gokhan | good morning noonedeadpunk, I am trying distribution upgrades, but when reinstalling infra nodes, lxc containers are not created with "openstack-ansible setup-hosts.yml --limit localhost,reinstalled_host*". it seems we need also adding lxc_hosts when using limit | 06:27 |
---|---|---|
gokhan | noonedeadpunk, I found it we need to add also reinstalled_host,reinstalled_host-host_containers. we need to update distribution upgrade document | 07:48 |
noonedeadpunk | gokhan: oh, yes, sure | 08:32 |
noonedeadpunk | you defenitely need it | 08:32 |
noonedeadpunk | however, I somehow thought that reinstalled_host* includes both it and containers? | 08:32 |
noonedeadpunk | gokhan: like, when I do `ansible -m ping os-control01*` I get both host and containers | 08:33 |
noonedeadpunk | so `openstack-ansible setup-hosts.yml --limit localhost,reinstalled_host*` should do the trick? | 08:34 |
gokhan | noonedeadpunk, sorry, it was my fault, I didn't use asterix * after node name :( it is working | 08:36 |
noonedeadpunk | yeah, asterisk is important there :) | 08:38 |
gokhan | noonedeadpunk, in this command "openstack-ansible set-haproxy-backends-state.yml -e hostname=<infrahost> -e backend_state=disabled --limit reinstalled_host" infrahost is reinstalled host or another infrahost | 08:58 |
gokhan | ? | 08:58 |
noonedeadpunk | good question | 09:04 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Disable RPC configuration for Neutron with OVN in CI https://review.opendev.org/c/openstack/openstack-ansible/+/908521 | 09:33 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Disable RPC configuration for Neutron with OVN in CI https://review.opendev.org/c/openstack/openstack-ansible/+/908521 | 09:34 |
noonedeadpunk | so, seems that centos9 is broken on nova-compute | 09:41 |
noonedeadpunk | and also we have broken CI overall... | 09:42 |
noonedeadpunk | jobs are not scheduled due to zuul config error | 09:43 |
noonedeadpunk | https://review.opendev.org/c/openstack/openstack-ansible/+/908322 to solve it | 09:43 |
noonedeadpunk | gokhan: sorry got distracted | 09:43 |
noonedeadpunk | gokhan: yes infrahost is reinstalled_host in this context | 09:44 |
noonedeadpunk | would be good to align these 2 in doc | 09:45 |
andrewbonney | I can fix that in my patch. Should have it ready to go within a day or two | 09:53 |
noonedeadpunk | NeilHanlon: just in case you might be interested, that libvirt 9.10 has nasty regression https://fedoraproject.org/wiki/Changes/LibvirtModularDaemons | 10:48 |
noonedeadpunk | ugh | 10:48 |
noonedeadpunk | https://issues.redhat.com/browse/RHEL-20609 | 10:48 |
noonedeadpunk | andrewbonney: we're having another OS upgrade next week, so can practice it a bit :) | 11:00 |
halali | folk, will be good to land/merge this change soon https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/907708 :) | 11:04 |
gokhan | noonedeadpunk, when running "openstack-ansible setup-infrastructure.yml --limit localhost,repo_all,rabbitmq_all,reinstalled_host*" , it throws error when creating wheel directory on the build host | 11:07 |
gokhan | https://paste.openstack.org/show/bTsHar3QBasMsJk1UyEP/ | 11:07 |
gokhan | error is "msg": "chown failed: failed to look up user nginx", nginx user is not created on repo container | 11:09 |
noonedeadpunk | gokhan: maybe it failed somewhere before that? | 11:10 |
noonedeadpunk | what if you run repo-install.yml? | 11:10 |
gokhan | noonedeadpunk, I am checking now | 11:11 |
noonedeadpunk | as it feel that repo installation also potentially failed. | 11:11 |
noonedeadpunk | As task is delegated to repo container | 11:11 |
noonedeadpunk | and it expects nginx to be present there | 11:12 |
gokhan | noonedeadpunk, when I run "openstack-ansible repo-install.yml --limit localhost,dev-compute1*", I am getting failed: dev-infra1-repo-container-c3e5f3be is either already part of another cluster or having volumes configured\ | 11:14 |
gokhan | I have previously removed this infra node from peers | 11:15 |
noonedeadpunk | ok, frankly speaking I'm not that an expert in gluster... And we don't use it locally either... | 11:17 |
noonedeadpunk | So I can hardly help with this part | 11:18 |
noonedeadpunk | But I know andrewbonney dealt with it | 11:18 |
noonedeadpunk | there's a doc update containing removing brick in advacne: https://review.opendev.org/c/openstack/openstack-ansible/+/906832/2/doc/source/admin/upgrades/distribution-upgrades.rst | 11:18 |
noonedeadpunk | L195 | 11:19 |
andrewbonney | Assuming the brick/peer was removed in advance, the issue may be that the repo install needs to run against all hosts (no limit) | 11:20 |
gokhan | noonedeadpunk, sorry it was my fault again :( repoinstall.yml is commented in setup-infrastructre.yml :( | 11:20 |
noonedeadpunk | heh, ok :) | 11:21 |
gokhan | I previously followed https://review.opendev.org/c/openstack/openstack-ansible/+/906832/2/doc/source/admin/upgrades/distribution-upgrades.rst and remove brick/peers | 11:21 |
gokhan | noonedeadpunk, you are right repo install needs to run against all hosts | 11:22 |
gokhan | thanks noonedeadpunk andrewbonney it is now working | 11:22 |
noonedeadpunk | was not me, but ok | 11:22 |
andrewbonney | :) | 11:22 |
noonedeadpunk | also your original paste does limit repo_all | 11:23 |
noonedeadpunk | so I guess it should be fine | 11:23 |
gokhan | yes my original post does limit repo_all but when ı run it repo-install.yml is commented :( | 11:23 |
noonedeadpunk | yeah, ok, gotcha | 11:24 |
gokhan | noonedeadpunk, mariadb is installed on new node but it doesn't create /root/.my.cnf file I manually created this file | 11:34 |
gokhan | also on rabbitmq it is failed "To install a new major/minor version of RabbitMQ set '-e rabbitmq_upgrade=true'."" | 11:36 |
gokhan | do we need to add "-e rabbitmq_upgrade=true" | 11:36 |
noonedeadpunk | gokhan: yes, so that is kinda known thing.... | 11:37 |
noonedeadpunk | And I'm not sure about it at all | 11:38 |
noonedeadpunk | Known - missing /root/.my.cnf | 11:38 |
noonedeadpunk | Eventually, with any modern MariaDB you are not supposed to have my.cnf | 11:38 |
noonedeadpunk | As you're expected to login as root through socket auth | 11:38 |
noonedeadpunk | which is default | 11:38 |
noonedeadpunk | And old envs that has root messed up would struggle with not being able to auth as root without my.cnf | 11:39 |
noonedeadpunk | I guess we might want to add a note about that.... | 11:39 |
noonedeadpunk | not sure about rabbit, but potentially yes | 11:40 |
noonedeadpunk | actually, another thing about `rabbitmq_upgrade=true` that it feels to be bad/wrong approach when quorum queues are enabled | 11:40 |
noonedeadpunk | What I see in my sandbox now, is that rabbitmq behaves like mysql more or less - being in `activating` state until it can get clustered properly | 11:41 |
noonedeadpunk | And our rabbitmq_upgrade currently stops everything except 1 node "by design" | 11:41 |
noonedeadpunk | but it's future problem.... | 11:41 |
noonedeadpunk | or well | 11:41 |
noonedeadpunk | not for you gokhan at least :) | 11:41 |
gokhan | noonedeadpunk, yes when I tried to check galera status on deployment host, I realized that .my.cnf is missing on the new host. it is needed when checking status for me. | 11:41 |
gokhan | yes in my env. mirroring queue is enabled :) | 11:42 |
noonedeadpunk | mirroring of queues != quorum queues | 11:42 |
noonedeadpunk | these are 2 very distinct things and switching is not very trivial, available only since 2023.2 | 11:43 |
noonedeadpunk | and mirrored queues are considered deprecated at this point | 11:44 |
gokhan | is quorum queues enabled default on bobcat ? is there any migration path from mirrored queues to quorum queues? | 11:47 |
noonedeadpunk | no, not default, yes, upgrade is possible | 11:52 |
noonedeadpunk | but it's involving some downtime/disturbance | 11:52 |
noonedeadpunk | eventually, upgrade is already there. The problem is, that to upgrade to quorum, you actually need to drop existing vhost, and create a new one which will be replicated | 11:53 |
noonedeadpunk | So after removing vhost for the service (which happens around at the beginning) and until playbook ends - service might misbehave | 11:54 |
noonedeadpunk | But so far in sandbox experience is waaay better | 11:54 |
opendevreview | Merged openstack/openstack-ansible-rabbitmq_server master: Add the abillity to configure the logging options https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/902908 | 11:59 |
gokhan | thanks for information noonedeadpunk :) | 12:09 |
NeilHanlon | noonedeadpunk ah.. yeah. i had heard about that in the Integration SIG.. :\ | 13:46 |
noonedeadpunk | if you around... can you check this backport pls?:) https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/907708 | 13:47 |
spatel | mgariepy morning! | 14:35 |
spatel | any luck with CAPI? | 14:35 |
mgariepy | didnt had time to try it. | 15:02 |
mgariepy | it's for my future self ;) haha | 15:03 |
nixbuilder | I know this may not be the proper place for this question... however I need to know if anyone has a procedure for deleting images and volumes using only mysql? There are a few images/volumes that are in error. Somehow the image/volume already was deleted on our SAN but not within the openstack databases. I am attempting to clean this up. | 15:10 |
noonedeadpunk | update volumes set deleted = 1 deleted_at = "2024-02-09 15:11:23" where id = UUID ? | 15:12 |
noonedeadpunk | but eventually for volumes specifically - it should not get to error if backing device is gone | 15:12 |
noonedeadpunk | it should be marked as deleted properly | 15:12 |
noonedeadpunk | So you should be able to issue delete request thorugh api | 15:13 |
nixbuilder | noonedeadpunk: from what I can tell cinder makes a call through the SAN driver to delete the volume, that call fails because the volume is not there and then I get an "error deleting" status on the volume. But I will try your suggestion. | 15:16 |
noonedeadpunk | huh | 15:16 |
noonedeadpunk | Ok, that's different in ceph. Or well. It still tries to issue request to ceph, it says - no image, and cinder happily marks as "deleted" afterwards, | 15:17 |
noonedeadpunk | So potentially a bug in a driver, as I would expect such exception to be catched | 15:17 |
nixbuilder | noonedeadpunk: Perhaps a bug in the driver... as always thanks for your help! | 15:18 |
drarvese | Greetings! I'm running into an issue during the Keystone playbook where I get a "504 Gateway timeout" when adding the service project -- https://paste.openstack.org/show/bwzv3tuyCp8mLQNaTf5w/. Does anyone have any ideas? This is an AIO deployment, though I'm not using the bootstrap-aio.sh script or scenarios. This is the second time I've ran into this. The previous time (also an AIO | 16:41 |
drarvese | deployment) I was able to get around it by deploying everything on baremetal, but that seems like a really heavy handed solution. | 16:41 |
noonedeadpunk | o/ | 16:51 |
noonedeadpunk | drarvese: I guess, first question should be if you can access a keystone with curl from the VM? | 16:52 |
noonedeadpunk | meaning - through container IP | 16:52 |
noonedeadpunk | probably you can, as that's container timeout.... | 16:52 |
noonedeadpunk | *API | 16:52 |
noonedeadpunk | and then if you can reach MySQL and what you see in logs inside keystone container | 16:53 |
noonedeadpunk | as that sounds like some kind of connectivity issue to me... | 16:54 |
noonedeadpunk | between what parts is a good question... | 16:54 |
noonedeadpunk | so it can be haproxy -> keystone or keystone -> mysql, keystone -> memcached | 16:54 |
drarvese | Yeah, I can curl the keystone endpoint through its container IP. I can reach MySQL through the utility container. Lemme grab the logs from the keystone container | 17:00 |
noonedeadpunk | Huh, ok, interesting | 17:08 |
noonedeadpunk | and with curl it returns api version and some json? | 17:08 |
drarvese | Yeah | 17:09 |
noonedeadpunk | I guess I would install telnet or smth like that to keystone container and would try to reach mariadb and memcached ips from it | 17:10 |
noonedeadpunk | via ips defined in /etc/keystone/keystone.conf | 17:10 |
noonedeadpunk | oh, btw, can you run smth like `openstack endpoint list` from utility container? | 17:11 |
noonedeadpunk | As I assume you should get same 504? | 17:11 |
drarvese | Logs from the keystone container: https://paste.openstack.org/show/byw3oUGWo0NzkzXIBHtW/ | 17:19 |
drarvese | And, yes, that returns a 504 | 17:19 |
noonedeadpunk | huh | 17:21 |
noonedeadpunk | according to the log - keystone answers eventually | 17:22 |
noonedeadpunk | log looks quite short though.... | 17:24 |
noonedeadpunk | another thing - have you applied same overrides as for aio? | 17:25 |
noonedeadpunk | ie: https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L74-L81 | 17:25 |
noonedeadpunk | but franky speaking I'm not sure what's really wrong, given that keystone can connect to memcache and mariadb | 17:26 |
noonedeadpunk | and system is not under some weird load | 17:26 |
drarvese | No, I haven't applied any overrides like that | 17:27 |
noonedeadpunk | ofc you can try to increase timeouts and see if request will eventually pass.... | 17:29 |
noonedeadpunk | there're couple of variables for that: https://opendev.org/openstack/openstack-ansible-haproxy_server/src/branch/master/defaults/main.yml#L244-L251 | 17:29 |
noonedeadpunk | BUt in fact I experienced this sort of issues only when keystone was not able to reach memcached due to some firewalling | 17:30 |
drarvese | I'm able to telnet to the MySQL IP (the internal_lb_vip_ip), but not memcached or the IP of the MySQL container | 17:30 |
noonedeadpunk | when connection was not reseted, but dropped | 17:31 |
noonedeadpunk | yeah, so then, when keystone can not reach memcached, it will wait for connection timeout and only then proceed with request | 17:32 |
noonedeadpunk | Which has high probability of timing out on haproxy | 17:32 |
noonedeadpunk | I dunno how aio is done (and if it aio), but memcached and keystone containers are ideally on the same bridge inside the controller | 17:32 |
noonedeadpunk | so unless it's some multi-node aio - issue is strange | 17:33 |
drarvese | Yeah, they are on the same bridge | 17:34 |
noonedeadpunk | and memcached container does have IP on eth1? | 17:35 |
noonedeadpunk | and running? | 17:35 |
drarvese | Yep | 17:36 |
noonedeadpunk | Then I can only guess the reason might be in disabled net.ipv4.ip_forward or smth like that... | 17:37 |
noonedeadpunk | but that should be set by openstack_hosts role even.... | 17:37 |
noonedeadpunk | drarvese: ok, easy test. comment out memcached in /etc/keystone/keystone.conf, restart service. After that you should be able to issue request from utility container | 17:38 |
drarvese | That works | 17:40 |
noonedeadpunk | mhm... well... you need to find out why direct connection withing same bridge does not work... While you can reach host - you can't reach other container somehow... | 17:44 |
noonedeadpunk | maybe proxy_arp is needed, but I'd doubt... | 17:44 |
noonedeadpunk | that really feels like some firewall frankly speaking | 17:44 |
noonedeadpunk | drarvese: you would totally need that for rabbitmq for sure in the future | 17:47 |
drarvese | Yeah. It does seem like a firewall issue. I'll look closer at that | 17:49 |
noonedeadpunk | from osa prespective - nothing touches firewall | 17:53 |
drarvese | Sigh, it was a firewall issue. The FORWARD iptables chain was configured to deny stuff. | 18:03 |
noonedeadpunk | that would explain it :D | 18:07 |
opendevreview | Merged openstack/openstack-ansible master: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908322 | 18:41 |
noonedeadpunk | folks, does anybody know how VLAN in OVN works? :D | 18:54 |
noonedeadpunk | Like - I do see there's a virtual switch, I also see patch-provnet that in nbdb maps to vlan | 18:55 |
noonedeadpunk | as well as all ports in the network | 18:55 |
noonedeadpunk | but question more - where traffic does go out from this vlan? | 18:56 |
noonedeadpunk | I guess meaning, if gateway != compute, should compute have access to vlan? | 18:56 |
noonedeadpunk | As it feels like there's anyway geneve in between | 18:57 |
noonedeadpunk | jamesdenton: sorry, not sure if you're around, but I guess you might know best :D | 18:57 |
jamesdenton | hi | 19:14 |
jamesdenton | IIRC your gateway nodes will handle non-floatingip traffic always, and compute nodes would handle floatingip traffic when distributed routing is enabled. Otherwise, the gateway nodes handle that too | 19:16 |
jamesdenton | If it's just a provider network (w/o a neutron router) then the computes would need to have access to that vlan | 19:16 |
noonedeadpunk | aha | 19:17 |
noonedeadpunk | and what is non-floating ip traffic then? | 19:17 |
jamesdenton | SNAT | 19:18 |
noonedeadpunk | ok, so routers | 19:18 |
jamesdenton | So, tenant network behind neutron router, likely geneve | 19:18 |
jamesdenton | yes | 19:18 |
noonedeadpunk | and fip in routers if distributed is disabled | 19:18 |
noonedeadpunk | mhm, ok yes | 19:18 |
noonedeadpunk | I somehow started assuming that vlan somehow goes through gateways as well | 19:18 |
jamesdenton | yep | 19:19 |
noonedeadpunk | but didn't find how to prove that or dismiss | 19:19 |
noonedeadpunk | and I was thinking about octavia lbaas vlan per say | 19:19 |
jamesdenton | yeah that gateway node is only used when routers are in play, if it's just a VM on a vlan network straight up to the fabric then that's all through the compute | 19:20 |
noonedeadpunk | so it does not have router or anything in ovn | 19:20 |
noonedeadpunk | just being in nbdb confused me I guess :D | 19:20 |
noonedeadpunk | ok, thanks! | 19:20 |
jamesdenton | octavia w/ ovn provider does not require the lbaas mgmt network | 19:20 |
noonedeadpunk | yeah, I know that | 19:20 |
jamesdenton | cool | 19:20 |
noonedeadpunk | But it does not have l4 either | 19:20 |
noonedeadpunk | so meh :( | 19:20 |
jamesdenton | it's a little more basic :) | 19:20 |
noonedeadpunk | yeah, I mean, it can repalce some usecases, but not all I guess | 19:21 |
noonedeadpunk | btw, I've did some cleanup of your octavia ovn patch | 19:21 |
jamesdenton | but cheap! | 19:21 |
jamesdenton | how's the ovn vpnaas stuff coming along? | 19:21 |
noonedeadpunk | and tested it - works nicely | 19:21 |
jamesdenton | oh thank you | 19:21 |
noonedeadpunk | though it's somehow failing CI on quite unrelated failures.... | 19:22 |
noonedeadpunk | jamesdenton: well. it looks very nice | 19:22 |
noonedeadpunk | and about working | 19:22 |
jamesdenton | does it use a namespace? | 19:22 |
noonedeadpunk | I guess I just constantly messing up with bringing tunnel up | 19:22 |
noonedeadpunk | it does | 19:22 |
noonedeadpunk | And messing what's left what's right side.... | 19:22 |
noonedeadpunk | So it creates a namespace with ipsec, it does use one more IP from the external network, as it can't share one with the router | 19:23 |
jamesdenton | oh ok, not terrible i guess | 19:23 |
noonedeadpunk | Then it also creates internal /30 network and wires up with the router | 19:23 |
jamesdenton | and adds some routes to the router? | 19:24 |
noonedeadpunk | yeah, I believe it does | 19:24 |
noonedeadpunk | I didn't manage to make a pair fully working yet:) | 19:24 |
noonedeadpunk | but all pieces are in place, so it must work | 19:24 |
noonedeadpunk | Ah! And VPN is running as extra service, alike to metadata, and is registered in neutron agents | 19:25 |
noonedeadpunk | And uses RPC.... | 19:25 |
jamesdenton | oh nice | 19:25 |
jamesdenton | i'll give the patch a go locally this weekend | 19:25 |
jamesdenton | i could never get a tunnel up in an OVS environment for some reason | 19:25 |
noonedeadpunk | But I really think I'm doing some very basic and stupid mistake when bringing 2 VPNs up | 19:25 |
jamesdenton | been a few months since i tried though | 19:25 |
noonedeadpunk | (in the same env) | 19:25 |
noonedeadpunk | I'm also about to look into ovn-bgp-agent really shortly | 19:26 |
noonedeadpunk | but dunno where to take frr from... | 19:26 |
jamesdenton | been keeping eyes on that too | 19:26 |
noonedeadpunk | Yeah, according to internal planning I should have done that 2 weeks ago... | 19:27 |
jamesdenton | don't be so hard on yourself, i'm still working on backlog from 3 years ago | 19:27 |
noonedeadpunk | haha | 19:28 |
noonedeadpunk | yeah, true | 19:28 |
noonedeadpunk | backlog from 3y ago haven't gone anywhere | 19:28 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove galera_client from required projects https://review.opendev.org/c/openstack/openstack-ansible/+/908324 | 19:32 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.2: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908280 | 19:33 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.1: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908681 | 19:34 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908682 | 19:34 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908682 | 19:35 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908682 | 19:35 |
spatel | jamesdenton hey! after long time | 19:42 |
jamesdenton | hey spatel ! | 19:43 |
spatel | how is your EVPN issue? | 19:43 |
jamesdenton | what's new? | 19:43 |
jamesdenton | we got that worked out... i think there were a few issues but mainly a mismatch between switches on the reserved vlan ranges, in addition to the lack of an infra vlan configuration | 19:44 |
jamesdenton | but we're cookin' now | 19:44 |
spatel | oh so it was mis-config issue right? | 19:44 |
jamesdenton | yeah, at the end of the day it was | 19:45 |
jamesdenton | out setup is ingress replication, no multicast | 19:45 |
jamesdenton | all is well, for now | 19:45 |
spatel | I am busy in building new DC and new openstack. I am looking for k8s with sriov support | 19:45 |
jamesdenton | the fun stuff | 19:45 |
spatel | Did you ever run k8s with sriov ? | 19:45 |
jamesdenton | i have not | 19:46 |
spatel | developer want to run voice application on k8s with sriov support | 19:46 |
spatel | Yes OVN-BGP-AGENT is in my list | 19:47 |
spatel | jamesdenton why are you using ingress replication? | 19:48 |
spatel | Multicast is easy and scalable.. | 19:48 |
jamesdenton | this is the way our network guys wanna run it | 19:49 |
spatel | Ingress is easy so I can understand but using multicast give your better control on BUM engineering.. | 19:50 |
spatel | if you don't want to send BUM traffic on ABC rack then you can do that without any issue :) | 19:50 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!