Tuesday, 2024-10-29

opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Set Rocky 9 Ceph jobs as voting  https://review.opendev.org/c/openstack/openstack-ansible/+/93359209:31
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Set Rocky 9 Ceph jobs as voting  https://review.opendev.org/c/openstack/openstack-ansible/+/93359209:31
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Move healthcheck playbooks to collection  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/93361010:46
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Move healthcheck playbooks to the collection  https://review.opendev.org/c/openstack/openstack-ansible/+/93361110:49
noonedeadpunkI think it's the meeting time?15:02
noonedeadpunk#startmeeting openstack_ansible_meeting15:02
opendevmeetMeeting started Tue Oct 29 15:02:58 2024 UTC and is due to finish in 60 minutes.  The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.15:02
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:02
opendevmeetThe meeting name has been set to 'openstack_ansible_meeting'15:02
noonedeadpunk#topic rollcall15:03
noonedeadpunko/15:03
noonedeadpunksorry for the delay - hasn't adopted to winter time yet15:03
NeilHanlono/ 15:04
noonedeadpunk#topic office hours15:08
opendevreviewMerged openstack/openstack-ansible master: Bump ansible version to 2.17.5  https://review.opendev.org/c/openstack/openstack-ansible/+/93206715:09
noonedeadpunkSo yesterday I was able to figure out what's wrong with ceph jobs on rocky, as also we got an incoming bug on the topic15:09
noonedeadpunk#link https://bugs.launchpad.net/openstack-ansible/+bug/208514615:09
noonedeadpunkAnd now ceph jobs on rocky are passing15:10
noonedeadpunk#link https://review.opendev.org/c/openstack/openstack-ansible/+/93359215:10
noonedeadpunkI've suggested to make them voting now15:10
noonedeadpunkthough, NeilHanlon, somehow ansible still does not have anible_python_interpreter defined. I guess it's smth to take into ansible community though15:11
jrossero/ hello15:11
noonedeadpunkAnd I've suggested a workaround which should work normally15:11
noonedeadpunkas in ceph-ansible it doesn't look to be heavily used anyway15:11
noonedeadpunkas of CI stability, we keep getting and we keep getting The read operation timed out for https://releases.openstack.org/constraints/15:13
noonedeadpunkdoes anyone recall where to see the nodepool provider being used?15:13
noonedeadpunkas I wanna collect some stats about where failures occur15:13
noonedeadpunk#link https://zuul.opendev.org/t/openstack/build/87cb2dbe3f554df6aabd16ecf04f829d15:13
noonedeadpunkas example15:13
noonedeadpunkaha, here: https://zuul.opendev.org/t/openstack/build/87cb2dbe3f554df6aabd16ecf04f829d/log/zuul-info/inventory.yaml15:14
jrosser^ yes15:14
NeilHanlonnoted -- thanks for working around that and sorry i've been absent on it15:14
jrosserbut also here https://zuul.opendev.org/t/openstack/build/87cb2dbe3f554df6aabd16ecf04f829d/log/job-output.txt#4815:15
NeilHanlondouble booked here, so have to drop for another, will respond after15:15
noonedeadpunkah, yeah, fair15:20
noonedeadpunkI also did send patch to move healthcheck playbooks to the collection15:21
noonedeadpunkthough this patch fails on being uable to find them15:21
noonedeadpunk#link https://review.opendev.org/c/openstack/openstack-ansible/+/93361115:21
noonedeadpunkI think it's actually a red herring and a recheck needed after dependancy is merged15:21
jrosseris that allowed to have subdirectories of playbooks?15:22
jrosseropenstack.osa.healthcheck.hosts -> plugins-repo/playbooks/healthchecks/hosts.yml15:23
noonedeadpunkI think it should be15:23
noonedeadpunkas this does pass https://zuul.opendev.org/t/openstack/build/ecd042a433314549a9a4eb5ea1b1923a15:23
noonedeadpunkand it's running it: https://zuul.opendev.org/t/openstack/build/ecd042a433314549a9a4eb5ea1b1923a/log/job-output.txt#1322015:24
noonedeadpunkmariadb is apparently being late with 11.4.4 release: https://jira.mariadb.org/projects/MDEV/versions/2990715:25
noonedeadpunkthough everything seems to be ready for it15:25
noonedeadpunkonce it's published - I think we can finally land the upgrade patch for it15:27
noonedeadpunkI'm also due to send the ptg results out15:28
noonedeadpunkalso, apparently, we've released broken beta due to rabbitmq role being unable to bootstrap after some patch merged there15:30
noonedeadpunkI assume that my bump script doesn't account for some corner cases, which in fact I do not fully understand15:31
noonedeadpunkBlind guess is that it takes smth SHA which then being shadowed by some merge, as the tree is not plain, and `git history` doesn't always show it in a good way15:32
noonedeadpunk#endmeeting15:58
opendevmeetMeeting ended Tue Oct 29 15:58:25 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:58
opendevmeetMinutes:        https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-10-29-15.02.html15:58
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-10-29-15.02.txt15:58
opendevmeetLog:            https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-10-29-15.02.log.html15:58
spatelnoonedeadpunk how is your OVN deployment going on ?18:06
noonedeadpunkwell, quite good. I was able to get everthing working18:06
noonedeadpunkthough migration from OVS would be very tough for us18:07
spatelFlag OVN or ovn-bgp?18:07
spatelflat*18:07
noonedeadpunknow looking closely into runining my ovn deployment and getting bgp agent in18:07
noonedeadpunkflat for now18:07
spatelI have started OVN deployment.18:07
noonedeadpunkovn-bgp known not to work with net nodes18:07
spatelJust simple OVN 18:07
noonedeadpunkif you are to use supported nortd driver18:07
noonedeadpunkit does with southd driver, but it's not gona be further developed18:08
noonedeadpunkor you ned to get rid of net nodes18:08
noonedeadpunkspecifically FIPs not being announced from net nodes, as they attempt to be announced from computes always18:08
noonedeadpunkeven when they should not18:08
spatelhmm18:09
noonedeadpunkconsidered as a bug, but never got chance to look deeper in it since february18:09
spatelAre you seeing any issue with OVN deployment or any tuning :)18:09
spatelHow many compute nodes you have in OVN deployment?18:10
noonedeadpunkah, it's still non-production env. so like 5 computes in each AZ18:10
noonedeadpunkconsider a nice sandbox :)18:10
noonedeadpunk3 azs18:10
spateloh!! 18:11
spatelI am going to production with 200 computes :D18:11
noonedeadpunkso I haven't went to a point of where I'd need to scale NB DB18:11
spatelI thought you have story for me 18:11
noonedeadpunkwell. I managed to break the raft cluster and it got completely stuck on me18:12
spatelReally?18:12
noonedeadpunkso one thing I learned - is that while it's "trivial" to restore state of neutron from mysql - it's not for octavia18:12
spatelBut i thought you can re-build raft and use neutron script to re-create all the flow18:12
noonedeadpunkSo realized that https://review.opendev.org/c/openstack/ovn-octavia-provider/+/925747 is required for it18:13
noonedeadpunkyeah, sure, you can drop cluster and rebuild18:13
noonedeadpunkbut it won't include load balancers18:13
noonedeadpunkas these are managed by octavia18:13
spatelI am not planning to use octavia OVN lb 18:14
spatelI will use amphora driver18:14
spatelAlso you can take snapshot backup of OVN DB and restore right?18:15
noonedeadpunkUm, don't really know18:16
spatelMy plan is to run k8s on new OVN based openstack and k8s has lots of routers and networking requirement so OVN will be good fit here 18:16
spatelIf I use OVS then it will have lots of routers running in name space and make it mess to debug 18:17
noonedeadpunkwell, if you're to use magnum with capi driver - ovn lb makes total sense18:17
noonedeadpunklol, debugging ovn not smaller mess :D18:17
spatelAgreed 18:17
mgariepyi'm currently debuging some ovn issues :D18:17
spatellol... but world is moving toward OVN :) 18:18
spatelmgariepy what are those :)18:18
mgariepyat first i tought that my vm were not getting the dns from the dhcp lease from ovn.18:18
mgariepybut it turn out that for some reason the metadata is configuring the network as static.18:19
mgariepybut only on a few select nodes.. like the 6 i newly deployed.18:19
spatelHmm! 18:20
mgariepypretty much the same config beside the hostnames.. 18:20
spatelMy deployment is all VLAN based provider so hope it will simplify flow entires. 18:20
spatelWhen you use lots of floating IPs etc.. that is where problem start in OVN 18:21
mgariepydefine lots ?18:21
mgariepy1000s ?18:21
spatelI have 200 computes 18:21
spatelIf you do floating and lots of VPC etc.. that add more flow entries in control plane 18:22
spatelWe run SRIOV and VLAN base tenants. (no floating IPs) 18:23
spatelin k8s case sure we will have floating IPs 18:23
mgariepyi'll have more nodes soon-ish 18:27
spatelIf OVN is scary then why big companies adoption in their environment :)18:29
spatelI think its just new toy and not enough doc around. 18:30
spatelI am running 22 nodes production cluster on OVN and its been 3 years. Not a single issue. 18:30
mgariepyovn is getting push everywhere. and it works quite well from what i see.18:30
spatelIt running on Yoga release which is freaking old 18:30
spatelRecent versions are much stable compare to previous. 18:31
mgariepycurrently on 2023.1 18:31
spatelI am deploying 2024.1 on new deployment 18:31
mgariepyi need to upgrade to it :D18:32
noonedeadpunkWe actually finished upgrades from 2023.1 to 2024.118:33
noonedeadpunkon 29.0.2, but a lot of things landed to 29.1.0 to be frank18:33
mgariepywow. i'm quite puzzeled.. how does the metadata service gives me static config on one node and dhcp on the otherone ?18:41
mgariepywith something like : `curl http://169.254.169.254/openstack/2020-10-14/network_data.json`18:41
spatelThere is not DHCP agent or anything like that in OVN 19:28
mgariepyovn does generate the packet19:29
mgariepydhcp works fine.19:29
noonedeadpunk++19:30
mgariepyit's the metadata-agent that does spit garbage to me ;) haha19:30
noonedeadpunkI have seen some kind of native implementation in ovn for metadata as well... but I could be hallucinating as well19:30
mgariepynot sure / haven't found the if that is reponsible to take the static path.19:30
mgariepyi'm using what's in tree.19:31
mgariepyhttps://github.com/openstack/neutron/tree/master/neutron/agent/ovn/metadata19:31
noonedeadpunkyeah, I guess I am 19:32
noonedeadpunkI also have VPNaaS that kinda brings it's own mess19:32
mgariepyhttps://paste.opendev.org/show/bAVYgLAFuiTtuYFum4NN/19:40
mgariepysame neutron config same nova config. same-ish ovs db dump on the server19:41
mgariepytop one is dhcp the bottom one is static.19:42
noonedeadpunkso what specifically are you confused about?19:52
noonedeadpunkthat entries are different?19:52
noonedeadpunkas I guess that should instruct on how to provision, say, netplan19:52
noonedeadpunk"type": "ipv4_dhcp" vs "type": "ipv4" should be the difference you explain19:53
noonedeadpunkso you don't end up without networking if dhcp is disabled. which kinda makes sense....19:54
mgariepyit should be dhcp.19:57
mgariepybut the metadata agent is not giving me the same network config.19:58
noonedeadpunkaaaaah19:58
mgariepyin the dhcp lease there is a dns server.. not in the metadata json file. so cloud init does generate dhcp or static ip depending on the value of this file.19:59
mgariepyi just ton get why it's different20:00
noonedeadpunkI somehow was thinking these are different networks20:01
mgariepynop 20:01
mgariepysame network20:01
mgariepy:)20:01
noonedeadpunkwhere one have dhcp enabled second is not20:01
noonedeadpunk*or well, subnets(20:01
mgariepythey both have20:01
mgariepysame subnet also.20:01
noonedeadpunkthat's extremely weird otherwise...20:02
mgariepyanyway i'll continue to poke it tomorrow i guess ..20:02
noonedeadpunkcould it be that subnet changed but was not synced somehow?20:02
mgariepyif i set the interface to dhcp i get the lease.20:02
mgariepyno20:03
mgariepythe subnet was created last year.20:03
mgariepythe only diff is that i deployed the new host a couple weeks ago.20:03
noonedeadpunkwell, dhcp you can enable/disable dynamically 20:03
mgariepyit's enabled20:04
noonedeadpunkalso - we had a https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/stable/2023.1/handlers/main.yml#L38-L51 which actually was never working20:05
mgariepyhttps://blog.oddbit.com/post/2019-12-19-ovn-and-dhcp/20:05
noonedeadpunkso for upgrade it could be you're running old code 20:05
noonedeadpunkor well s/never/for quite a while/20:05
noonedeadpunkmy guess it's not needed anymore though, so I've suggested to drop such handler as a whole lately...20:07
mgariepyi deployed the first nodes with zed, upgraded to 2023.1 , a couple of other nodes in last april on 2023.1 theses works correcly and some a couple of weeks ago.20:07
noonedeadpunkbut that totally sounds like discrpency of where request ends up20:07
mgariepythe first batch works the second also, but not the last one20:07
noonedeadpunklike some obsoleted namespace or smth20:07
mgariepywhere ? the host is brand new.. rebooted it a couple of times also20:08
mgariepyotherwise the network is working ok-ish i can connect to it a FIP20:08
noonedeadpunkthat is great question....20:08
noonedeadpunkhardly know how to trace rtraffic inside ovn20:09
mgariepyread the flow tables ;p20:10
noonedeadpunkyeah, and I can hardly read them20:10
mgariepythe ovn-trace command to tests dhcp works correctly ;)20:10
mgariepyi suspect an issue with neutron itself more than ovn.20:11
noonedeadpunkbut why would you test dhcp?20:11
noonedeadpunkwell, I've heard that discrepencies between neutron db and ovn db can grow and evolve with time20:11
mgariepyi tought that was the part that wasn't working correclty haha20:11
noonedeadpunkbut I don't think in ovn dhcp has smth to do with formatting metadata?20:11
noonedeadpunkI thought now it's only up to metadata agent?20:12
mgariepyno but the metadata agent does some query to the ovsdb20:12
noonedeadpunkwell, if some metadata agent is stuck... or lost control over haproxy...20:12
noonedeadpunkthen if some would disable dhcp and re-enable, while 1 metadata was misbehaving20:13
noonedeadpunkthis could lead to the result you see...20:13
noonedeadpunkbut dunno20:13
mgariepycurl to meta > ovn > hjaproxy > socket to the meta with magic headers20:13
mgariepywhy the nodes from last april works fine then ?20:14
mgariepycomputers should output only 1's and 0's... 20:14
mgariepyanyhow i will probably find the issue at some point... just need to poke at it a bit more.20:14
noonedeadpunkwell there were really a top of changes to neutron landed and backported...20:15
noonedeadpunkbut I'd guess you're running same neutron version anyway20:15
mgariepyyeah20:15
mgariepysame git sha everywhere haha20:15
noonedeadpunkyeah20:15
noonedeadpunkbut ovn is not pinned that way fwiw20:16
mgariepyi dumped a lot of DB from ovn/ovs and didn't found much20:16
noonedeadpunkbut it's uca which is stable kinda (would be more concerned with rdo)20:16
mgariepyand i have the same ovn vsersion installed.20:16
noonedeadpunkI somehow don't think it;s about ovn itself20:16
mgariepyi dont think either20:17
noonedeadpunkbut what responds for each VM at the end20:17
noonedeadpunkas ovn just passes trqaffic through20:17
noonedeadpunkit has nothing to do with content20:17
mgariepyneutron-ovn-metadata-agent20:17
noonedeadpunkso maybe you can try to ps | grep or smth to find where are namespaces / haproxy serving the network?20:18
mgariepywhitch is accessed via a socket on the hypervisor from the hapropxy service20:18
noonedeadpunkduno20:18
noonedeadpunkbut if neutron is updated, and it's restarted, and it lost haproxy it's managing...20:18
noonedeadpunkcould be that some agents just serve stale data20:19
noonedeadpunkbut again that;s only a thing if at some point subnet wasw without dhcp enabled20:19
mgariepyeveryting restart on meta service restart.20:20
noonedeadpunkbad that neutron doesn't store any action history for anything20:20
noonedeadpunkactually - it should not be restarted20:20
mgariepyit was with dhcp at the start and has been for months if not more than a year20:20
mgariepyanyway when i find the issue i'll share my findings.20:21
noonedeadpunkI think when you restart metadata agent - haproxy services are likely remain to run20:21
noonedeadpunkbut again - not 100%^ sure20:21
mgariepynah they restart20:22
mgariepyps aux all the services are ok20:22
noonedeadpunk(at least that was the case with l3 agent on ovs and keepalived due to https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/stable/2023.1/vars/main.yml#L538)20:22
noonedeadpunkyeah, pls share- that's very interesting20:23
mgariepyyep20:23
mgariepya real bug that's fun doesn't occurs too often theses days ;)20:24

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!