opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Set Rocky 9 Ceph jobs as voting https://review.opendev.org/c/openstack/openstack-ansible/+/933592 | 09:31 |
---|---|---|
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Set Rocky 9 Ceph jobs as voting https://review.opendev.org/c/openstack/openstack-ansible/+/933592 | 09:31 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Move healthcheck playbooks to collection https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/933610 | 10:46 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Move healthcheck playbooks to the collection https://review.opendev.org/c/openstack/openstack-ansible/+/933611 | 10:49 |
noonedeadpunk | I think it's the meeting time? | 15:02 |
noonedeadpunk | #startmeeting openstack_ansible_meeting | 15:02 |
opendevmeet | Meeting started Tue Oct 29 15:02:58 2024 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:02 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:02 |
opendevmeet | The meeting name has been set to 'openstack_ansible_meeting' | 15:02 |
noonedeadpunk | #topic rollcall | 15:03 |
noonedeadpunk | o/ | 15:03 |
noonedeadpunk | sorry for the delay - hasn't adopted to winter time yet | 15:03 |
NeilHanlon | o/ | 15:04 |
noonedeadpunk | #topic office hours | 15:08 |
opendevreview | Merged openstack/openstack-ansible master: Bump ansible version to 2.17.5 https://review.opendev.org/c/openstack/openstack-ansible/+/932067 | 15:09 |
noonedeadpunk | So yesterday I was able to figure out what's wrong with ceph jobs on rocky, as also we got an incoming bug on the topic | 15:09 |
noonedeadpunk | #link https://bugs.launchpad.net/openstack-ansible/+bug/2085146 | 15:09 |
noonedeadpunk | And now ceph jobs on rocky are passing | 15:10 |
noonedeadpunk | #link https://review.opendev.org/c/openstack/openstack-ansible/+/933592 | 15:10 |
noonedeadpunk | I've suggested to make them voting now | 15:10 |
noonedeadpunk | though, NeilHanlon, somehow ansible still does not have anible_python_interpreter defined. I guess it's smth to take into ansible community though | 15:11 |
jrosser | o/ hello | 15:11 |
noonedeadpunk | And I've suggested a workaround which should work normally | 15:11 |
noonedeadpunk | as in ceph-ansible it doesn't look to be heavily used anyway | 15:11 |
noonedeadpunk | as of CI stability, we keep getting and we keep getting The read operation timed out for https://releases.openstack.org/constraints/ | 15:13 |
noonedeadpunk | does anyone recall where to see the nodepool provider being used? | 15:13 |
noonedeadpunk | as I wanna collect some stats about where failures occur | 15:13 |
noonedeadpunk | #link https://zuul.opendev.org/t/openstack/build/87cb2dbe3f554df6aabd16ecf04f829d | 15:13 |
noonedeadpunk | as example | 15:13 |
noonedeadpunk | aha, here: https://zuul.opendev.org/t/openstack/build/87cb2dbe3f554df6aabd16ecf04f829d/log/zuul-info/inventory.yaml | 15:14 |
jrosser | ^ yes | 15:14 |
NeilHanlon | noted -- thanks for working around that and sorry i've been absent on it | 15:14 |
jrosser | but also here https://zuul.opendev.org/t/openstack/build/87cb2dbe3f554df6aabd16ecf04f829d/log/job-output.txt#48 | 15:15 |
NeilHanlon | double booked here, so have to drop for another, will respond after | 15:15 |
noonedeadpunk | ah, yeah, fair | 15:20 |
noonedeadpunk | I also did send patch to move healthcheck playbooks to the collection | 15:21 |
noonedeadpunk | though this patch fails on being uable to find them | 15:21 |
noonedeadpunk | #link https://review.opendev.org/c/openstack/openstack-ansible/+/933611 | 15:21 |
noonedeadpunk | I think it's actually a red herring and a recheck needed after dependancy is merged | 15:21 |
jrosser | is that allowed to have subdirectories of playbooks? | 15:22 |
jrosser | openstack.osa.healthcheck.hosts -> plugins-repo/playbooks/healthchecks/hosts.yml | 15:23 |
noonedeadpunk | I think it should be | 15:23 |
noonedeadpunk | as this does pass https://zuul.opendev.org/t/openstack/build/ecd042a433314549a9a4eb5ea1b1923a | 15:23 |
noonedeadpunk | and it's running it: https://zuul.opendev.org/t/openstack/build/ecd042a433314549a9a4eb5ea1b1923a/log/job-output.txt#13220 | 15:24 |
noonedeadpunk | mariadb is apparently being late with 11.4.4 release: https://jira.mariadb.org/projects/MDEV/versions/29907 | 15:25 |
noonedeadpunk | though everything seems to be ready for it | 15:25 |
noonedeadpunk | once it's published - I think we can finally land the upgrade patch for it | 15:27 |
noonedeadpunk | I'm also due to send the ptg results out | 15:28 |
noonedeadpunk | also, apparently, we've released broken beta due to rabbitmq role being unable to bootstrap after some patch merged there | 15:30 |
noonedeadpunk | I assume that my bump script doesn't account for some corner cases, which in fact I do not fully understand | 15:31 |
noonedeadpunk | Blind guess is that it takes smth SHA which then being shadowed by some merge, as the tree is not plain, and `git history` doesn't always show it in a good way | 15:32 |
noonedeadpunk | #endmeeting | 15:58 |
opendevmeet | Meeting ended Tue Oct 29 15:58:25 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:58 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-10-29-15.02.html | 15:58 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-10-29-15.02.txt | 15:58 |
opendevmeet | Log: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-10-29-15.02.log.html | 15:58 |
spatel | noonedeadpunk how is your OVN deployment going on ? | 18:06 |
noonedeadpunk | well, quite good. I was able to get everthing working | 18:06 |
noonedeadpunk | though migration from OVS would be very tough for us | 18:07 |
spatel | Flag OVN or ovn-bgp? | 18:07 |
spatel | flat* | 18:07 |
noonedeadpunk | now looking closely into runining my ovn deployment and getting bgp agent in | 18:07 |
noonedeadpunk | flat for now | 18:07 |
spatel | I have started OVN deployment. | 18:07 |
noonedeadpunk | ovn-bgp known not to work with net nodes | 18:07 |
spatel | Just simple OVN | 18:07 |
noonedeadpunk | if you are to use supported nortd driver | 18:07 |
noonedeadpunk | it does with southd driver, but it's not gona be further developed | 18:08 |
noonedeadpunk | or you ned to get rid of net nodes | 18:08 |
noonedeadpunk | specifically FIPs not being announced from net nodes, as they attempt to be announced from computes always | 18:08 |
noonedeadpunk | even when they should not | 18:08 |
spatel | hmm | 18:09 |
noonedeadpunk | considered as a bug, but never got chance to look deeper in it since february | 18:09 |
spatel | Are you seeing any issue with OVN deployment or any tuning :) | 18:09 |
spatel | How many compute nodes you have in OVN deployment? | 18:10 |
noonedeadpunk | ah, it's still non-production env. so like 5 computes in each AZ | 18:10 |
noonedeadpunk | consider a nice sandbox :) | 18:10 |
noonedeadpunk | 3 azs | 18:10 |
spatel | oh!! | 18:11 |
spatel | I am going to production with 200 computes :D | 18:11 |
noonedeadpunk | so I haven't went to a point of where I'd need to scale NB DB | 18:11 |
spatel | I thought you have story for me | 18:11 |
noonedeadpunk | well. I managed to break the raft cluster and it got completely stuck on me | 18:12 |
spatel | Really? | 18:12 |
noonedeadpunk | so one thing I learned - is that while it's "trivial" to restore state of neutron from mysql - it's not for octavia | 18:12 |
spatel | But i thought you can re-build raft and use neutron script to re-create all the flow | 18:12 |
noonedeadpunk | So realized that https://review.opendev.org/c/openstack/ovn-octavia-provider/+/925747 is required for it | 18:13 |
noonedeadpunk | yeah, sure, you can drop cluster and rebuild | 18:13 |
noonedeadpunk | but it won't include load balancers | 18:13 |
noonedeadpunk | as these are managed by octavia | 18:13 |
spatel | I am not planning to use octavia OVN lb | 18:14 |
spatel | I will use amphora driver | 18:14 |
spatel | Also you can take snapshot backup of OVN DB and restore right? | 18:15 |
noonedeadpunk | Um, don't really know | 18:16 |
spatel | My plan is to run k8s on new OVN based openstack and k8s has lots of routers and networking requirement so OVN will be good fit here | 18:16 |
spatel | If I use OVS then it will have lots of routers running in name space and make it mess to debug | 18:17 |
noonedeadpunk | well, if you're to use magnum with capi driver - ovn lb makes total sense | 18:17 |
noonedeadpunk | lol, debugging ovn not smaller mess :D | 18:17 |
spatel | Agreed | 18:17 |
mgariepy | i'm currently debuging some ovn issues :D | 18:17 |
spatel | lol... but world is moving toward OVN :) | 18:18 |
spatel | mgariepy what are those :) | 18:18 |
mgariepy | at first i tought that my vm were not getting the dns from the dhcp lease from ovn. | 18:18 |
mgariepy | but it turn out that for some reason the metadata is configuring the network as static. | 18:19 |
mgariepy | but only on a few select nodes.. like the 6 i newly deployed. | 18:19 |
spatel | Hmm! | 18:20 |
mgariepy | pretty much the same config beside the hostnames.. | 18:20 |
spatel | My deployment is all VLAN based provider so hope it will simplify flow entires. | 18:20 |
spatel | When you use lots of floating IPs etc.. that is where problem start in OVN | 18:21 |
mgariepy | define lots ? | 18:21 |
mgariepy | 1000s ? | 18:21 |
spatel | I have 200 computes | 18:21 |
spatel | If you do floating and lots of VPC etc.. that add more flow entries in control plane | 18:22 |
spatel | We run SRIOV and VLAN base tenants. (no floating IPs) | 18:23 |
spatel | in k8s case sure we will have floating IPs | 18:23 |
mgariepy | i'll have more nodes soon-ish | 18:27 |
spatel | If OVN is scary then why big companies adoption in their environment :) | 18:29 |
spatel | I think its just new toy and not enough doc around. | 18:30 |
spatel | I am running 22 nodes production cluster on OVN and its been 3 years. Not a single issue. | 18:30 |
mgariepy | ovn is getting push everywhere. and it works quite well from what i see. | 18:30 |
spatel | It running on Yoga release which is freaking old | 18:30 |
spatel | Recent versions are much stable compare to previous. | 18:31 |
mgariepy | currently on 2023.1 | 18:31 |
spatel | I am deploying 2024.1 on new deployment | 18:31 |
mgariepy | i need to upgrade to it :D | 18:32 |
noonedeadpunk | We actually finished upgrades from 2023.1 to 2024.1 | 18:33 |
noonedeadpunk | on 29.0.2, but a lot of things landed to 29.1.0 to be frank | 18:33 |
mgariepy | wow. i'm quite puzzeled.. how does the metadata service gives me static config on one node and dhcp on the otherone ? | 18:41 |
mgariepy | with something like : `curl http://169.254.169.254/openstack/2020-10-14/network_data.json` | 18:41 |
spatel | There is not DHCP agent or anything like that in OVN | 19:28 |
mgariepy | ovn does generate the packet | 19:29 |
mgariepy | dhcp works fine. | 19:29 |
noonedeadpunk | ++ | 19:30 |
mgariepy | it's the metadata-agent that does spit garbage to me ;) haha | 19:30 |
noonedeadpunk | I have seen some kind of native implementation in ovn for metadata as well... but I could be hallucinating as well | 19:30 |
mgariepy | not sure / haven't found the if that is reponsible to take the static path. | 19:30 |
mgariepy | i'm using what's in tree. | 19:31 |
mgariepy | https://github.com/openstack/neutron/tree/master/neutron/agent/ovn/metadata | 19:31 |
noonedeadpunk | yeah, I guess I am | 19:32 |
noonedeadpunk | I also have VPNaaS that kinda brings it's own mess | 19:32 |
mgariepy | https://paste.opendev.org/show/bAVYgLAFuiTtuYFum4NN/ | 19:40 |
mgariepy | same neutron config same nova config. same-ish ovs db dump on the server | 19:41 |
mgariepy | top one is dhcp the bottom one is static. | 19:42 |
noonedeadpunk | so what specifically are you confused about? | 19:52 |
noonedeadpunk | that entries are different? | 19:52 |
noonedeadpunk | as I guess that should instruct on how to provision, say, netplan | 19:52 |
noonedeadpunk | "type": "ipv4_dhcp" vs "type": "ipv4" should be the difference you explain | 19:53 |
noonedeadpunk | so you don't end up without networking if dhcp is disabled. which kinda makes sense.... | 19:54 |
mgariepy | it should be dhcp. | 19:57 |
mgariepy | but the metadata agent is not giving me the same network config. | 19:58 |
noonedeadpunk | aaaaah | 19:58 |
mgariepy | in the dhcp lease there is a dns server.. not in the metadata json file. so cloud init does generate dhcp or static ip depending on the value of this file. | 19:59 |
mgariepy | i just ton get why it's different | 20:00 |
noonedeadpunk | I somehow was thinking these are different networks | 20:01 |
mgariepy | nop | 20:01 |
mgariepy | same network | 20:01 |
mgariepy | :) | 20:01 |
noonedeadpunk | where one have dhcp enabled second is not | 20:01 |
noonedeadpunk | *or well, subnets( | 20:01 |
mgariepy | they both have | 20:01 |
mgariepy | same subnet also. | 20:01 |
noonedeadpunk | that's extremely weird otherwise... | 20:02 |
mgariepy | anyway i'll continue to poke it tomorrow i guess .. | 20:02 |
noonedeadpunk | could it be that subnet changed but was not synced somehow? | 20:02 |
mgariepy | if i set the interface to dhcp i get the lease. | 20:02 |
mgariepy | no | 20:03 |
mgariepy | the subnet was created last year. | 20:03 |
mgariepy | the only diff is that i deployed the new host a couple weeks ago. | 20:03 |
noonedeadpunk | well, dhcp you can enable/disable dynamically | 20:03 |
mgariepy | it's enabled | 20:04 |
noonedeadpunk | also - we had a https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/stable/2023.1/handlers/main.yml#L38-L51 which actually was never working | 20:05 |
mgariepy | https://blog.oddbit.com/post/2019-12-19-ovn-and-dhcp/ | 20:05 |
noonedeadpunk | so for upgrade it could be you're running old code | 20:05 |
noonedeadpunk | or well s/never/for quite a while/ | 20:05 |
noonedeadpunk | my guess it's not needed anymore though, so I've suggested to drop such handler as a whole lately... | 20:07 |
mgariepy | i deployed the first nodes with zed, upgraded to 2023.1 , a couple of other nodes in last april on 2023.1 theses works correcly and some a couple of weeks ago. | 20:07 |
noonedeadpunk | but that totally sounds like discrpency of where request ends up | 20:07 |
mgariepy | the first batch works the second also, but not the last one | 20:07 |
noonedeadpunk | like some obsoleted namespace or smth | 20:07 |
mgariepy | where ? the host is brand new.. rebooted it a couple of times also | 20:08 |
mgariepy | otherwise the network is working ok-ish i can connect to it a FIP | 20:08 |
noonedeadpunk | that is great question.... | 20:08 |
noonedeadpunk | hardly know how to trace rtraffic inside ovn | 20:09 |
mgariepy | read the flow tables ;p | 20:10 |
noonedeadpunk | yeah, and I can hardly read them | 20:10 |
mgariepy | the ovn-trace command to tests dhcp works correctly ;) | 20:10 |
mgariepy | i suspect an issue with neutron itself more than ovn. | 20:11 |
noonedeadpunk | but why would you test dhcp? | 20:11 |
noonedeadpunk | well, I've heard that discrepencies between neutron db and ovn db can grow and evolve with time | 20:11 |
mgariepy | i tought that was the part that wasn't working correclty haha | 20:11 |
noonedeadpunk | but I don't think in ovn dhcp has smth to do with formatting metadata? | 20:11 |
noonedeadpunk | I thought now it's only up to metadata agent? | 20:12 |
mgariepy | no but the metadata agent does some query to the ovsdb | 20:12 |
noonedeadpunk | well, if some metadata agent is stuck... or lost control over haproxy... | 20:12 |
noonedeadpunk | then if some would disable dhcp and re-enable, while 1 metadata was misbehaving | 20:13 |
noonedeadpunk | this could lead to the result you see... | 20:13 |
noonedeadpunk | but dunno | 20:13 |
mgariepy | curl to meta > ovn > hjaproxy > socket to the meta with magic headers | 20:13 |
mgariepy | why the nodes from last april works fine then ? | 20:14 |
mgariepy | computers should output only 1's and 0's... | 20:14 |
mgariepy | anyhow i will probably find the issue at some point... just need to poke at it a bit more. | 20:14 |
noonedeadpunk | well there were really a top of changes to neutron landed and backported... | 20:15 |
noonedeadpunk | but I'd guess you're running same neutron version anyway | 20:15 |
mgariepy | yeah | 20:15 |
mgariepy | same git sha everywhere haha | 20:15 |
noonedeadpunk | yeah | 20:15 |
noonedeadpunk | but ovn is not pinned that way fwiw | 20:16 |
mgariepy | i dumped a lot of DB from ovn/ovs and didn't found much | 20:16 |
noonedeadpunk | but it's uca which is stable kinda (would be more concerned with rdo) | 20:16 |
mgariepy | and i have the same ovn vsersion installed. | 20:16 |
noonedeadpunk | I somehow don't think it;s about ovn itself | 20:16 |
mgariepy | i dont think either | 20:17 |
noonedeadpunk | but what responds for each VM at the end | 20:17 |
noonedeadpunk | as ovn just passes trqaffic through | 20:17 |
noonedeadpunk | it has nothing to do with content | 20:17 |
mgariepy | neutron-ovn-metadata-agent | 20:17 |
noonedeadpunk | so maybe you can try to ps | grep or smth to find where are namespaces / haproxy serving the network? | 20:18 |
mgariepy | whitch is accessed via a socket on the hypervisor from the hapropxy service | 20:18 |
noonedeadpunk | duno | 20:18 |
noonedeadpunk | but if neutron is updated, and it's restarted, and it lost haproxy it's managing... | 20:18 |
noonedeadpunk | could be that some agents just serve stale data | 20:19 |
noonedeadpunk | but again that;s only a thing if at some point subnet wasw without dhcp enabled | 20:19 |
mgariepy | everyting restart on meta service restart. | 20:20 |
noonedeadpunk | bad that neutron doesn't store any action history for anything | 20:20 |
noonedeadpunk | actually - it should not be restarted | 20:20 |
mgariepy | it was with dhcp at the start and has been for months if not more than a year | 20:20 |
mgariepy | anyway when i find the issue i'll share my findings. | 20:21 |
noonedeadpunk | I think when you restart metadata agent - haproxy services are likely remain to run | 20:21 |
noonedeadpunk | but again - not 100%^ sure | 20:21 |
mgariepy | nah they restart | 20:22 |
mgariepy | ps aux all the services are ok | 20:22 |
noonedeadpunk | (at least that was the case with l3 agent on ovs and keepalived due to https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/stable/2023.1/vars/main.yml#L538) | 20:22 |
noonedeadpunk | yeah, pls share- that's very interesting | 20:23 |
mgariepy | yep | 20:23 |
mgariepy | a real bug that's fun doesn't occurs too often theses days ;) | 20:24 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!