f0o | Hi, does openstack-ansible handle migration of network_interface_mappings? I would like to test out distributed FIPs but for that I need to move the Gateway Hosts' ports from the internal cross connect to the switch MLAG interface... I know this will cause downtime but wonder what posisble manual steps are required to move the OVN ports around without neutron going nuts | 08:13 |
---|---|---|
noonedeadpunk | hey | 08:13 |
noonedeadpunk | ok, so you're using OVN? As I think in this case it might be even no downtime | 08:15 |
noonedeadpunk | So what would happen, is that the new mapped interface will be added to the bridge, but the old one will not be removed | 08:16 |
noonedeadpunk | so pretty much you'd need to worry about mainly of not making a loop with that | 08:16 |
f0o | oh thats neat | 08:17 |
f0o | it *shouldnt* loop but I can prepare for that by using packet filters on the crossconnect | 08:17 |
f0o | guess I'll cause some havoc todau | 08:18 |
noonedeadpunk | that's the task which ensures that behaviour: https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/tasks/providers/setup_ovs_ovn.yml#L79-L92 | 08:19 |
noonedeadpunk | frasnkly I don't remember about any pre-caution measures and logic before this task.... But I think it should be always executed... | 08:20 |
noonedeadpunk | except might be broken with some tags, but not sure | 08:20 |
f0o | I'll just limit the playbook to the standby gateway node and make sure they have packet filters on the crossconnect; then just perform a failover and do the same on the other side and finally run distributed fip on all compute nodes | 08:21 |
f0o | that sounds rather safe | 08:22 |
noonedeadpunk | so you have st6andalone gateway nodes which are not computes? | 08:22 |
f0o | yep | 08:22 |
f0o | and now I'm hitting conntrack issues haha | 08:23 |
noonedeadpunk | are you planning to leave them or for cost efficiency get rid of them?> | 08:23 |
noonedeadpunk | oh? | 08:23 |
f0o | I will leave them as they run edge-bgp as well | 08:23 |
f0o | the architecture didnt account for OVN using kernel conntrack | 08:23 |
noonedeadpunk | ah, so you use ovn-bgp-agent? or old dragent? | 08:23 |
f0o | neither | 08:23 |
noonedeadpunk | aha | 08:23 |
noonedeadpunk | ok then :D | 08:23 |
noonedeadpunk | as yeah, was surprised about conntrack before you got bgp involved | 08:24 |
f0o | gateway nodes are the actual core routers that sit on all transit/ix'es/pnis/... | 08:24 |
noonedeadpunk | aha | 08:24 |
f0o | we assumed OVN would just do p2p vteps and then offload the packets on the gateway nodes, which was sort of correct but it also utilizes conntrack for it which now blows up at scale | 08:24 |
f0o | so we're going to see if distributed fip solves it. if not, we get to figure out how to configure ovn-bgp | 08:25 |
noonedeadpunk | ok, that is super interesting actually | 08:26 |
noonedeadpunk | ' | 08:26 |
noonedeadpunk | yeah, from my experience ovn-bgp is a little bit too edgy right now in terms of code maturity | 08:27 |
noonedeadpunk | but we made it work after all | 08:27 |
f0o | conntrac on primary is at 3368855 and on the secondary 824167 | 08:27 |
noonedeadpunk | though we actually using gateway nodes as just "gateways" - which have connectivity to public vlans | 08:28 |
noonedeadpunk | so it seems that what you see with conntrack is extremely relevant for our setup | 08:28 |
f0o | if you use distributed fip our understanding is that each compute node will handle their own conntrack since you span the public vlan to all nodes | 08:29 |
f0o | the drawback is that you are spanning the whole vlan across all nodes... which is really not ideal at scale either | 08:29 |
noonedeadpunk | yes, that was exactly the issue that we don't want to stretch vlan | 08:30 |
f0o | honesly same here | 08:30 |
noonedeadpunk | as public network is shared between multiple datacenters | 08:30 |
noonedeadpunk | also, I was able to convert an environment from normal ovn to ovn-bgp with almost no downtime | 08:31 |
noonedeadpunk | though keep in mind, it was only a "test" one, with limited amount of workloads | 08:31 |
noonedeadpunk | so no idea how it will run at scale | 08:32 |
noonedeadpunk | so such migration is totally doable fwiw | 08:32 |
noonedeadpunk | was a matter of getting an absolutely working confiuguratiuon and running an os-neutron-install playbook | 08:33 |
f0o | how does that work, do the gateway nodes talk to the compute nodes? and the gateway nodes then dump it on the wire? | 08:34 |
noonedeadpunk | the biggest hassle was to get a "correct" version of agent/neutron... | 08:34 |
f0o | or do all nodes (gateway+compute) talk to a core-router? | 08:34 |
f0o | reason I ask is the gateway and my core-router are one and the same. Running two bgpd processes on the same box will likely conflict | 08:34 |
f0o | let alone figuring out how ovn-bgp handles VRFs in case they do need to run on the same box | 08:35 |
noonedeadpunk | so you get a new neutron-ovn-bgp-agent service. it listens to event on NB DB and do execute "actions" based on the event | 08:35 |
noonedeadpunk | it also leverages/requires FRR | 08:35 |
f0o | is this inside an lxc or on the host? | 08:35 |
noonedeadpunk | Where you run it - depends on the setup. So in our case with no distributed FIPs and standalone gateway nodes - we need to run it only on gateway nodes | 08:36 |
noonedeadpunk | if you do distributed fips - also on computes | 08:36 |
noonedeadpunk | it is on the host, as agent needs to mess up and inject flows into OVS to eject traffic from it to kernel networking | 08:37 |
f0o | how does this work traffic flow wise? because it sounds almost identical to how non-bgp non-distributed-fips work | 08:37 |
noonedeadpunk | so it depends on exposure_method. | 08:38 |
f0o | if only the gateway nodes speak BGP to presumably the core-router, that only eliminates the need to span the vlan to all gateway nodes. But the gateway node <> compute would still use conntrack then just like now, no? | 08:38 |
noonedeadpunk | in case of "ovn" - ypu need an extrea ovn "cluster" per node | 08:39 |
noonedeadpunk | in case of "underlay" - yes, it will be pretty much the same | 08:39 |
f0o | I guess to fully remove conntrack here, or at very least offload as much as possible, I'd need the compute nodes to speak BGP to the core-routers for their VM's FIPs. | 08:39 |
noonedeadpunk | in case of "vrf" - agent will maintain and split networks into multiple VRFs as well | 08:40 |
noonedeadpunk | yeah, I think so. But then probably it makes sense also to use computes as gateway hosts? | 08:40 |
f0o | probably! | 08:41 |
noonedeadpunk | as why I started asking, I was trying to understand if there's a good use-case to have distributed fips, but also standalone gateway nodes | 08:41 |
noonedeadpunk | as if you have to add vlan to all computes.... | 08:42 |
f0o | I guess the use-case is to stop fires from spreading haha | 08:42 |
noonedeadpunk | then why waste rack space for gateway nodes kinda | 08:42 |
f0o | that is correct | 08:42 |
noonedeadpunk | but when they act as leaves or core routeres - then yeah | 08:42 |
f0o | I will look into how to hook up the compute nodes (and/or collapsed gateway nodes) to the core-routers with BGP so each node just announces their respective FIPs/NATs | 08:43 |
noonedeadpunk | so that is what ovn-bgp-agent does kinda | 08:43 |
noonedeadpunk | so it takes a port binding from ovn nb and add a respective route to the vrf where it ejects trasffic from ovs | 08:43 |
noonedeadpunk | and this route is being picked up by frr | 08:44 |
noonedeadpunk | (unless it's "ovn" exposuure method I guess - I just never looked into it in details) | 08:44 |
noonedeadpunk | and then a bgp session is established for each compute with announcements of own fips | 08:45 |
noonedeadpunk | but it does work pretty much same way witrhout BGP tbh :D | 08:45 |
noonedeadpunk | only you need vlans | 08:45 |
noonedeadpunk | so it's kinda about - where you want to build a complexity | 08:46 |
f0o | well for BGP sessions I can reuse the management vlan | 08:46 |
f0o | that's stretched everywhere anyways | 08:46 |
f0o | same as the vxlan vlan | 08:46 |
f0o | so there are ways to aggregate/reuse things there | 08:46 |
noonedeadpunk | one big downside of ovn-bgp-agent so far, is that ovn knows nothing about it's existance. so in case of frr or the agent dies for some reason - networks/fips are just become unreachable | 08:47 |
f0o | I wonder | 08:47 |
f0o | can I run a half ovn-bgp-agent? | 08:47 |
noonedeadpunk | which half?:) | 08:47 |
f0o | because what you say and what the docs suggest is that you have ovn create route entries into the kernel for the FIPs | 08:47 |
f0o | this would entirely bypass conntrack | 08:47 |
f0o | my current gateway nodes can pick up those entries already | 08:48 |
f0o | so can I skip FRR? | 08:48 |
noonedeadpunk | I don't think you can | 08:48 |
noonedeadpunk | As it also tries to inject things into frr from time to time | 08:48 |
noonedeadpunk | so in case vtysh is not there - it will just crash | 08:49 |
f0o | shame | 08:49 |
noonedeadpunk | you can probably have some kind of "fake" frr running... | 08:49 |
noonedeadpunk | which does have only noop neighbour or none at all | 08:50 |
f0o | the loops and hoops one must go through to remove conntrack | 08:50 |
noonedeadpunk | as in case of underlay exposure method - I don't think it injects there smth very meaningful | 08:50 |
f0o | and this doesnt even guarantee that conntrack is actually removed, I'm sure the computenodes just have them then | 08:51 |
noonedeadpunk | I don't think it's removed tbh | 08:51 |
f0o | so ultimately this is just to remove the L2 for a L3 | 08:51 |
noonedeadpunk | as I was able to break networks with a missing iptables rules for FORWARD chain | 08:51 |
noonedeadpunk | yeah | 08:51 |
*** kleini_ is now known as kleini | 08:53 | |
noonedeadpunk | and tbh, I'd prefer having just l2 setup... but it's me :) | 08:54 |
noonedeadpunk | as I don't need to deal with network side of things with it - it';s on other teams then :D | 08:55 |
f0o | yeah I think I'm doing a full 180 to 360 here because I'm back at distributed fip and byting the bullet of stretching vlans... However I can use the gateway nodes to segment it a bit... give each rack a /24 or so and only care for a vlan per rack and then the gateway nodes which already speak bgp just do their thing as they are now | 08:56 |
f0o | in an ideal world, I could dump routes from OVN NB like ovn-bgp-agent does onto the vxlan bridge of the gateway node and use distribute connected in my gateway node's bgpd. This would be the half-ovn-bgp-agent haha | 08:57 |
f0o | noonedeadpunk: https://github.com/search?q=repo%3Aopenstack%2Fovn-bgp-agent%20run_vtysh_command&type=code do I read this right that vtysh is solely used to obtain the router_id? | 09:01 |
f0o | I can 100% mock this | 09:01 |
f0o | oh I see it also changes FRR's config every so often | 09:03 |
f0o | hrm maybe I just fork this and make it vty-less | 09:03 |
f0o | HereBeDragons | 09:03 |
noonedeadpunk | Yeah, it uses frr for more then fetcvhingt the id for sure.... | 09:16 |
noonedeadpunk | but again, I guess you could jsut to have frr that does mainly nothing? | 09:16 |
f0o | https://github.com/openstack/ovn-bgp-agent/blob/master/ovn_bgp_agent/drivers/openstack/ovn_bgp_driver.py only utilizes vtysh for a quick config sanity check which I can just patch out with a config flag for instance | 09:17 |
noonedeadpunk | as if you have no peers - Ithink it will still likely to start and run | 09:17 |
f0o | I need to draw this up and see what I'm actually trying to fix and which driver is the one I need to meddle with | 09:18 |
noonedeadpunk | but it would very depend on exposure method tbh. as in case of vrf it's also being used to handle bgp evpn between computes | 09:19 |
noonedeadpunk | so that traffic would flow directly if it's "inside" of the public network | 09:19 |
f0o | in an ideal world, I'd have my ToR-routers listen for any bgp peers on the vxlan subnet. The compute nodes would then peer with them and announce their respective FIPs. From what I understand this is the stretched-l2 bgp driver? | 09:22 |
noonedeadpunk | um, probably?:) | 09:22 |
f0o | I'm going back and forth between the docs on the drivers and they all start to sound the same xD | 09:23 |
noonedeadpunk | Just skip the SB DB driver | 09:23 |
noonedeadpunk | and all relatged to it | 09:23 |
noonedeadpunk | I think mainly maintained is NB DB, but also keep in mind there's another level of complexity in each driver, which is "exposure_method" | 09:24 |
noonedeadpunk | and it changes the driver behaviour really dramaticaly | 09:24 |
noonedeadpunk | but keeping the purpose of the driver the same' | 09:24 |
f0o | what a rabbit hole | 09:25 |
jrosser | it does surprise me just how complicated this ovn stuff all is | 11:33 |
noonedeadpunk | well... I think it depends. If take more or less regular ovs setup - ovn seems like a way better choice in general. And somehow simpler one. Until it works | 11:49 |
noonedeadpunk | If it dopesn't and you need to dig in flows - that's /o\ | 11:50 |
noonedeadpunk | and comparing to lxb that all is a mess | 11:50 |
noonedeadpunk | I still don't get why 80% of deployments need anything but just lxb (if it was supported) | 11:51 |
mgariepy | flows are a lot harder to parse by dumping them compared to plain iptables stuff ;) | 12:27 |
f0o | ^ 100% | 13:44 |
f0o | I've managed to drop my conntrack from 4.5M to 800K by making all timeouts 3s and remove retrans. This seemingly had no effect on the network stability - but it ofc now requires a much more stable upstream network | 13:46 |
f0o | however, while conntrack is now lower it did not ultimately increased network performance by any notable margin. I think the main reason is still the immense amount of context switching ovs-switchd performs here to move packets around and 32 dedicated cores just arent enough to surpass 12G | 13:53 |
f0o | I hope moving forward with distributed fips would offload some of that load onto the compute nodes resulting in a higher BW capacity | 13:54 |
f0o | even if it seems less efficient at a glance and having to deal with the stretched l2 pains | 13:55 |
f0o | I honestly wonder how operators at scale solved this congestion | 13:56 |
f0o | ovn-bgp-agent seems to just solve the L2 part by swapping to L3 but the packet pushing load of ovs-vswitchd and conntrack needs are still there, just at a different level | 13:57 |
f0o | how do "they" cope with the resource steal on the compute nodes then? | 13:57 |
f0o | s/cope/deal/g | 13:58 |
noonedeadpunk | yeah, that is actually a great question indeed | 14:06 |
noonedeadpunk | from what I got talking to RH folks lately about ovn - they very rarely do have standalone net nodes | 14:06 |
noonedeadpunk | mostly it's spread between all computes | 14:06 |
noonedeadpunk | so I guess that distributed fip is likely an answer indeed | 14:07 |
noonedeadpunk | and also all computes serving as gateway hosts | 14:07 |
noonedeadpunk | #startmeeting openstack_ansible_meeting | 15:01 |
opendevmeet | Meeting started Tue Mar 25 15:01:48 2025 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:01 |
opendevmeet | The meeting name has been set to 'openstack_ansible_meeting' | 15:01 |
noonedeadpunk | #topic rollcall | 15:01 |
noonedeadpunk | o/ | 15:01 |
NeilHanlon | o/ was just about to poke you! :D | 15:01 |
jrosser | o/ hello | 15:01 |
noonedeadpunk | sorry for being a bit late :) | 15:02 |
NeilHanlon | i only just got here 3 seconds before, so... ;) | 15:02 |
noonedeadpunk | #topic office hours | 15:04 |
noonedeadpunk | so, I think main question for today is PTG and it's planned time? | 15:05 |
noonedeadpunk | given that we're still lacking a good agenda for it... and last years participation was not high... | 15:05 |
noonedeadpunk | I'd guess we should be fine with 2h timeslot? | 15:05 |
jrosser | we will be missing andrew this time i think | 15:06 |
NeilHanlon | 2h sounds good, i'll make it :) | 15:06 |
noonedeadpunk | 15:00 - 17:00 UTC? | 15:07 |
noonedeadpunk | Tuesday? | 15:07 |
noonedeadpunk | or 16-17 to reduce conflicts a little bit? | 15:08 |
* jrosser tries to work out even which dates this is | 15:08 | |
noonedeadpunk | April 7-11 | 15:09 |
noonedeadpunk | So April 8 | 15:09 |
noonedeadpunk | or well. we can do this April 7 on Monday - these slots are very empty there | 15:09 |
noonedeadpunk | #link https://ptg.opendev.org/ptg.html | 15:10 |
noonedeadpunk | I'm just thinking that eventlet one might be an interesting for sure | 15:10 |
jrosser | 8th is difficult for me | 15:10 |
noonedeadpunk | should I create some kind of poll and ask for some votes through ML? | 15:11 |
noonedeadpunk | (ie doodle or smth) | 15:12 |
noonedeadpunk | or what would fit you best ? | 15:16 |
jrosser | 9th is pretty clear | 15:17 |
NeilHanlon | whatever works for yall i can make work | 15:17 |
noonedeadpunk | 15 - 17? | 15:17 |
NeilHanlon | +1 here | 15:18 |
jrosser | yeah thats ok | 15:18 |
noonedeadpunk | I'm fine with 9th | 15:18 |
noonedeadpunk | ok, agreed then | 15:18 |
noonedeadpunk | I will try to come up with some kind of agenda this week | 15:19 |
noonedeadpunk | and book a timeslot :) | 15:19 |
noonedeadpunk | anything else to discuss now? | 15:20 |
jrosser | only that we need extra attention on code review for the next few months | 15:20 |
noonedeadpunk | yes | 15:20 |
jrosser | andrewbonney is on paternity leave for ~4mo and has been very active reviewing | 15:21 |
noonedeadpunk | and some reviews are highly appreciated | 15:21 |
noonedeadpunk | especially we're closer and closer to the release date | 15:21 |
jrosser | so we have to make up the effort from elsewhere during his absence | 15:21 |
noonedeadpunk | and our review dashbopard can help to figure out outstanding things quite quickly | 15:22 |
noonedeadpunk | #link https://openinfra.org/cla/ | 15:22 |
noonedeadpunk | #link http://bit.ly/osa-review-board-v5 | 15:22 |
noonedeadpunk | that is the correct one ^ | 15:22 |
noonedeadpunk | ignore CLA :D | 15:22 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-ops master: Add a collection for managing encryption of secret data https://review.opendev.org/c/openstack/openstack-ansible-ops/+/943866 | 15:24 |
noonedeadpunk | also - once https://review.opendev.org/c/openstack/openstack-ansible/+/945115 lands - I want to issue a beta release | 15:25 |
NeilHanlon | awesome :) | 15:25 |
NeilHanlon | I will pick up some slack for Andrew | 15:25 |
noonedeadpunk | (or well, I'd need to push another patch for beta to freeze roles) | 15:26 |
noonedeadpunk | (and make beta out of it) | 15:26 |
noonedeadpunk | but https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/945089 needs to land to create such freeze | 15:27 |
noonedeadpunk | given that we had quite slow-paced cycle - hopefully we'd be able to release early at least this time... | 15:27 |
noonedeadpunk | but yes - please, go through the review dashboard once in a while :) | 15:30 |
noonedeadpunk | there are couple of nice changes lying around, fwiw | 15:32 |
noonedeadpunk | and https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/942581 is a one with potential side-effects | 15:33 |
noonedeadpunk | (or well, there's a whole topic around this one) | 15:36 |
NeilHanlon | :) | 15:39 |
noonedeadpunk | I guess I'm not sure why https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/942783/ is failing though... | 15:41 |
noonedeadpunk | oh, well | 15:42 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Switch volume catalog_type to block-storage https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/945476 | 15:45 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_cinder master: Disable v3 endpoints by default https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/942783 | 15:46 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Align on cinder service naming https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/942582 | 15:47 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: Switch volume catalog_type to block-storage https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/945477 | 15:49 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_cinder master: Disable v3 endpoints by default https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/942783 | 15:50 |
noonedeadpunk | #endmeeting | 15:53 |
opendevmeet | Meeting ended Tue Mar 25 15:53:59 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:53 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2025/openstack_ansible_meeting.2025-03-25-15.01.html | 15:53 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2025/openstack_ansible_meeting.2025-03-25-15.01.txt | 15:53 |
opendevmeet | Log: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2025/openstack_ansible_meeting.2025-03-25-15.01.log.html | 15:53 |
noonedeadpunk | fwiw: https://openinfra.org/blog/openinfra-summit-2025 | 16:24 |
noonedeadpunk | next openinfra summit EU is in _Paris in October | 16:24 |
admin1 | will be there :) | 18:12 |
noonedeadpunk | I plan to go there as well | 18:24 |
noonedeadpunk | but will see | 18:24 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!