14:00:54 #startmeeting neutron_drivers 14:00:54 Meeting started Fri May 30 14:00:54 2025 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:54 The meeting name has been set to 'neutron_drivers' 14:00:59 o/ 14:01:02 Ping list: ykarel, mlavalle, mtomaska, slaweq, obondarev, tobias-urdin, lajoskatona, haleyb, ralonsoh 14:01:06 hello! 14:01:10 o/ 14:01:11 o/ 14:01:14 o/ 14:01:24 o/ 14:02:26 we can get started, others might show up late, but we have a number of topics 14:02:39 jlibosva: are you coming? 14:02:43 yes :) hi :) 14:02:50 ah, perfect 14:02:56 first is this one 14:03:00 #link https://bugs.launchpad.net/neutron/+bug/2111276 14:03:06 [RFE] Integrate OVN BGP capabilities into Neutron OVN driver 14:03:26 i know you added it, but fnordahl also has an interest 14:03:40 o/ 14:03:50 i'll let either or both of you talk about it while i read the latest update 14:04:51 right, so basically the idea would be to replace the ovn-bgp-agent - either by implementing the same like right now I have some PoC where the public switches connecting to the provider network connect to some underlaying BGP routers instead of local port. That would require no Neutron API changes 14:05:28 or something smarter using existing API like networking-bgpvpn if there is interest and use case for it 14:05:48 +1 for reusing existing APIs or extending them 14:06:01 (or no new API :P) 14:06:41 the first approach would likely be a knob in Neutron conf and then Neutron would manage the BGP resources similarly to what ovn-bgp-agent does today 14:07:07 and FIPs, directly connected ports to provider networks and LBs would get automatically advertised their routes 14:07:10 if EVPN is in scope (I certainly hope it would be), it would be necessary to be able to associate the provider network with L2VNI and/or L3VNI values 14:07:41 EVPN is in mind - but currently the work in core OVN is at very early stages but we would be interested in it too :) 14:08:09 tore: I linked the OVN Community A/V meeting notes which has link to spec on the bug 14:08:24 but for this particular RFE it is out of the scope but we definitely keep it in mind so once we get the support we can continue with the work 14:08:52 fnordahl: I saw, thanks 14:09:21 and I assume a neutron-spec with details would be required :) 14:09:43 fnordahl: i have not read through all the docs you linked yet, but does this match with what you were thinking? 14:10:08 +1 on spec and we have roadmap time to contribute to such a spec 14:10:36 a couple of questions 14:10:41 is there something than have to happen first in core OVN ? 14:11:07 thanks fnordahl for the comments on the LP - just to make it more clear - the BGP protocol itself would be managed by external tool - like FRR 14:11:33 lajoskatona1: I have a working PoC - manually configured running with Neutron and core OVN advertising FIPs 14:11:43 I am not very familiar with the Neutron or OVN codebases, so I don't know how much I could contribude development-wise (would perhaps need a mentor), but I am very happy to provide an operator's perspective on how this feature could be implemented in a sensible and useful manner 14:11:46 so, so far it seems it has everything we need :) 14:11:53 jlibosva: ack, thanks 14:11:53 jlibosva, you said there is not API change. Do we need DB changes? Any agent (we do, I think)? 14:12:38 and would this be implemented as a service plugin? So anyone can enable/disable it? 14:12:39 jlibosva: "yes" to your question about the external tool like FRR 14:12:39 ralonsoh: The changes will be mostly creating the underlaying topology and avoiding using localnet ports. Now MySQL DB changes or agents 14:12:56 and what about migration from existing ovn-bgp-agent - will that require any extra steps or will be as easy as stop old agent and run new one with proper config/extension? 14:13:25 ralonsoh: that's a great question about where to put it - it would make sense to separate it from the mech driver 14:13:35 cool 14:14:06 slaweq: probably a migration path would be needed if required. the agent configures the host networking so we would need some cleanup 14:14:58 There are multiple approaches to this, it could either be a light touch (from Neutron's POV) implementation where we only do the changes needed for it to work with the features provided by OVN, and then have downstream deployment tooling do the lifting of BGP agents and such, or Neutron could take more. I think this needs to be ironed out at spec creation time 14:15:17 jlibosva will such migration tool be part of this RFE? 14:16:11 fnordahl: I think the BGP speaker configuration and such should be done by some deployment tooling 14:16:18 +1 14:17:02 slaweq: I don't think it will in my opinion 14:18:15 are there any other questions? should we vote on this? 14:18:37 * haleyb is assuming jlibosva and fnordahl will work together on the spec 14:18:53 I would be happy to 14:19:02 I might include other people in my team too , and yes 14:19:10 No questions from me, big +1 for spec 14:19:19 +1 (with spec, of course). I would like to have this RFE 14:19:20 great, thanks 14:19:24 +1 14:19:34 +1 for me to move forward with spec 14:19:36 +1 14:20:07 slaweq: ? 14:20:35 ahh sorry 14:20:36 +1 14:20:48 even if migration will be done separately 14:21:06 great, that's consensus, i'll approve the rfe and looking forward to the spec 14:21:26 thanks for the questions and for the interest :) 14:21:31 \o/ 14:22:00 we can move onto next rfe 14:22:05 #link https://bugs.launchpad.net/neutron/+bug/2111899 14:22:11 [RFE] Use stateless NAT rules for FIPs 14:22:32 yeah, this is a feature already implemented 14:22:35 and the reverted 14:22:53 it was reverted because the expected benefit with HW offladed envs didn't happen 14:23:09 because the dnat rule flows where not offloaded 14:23:27 but this feature can "bring joy" (and good performance) to DPDK envs 14:23:41 because we skip the conntrack module (that is always a pain with DPDK) 14:23:51 so I would like to re-propose it but this time configurable 14:24:01 questions? 14:24:33 all clear 14:24:34 i see from the original change stateless fips were added to OVN 20.03 so that's good 14:24:42 did the hw offloaded environments experience a performance degradation, or was it just same performance as before? 14:24:44 yes, very old core OVN 14:25:05 tore, performance degradation because these flows in particular were not offloaded 14:25:25 (don't ask me for more details in this, I would need to request them) 14:25:45 understood, then it makes sense to me why it was reverted 14:25:47 but for DPDK, for sure, this is a performance boost, as long as the FIPs rules do not need conntrack 14:26:16 +1 for configurable 14:27:44 ralonsoh: so is the offloading of dnat particular to the nic involved? i.e. they didn't exist when this originally merged? but do now? 14:27:55 I agree, we can add it back as configurable option 14:28:22 haleyb, at least, as far a I know, the tests done with this feature were not good 14:28:26 makes perfect sense for me to have 1:1 NAT rules be stateless, so +1 from me for being able to activate this 14:28:43 and actually they initially implemented that for HW offloaded envs 14:30:06 ralonsoh: ack. i think i'm fine with the change and config option, but just wanted to make sure we somehow explain where it's expected to work or not work, maybe the release note would be enough? 14:30:35 for sure I'll add a release note explaining the rationale of this implementation 14:30:42 since it is clear as mud to me how to get this improvement other than trying it out and doing perf testing 14:30:43 and where it could be benefitial 14:31:12 of course, anyone will be able to enable/disable (restaring the API) this feature and restest again 14:31:34 I'll add a maintenance method to change the existing FIPs to the configured value 14:31:48 so it will be just a matter of changing the parameter and test 14:34:20 ralonsoh: and since you did mention a performance degradation initially should include that wording so it's not treated as a silver bullet 14:34:47 anyways, let's vote 14:34:51 sure, I'll add where this is actually a benefit (DPDK) and why 14:35:00 perfect 14:35:17 +1 14:35:32 +1 from me 14:35:57 +1 14:36:36 +1 14:37:39 ok, i believe that is quorum (4) 14:38:17 i will approve and add a link to discussion 14:38:27 thanks folks 14:38:36 (please, skip my next point in the agenda) 14:38:52 ralonsoh: ack, can discuss in a future meeting 14:39:04 sure 14:39:27 next is a bug created by tore so glad he is here 14:39:31 #link https://bugs.launchpad.net/neutron/+bug/2111891 14:39:37 IPv6 Subnet-Router anycast address inappropriately used for gateway_ip when specifying --subnet-range 14:40:02 thanks for the invite, haleyb :) 14:40:03 slaweq mentioned making this change is a change in the API behaviour 14:41:20 so to my mind it makes no sense that the behaviour differs from when using --subnet-range or not, that seems very inconsistent, and just that alone is imho a bug 14:42:12 for what it is worth, for ipv4 "address one" is always used, both with or without --subnet-range being specified 14:42:35 tore: so my IPv6 knowledge in this area is a little rusty, but if that address is configured in the neutron router it cannot be used? not sure if i read that correctly 14:43:01 i thought all the routers could use the anycast address 14:43:11 haleyb: it cannot be used by VMs that are routers themselves, i.e., has enabled IPv6 forwarding. which I believe dockerd and such does automatically 14:43:14 but i do agree it's odd this one case if different from the others 14:43:33 ah, ok, vm deployed as a router 14:43:57 haleyb: if the VM enables forwarding sysctl (is a router), it is required by the spec to consider the subnet-router "address zero" as a *local* address 14:44:20 meaning it is unusable as an ipv6 default gateway, since it is a next-hop pointing to itself essentially 14:44:32 for this this proposal sounds reasonable, I just wanted to have it discussed here as it will change API which we have now 14:44:55 tore: understood, thanks for the explanation 14:45:45 note it will not be a problem if using ipv6_ra_mode=something. because then the default gateway is a fe80::x link-local address learned from RA, and the gateway_ip address isn't used for anything 14:46:37 but if ipv6_ra_mode is left at default "none" (another bad default imho, but that's a different issue), the VM would need to get the default route from metadata service or manual config, and in that case gateway_ip will be used as next-hop 14:47:16 …and break the moment VM enables forwarding sysctl 14:48:04 slaweq: i can understand the API change, but it seems like as it's implemented it creates a broken object 14:48:21 i'm curious what the API says about the address used, if anything 14:49:32 "If the gateway_ip is not specified, OpenStack Networking allocates an address from the CIDR for the gateway for the subnet by default" 14:50:07 Ok, it took me a while to find it 14:50:07 https://review.opendev.org/q/I3f2905c2c4fca02406dfa3c801c166c14389ba41 14:50:14 so we don't actually say, although we do typically pick .1/::1 14:51:44 where in the RFC is this issue commented? 14:51:46 there is one exception to this problem, which is /127 p2p subnets where the subnet-router anycast address is disabled, cf. https://datatracker.ietf.org/doc/html/rfc6164. so for /127s "address zero" would be the appropriate choice 14:51:54 ralonsoh: thanks for finding that, i even +2'd it 14:51:56 (same thing for ipv4 /31s I guess) 14:53:07 ralonsoh: it just says you can use it - "This anycast address is syntactically 14:53:07 the same as a unicast address for an interface on the link with the 14:53:07 interface identifier set to zero." 14:53:45 it doesn't say you *should* use it, and maybe in practice it's not a good idea 14:54:12 https://blog.apnic.net/2023/08/28/behavioural-differences-of-ipv6-subnet-router-anycast-address-implementations/ Ctrl-F strange behaviour 14:54:44 but yes, we clearly single out IPv6 14:54:45 https://review.opendev.org/c/openstack/neutron/+/647484/10/neutron/db/ipam_backend_mixin.py 14:55:38 tore: and obviously, you can specify the ::1 address and it all works 14:56:34 yep, so this is about default behaviour (and the inconsistency between using --subnet-range and not) 14:58:42 it also seems conceivable that on a multi-node network where >0 VMs have forwarding enabled, this could be problematic for other VMs even with forwarding disabled, because the router VM(s) might answer Neighbour Solicitations for that address, effectively hijacking traffic 14:59:02 haven't tested this though. might be portsec will prevent it 15:00:45 i don't know the best thing to do here, seems you've found a corner case, but it does change the API. I would be inclined to make it consistent saying you can still choose :: manually, but have reservations 15:01:05 (sorry, have to logoff) 15:01:42 I think we can document it 15:01:43 i have a hard stop as well, any other opinions? 15:01:52 agreed, would be fine to explicitly request the ::0 address, but it's not a good default imho, regardless of --subnet-range being used 15:01:56 the user can always defined the gw-ip when creating the subnet 15:02:15 at least let's document it 15:02:32 ralonsoh: change and document? or just document it could be an issue? 15:02:47 document that this could be an issue 15:02:52 I think we have a section for ipv6 15:02:59 (somewhere...) 15:03:12 https://docs.openstack.org/neutron/latest/admin/config-ipv6.html 15:03:22 I beleive this one 15:03:27 correct 15:05:16 so i don't think we'll have consensus on changing to use ::1, so documenting would be next best thing 15:05:39 I'm most concerned about the user behaviour here. users clicking through the web ui or using "openstack subnet create" shouldn't be handed a loaded gun pointing at themselves, imho 15:05:44 i'm not a fan of creating a broken config by default though 15:06:24 there's also the issue of the unexplained inconsistency between not using --subnet-range and using it. if you want to clean up that inconsistency, you'll need to change something anyway... 15:06:34 IMHO we can fix it in master, maybe not backporting to stable branches though 15:06:46 we can maybe continue discussion in the bug or next week, but i have to go to kids school presentation 15:06:53 good idea 15:06:58 and highlight in the release notes that default behaviour has changed since this release 15:07:03 I would be fine with slaweq's suggestion 15:07:13 +1 to slaweq 15:07:38 i really have to go so will end meeting 15:07:40 #endmeeting