14:01:15 <ralonsoh> #startmeeting neutron_drivers 14:01:15 <opendevmeet> Meeting started Fri Jul 21 14:01:15 2023 UTC and is due to finish in 60 minutes. The chair is ralonsoh. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:15 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:15 <opendevmeet> The meeting name has been set to 'neutron_drivers' 14:01:17 <felixhuettner[m]> o/ 14:01:18 <ralonsoh> hello all 14:01:21 <obondarev> hi 14:01:26 <slaweq> hi 14:01:30 <racosta> hi 14:01:37 <haleyb> hi 14:01:53 <ralonsoh> this is the agenda of the meeting: https://wiki.openstack.org/wiki/Meetings/NeutronDrivers 14:02:05 <ralonsoh> let's give 30 seconds more 14:02:52 <ralonsoh> ok, let's start 14:02:56 <ralonsoh> the topic we have is 14:03:04 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/2027742 14:03:09 <ralonsoh> [RFE] unmanaged dynamic router resources - OVN 14:03:12 <ralonsoh> racosta, please 14:03:27 <racosta> First of all, thank you very much for taking the time to discuss this RFE. 14:03:37 <racosta> This is an opensource project with global impact, and I try to contribute as much as possible with the development of the solution and with experience/test results. 14:03:44 <racosta> As I mentioned in the RFE information and mailing list messages, I don't know any other option that allows learn and reinjecting routes at the SDN level (BGP with default gw/static routes does not do the same thing). https://lists.openstack.org/pipermail/openstack-discuss/2023-July/034377.html 14:03:51 <racosta> Why is it important? -> high availability 14:04:01 <racosta> In the proposed RFE, all the basic resources are created by OpenStack Neutron, including the tenant router. What do I need to add for interconnect to work? the LRP in the TS network. 14:04:09 <racosta> The LRP of the tenant's router connects with the TS (this transit switch behaves like a provider), as shown in the figure: https://drive.google.com/file/d/1lBP7MdvukRlQIE1yWU3K3jWGqnAGGQd3/view?usp=sharing 14:04:20 <racosta> I tested an OVN interconnect integrated with 3 OpenStack installations and it worked very well. The TS is not known to Neutron and nothing happens to it. The point here is the "learned" Static Routes and LRPs (used to connect with the TS). 14:04:34 <racosta> I understand the arguments about security that Rodolfo commented on the thread, but in this case, this LRP and these routes are created solely and exclusively in the tenant's network (having no overlap or problems with other tenant's subnets). 14:04:56 <racosta> Why should we make db_sync flexible? Even if other ways of managing TS are applied in Neutron, the static routes (which is the main point for the interconnect to work), would have to be made more flexible, and the tenant router would need to skip static routes learned from db_sync, for example. There are other points here about adding TS support as a network, scalability is the biggest one. 14:05:09 <racosta> While it is possible to create the TS in Neutron itself (with a new RFE), this would mean having to add a new router in the tenant project and create static routes for that project. Don't you have problems with scaling on OpenStack? I mean, Neutron's networking backend have no infinite resources. 14:05:23 <racosta> The big problem with this approach is that with large scale deployments (thousands of tenants), the SDN can reach the limit and we will face timeout problems. The most classic I've faced with OpenStack Ussuri (which has lower limits) is "network-vif-plugged", caused by the OVN's delay in transitioning the Port events from chassis to the OVSDB Southbound and consequently Nova's ends up failing - (even with configured network-vif-plugged 14:05:23 <racosta> timeouts of 300 seconds). 14:06:16 <racosta> what do you think about this RFE? 14:07:13 <ralonsoh> first of all, is everyone aware of the goal of this RFE? Do you have questions about it? 14:08:04 <slaweq> I read today the thread on the ML 14:08:35 <slaweq> IIUC it's about interconnecting 2 different OpenStack clusters 14:08:43 <slaweq> using L3 14:08:53 <slaweq> is that correct? 14:08:57 <racosta> Yep, two or more OpenStack clusters. 14:09:02 <obondarev> not only OpenStack 14:09:06 <mlavalle> that's my understanding as well 14:09:09 <ralonsoh> using OVN IC, that creates an IC Transit Switch and requires OVN routers 14:09:15 <slaweq> ok 14:09:18 <mlavalle> and yes, connect more that OpenStack 14:09:20 <obondarev> also connect openstack to smth else, right? 14:09:32 <slaweq> and this reminded me old things which we had proposed long time ago: 14:09:38 <racosta> Yes, I tested it integrated with ovn-kube. 14:09:41 <slaweq> Old spec https://specs.openstack.org/openstack/neutron-specs/specs/stein/neutron-interconnection.html 14:09:54 <slaweq> old API-REF https://review.opendev.org/c/openstack/neutron-lib/+/626871 - was later removed with https://review.opendev.org/c/openstack/neutron-lib/+/626871 when we archived this stadium project, 14:10:09 <slaweq> and stadium project https://github.com/openstack-archive/neutron-interconnection/tree/1c5fbf56ff05f503745f7de7041f3e3e258d4f73 14:10:30 <slaweq> did You looked maybe at that spec and API? Maybe we could somehow base on that to implement it with ovn-ic now? 14:10:34 <felixhuettner[m]> if i got it correctly the point of the RFE here is to not interfere with the resources created by/for OVN-IC inside the local ovn cluster 14:10:53 <mlavalle> I have questions regarding ovn-kube: has this approach already been implemented with ovn-kube? 14:10:55 <felixhuettner[m]> i did not understand that ic should be intergrated in neutron (or maybe just as a future topic) 14:11:03 <racosta> I saw that slaweq but it wasn't continued. 14:11:45 <slaweq> racosta yes, it was proposed by guys from Orange but later they stopped working on it, at least u/s and there was nobody interested in that later 14:12:31 <racosta> That's the point Felix, the goal is for Neutron not to interfere with the ovn-ic. 14:12:51 <slaweq> I'm just saying about it now, as maybe it would be useful - making it somehow part of neutron would mean that neutron will know about those resources so there may not be interfere problem anymore :) 14:13:40 <felixhuettner[m]> that might also be an option (and would even make it more easily useable) 14:13:43 <racosta> mlavalle, to integrate with ovn-kube used this doc: https://github.com/kubeovn/kube-ovn/blob/v1.11.0/docs/cluster-interconnection.md 14:13:59 <felixhuettner[m]> allthough i would only see the "usage" of this transit switch within neutron 14:14:04 <felixhuettner[m]> not the management of it 14:14:14 <felixhuettner[m]> otherwise we have one neutron ruling over other neutrons 14:14:30 <felixhuettner[m]> which at least in our case is not what we want 14:15:07 <ralonsoh> how is that? each openstack cloud will create its resources (TS, routers, etc) 14:15:28 <felixhuettner[m]> no the transit switch is own by ovn-interconnect (in a separate database) 14:15:32 <racosta> Yes Felix, remote LSP learned via ovn-ic should not be managed by Neutron. 14:15:42 <felixhuettner[m]> and then "instanciated" in each individual ovn deployment 14:15:55 <felixhuettner[m]> and i meant the transit switch within the ic databases 14:16:23 <ralonsoh> hold one, this IC belong to other OVN deployment? 14:16:42 <ralonsoh> the IC database belongs to each cloud/OVN cluster 14:16:53 <slaweq> I think I need to read more about ovn-ic because I never knew about this :) Because of that I may say totally stupid things regrarding this old spec - sorry for that 14:16:55 <ralonsoh> each one has its own IC NB/IC SB database 14:17:01 <felixhuettner[m]> no 14:17:07 <felixhuettner[m]> there is one central IC NB+SB 14:17:19 <felixhuettner[m]> and all OVN deployments have their individual normal NB+SB 14:17:32 <felixhuettner[m]> and the ovn-ic daemon connects to the IC-NB+IC-SB and the normal NB+SB 14:17:52 <felixhuettner[m]> and these IC NB+SB contain the information about the existence of this transit switch and who is connected there 14:18:07 <ralonsoh> right rigth, that's correct 14:18:20 <racosta> no worries slaweq, I had already seen it and even opened a thread in March mentioning this spec, but the case here is a little different. 14:18:37 <ralonsoh> so let's go to the NB resources case, what is the problem here 14:18:42 <slaweq> racosta now I see :) 14:19:06 <felixhuettner[m]> so the ovn-ic daemon will sync information from the IC-NB+IC-SB to the normal NB 14:19:16 <felixhuettner[m]> this includes the transit switch (a logical switch) 14:19:31 <felixhuettner[m]> the routes to other routes (logical router static routes) 14:19:51 <felixhuettner[m]> and the ports of other devices on the transit switch (logical switch ports) 14:20:12 <racosta> local and the remote LSP's on the transit switch 14:20:15 <ralonsoh> and can these resources be tagged somehow from the IC controoler? 14:20:25 <ralonsoh> I mean the local NB resources 14:20:35 <felixhuettner[m]> they are tagged iirc 14:20:45 <felixhuettner[m]> let me check 14:20:47 <ralonsoh> so we have a way to recognize them, right 14:21:22 <felixhuettner[m]> so logical switches have other_config:interconn-ts=*** 14:21:38 <ralonsoh> and routers? 14:21:41 <racosta> If they are not natively tagged, you can add an interconnect tag 14:21:57 <ralonsoh> NAT rules in OVN 14:22:15 <racosta> The routers are Neutron Native router... 14:22:20 <ralonsoh> no 14:22:21 <ralonsoh> no 14:22:28 <felixhuettner[m]> static routes have external_ids:ic-learned-route=xxx 14:22:38 <ralonsoh> you said that you DON'T want to do resources from Neutron 14:22:52 <ralonsoh> felixhuettner[m], ok 14:23:07 <ralonsoh> and the problem are the learned static routes, right? 14:23:15 <felixhuettner[m]> and the logical switch ports 14:23:19 <ralonsoh> that you don't have a way to correlate to the routers 14:23:20 <felixhuettner[m]> and i think they have type=remote 14:23:23 <felixhuettner[m]> but i'm not sure 14:23:54 <felixhuettner[m]> but i guess we could exclude all resources with these tags from the sync 14:24:16 <racosta> Of course we can correlate ralonsoh, the LRP connected to the TS is linked with the Neutron router. 14:24:21 <ralonsoh> that's the point, if you can correlate all of them, you can create these rules in the sync tool 14:24:37 <felixhuettner[m]> +1 14:24:53 <ralonsoh> racosta, ok, I've been trying to have a reply for this question for a whole week 14:24:59 <ralonsoh> let me ask it again 14:25:18 <ralonsoh> you are proposing a strategy here where the resources are NOT created in Neutron 14:25:21 <ralonsoh> right? 14:26:44 <racosta> Yes, interconnect LRP only (and learned static routes). Everything else is from Neutron DB and will be linked with TS via LRP 14:28:08 <racosta> We can define a tag to identify this LRP and handle it in the sync tool. 14:28:25 <ralonsoh> ok, I have no idea what you are proposing, at all. I'm totally disoriented 14:28:35 <ralonsoh> I think you are mixing Neutron DB and OVN DB 14:28:53 <ralonsoh> in any case, at this point I'll stop writing and let other people to collaborate 14:29:29 <felixhuettner[m]> aah, i missed that connection 14:29:36 <felixhuettner[m]> so we where above at 14:29:48 <felixhuettner[m]> there is a transit switch replicated to the normal northbound 14:29:59 <felixhuettner[m]> and there is a normal router created by neutron in the standard way 14:30:21 <ralonsoh> ^ that's what I was expecting, finally 14:30:28 <ralonsoh> so this must be very clear in the spec 14:30:37 <felixhuettner[m]> and the needed LRP and LSP port to connect the router to the transit switch was still missing 14:30:42 <ralonsoh> what is created from the IC controller 14:30:46 <felixhuettner[m]> and that is not created by ovn-ic 14:30:46 <ralonsoh> and what manually in Neutron 14:30:57 <felixhuettner[m]> and not created by neutron 14:31:04 <felixhuettner[m]> and i think this causes all this confusion 14:31:08 <ralonsoh> yes 14:31:13 <ralonsoh> ok, I'll stop now 14:31:14 <racosta> yep. TS - dynamic ovn managed. Logical_Router - Neutron managed. We need to connect this. 14:31:27 <slaweq> ralonsoh speaking about spec - I think that this needs spec with well described problem statement and proposed solution (maybe with examples) 14:31:44 <mlavalle> and a nice set of diagrams 14:31:50 <ralonsoh> +1 14:31:52 <slaweq> mlavalle++ 14:32:02 <haleyb> yes, i like pictures too 14:32:18 <felixhuettner[m]> i agree as well, the amount of confusion it has created is quite large already :) 14:32:19 <obondarev> so after all, what are neutron changes? Are there a lot? 14:32:22 <mlavalle> depicting the components and who is responsible of managing each (i.e. OVN-IC vs Neutron) 14:33:15 <racosta> I like too haleyb, I tried to put it but the openstack list blocks images... 14:33:43 <obondarev> my impression was that the only change is to skip some resources (not delete) during db sync, is it correct? 14:34:01 <felixhuettner[m]> yes (from my perspective) 14:34:09 <felixhuettner[m]> just defining which is not easy 14:34:12 <mlavalle> what is really positive about this proposal is that we have a feature in the underlying sdn backend that we are not leveraging in Neutron and we should explore how to do it, especially since other CMSs are already doing it 14:34:24 <felixhuettner[m]> racosta: should we work together on this spec? 14:34:46 <felixhuettner[m]> (allthough i'll be on vaccation soonish) 14:35:08 <ralonsoh> just a heads-up: any spec will be approved for the next release (C) 14:35:10 <racosta> Of course Felix, that would be nice. 14:35:13 <haleyb> from reading the bug and listening here, i'm still not sure how this IC thing learns routes, etc - is it just sync or getting neutron events? how does it keep up to date? just something to make clear 14:35:58 <felixhuettner[m]> it reads the routes from one clusters NB and puts it to the central IC-SB 14:36:10 <felixhuettner[m]> then the other cluster takes it from the IC-SB and adds it to the clusters NB 14:37:41 <racosta> Yes, there is an ovn-ic daemon that monitors the NB DB and replicates to other elements of the interconnect domain (another OpenStack, for example) - but this route "learned" is limited to the scope of TS. 14:38:58 <haleyb> ok, so it's watching OVN events after an initial sync 14:39:12 <felixhuettner[m]> +1 14:39:15 <racosta> In practice, a subnet behind an Openstack router can communicate to another subnet behind a router on another OpenStack (using L3). 14:40:18 <ralonsoh> (I would expect a good documentation of how to create this IC between clusters) 14:40:23 <ralonsoh> any other question? 14:40:41 <ralonsoh> so let's vote for this RFE 14:40:51 <ralonsoh> +1 (plus an spec) 14:41:29 <ralonsoh> folks? 14:41:35 <slaweq> +1 for RFE as I think that's great thing to have possibility to interconnect different clusters togethere 14:41:54 <mlavalle> let's go for a spec 14:42:01 <mlavalle> +1 14:42:12 <ralonsoh> obondarev, ? 14:42:16 <mlavalle> the good news is that we have time to be thorough with it 14:42:20 <obondarev> +1 14:42:33 <ralonsoh> so perfect, the RFE is approved. I'll comment that in the LP bug 14:42:34 <haleyb> +1 14:42:37 <ralonsoh> sorry 14:42:43 <ralonsoh> haleyb, I missed you! 14:42:44 <ralonsoh> sorry 14:42:57 <haleyb> i'll go back to my corner :) 14:43:02 <ralonsoh> my bad... 14:43:13 <ralonsoh> anything else you want to comment? 14:43:27 <haleyb> no, will wait for spec 14:43:42 <ralonsoh> thank you all for attending this meeting. Have a nice weekend 14:43:55 <ralonsoh> (I'll be on PTO next week, just a heads-up) 14:43:59 <felixhuettner[m]> thank you, have a nice weekend 14:44:01 <obondarev> o/ 14:44:02 <ralonsoh> #endmeeting