14:00:58 <haleyb> #startmeeting neutron_drivers 14:00:58 <opendevmeet> Meeting started Fri Dec 6 14:00:58 2024 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:58 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:58 <opendevmeet> The meeting name has been set to 'neutron_drivers' 14:01:01 <haleyb> Ping list: ykarel, mlavalle, mtomaska, slaweq, obondarev, tobias-urdin, lajoskatona, amotoki, haleyb, ralonsoh 14:01:06 <mlavalle> \o 14:01:11 <ralonsoh> hello 14:01:17 <slaweq> o/ 14:01:19 <s3rj1k> hi all 14:02:11 <haleyb> alright, I see 4 items in the agenda, let's get started 14:02:18 <haleyb> ralonsoh: the first two are yours 14:02:28 <ralonsoh> I'm opening the agenda 14:02:38 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/2084782 14:02:43 <ralonsoh> [OVN] VLAN/flat LRP should be updated when the router networks are updated 14:03:24 <lajoskatona> o/ 14:03:26 <ralonsoh> summarizing: the problem here is that the funcionality right now doesn't work if we update the networks attached to a router 14:03:51 <ralonsoh> if we have mixed type networks (tunnelled and not tunnelled), the flat/vlan ones need to be centralized 14:04:04 <ralonsoh> due to issues in the MTU 14:04:26 <ralonsoh> so the point is that we have been fixing/reverting this functionality several times 14:04:32 <ralonsoh> my suggestion: 14:04:43 <ralonsoh> 1) make always the vlan/flat network traffic centralized 14:05:07 <ralonsoh> 2) propose a new feature to allow distributed vlan/flat traffic for routers with only this kind of networks 14:05:17 <ralonsoh> (a kind of network type segregation) 14:05:28 <ralonsoh> --> https://bugs.launchpad.net/neutron/+bug/2084782/comments/1 14:05:31 <obondarev> late o/ 14:06:01 <ralonsoh> for (1) --> https://review.opendev.org/c/openstack/neutron/+/935652 14:06:25 <ralonsoh> so opinions here? 14:07:12 <lajoskatona> what centralized means in this case with OVN? 14:07:24 <ralonsoh> ovn allows distributed FIPs 14:07:39 <ralonsoh> that won't work for flat/vlan networks connected to a router 14:07:55 <ralonsoh> all traffic will cross the GW logical router port 14:08:05 <lajoskatona> ahh, ok ,thanks 14:08:53 <slaweq> so in fact we would make config option "enable_distributed_floating_ips" having no effect on the deployment, right? 14:09:05 <slaweq> for the tenant networks of type vlan/flat 14:09:09 <ralonsoh> for vlan/flat tenant networks 14:09:12 <ralonsoh> yes 14:09:15 <slaweq> for geneve networks it will still be distributed 14:09:18 <ralonsoh> yes 14:09:55 <slaweq> so basically it is revert of all the attemps to make vlan tenant networks to be distributed 14:09:58 <ralonsoh> yes 14:10:15 <ralonsoh> and (2) make this possible but only for these networks connected to a router 14:10:21 <ralonsoh> not router type mixing 14:10:28 <ralonsoh> right now this is a problem in OVN 14:10:41 <ralonsoh> not network type mixing* 14:10:48 <slaweq> and you know my opinion - I am ok with this as I know there is no other option and basically now we have this functionality broken in some cases 14:10:58 <mlavalle> ah ok, so we are not ruling out distributed fips with vlan networks 14:11:00 <slaweq> and there's no way to fix it for all use cases 14:11:08 <opendevreview> Stanislav Dmitriev proposed openstack/neutron master: HA VRRP health check parameters https://review.opendev.org/c/openstack/neutron/+/932716 14:11:10 <mlavalle> it's just the mixing that we want to limit 14:11:32 <ralonsoh> yes but we still allow that, but if we mix them 14:11:32 <lajoskatona> if Iunderstand well the 2 options there will be some limitation anyway so have to choose which one :-) 14:11:40 <ralonsoh> 1) tunnelled with have DVR 14:11:45 <ralonsoh> 2) flat/vlan won't 14:11:49 <mlavalle> yes, there's a limit in the proposal 14:12:07 <ralonsoh> so we are not going to prevent mixing networks 14:12:16 <ralonsoh> just disabling dvr for vlan 14:12:27 <mlavalle> right, but... 14:12:53 <mlavalle> if there are no tunneled networks, 2 will allow dvr for vlans, right? 14:13:01 <slaweq> lajoskatona: yes, I see it like choose between: 1) mix network types in routers as you want but traffic to/from vlan tenant networks will be going throug chassis gw or 2) vlan tenant networks traffic will be distributed but you can't mix networks of different types in the same router 14:13:12 <ralonsoh> mlavalle, no 14:13:19 <ralonsoh> this is proposal (2) 14:13:24 <slaweq> depends on the use case operator will be able to choose what the need 14:13:29 <mlavalle> yes, thet's what I meant 14:13:33 <slaweq> at least that's my understanding of the proposal 14:13:41 <ralonsoh> and this is because if in the router you add a 3rd network, that is tunnelled, the vlan traffic will be broken 14:15:34 <ralonsoh> mlavalle, right, I didn't read it correctly 14:16:02 <ralonsoh> maybe, with future OVN improvements, we'll be able to avoid this limitations 14:17:22 <opendevreview> Merged x/whitebox-neutron-tempest-plugin master: Bump hacking https://review.opendev.org/c/x/whitebox-neutron-tempest-plugin/+/936943 14:17:33 <mlavalle> is there a way we can fix this at the ovnlevel instead that at neutron's? 14:17:58 <ralonsoh> yes but there are other limitations, for example with QoS 14:18:00 <mlavalle> would a conversation with the ovn core team help? 14:18:29 <ralonsoh> there is some kind of fix but that removes the qos capability for vlan networks 14:18:37 <ralonsoh> so the solution introduces another issue\ 14:18:43 <haleyb> so what will be the impact to a tenant (besides it actually working correctly), just a possible performance penalty for centralizing provider traffic? 14:19:02 <haleyb> o 14:19:06 <ralonsoh> right now we have this limitation, with mixed networks 14:19:28 <ralonsoh> if we also implement (2) for vlan networks, this limitation will be gone 14:19:44 <ralonsoh> (but, of course, using only vlan networks in the router, not mixing) 14:20:04 <ralonsoh> right now we can't have everything 14:20:44 <haleyb> right, but the use case of having both is probably pretty small, so we can live with it 14:20:57 <ralonsoh> yes, that's the point 14:21:06 <ralonsoh> usually tenant networks are geneve 14:21:13 <ralonsoh> (that work much better in ovn) 14:21:32 <mlavalle> that's a valid consideration 14:22:09 <ralonsoh> if this is accepted, of course I'll document everything crystal clear 14:22:55 <lajoskatona> that is necessary for sure 14:23:07 <haleyb> with pictures or ascii art :) j/k doc is the key 14:23:24 <obondarev> "but you can't mix networks of different types in the same router" what means "you can't"? Error raised on attempt to connect a diff type network to a router? 14:24:00 <ralonsoh> if (2) is implemented and you add vlan networks to a router 14:24:10 <ralonsoh> that will be configurable, in order to have DVR with vlan 14:24:24 <ralonsoh> but that will imply that you won't be able to mix network types 14:24:39 <ralonsoh> but this will be configurable (same as DVR, of example) 14:24:51 <mlavalle> would that be for the entire deployment or per router? 14:24:54 <haleyb> true, is that an API-type change? or ? 14:24:56 <ralonsoh> per router 14:25:12 <ralonsoh> haleyb, I don't think we can implement that at API level 14:25:12 <obondarev> yeah I'm just thinking on upgrade scenario, when user already has mixed typed nets in a router 14:25:18 <mlavalle> seems less restrictive then 14:25:20 <ralonsoh> this is on the server, running time 14:25:40 <ralonsoh> obondarev, by default, this config option will be disabled so the upgrade will be possible 14:25:53 <obondarev> ack 14:26:21 <ralonsoh> folks, I'm going to propose a spec 14:26:26 <haleyb> ralonsoh: oh, per-router to me implied you can change it via the API 14:26:28 <ralonsoh> this kind of change deserves it 14:26:42 <slaweq> obondarev we can add some upgrade check to warn user about it durign upgrade 14:26:43 <ralonsoh> it will be much better to describe everything there 14:26:56 <lajoskatona> +1 for spec 14:27:00 <ralonsoh> including upgrades or any other consideration 14:27:06 <ralonsoh> I'll add the RFE tag to the bug 14:27:18 <obondarev> +1 14:27:22 <haleyb> anyways, we should vote, and i was going to mention the RFE tag 14:27:34 <haleyb> +1 14:28:05 <mlavalle> +1 14:28:08 <lajoskatona> +1 for the whole id and proposing solution for it 14:28:27 <mlavalle> yes, my +1 assumes we implement (2) 14:28:33 <ralonsoh> thanks folks, I'll propose this spec next week 14:28:57 <haleyb> alright, great i marked it approved 14:29:03 <haleyb> you also had the next item 14:29:09 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/2032817 14:29:27 <ralonsoh> issue: in a router, we handle ext MTU < int MTU 14:29:39 <ralonsoh> but not the other way: ext MTU > int MTU 14:29:49 <ralonsoh> this last case is rare but possible (there we have the bug) 14:30:11 <ralonsoh> we have a flag in the GW LRP: options.gateway_mtu, that is set in the first case 14:30:31 <ralonsoh> this sends a frag meesage to the VM to fragment the traffic, and works fine 14:30:53 <ralonsoh> the proposal: to ALWAYS set this flag, regardless of the MTU relation 14:31:19 <ralonsoh> that means, in a router with several networks and MTUs, the minimum MTU will define this value 14:31:30 <ralonsoh> that makes sense if we want all networks to communicate 14:31:39 <ralonsoh> this is, of course, a limitation in the performance 14:31:57 <ralonsoh> but if you connect a network to a router you'll have this in consideration 14:32:34 <ralonsoh> and, something else to be condifered, this is configurable: there is a neutron configuration flag in order to write or not this flag 14:32:46 <ralonsoh> ovn_emit_need_to_frag 14:33:09 <ralonsoh> better than this, the POC patch 14:33:16 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/937026 14:33:27 <ralonsoh> (code speaks better than me) 14:33:59 <haleyb> ok, so we will only set the mtu if the config option is set 14:34:11 <ralonsoh> yes (this is True by default) 14:34:24 <ralonsoh> and makes sense if we don't want issues with different MTUs 14:34:35 <ralonsoh> this is just a heads-up because is an important change in the GW LRP 14:34:47 <ralonsoh> so any comment can be done in the patch ^^^ 14:34:59 <ralonsoh> thanks for listening (there are other topics in the agenda) 14:35:33 <lajoskatona> thanks ralonsoh, makes the review easier 14:35:38 <ralonsoh> thanks! 14:35:40 <mlavalle> +1 14:35:42 <haleyb> ralonsoh: i agree with fixing, might need to clean-up gaps document where we added this, will put in review 14:37:06 <haleyb> not sure we have to vote since it's technically a bug fix, but i'm +1 obviously 14:37:14 <ralonsoh> no no, just a heads-up 14:37:20 <ralonsoh> any review in the patch, thanks!! 14:37:35 <opendevreview> Dmitriy Rabotyagov proposed openstack/neutron master: [OVN] Do not supply gateway_port if it's not binded to chassis https://review.opendev.org/c/openstack/neutron/+/931495 14:37:39 <mlavalle> I just typed +1 to agree in that the clarification makes the revies easier 14:37:52 <haleyb> ok, we have two more items 14:38:03 <haleyb> s3rj1k: since i noticed you here, you can go next 14:38:30 <haleyb> #link https://review.opendev.org/c/openstack/neutron-specs/+/935724 14:38:51 <s3rj1k> yea, so this is continuation of previous meeting 14:39:10 <mlavalle> I thought so 14:39:11 <s3rj1k> same toping, just prepared a spec for it 14:39:16 <s3rj1k> *topic 14:39:39 <ralonsoh> I'll check it next week, for sure 14:39:51 <ralonsoh> I think we already voted for this one 14:39:56 <mlavalle> we did 14:40:17 <s3rj1k> yea, just highlighting this as needs review 14:40:24 <ralonsoh> for sure, thanks! 14:40:27 <mlavalle> thanks for the reminder 14:40:30 <s3rj1k> we can continue on next items 14:41:52 <haleyb> ok, next one was from Amir, not sure of his nick 14:41:57 <ralonsoh> amnik, 14:42:18 <amnik> Yes I'm here 14:42:25 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2089726 14:42:58 <amnik> I get your points in review I want to share with you what I think now. 14:43:45 <amnik> I also agree that it is not reasobable to implement parralel in ovsdbapp 14:44:06 <amnik> Because affect all other project that use ovsdbapp 14:44:33 <amnik> but in OVN agent I think order of events does not matter 14:44:51 <opendevreview> Serhii Ivanov proposed openstack/neutron-specs master: Proposes `Agent Startup State Tracking` https://review.opendev.org/c/openstack/neutron-specs/+/935724 14:45:10 <amnik> so maybe we can implment parrale in OVN agent 14:45:37 <amnik> in Ovn agent we always call provision_datapath in run function 14:45:44 <noonedeadpunk> but comment for port DOWN/UP was a good example which was applicable for OVN 14:45:56 <noonedeadpunk> in which state port wil result if it's done in parallel? 14:46:42 <ralonsoh> noonedeadpunk, you can receive, as we are doing now in the CI, the port up and down events almost at the same time 14:46:46 <amnik> and in provision_datapath we get the latest change of database independant of events 14:47:21 <noonedeadpunk> ralonsoh: yes, but you would expect order matter, right? so you can't process these in parallel I assume? 14:47:29 <ralonsoh> amnik, I'm totally ok with any improvement in the datapath provisioning. We had problems in the past because of this with beefy servers 14:47:44 <ralonsoh> noonedeadpunk, that was my point in the comment in the patch 14:47:48 <noonedeadpunk> yeah 14:47:51 <ralonsoh> amnik, but you need to consider: 14:47:57 <noonedeadpunk> I was just +1 it effectively here :) 14:48:14 <ralonsoh> 1) the datapath will be dependant on a network 14:48:39 <ralonsoh> 2) you can't process provisioning of a port in a datapath if other call is also doing the same for the same datapath 14:49:00 <ralonsoh> 3) create threads in python to execute just python code doesn't improve performance 14:49:25 <ralonsoh> that's all 14:50:25 <lajoskatona> if we add parallelism for these low level operations do we gain anything or just have the extra headache with the complexity? 14:50:47 <amnik> can you please explain the third one more. If we provosion datpath in multiple thread does not improve performance? 14:51:06 <slaweq> lajoskatona: and new set of problems to debug in CI jobs 🙂 14:51:09 <ralonsoh> amnik, no, you are executing python code 14:51:34 <lajoskatona> and debug customers :-) 14:51:40 <lajoskatona> I mean their deployments 14:51:52 <ralonsoh> lajoskatona, the only parts of this code that can maybe be executed in parallel are the pyroute calls to create/set the devices 14:52:17 <noonedeadpunk> from what I saw main performance issues in neutron replies, was actually the way how policies are being processed... 14:52:30 <ralonsoh> if we really want to improve this, we should refactor the code and create a dispatcher for several workers (processes) attending one datapath at the same time 14:52:40 <ralonsoh> so you can receive events and send the operations to these workers 14:53:04 <amnik> rolonsoh: I think the ovnmeta namespace maybe created parralely could help 14:53:09 <ralonsoh> to prevent issues with parallel executions in the same datapath, only one worker should attend each datapath 14:53:19 <ralonsoh> amnik, that takes 0.1 seconds 14:54:01 <amnik> rolonsoh: I understand. 14:54:56 <haleyb> ralonsoh: that's what the l3-agent does with it's work queues, indexed by network if my memory is good, but the GIL gets in the way 14:55:05 <ralonsoh> so, again, I'll help and actively participate in a good improvement design for the OVN agent 14:55:21 <ralonsoh> haleyb, kind of but the l3 agent and dhcp agent use threads 14:55:29 <ralonsoh> and right now I lowered the working threads to 1 14:55:47 <ralonsoh> and there is no performance impact (and much better debugging and less issues) 14:56:07 <ralonsoh> so if we want to do this, we need to create a server and spawn processes that will attend the requests 14:56:24 <haleyb> yes, understood 14:56:25 <ralonsoh> that is a HUGE refactor but I'll actively collaborate on this 14:57:39 <amnik> ralonsoh: thanks, I will review your idea in this meeting. and propose RFE if I can help. 14:57:48 <ralonsoh> perfect! 14:58:52 <lajoskatona> don't we need a spec for such a refactor? 14:59:04 <haleyb> ok, so based on that i will reject this current RFE 14:59:07 <ralonsoh> we need first an idea hehehe 14:59:38 <lajoskatona> ack 14:59:40 <haleyb> and we need data showing it helps 15:00:14 <haleyb> ok, we are out of time 15:00:23 <slaweq> +1 for having some data which will show what is actually improved and by how much 15:00:39 <amnik> Thanks all. I'll check it more. and contribute If I can help. 15:00:51 <haleyb> thanks everyone for attending 15:00:54 <haleyb> #endmeeting