14:00:15 #startmeeting neutron_drivers 14:00:15 Meeting started Fri Jun 24 14:00:15 2022 UTC and is due to finish in 60 minutes. The chair is lajoskatona. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:15 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:15 The meeting name has been set to 'neutron_drivers' 14:00:17 o/ 14:00:33 hi 14:00:42 o/ 14:00:43 o/ 14:01:13 hi 14:01:20 haleyb can't join today, and I think mlavalle is also ooo today (this week?) 14:01:28 yes 14:01:32 hi, but I was PTO today so I did not check the topics in advance.... 14:01:40 hi 14:02:06 amotoki: thanks for coming 14:02:21 amu 14:02:38 sorry, wrong window 14:02:45 I think we can start 14:02:47 lajoskatona: np. will try to catch up discussions :) 14:03:23 I tried to put our topics in time order as we skipped few meetings and to avoid long hanging topics 14:03:31 The first one is: 14:03:36 [RFE] Improve performance of bulk port create/delete with networking-generic-switch (#link https://bugs.launchpad.net/neutron/+bug/1976270) 14:04:52 there's 2 proposals in the RFE: 14:05:06 add a new "bind_port_bulk()" interface between Neutron and mechanism drivers 14:05:17 or add a way to bind ports asynchronously 14:06:29 can I ask what does it means "bind ports asynchronously"? 14:06:29 just a thought: maybe several API calls with less port chunks would perform better than bulk creation in this case? 14:06:53 meaning to use neutron server threads and processes 14:07:27 obondarev most likely as then You would paralelize ports creation :) 14:07:33 and binding 14:07:39 right 14:08:17 but from the other hand, I agree with the RFE, that if we have bulk port creation, that should be recommended and faster to create many similar ports then making many API calls 14:08:18 but this is more burden at a higher layer of course (API caller orchestrator) 14:08:22 IIUC network-generic-switch driver provisions hardware switches, so I am not sure parallelization would improve the performance if the driver provisions switches when API is called 14:08:52 but can't you call this API from different threads? 14:09:11 that is my understanding also from the RFE, the hw is slow in this case so the bulk soluton is not that good for it 14:09:44 "and networking-generic-switch takes between 5s and 30s to bind a port (depending on the hardware)" 14:09:48 my understanding is that "ssh connection to configure every single port is slow" 14:10:09 slaweq: my understanding is same 14:10:19 but I also think that You can make more parallel connections to the switch and configure more ports in same time 14:10:36 so bulk port creation response would be much faster 14:10:44 if we would paralelize it 14:10:59 I am not sure hardware switches allow to configure ports in multiple ssh connections 14:11:20 ok, that answers my previous question then 14:11:28 but last comment from the rfe owner is that they are trying to move hardware configureation to post_commit method from the bind_port 14:11:29 +1 14:11:33 seems testing separate port creation (API call per port) show whether parallelism is a good way 14:11:51 should show* 14:12:03 for me it makes sense and maybe, as he wrote in LP, it would be easier to optimize port_create_postcommit() rather then port_bind 14:13:33 but the problem will be the same, if the response from the HW takes 30secs 14:14:21 what I think they need is an async method to update the port status 14:14:27 so, maybe they should simply improve networking-generic-switch mech driver, and make methods which connects to the router somehow asynchronouse 14:14:37 send the creation message and then receive the HW confirmation 14:14:46 so neutron would simple call that method, driver would start some worker and do whatever it needs to do 14:15:02 but neutron would finish fast, even when ports aren't really configured on switches yet 14:15:14 right ^ 14:15:45 that is back to the async solution on NGC side 14:16:07 so agent will end process loop without notifying "ports on agent are up"? 14:16:24 there is BUILD status in neutron port. Does neutron API consumers like nova honor it? If so, neutron server can response API in async way 14:16:50 amotoki: that's a good question 14:17:23 I am not sure if nova just uses the binding ifo or the port status also 14:17:25 sorry, I mixed up OVS and generic switch 14:17:29 amotoki nova don't checks port's status, it just waits for notification from neutron that port is ACTIVE/DOWN 14:17:57 not active but plugged 14:18:32 true, we ave notifiers for nova and ironic, my bad 14:20:29 so, even if the neutron API return port creation response before a port is actually created on a backend, it should work, right? 14:20:34 In this case the async mode can work for the RFE as when the port is plugged nova receives the notification anyway. 14:20:48 if we take the idea of async update, we can use the port provisioning mechanism we already have 14:20:53 amotoki: that's how I understand it 14:20:56 we only inform Nova that a port is plugged 14:21:11 when DHCP/OVS provide this info (in ML2/OVS) 14:21:28 so we can wait until the agent provisions this port 14:22:04 but for that there is no neeed to change anything in Neutron, or do we miss something? 14:23:39 what I think should be done is a change in the networking-generic-switch mech driver not to wait for the backend, and the agent should provide this info (same as OVS agent) 14:23:47 (this is what I think) 14:23:48 need to change mech driver afaiu 14:23:56 agree with ralonsoh 14:24:25 +1 14:24:41 ok, but that is a change in networkingGenericSwitch, not in core Neutron 14:24:51 right 14:25:13 generally +1. if the rfe author still feel the need to improve the mech driverr interface, they can request it more specifically 14:25:26 amotoki: +1 14:25:34 +1 to this 14:25:49 +1 14:25:51 +1 14:25:57 ok I try to summarize this to the RFE and see if the reporter based on their tests can agree with it 14:26:12 +1 14:26:16 +1 14:26:35 +1 14:27:08 Ok, next one: 14:27:16 (amorin): Neutron RBAC not sharing subnet (#link https://bugs.launchpad.net/neutron/+bug/1975603) 14:28:33 There is even a patch for this problem: https://review.opendev.org/c/openstack/neutron/+/843871 14:29:32 IMO, the LP bug sounds good. But the patch shows what the real problem the submitter has. And this patch is re-introducing a previous fixed issue 14:29:59 this one: https://bugs.launchpad.net/neutron/+bug/1757482 14:30:04 exactly 14:30:19 and this older bug overrides any possible new RBAC implementation 14:30:41 for the consequences of handling IP pools in an external network 14:31:38 isn't the root cause of the problem that subnets aren't shared through RBAC mechanism thus they aren't visible when regular user makes DB query? 14:31:50 I think we had/have similar problem with SG rules 14:32:15 no no, he can't add an interface to a router 14:32:24 he can see it, but can't add it 14:32:58 subnet visibilility should be determined by a network visibility which corresponding subnets belong to 14:33:00 yes that's what I understand also, the subnet is visible for the user, but Neutron trows exception 14:33:04 so this check in L861 there should be different 14:33:16 and should be aware of the networks shared through RBAC IMO 14:33:58 slaweq: +1 14:34:44 slaweq, agree 14:35:03 that's true 14:37:04 so IMO we should treat that one as regular bug and "just" fix it :) 14:37:17 ok, so for the proposal need to be changed, but we can go on with review 14:37:37 +1 14:37:44 +1 14:37:48 +1 14:37:51 +1 14:37:52 +1 14:38:38 Next one: 14:38:43 [ovn]Floating IP adds distributed attributes (#link https://bugs.launchpad.net/neutron/+bug/1978039) 14:39:37 For this one the reporter even opened a Blueprint: #link https://blueprints.launchpad.net/neutron/+spec/custom-floatingip-distributed 14:40:02 I read that one and I'm not big fan of this proposal really 14:40:13 we already have "distributed" flag for router 14:40:30 not in OVN, this flag does not apply 14:40:34 and IMO we should use that one in OVN too and make all FIPs in one router to be distributed or not 14:40:54 the reason for that is that I don't want to have API differences depending on the backend 14:41:21 ralonsoh I know that it don't apply now but IMO we should make OVN aware of this flag rather than adding "distributed" to the floating IPs 14:42:03 slaweq, but I think he wants to make this even more granular 14:42:18 or maybe combine 2 approaches in ovn case: 1. router's distributed would be the default setting for FIPs in router 14:42:19 I know, I know, the current API does not allow it 14:42:36 and "per FIP" "distributed" would allow to overide that router's setting 14:42:51 like we have e.g. with qos policies on network and ports 14:43:09 that way API will be more consistent IMO with ML2/OVS+DVR 14:43:11 wdyt? 14:43:16 so implement the router --distributed flag 14:43:23 and also allow the flag per FIP? 14:43:35 ralonsoh yeah 14:43:45 I have a question on "distributed" FIP. FIP is usually created by regular users. It allows users to control whether FIP is distirbuted or not. is this what operators would like to allow? 14:43:57 good question... 14:43:57 then if someone wants to have all FIPs in router distributed (or not) it can be done with one setting on the router's level 14:44:16 amotoki, maybe that should be controller with a policy 14:44:24 amotoki that could be controlled by policies 14:44:32 and by default only admin would be able to do that 14:44:34 Merged openstack/networking-ovn stable/train: Revert "Fix member_batch_update function to follow api v2" https://review.opendev.org/c/openstack/networking-ovn/+/847255 14:44:47 and again, that's why router's distributed flag as default would be good IMO 14:44:54 ralonsoh: slaweq: yes. we can allow oeprators to control them via policies 14:45:23 +1 to this proposal (with the two steps implementation + policies) 14:46:01 ralonsoh implementation can be "one step", it would be just something like: 14:46:20 is_fip_distributed = fip['distributed'] or router['distributed'] 14:46:31 perfect 14:46:43 router['distributed'] is already there but ovn don't respect it 14:46:45 +1 14:46:55 so that way it would start to use that parameter 14:47:14 and we need to add "fip['distributed']" (with api extension and so on) 14:47:17 is there no case where a router is distributed but FIP is not distributed? 14:47:25 and implement that "per fip" in ovn_l3 14:47:46 amotoki in ovn case in fact You control that "per fip" 14:48:17 so You can have by default all fips distributed but for some reason make one fip to be centralized 14:48:17 slaweq: so, ' fip['distributed'] or router['distributed']' does not work perhaps 14:48:43 amotoki yeah, that was kind of shortcut 14:48:46 router['distributed'] can be the default for "distributed" for FIP 14:48:56 it would need to check if it's None or not 14:48:58 etc. 14:49:05 I just wanted to be fast here :) 14:49:12 yeah 14:49:22 sorry if I wasn't clear :) 14:49:54 anyway what we discussed is similar to what we do for router.distributed (and ha) now. 14:50:36 my honest impression is that the meaning of "distributed" is too backend specific to have it as a generic attribute. (i had the same impression on the dvr's one too.) 14:51:10 but maybe it's ok if it makes sense for ovn... 14:52:11 ok, so we need API extension and the logic to make it with OVN 14:52:28 make it work with OVN 14:53:00 lajoskatona: an API externsion for fip.distributed, right? 14:53:11 amotoki: yes 14:53:16 +1 14:53:28 +1 14:53:29 +1 14:54:12 just for clarification. is the first step to honor router.distributed in OVN backend? 14:54:56 amotoki IMO it can but don't have to be 14:55:12 as this setting will need to be checked for each FIP by ovn_l3 plugin 14:55:47 but we can add it as first step and then "extend" it with fip['distributed'] when that API extension will be there 14:55:48 slaweq: thanks 14:55:51 so even if router=distributed, the fips for must be checked one-by-one? ok 14:56:14 lajoskatona I think so as You can/need to set it per IP in ovn 14:56:22 ok, I will add the summary to the RFE 14:56:26 slaweq: ok, thanks 14:56:29 but maybe I'm wrong on that, ralonsoh will for sure know it better 14:56:47 that's part of the implementation, we can review the patches later 14:56:53 +1 14:56:55 yeah 14:57:25 I did not vote yet. +1 for this proposal itself. I am okay with the direction. 14:58:05 +1 (as a parameter for specific backend, similarly to router.distributed for dvr) 14:58:17 ok, thanks for the discussion 14:58:39 we have one more RFE, but we have no more time, so let's postpone that one for next time 14:59:06 ok, have a great weekend all! 14:59:09 #endmeeting