#openstack-neutron log

14:00:15 <lajoskatona> #startmeeting neutron_drivers
14:00:15 <opendevmeet> Meeting started Fri Jun 24 14:00:15 2022 UTC and is due to finish in 60 minutes.  The chair is lajoskatona. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:15 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:15 <opendevmeet> The meeting name has been set to 'neutron_drivers'
14:00:17 <lajoskatona> o/
14:00:33 <yamamoto> hi
14:00:42 <slaweq> o/
14:00:43 <mtomaska> o/
14:01:13 <ralonsoh> hi
14:01:20 <lajoskatona> haleyb can't join today, and I think mlavalle is also ooo today (this week?)
14:01:28 <ralonsoh> yes
14:01:32 <amotoki> hi, but I was PTO today so I did not check the topics in advance....
14:01:40 <obondarev> hi
14:02:06 <lajoskatona> amotoki: thanks for coming
14:02:21 <yamamoto> amu
14:02:38 <yamamoto> sorry, wrong window
14:02:45 <lajoskatona> I think we can start
14:02:47 <amotoki> lajoskatona: np. will try to catch up discussions :)
14:03:23 <lajoskatona> I tried to put our topics in time order as we skipped few meetings and to avoid long hanging topics
14:03:31 <lajoskatona> The first one is:
14:03:36 <lajoskatona> [RFE] Improve performance of bulk port create/delete with networking-generic-switch (#link https://bugs.launchpad.net/neutron/+bug/1976270)
14:04:52 <lajoskatona> there's 2 proposals in the RFE:
14:05:06 <lajoskatona> add a new "bind_port_bulk()" interface between Neutron and mechanism drivers
14:05:17 <lajoskatona> or add a way to bind ports asynchronously
14:06:29 <ralonsoh> can I ask what does it means "bind ports asynchronously"?
14:06:29 <obondarev> just a thought: maybe several API calls with less port chunks would perform better than bulk creation in this case?
14:06:53 <obondarev> meaning to use neutron server threads and processes
14:07:27 <slaweq> obondarev most likely as then You would paralelize ports creation :)
14:07:33 <slaweq> and binding
14:07:39 <obondarev> right
14:08:17 <slaweq> but from the other hand, I agree with the RFE, that if we have bulk port creation, that should be recommended and faster to create many similar ports then making many API calls
14:08:18 <obondarev> but this is more burden at a higher layer of course (API caller orchestrator)
14:08:22 <amotoki> IIUC network-generic-switch driver provisions hardware switches, so I am not sure parallelization would improve the performance if the driver provisions switches when API is called
14:08:52 <ralonsoh> but can't you call this API from different threads?
14:09:11 <lajoskatona> that is my understanding also from the RFE, the hw is slow in this case so the bulk soluton is not that good for it
14:09:44 <lajoskatona> "and networking-generic-switch takes between 5s and 30s to bind a port (depending on the hardware)"
14:09:48 <slaweq> my understanding is that "ssh connection to configure every single port is slow"
14:10:09 <amotoki> slaweq: my understanding is same
14:10:19 <slaweq> but I also think that You can make more parallel connections to the switch and configure more ports in same time
14:10:36 <slaweq> so bulk port creation response would be much faster
14:10:44 <slaweq> if we would paralelize it
14:10:59 <amotoki> I am not sure hardware switches allow to configure ports in multiple ssh connections
14:11:20 <ralonsoh> ok, that answers my previous question then
14:11:28 <slaweq> but last comment from the rfe owner is that they are trying to move hardware configureation to post_commit method from the bind_port
14:11:29 <lajoskatona> +1
14:11:33 <obondarev> seems testing separate port creation (API call per port) show whether parallelism is a good way
14:11:51 <obondarev> should show*
14:12:03 <slaweq> for me it makes sense and maybe, as he wrote in LP, it would be easier to optimize port_create_postcommit() rather then port_bind
14:13:33 <ralonsoh> but the problem will be the same, if the response from the HW takes 30secs
14:14:21 <ralonsoh> what I think they need is an async method to update the port status
14:14:27 <slaweq> so, maybe they should simply improve networking-generic-switch mech driver, and make methods which connects to the router somehow asynchronouse
14:14:37 <ralonsoh> send the creation message and then receive the HW confirmation
14:14:46 <slaweq> so neutron would simple call that method, driver would start some worker and do whatever it needs to do
14:15:02 <slaweq> but neutron would finish fast, even when ports aren't really configured on switches yet
14:15:14 <ralonsoh> right ^
14:15:45 <lajoskatona> that is back to the async solution on NGC side
14:16:07 <obondarev> so agent will end process loop without notifying "ports on agent are up"?
14:16:24 <amotoki> there is BUILD status in neutron port. Does neutron API consumers like nova honor it? If so, neutron server can response API in async way
14:16:50 <lajoskatona> amotoki: that's a good question
14:17:23 <lajoskatona> I am not sure if nova just uses the binding ifo or the port status also
14:17:25 <obondarev> sorry, I mixed up OVS and generic switch
14:17:29 <slaweq> amotoki nova don't checks port's status, it just waits for notification from neutron that port is ACTIVE/DOWN
14:17:57 <ralonsoh> not active but plugged
14:18:32 <lajoskatona> true, we ave notifiers for nova and ironic, my bad
14:20:29 <amotoki> so, even if the neutron API return port creation response before a port is actually created on a backend, it should work, right?
14:20:34 <lajoskatona> In this case the async mode can work for the RFE as when the port is plugged nova receives the notification anyway.
14:20:48 <ralonsoh> if we take the idea of async update, we can use the port provisioning mechanism we already have
14:20:53 <lajoskatona> amotoki: that's how I understand it
14:20:56 <ralonsoh> we only inform Nova that a port is plugged
14:21:11 <ralonsoh> when DHCP/OVS provide this info (in ML2/OVS)
14:21:28 <ralonsoh> so we can wait until the agent provisions this port
14:22:04 <lajoskatona> but for that there is no neeed to change anything in Neutron, or do we miss something?
14:23:39 <ralonsoh> what I think should be done is a change in the networking-generic-switch mech driver not to wait for the backend, and the agent should provide this info (same as OVS agent)
14:23:47 <ralonsoh> (this is what I think)
14:23:48 <obondarev> need to change mech driver afaiu
14:23:56 <obondarev> agree with ralonsoh
14:24:25 <slaweq> +1
14:24:41 <lajoskatona> ok, but that is a change in networkingGenericSwitch, not in core Neutron
14:24:51 <ralonsoh> right
14:25:13 <amotoki> generally +1. if the rfe author still feel the need to improve the mech driverr interface, they can request it more specifically
14:25:26 <lajoskatona> amotoki: +1
14:25:34 <ralonsoh> +1 to this
14:25:49 <slaweq> +1
14:25:51 <obondarev> +1
14:25:57 <lajoskatona> ok I try to summarize this to the RFE and see if the reporter based on their tests can agree with it
14:26:12 <lajoskatona> +1
14:26:16 <amotoki> +1
14:26:35 <yamamoto> +1
14:27:08 <lajoskatona> Ok, next one:
14:27:16 <lajoskatona> (amorin): Neutron RBAC not sharing subnet (#link https://bugs.launchpad.net/neutron/+bug/1975603)
14:28:33 <lajoskatona> There is even a patch for this problem: https://review.opendev.org/c/openstack/neutron/+/843871
14:29:32 <ralonsoh> IMO, the LP bug sounds good. But the patch shows what the real problem the submitter has. And this patch is re-introducing a previous fixed issue
14:29:59 <lajoskatona> this one: https://bugs.launchpad.net/neutron/+bug/1757482
14:30:04 <ralonsoh> exactly
14:30:19 <ralonsoh> and this older bug overrides any possible new RBAC implementation
14:30:41 <ralonsoh> for the consequences of handling IP pools in an external network
14:31:38 <slaweq> isn't the root cause of the problem that subnets aren't shared through RBAC mechanism thus they aren't visible when regular user makes DB query?
14:31:50 <slaweq> I think we had/have similar problem with SG rules
14:32:15 <ralonsoh> no no, he can't add an interface to a router
14:32:24 <ralonsoh> he can see it, but can't add it
14:32:58 <amotoki> subnet visibilility should be determined by a network visibility which corresponding subnets belong to
14:33:00 <lajoskatona> yes that's what I understand also, the subnet is visible for the user, but Neutron trows exception
14:33:04 <slaweq> so this check in L861 there should be different
14:33:16 <slaweq> and should be aware of the networks shared through RBAC IMO
14:33:58 <amotoki> slaweq: +1
14:34:44 <ralonsoh> slaweq, agree
14:35:03 <lajoskatona> that's true
14:37:04 <slaweq> so IMO we should treat that one as regular bug and "just" fix it :)
14:37:17 <lajoskatona> ok, so for the proposal need to be changed, but we can go on with review
14:37:37 <ralonsoh> +1
14:37:44 <amotoki> +1
14:37:48 <obondarev> +1
14:37:51 <yamamoto> +1
14:37:52 <lajoskatona> +1
14:38:38 <lajoskatona> Next one:
14:38:43 <lajoskatona> [ovn]Floating IP adds distributed attributes (#link https://bugs.launchpad.net/neutron/+bug/1978039)
14:39:37 <lajoskatona> For this one the reporter even opened a Blueprint: #link https://blueprints.launchpad.net/neutron/+spec/custom-floatingip-distributed
14:40:02 <slaweq> I read that one and I'm not big fan of this proposal really
14:40:13 <slaweq> we already have "distributed" flag for router
14:40:30 <ralonsoh> not in OVN, this flag does not apply
14:40:34 <slaweq> and IMO we should use that one in OVN too and make all FIPs in one router to be distributed or not
14:40:54 <slaweq> the reason for that is that I don't want to have API differences depending on the backend
14:41:21 <slaweq> ralonsoh I know that it don't apply now but IMO we should make OVN aware of this flag rather than adding "distributed" to the floating IPs
14:42:03 <ralonsoh> slaweq, but I think he wants to make this even more granular
14:42:18 <slaweq> or maybe combine 2 approaches in ovn case: 1. router's distributed would be the default setting for FIPs in router
14:42:19 <ralonsoh> I know, I know, the current API does not allow it
14:42:36 <slaweq> and "per FIP" "distributed" would allow to overide that router's setting
14:42:51 <slaweq> like we have e.g. with qos policies on network and ports
14:43:09 <slaweq> that way API will be more consistent IMO with ML2/OVS+DVR
14:43:11 <slaweq> wdyt?
14:43:16 <ralonsoh> so implement the router --distributed flag
14:43:23 <ralonsoh> and also allow the flag per FIP?
14:43:35 <slaweq> ralonsoh yeah
14:43:45 <amotoki> I have a question on "distributed" FIP. FIP is usually created by regular users. It allows users to control whether FIP is distirbuted or not. is this what operators would like to allow?
14:43:57 <ralonsoh> good question...
14:43:57 <slaweq> then if someone wants to have all FIPs in router distributed (or not) it can be done with one setting on the router's level
14:44:16 <ralonsoh> amotoki, maybe that should be controller with a policy
14:44:24 <slaweq> amotoki that could be controlled by policies
14:44:32 <slaweq> and by default only admin would be able to do that
14:44:34 <opendevreview> Merged openstack/networking-ovn stable/train: Revert "Fix member_batch_update function to follow api v2"  https://review.opendev.org/c/openstack/networking-ovn/+/847255
14:44:47 <slaweq> and again, that's why router's distributed flag as default would be good IMO
14:44:54 <amotoki> ralonsoh: slaweq: yes. we can allow oeprators to control them via policies
14:45:23 <ralonsoh> +1 to this proposal (with the two steps implementation + policies)
14:46:01 <slaweq> ralonsoh implementation can be "one step", it would be just something like:
14:46:20 <slaweq> is_fip_distributed = fip['distributed'] or router['distributed']
14:46:31 <ralonsoh> perfect
14:46:43 <slaweq> router['distributed'] is already there but ovn don't respect it
14:46:45 <lajoskatona> +1
14:46:55 <slaweq> so that way it would start to use that parameter
14:47:14 <slaweq> and we need to add "fip['distributed']" (with api extension and so on)
14:47:17 <amotoki> is there no case where a router is distributed but FIP is not distributed?
14:47:25 <slaweq> and implement that "per fip" in ovn_l3
14:47:46 <slaweq> amotoki in ovn case in fact You control that "per fip"
14:48:17 <slaweq> so You can have by default all fips distributed but for some reason make one fip to be centralized
14:48:17 <amotoki> slaweq: so, ' fip['distributed'] or router['distributed']' does not work perhaps
14:48:43 <slaweq> amotoki yeah, that was kind of shortcut
14:48:46 <amotoki> router['distributed'] can be the default for "distributed" for FIP
14:48:56 <slaweq> it would need to check if it's None or not
14:48:58 <slaweq> etc.
14:49:05 <slaweq> I just wanted to be fast here :)
14:49:12 <amotoki> yeah
14:49:22 <slaweq> sorry if I wasn't clear :)
14:49:54 <amotoki> anyway what we discussed is similar to what we do for router.distributed (and ha) now.
14:50:36 <yamamoto> my honest impression is that the meaning of "distributed" is too backend specific to have it as a generic attribute. (i had the same impression on the dvr's one too.)
14:51:10 <yamamoto> but maybe it's ok if it makes sense for ovn...
14:52:11 <lajoskatona> ok, so we need API extension and the logic to make it with OVN
14:52:28 <lajoskatona> make it work with OVN
14:53:00 <amotoki> lajoskatona: an API externsion for fip.distributed, right?
14:53:11 <lajoskatona> amotoki: yes
14:53:16 <slaweq> +1
14:53:28 <lajoskatona> +1
14:53:29 <ralonsoh> +1
14:54:12 <amotoki> just for clarification. is the first step to honor router.distributed in OVN backend?
14:54:56 <slaweq> amotoki IMO it can but don't have to be
14:55:12 <slaweq> as this setting will need to be checked for each FIP by ovn_l3 plugin
14:55:47 <slaweq> but we can add it as first step and then "extend" it with fip['distributed'] when that API extension will be there
14:55:48 <amotoki> slaweq: thanks
14:55:51 <lajoskatona> so even if router=distributed, the fips for must be checked one-by-one? ok
14:56:14 <slaweq> lajoskatona I think so as You can/need to set it per IP in ovn
14:56:22 <lajoskatona> ok, I will add the summary to the RFE
14:56:26 <lajoskatona> slaweq: ok, thanks
14:56:29 <slaweq> but maybe I'm wrong on that, ralonsoh will for sure know it better
14:56:47 <ralonsoh> that's part of the implementation, we can review the patches later
14:56:53 <lajoskatona> +1
14:56:55 <slaweq> yeah
14:57:25 <amotoki> I did not vote yet. +1 for this proposal itself. I am okay with the direction.
14:58:05 <yamamoto> +1 (as a parameter for specific backend, similarly to router.distributed for dvr)
14:58:17 <lajoskatona> ok, thanks for the discussion
14:58:39 <lajoskatona> we have one more RFE, but we have no more time, so let's postpone that one for next time
14:59:06 <slaweq> ok, have a great weekend all!
14:59:09 <lajoskatona> #endmeeting