14:00:30 <haleyb> #startmeeting neutron_drivers 14:00:30 <opendevmeet> Meeting started Fri Nov 10 14:00:30 2023 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:30 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:30 <opendevmeet> The meeting name has been set to 'neutron_drivers' 14:00:38 <slaweq> o/ 14:00:53 <obondarev_> o/ 14:00:59 <mlavalle> o/ 14:01:14 <mlavalle> haleyb: veterans day? 14:01:34 <haleyb> mlavalle: yes, and i didn't even know until yesterday 14:02:00 <mlavalle> it's in our calendar, but it is not a holiday for us 14:02:41 <lajoskatona> o/ 14:02:54 <haleyb> jlibosva, racosta_: are you around? you have the agenda items today 14:02:59 <jlibosva> o/ 14:03:01 <jlibosva> I am 14:03:06 <racosta_> o/ 14:03:43 <jlibosva> shall I start? 14:03:45 <haleyb> ok, i think we have quorom 14:03:57 <racosta_> please, go ahead jlibosva. 14:04:50 <jlibosva> I'm not familiar with the format of this meeting so I guess once we enter the "on demand agenda" I shall start? 14:05:09 <mlavalle> yeah go ahead 14:05:16 <mlavalle> we can improvise a bit 14:05:51 <haleyb> jlibosva: right, we can go right to on-demand 14:06:14 <haleyb> #topic on-demand 14:06:22 <haleyb> :) 14:07:00 <jlibosva> ok, thanks :) so I was told to bring this topic here, it's related to ports and its bindings 14:07:02 <jlibosva> #link https://review.opendev.org/c/openstack/neutron/+/892815 14:07:34 <jlibosva> so we can have multiple mech drivers configured and based on the vif type the driver binds the port on a host 14:08:00 <jlibosva> the vif type allows PUT actions (Updated) in the API layer so it's currently possible to change the type while a port is bound 14:09:05 <obondarev> but that will not be a correct change, right? 14:09:17 <jlibosva> now since API is uses stateless resources - ie we don't know if the updated port is bound or not until we look at the DB, which is layer below API, we can't forbid the update operation on port. So I made the patch to fail changing type if port is bound on the db layer 14:09:55 <jlibosva> obondarev: right 14:10:26 <jlibosva> there was a bug in Nova too that if user changed the type, the networking went down. From the Nova perspective this should not be an allowed operation too. And I agree 14:10:45 <obondarev> so it's not an API change to me, just handling an incorrect request 14:10:47 <jlibosva> or - if the type is changed we would need to re-bind the port 14:11:26 <jlibosva> yes, I agree with obondarev. the state a port is in should not allow the change 14:11:49 <jlibosva> there are some concerns on the review that would be good to discuss here 14:12:58 <jlibosva> Rodolfo and Lajos had some concerns and I see Lajos just dropped and Rodolfo is on PTO 14:14:26 <obondarev> so what happens currently when someone updates vif_type of a bound port? 14:14:57 <haleyb> this is vnic_type right? 14:15:18 <slaweq> why do we want to allow changing that on unbound ports? Maybe it would be better (and easier) to simply forbid that on the API level for all types of ports, wdyt? 14:15:28 <jlibosva> yes, the attribute is vnic_type, sorry 14:16:09 <jlibosva> slaweq: the only scenario I could think of would be that someone wants to create a port (like SR-IOV) and forgets to set the type. so instead of deleting the resource and re-creating one can just update, if it's not bound 14:16:32 <obondarev> just like any other update I think 14:16:41 <slaweq> but in such case user can easily delete and then create port again 14:16:42 <obondarev> not save 1 API call at least :) 14:16:48 <obondarev> to* 14:17:02 <jlibosva> as for what happens, I think there will be inconsistency when mechanism drivers or agents query that attribute 14:18:36 <lajoskatona> sorry seems I have some network issue 14:23:04 <haleyb> are there any other questions? 14:23:13 <mlavalle> so let me attempt to summarize the issue: over the years, we, the Neutron community, allowed a certain specific case of a port update and now we realize we shouldn't have 14:23:35 <jlibosva> so I guess there are two ways how to approach it? 1) disallow to update the attribute on API level 2) always check if the port is bound if the attribute is updated on the DB level (the linked patch) 14:24:08 <lajoskatona1> I think I have now working network, sorry for popping up and down 14:24:15 <obondarev> can 2 also include api-ref note? 14:24:24 <haleyb> and if 2) does it need to be discoverable with an extension 14:24:48 <lajoskatona1> yes my concern was to add an extension to show the users that hey the api behaves deifferently 14:24:52 <mlavalle> so we want to fix the bahavior but we worry about changing the behavior. Let's make that behavior discoverable and maybe optional 14:25:08 <lajoskatona1> Iyes good summary 14:25:20 <obondarev> but could anyone used it for any good reason? 14:25:23 <lajoskatona1> I am not sure to be that rigid and even for bug fix add extension 14:25:37 <lajoskatona1> perhaps not necessary and better to fix it 14:25:50 <mlavalle> obondarev: it doesn't seem logical, but we just don't know 14:26:04 <slaweq> I'm not sure we need extension for that - we will just prevent users from doing something what can lead only to the bad things finally so IMO release note would be enough in this case 14:26:10 <lajoskatona1> so my goal was to discuss it and have consensus before just doing anything unchangble 14:26:28 <obondarev> agree with slaweq 14:27:04 <mlavalle> I would be ok with just fixing it. I was not advicating a position, just weighing the alternatives 14:27:05 <lajoskatona1> as I remember from ralonsoh's comment he had the concern 14:27:21 <jlibosva> #link https://bugs.launchpad.net/nova/+bug/1981813 14:27:28 <jlibosva> there is the nova portion and how it was discovered 14:27:43 <haleyb> right, if the change results in an unusable resource i would think an error is appropriate 14:28:48 <haleyb> in my opinion 14:28:57 <obondarev> + 14:28:58 <lajoskatona1> +1 14:29:04 <mlavalle> +1 14:29:22 <haleyb> i was going to ask for a vote, but there it is 14:29:26 <lajoskatona1> and if there is no API extension we can backport it 14:29:41 <lajoskatona1> as I see from the nova bug the fix was backported 14:30:26 <mlavalle> teah, that's a plus 14:30:31 <mlavalle> yeah 14:31:05 <haleyb> so we proceed as a bug fix and don't require an extension 14:31:09 <haleyb> slaweq: opinion? 14:31:34 <slaweq> I'm good with that, no extension needed IMO 14:32:17 <haleyb> ok, i'll take the +1's as consensus on moving forward without an extension 14:32:20 <slaweq> regarding backport - I think we should maybe ask stable core team for opinion also 14:32:25 <slaweq> but personally I don't see reason why we shouldn't backport it 14:32:34 <haleyb> good discussion on the topic 14:32:38 <jlibosva> thanks everyone 14:32:57 <mlavalle> jlibosva: thanks for bringing it up :-) 14:33:06 <jlibosva> the credit goes to lajoskatona1 :) 14:33:30 <lajoskatona1> I stopped the thing, sorry for that 14:33:32 <mlavalle> yeah, but you also took the time to chat with us today 14:33:52 <jlibosva> that's always a pleasure 14:33:57 <haleyb> slaweq: it seems like the nova-related change was backported, so i would +1 a backport 14:34:08 <haleyb> of course that was a CVE 14:34:13 <slaweq> good for me then 14:34:23 <mlavalle> yeah, otherwise the whole thing is useless 14:34:36 <lajoskatona1> +1 for backport with re-no 14:35:00 <jlibosva> well, afaiu Nova doesn't need the Neutron patch. They basically just made Nova aware that the type can be changed in Neutron. Which they didn't count with before 14:35:13 <jlibosva> just to make it clear :) 14:35:24 <haleyb> jlibosva: so just to confirm - change is good, but add rlease note, and you will update the api-ref? 14:35:32 <jlibosva> haleyb: yesir 14:36:16 <haleyb> ack, thanks 14:36:31 <haleyb> racosta_: ok, we can move on to your items 14:36:45 <racosta_> ok, thanks. 14:36:53 <racosta_> Although I added the same topics I presented at PTG, I believe that's required a formal ack for RFE's in driver meetings. 14:37:07 <racosta_> # RFE: BGP peer connect mode. I think we're on track in this one. 14:37:14 <racosta_> #link https://review.opendev.org/c/openstack/neutron-specs/+/899210 14:37:38 <racosta_> It is basically a new configurable parameter to keep the connection with the BGP peer 'active' and avoid unnecessary disconnections if it enters an idle state on the switch side. 14:37:59 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2006145 14:39:56 <racosta_> It's a trivial change on the n-d-r side, and will be configurable to allow compatibility with anyone who uses the default passive default. 14:40:06 <racosta_> *passive mode 14:40:21 <racosta_> Any questions or comments on this? 14:41:20 <lajoskatona1> nothing from me 14:42:09 <haleyb> not from me, thinks it fixes a valid issue 14:42:16 <mlavalle> +1 14:42:30 <haleyb> +1 14:42:36 <obondarev> no questions 14:42:45 <obondarev> +1 14:42:46 <slaweq> IIRC obondarev had some concerns in the spec review, right? 14:43:07 <obondarev> ah, did I? Let me check 14:43:24 <slaweq> no, I think I messed it with some other spec, sorry :) 14:43:34 <obondarev> ok, no worries :) 14:44:32 <haleyb> slaweq, lajoskatona1: votes? to make it official 14:44:43 <slaweq> +1 14:44:46 <lajoskatona1> +1 14:45:07 <haleyb> ok, thanks, i've marked it rfe-approved 14:45:39 <racosta_> ok, thanks. we can move on to the next 14:45:49 <racosta_> #RFE: BGP speaker peer sessions resilient. This one is related to RMQ/Infra failures. 14:46:05 <racosta_> # LP: https://bugs.launchpad.net/neutron/+bug/2006145 14:46:09 <racosta_> #link https://review.opendev.org/c/openstack/neutron-specs/+/899209 14:46:30 <racosta_> It's a little more complex. The goal is to introduce a new speaker cache logic for the DRAgent can keep the speaker settings and the BGP peer sessions in case of RPC Exceptions, and/or reestablishment of communication via RPC. Basically: a new config option 'speaker_cache_timeout'. 14:47:18 <racosta_> In the RFE proposal, the cache timeout time is configurable and can be adjusted according to the time it actually takes for the RMQ to respond correctly again (transient time after RMQ/Infra comes back - cluster convergence, mysql issues, etc.) 14:47:53 <racosta_> There would be another way to do it, as Felix's suggestion. 14:48:04 <racosta_> It could be implemented as long as we noticed when the RMQ was offline and online, but I didn't find how to obtain this information via oslo_messaging (it would only be possible to experience timeouts in this case). 14:49:38 <racosta_> I think the cache timeout solves transient RMQ issues. 14:49:50 <racosta_> Any questions or comments on this? 14:51:03 <haleyb> i think mine were answered in the bug 14:51:13 <racosta_> yeah 14:53:25 <lajoskatona1> If i understand well in case of such rmq issue the agent removed the bgp settings? 14:54:11 <obondarev> I'm a bit confused, https://bugs.launchpad.net/neutron/+bug/2006145 does not sound like an RFE, but rather like a bug description 14:54:29 <racosta_> yesh, it can remove the complete BGP speaker confg if the RPC return is empty value 14:54:38 <obondarev> and it was already marked as approved 14:55:49 <lajoskatona1> I like the idea to keep these settings till the agent can connect again (if it is below the timeout) 14:55:58 <lajoskatona1> so +1 from me to the proposal 14:56:38 <mlavalle> +1 from me as well 14:57:05 <haleyb> obondarev: i maybe should have marked it rfe-triaged 14:57:40 <racosta_> this was an old thread in the original bug obondarev, issue or RFE? IMO this is a BUG, but as there was no consensus I proposed as RFE... 14:58:16 <haleyb> and since adding the config option was necessary i believe was the reason 14:58:30 <obondarev> I see, so there are 2 specs for a single RFE https://bugs.launchpad.net/neutron/+bug/2006145? 14:59:00 <obondarev> https://review.opendev.org/c/openstack/neutron-specs/+/899209 and https://review.opendev.org/c/openstack/neutron-specs/+/899210 14:59:47 <racosta_> no no, are different cases. 15:00:13 <obondarev> ah, sorry, my bad 15:00:50 <racosta_> no worries, but the two are related to the n-d-r (BGP sessions). 15:01:19 <obondarev> I just had 2 same tabs opened :) 15:02:09 <obondarev> I'm ok with this RFE and spec, +1 15:02:19 <haleyb> +1 from me too 15:02:23 <slaweq> +1 from me too 15:02:28 <lajoskatona1> +1 15:02:33 <haleyb> ok, thanks 15:02:44 <mlavalle> +1 15:03:20 <haleyb> and since that was all on the agenda, and we're over time, we are done 15:03:31 <slaweq> o/ 15:03:32 <haleyb> thanks everyone for attending, have a good weekend 15:03:38 <haleyb> #endmeeting