14:00:11 <lajoskatona> #startmeeting neutron_drivers
14:00:12 <opendevmeet> Meeting started Fri Aug 26 14:00:11 2022 UTC and is due to finish in 60 minutes.  The chair is lajoskatona. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:12 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:12 <opendevmeet> The meeting name has been set to 'neutron_drivers'
14:00:13 <lajoskatona> o/
14:00:16 <ralonsoh> hello
14:00:20 <amorin_> hello
14:00:40 <slaweq> hi
14:01:23 <njohnston_> o/
14:02:50 <lajoskatona> mlavalle can't join today
14:03:03 <mlavalle> o/
14:03:13 * mlavalle in another meeting
14:04:03 <lajoskatona> we have four drivers today, but we need at least 5 as I count
14:04:13 <obondarev_> hi
14:04:18 <lajoskatona> 5 :-)
14:04:29 <haleyb> hi
14:04:52 <lajoskatona> Let's start :-)
14:05:13 <lajoskatona> We have 2 RFEs which are related to each other as I see:
14:05:19 <lajoskatona> [RFE] Add DSCP mark 44 (#link https://bugs.launchpad.net/neutron/+bug/1987378 )
14:05:40 <ralonsoh> This one is easy, I think
14:05:41 <lajoskatona> In comment slaweq already added that perhaps no need for a formal RFE for this
14:05:47 <slaweq> TBH, for me it doesn't seems like real RFE even
14:05:58 <lajoskatona> +1
14:05:58 <ralonsoh> (I added the RFE just in case)
14:06:06 <slaweq> this seems like minor and obvious change (addition)
14:06:12 <njohnston> +1
14:06:22 <ralonsoh> I have just one question related, now you are here
14:06:34 <lajoskatona> ralonsoh: please
14:06:47 <ralonsoh> this feature implies to add a new DSCP mark, that implies changing the API
14:06:48 <ralonsoh> but
14:06:59 <ralonsoh> the API values can be consulted via API too
14:07:02 <ralonsoh> example
14:07:20 <ralonsoh> https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/854369/1/neutron_tempest_plugin/api/test_qos.py
14:07:57 <ralonsoh> so I don't think we need an extension, because those values can be consulted via API (these values = DSCP accepted values, per driver)
14:08:01 <ralonsoh> right?
14:08:16 <slaweq> for me this sounds good
14:08:36 <slaweq> we can discover values acceptable in the cloud using that existing API
14:08:42 <ralonsoh> exactly
14:08:47 <obondarev> makes sense
14:08:51 <njohnston> agreed
14:09:11 <ralonsoh> thanks a lot, that's all from me related to this feature
14:09:21 <lajoskatona> +1
14:09:40 <haleyb> +1 from me
14:10:04 <njohnston> I would just say that we should make sure that discoverabiluty method is in the api ref - I don't see it when I look at the DSCP api
14:10:30 <ralonsoh> it is in the rule-type API
14:10:37 <ralonsoh> I'll send you the link
14:11:24 <ralonsoh> https://docs.openstack.org/api-ref/network/v2/index.html?expanded=list-qos-rule-types-detail,show-qos-rule-type-details-detail#show-qos-rule-type-details
14:11:48 <njohnston> perhaps we add an example there that shows the DSCP details, none of the examples demonstrate that
14:12:05 <ralonsoh> njohnston, I'll add it
14:12:11 <njohnston> +1 thanks!
14:12:17 <lajoskatona> good idea
14:12:44 <lajoskatona> ok, I think we can move on
14:12:48 <lajoskatona> [RFE] Add a port extension to set/define the switchdev capabilities (#link https://bugs.launchpad.net/neutron/+bug/1987093 )
14:13:25 <ralonsoh> the goal of this RFE is to avoid exposing/modifying the port bindings from Neutron (or a Neutron user)
14:13:33 <ralonsoh> port binding should be read-only always
14:13:41 <ralonsoh> and only Nova should change that
14:14:04 <ralonsoh> so, in order to create a HW offloaded port, I'm proposing this new extension
14:14:31 <ralonsoh> why? a non-admin user, with the needed policies, can change a port binding PCI address
14:14:45 <ralonsoh> pointing, for example, to the compute node hard drive PCI address
14:14:56 <ralonsoh> that means when you boot the VM, this compute nodes dies
14:15:04 <ralonsoh> that's a possible security issue
14:15:08 <slaweq> for backward compatybility will this value be still exposed in binding profile, as it is now?
14:15:13 <ralonsoh> yes
14:15:21 <slaweq> so there will be no changes on nova required
14:15:28 <ralonsoh> it will
14:15:34 <slaweq> ok, sounds good for me
14:16:14 <lajoskatona> agree, as I see it is nova who has all the knowledge for the profile, let it do its best :-)
14:16:30 <ralonsoh> (and btw, that was something proposed by a Nova folk, sean-k-mooney )
14:16:50 <ralonsoh> so we'll have full support from Nova side
14:16:50 <lajoskatona> :-)
14:16:58 <obondarev> should we try make it consistent with direct ports (SRIOV)?
14:17:11 <ralonsoh> yes, that's the goal
14:17:24 <obondarev> ok cool
14:18:22 <lajoskatona> Ok, it was also easy
14:18:59 <lajoskatona> We have no more RFEs, but:
14:19:02 <lajoskatona> #topic On Demand Agenda
14:19:08 <lajoskatona> (amorin): Add new novaplug mechanism driver
14:19:14 <amorin> Hey
14:19:15 <lajoskatona> https://bugs.launchpad.net/neutron/+bug/1986969
14:19:21 <lajoskatona> https://review.opendev.org/c/openstack/neutron/+/854553
14:19:37 <amorin> I can explain a little bit the context
14:19:46 <sean-k-mooney> o/
14:19:58 <amorin> hey sean-k-mooney
14:20:03 <sean-k-mooney> this the trusted vf changes
14:20:08 <sean-k-mooney> or hardware offloaed ovs
14:20:33 <ralonsoh> sean-k-mooney, it is approved, yes
14:20:35 <sean-k-mooney> for the swtichdev capablity nova should set that on our side
14:20:44 <ralonsoh> we are talking about other bug now
14:20:49 <sean-k-mooney> fro trustedVF a new extstion on the neutron side
14:20:52 <sean-k-mooney> woudl be ideal
14:21:16 <lajoskatona> sean-k-mooney:  for this one: https://bugs.launchpad.net/neutron/+bug/1987093 ?
14:21:31 <sean-k-mooney> we should not  do that
14:21:46 <sean-k-mooney> at least not as the tital states
14:22:01 <sean-k-mooney> nova can detect if the VF has that capablity and set it
14:22:17 <sean-k-mooney> no if we want to allow user to schdule to a shost that has hardawr offloaed ovs
14:22:28 <sean-k-mooney> having an extion to request that via a triat or somthing would be nice
14:22:39 <sean-k-mooney> but setting the capablit is somethign we shoudl fix in nova
14:22:58 <sean-k-mooney> the trusted VF feature ahs a simiar problem however
14:23:04 <sean-k-mooney> that can only be fixed in neutorn
14:23:14 <sean-k-mooney> it also requires a vaule to be set in the profile to use
14:23:23 <sean-k-mooney> that shoudl be done differntly
14:23:37 <ralonsoh> but this is other feature, not related
14:23:43 <ralonsoh> we can address it in other bug
14:23:59 <sean-k-mooney> yes but i don tthink you shoudl try to adress https://bugs.launchpad.net/neutron/+bug/1987093 in nova
14:24:03 <sean-k-mooney> * in neutron
14:24:24 <ralonsoh> ok, now I'm lost
14:24:32 <lajoskatona> me too :-)
14:24:46 <sean-k-mooney> the binding profie is ment to be used to pass info form nova to the network backend
14:24:58 <lajoskatona> ok, let's go back to https://bugs.launchpad.net/neutron/+bug/1987093 ([RFE] Add a port extension to set/define the switchdev capabilities )
14:25:22 <sean-k-mooney> right so i am not conviced we shoudl do ^
14:25:49 <sean-k-mooney> nova does not use the  "{"capabilities": ["switchdev"]}" for anything today
14:26:00 <lajoskatona> do we need a spec on nova side also to be in sync?
14:26:02 <sean-k-mooney> and the enduer has no idea if the VF supports it
14:26:26 <ralonsoh> and how should we ask for a HWOL port?
14:26:37 <sean-k-mooney> just vic type = direct
14:26:43 <sean-k-mooney> no special request
14:27:00 <ralonsoh> and if we have SRIOV + OVS?
14:27:10 <ralonsoh> what driver should attend this request?
14:27:12 <sean-k-mooney> if we want to suppot that we shoudl have a custom trait
14:27:21 <sean-k-mooney> but just adding  "{"capabilities": ["switchdev"]}"
14:27:29 <sean-k-mooney> will not allow you to chosee today
14:27:43 <sean-k-mooney> the nova spec to implemnt the checking based on that was not finished
14:27:59 <sean-k-mooney> ralonsoh: you wote the code may years ago but we never merged it
14:28:03 <ralonsoh> this is the way we have now to discriminate if this port is for OVS (HW offload) or SRIOV (not HWOL)
14:28:11 <sean-k-mooney> nope
14:28:16 <ralonsoh> in Neutron yes
14:28:27 <sean-k-mooney> the bug fix that enforect that requriemtn was wrong
14:28:40 <sean-k-mooney> its check by ovs but that check is useless
14:28:46 <ralonsoh> if (vnic_type == portbindings.VNIC_DIRECT and
14:28:46 <ralonsoh> 'switchdev' in capabilities):
14:28:46 <ralonsoh> LOG.debug("Refusing to bind due to unsupported vnic_type: %s "
14:28:46 <ralonsoh> "with switchdev capability", portbindings.VNIC_DIRECT)
14:28:49 <ralonsoh> ^^ SRIOV
14:28:54 <ralonsoh> and the opposite in OVS
14:28:56 <sean-k-mooney> yep
14:29:01 <sean-k-mooney> that check is pointless
14:29:22 <sean-k-mooney> you have no idea if that port is conencted to a VF that is in swtichdev mode
14:29:24 <ralonsoh> Ok, let's do this
14:29:35 <ralonsoh> let's move this discussion out of this meeting
14:29:40 <sean-k-mooney> +1
14:29:42 <ralonsoh> between you and me
14:29:51 <ralonsoh> and we can continue with amorin bug
14:29:54 <sean-k-mooney> well with anyone that is interested
14:29:58 <amorin> ok
14:29:59 <sean-k-mooney> but sure
14:29:59 <ralonsoh> (I'll update the bug in LP)
14:30:02 <lajoskatona> thanks, let's do that
14:30:13 <lajoskatona> ralonsoh, sean-k-mooney: thanks
14:30:20 <amorin> we have customers using neutron api from terraform, they create port with a device_id, which endup NOT beeing attached to the instance
14:30:35 <amorin> so, to solve this on our side, we will add a custom mech-driver (novaplug)
14:30:38 <sean-k-mooney> ah this im glad im here :)
14:31:10 <amorin> after a discussion with sean-k-mooney
14:31:29 <amorin> it seems that the best would be to set device_id to be an admin only thing
14:31:49 <ralonsoh> (BTW, the goal of https://bugs.launchpad.net/neutron/+bug/1986969 was not to create a new mech driver, but to document the current behaviour)
14:31:55 <amorin> because nova will never approve plugging a port outside of a regular nova attache
14:32:08 <amorin> ralonsoh, yes true
14:32:19 <amorin> I just wanted to share what we will do downstream
14:32:38 <amorin> I dont want to bring our solution upstream, I just want to fix the issue at the end
14:32:57 <amorin> I created this in nova: https://review.opendev.org/c/openstack/nova/+/854627
14:33:06 <sean-k-mooney> yep so form a nova point of view detaching or attaching a port via neuton is not supported
14:33:15 <sean-k-mooney> and we dont want to supprot that in the future
14:33:26 <ralonsoh> right, you shouldn't
14:33:48 <sean-k-mooney> so we would prefer i fthe device_id and devie_oweer filed were admin/service user ownly for post/put
14:34:02 <sean-k-mooney> allowing normaly users readonly access is fine
14:34:14 <sean-k-mooney> but only nova/ironic/zun shoudl set them
14:34:26 <sean-k-mooney> as part of there internal interface attach workflows
14:34:52 <sean-k-mooney> well or neuton in the case of l3 router ports ecrta
14:35:06 <lajoskatona> sounds logical
14:35:12 <amorin> +1
14:35:42 <slaweq> ok, but if we will change that API and allow it only to the admin, we may break someone's usecase potentially if e.g. someone uses this field now for some custom ports
14:36:01 <sean-k-mooney> yes that is possible
14:36:07 <obondarev> for device_owner I think many users might use it
14:36:08 <amorin> maybe we can add an extra option in neutron.conf to change this behavior>
14:36:11 <amorin> ?
14:36:13 <slaweq> (I don't know about such uses cases in real world, just thinking loud now)
14:36:28 <sean-k-mooney> well it can be contoled by custom policy
14:36:34 <ralonsoh> this is part of the port binding
14:36:40 <sean-k-mooney> i know of one incorrect  use today
14:37:08 <amorin> custom policy is not an option here, if it's available, our customer expects to use it
14:37:12 <sean-k-mooney> the shfit on stack team were considering using the id to reserve ip for contianer
14:37:44 <ralonsoh> so another reason to make the port binding read only
14:38:08 <slaweq> ralonsoh port binding?
14:38:23 <slaweq> don't we talk about device_id and device_owner fields?
14:38:24 <obondarev> these are separate fields
14:38:33 <ralonsoh> sorry, no, this is not part of the port binding info
14:38:34 <sean-k-mooney> yes are seperate yes
14:38:35 <ralonsoh> my bad
14:38:36 <lajoskatona> slaweq: I thought the same
14:38:43 <slaweq> ok
14:38:51 <sean-k-mooney> nova sets them during port attch at the same time it binds the port
14:38:56 <sean-k-mooney> but they are seperate
14:39:21 <sean-k-mooney> but that is just use minimisng the api calls not
14:39:27 <slaweq> ok, so I would like to clarify one thing as I'm not sure I'm following all the discussion here
14:40:10 <slaweq> what is the proposal to discuss: document current behaviour on Neutron side or add new mechanism driver in neutron as amorin proposed already in gerrit or maybe something else?
14:40:23 <slaweq> like changing current default API policies?
14:40:27 <opendevreview> sean mooney proposed openstack/neutron master: Revert "ovs mech: bind only if user request switchdev"  https://review.opendev.org/c/openstack/neutron/+/854796
14:40:52 <sean-k-mooney> i was suggesting changing the default api policy
14:40:55 <lajoskatona> To document is the minimum as I see
14:41:03 <amorin> document +1
14:41:07 <sean-k-mooney> documenting is alwasy good
14:41:15 <amorin> changing it to admin only
14:41:23 <lajoskatona> but changing the policies that can be risky as you said with current unknown users
14:41:30 <sean-k-mooney> the issue is right now if you set the device-id to a nova server uuid you will start currptiong the nova db
14:41:55 <amorin> moreover, you can set it to a device_id which does not belong to the same tenant
14:41:59 <amorin> nova does nothing about it
14:42:12 <amorin> but openstack port list --server uuid
14:42:15 <amorin> as admin
14:42:19 <amorin> will print it
14:42:22 <sean-k-mooney> really thats not nice
14:42:22 <obondarev> should nova distinguish real interfaces from such manually created ones?
14:42:29 <amorin> very confusing for admins
14:43:01 <sean-k-mooney> the curent api contarct is that the device-id should be set to the entity that consumes the port
14:43:16 <sean-k-mooney> entiy being nova instnace, ironic server or zun contianer
14:43:49 <sean-k-mooney> the device-owner field is not used for filtering
14:43:50 <slaweq> so as sean-k-mooney said, we can change default API policies, document it and write "big" release note that this behaviour changed and if You want to use it in old way, You need to change Your API policies
14:44:06 <amorin> +1
14:44:11 <sean-k-mooney> yep
14:44:27 <lajoskatona> yeah, as it wa never an official "feature"
14:44:29 <lajoskatona> +1
14:44:30 <sean-k-mooney> by the way the usecase the opensfhit had was creating port to reservice ips for there use
14:44:48 <sean-k-mooney> so i also feel like that could be adress as its won feature another way
14:45:10 <sean-k-mooney> i.e. provide a way to reserve ips without having to create unattached ports
14:45:46 <amorin> but you can create ports without device_id, why do they need a device_id?
14:45:49 <sean-k-mooney> baicaly like creating flotating ip but form a normal subnet
14:45:56 <slaweq> ^^ that would be another, bigger RFE IMO
14:46:04 <sean-k-mooney> amorin: they wanted to track that the ip is used by that vm
14:46:16 <sean-k-mooney> or rahter a contianer on that vm
14:46:27 <amorin> ack
14:46:37 <sean-k-mooney> slaweq: yes it would
14:46:41 <amorin> were they aware that a reboot hard of the VM would actually attached it?
14:46:53 <sean-k-mooney> nope
14:47:00 <sean-k-mooney> and we were not either
14:47:06 <sean-k-mooney> (nova team)
14:47:09 <amorin> ok
14:47:14 <sean-k-mooney> i dont thin that used to happen
14:47:22 <ralonsoh> it works, yes
14:47:30 <sean-k-mooney> i think its a side effect of the network info cahce healing i added a few cycle ago
14:47:33 <lajoskatona> do we have to handle that case? I mean hard reboot ?
14:47:37 <amorin> true
14:47:42 <sean-k-mooney> its not ment ot work because it does not actully set thing up in our db properly
14:47:46 <amorin> (for the healing side effect)
14:47:47 <obondarev> so should we disallow device_id only or device_owner is also required?
14:48:03 <amorin> both I think
14:48:20 <sean-k-mooney> nova likely need to have its own bug for this and figure out hwo we want to move forward
14:48:59 <lajoskatona> perhaps a good discussion for the PTG :-)
14:49:09 <sean-k-mooney> yep perhaps
14:49:37 <sean-k-mooney> likely we shoudl jsut reject the external event and not populate it in the netowrk info cache and complain loadly
14:49:54 <sean-k-mooney> the probalem is to who
14:50:08 <lajoskatona> ok, so today and from Neutron perspective we agree that we have to document this behaviour and change the default policies for dev_id and dev_owner, am I right?
14:50:30 <ralonsoh> I think so, this should be the first step
14:50:43 <obondarev> well, the issue described is only related to device_id, so not sure device_owner is required here also
14:50:54 <lajoskatona> ok, thanks
14:51:49 <lajoskatona> no it mentions owner also, let's see it in the review, I would say :-)
14:52:08 <obondarev> less potential "broken" users is better I think
14:52:15 <ralonsoh> right
14:52:25 <slaweq> ++
14:52:26 <lajoskatona> that is a good point
14:53:04 <obondarev> + for continue in review
14:53:04 <lajoskatona> Ok, so documentation and policy change for device_id for now
14:53:08 <amorin> ok
14:53:33 <lajoskatona> Do you have anything more to discuss today?
14:54:35 <lajoskatona> If nothing we can close the meeting for today
14:54:43 <slaweq> nothing from me
14:55:24 <lajoskatona> #endmeeting