14:00:02 <slaweq> #startmeeting neutron_drivers
14:00:03 <openstack> Meeting started Fri Jan 31 14:00:02 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:04 <slaweq> hi
14:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:07 <openstack> The meeting name has been set to 'neutron_drivers'
14:00:10 <njohnston> o/
14:00:25 <felixhuettner> o/
14:01:02 <ralonsoh> hi
14:01:14 <slaweq> I don't think we will have quorum today as yamamoto and haleyb sent me info that they can't be today and amotoki is sick
14:01:37 <ralonsoh> hmmm you are right...
14:01:53 <ralonsoh> so what do you suggest?
14:02:26 <slaweq> lets wait few more minutes for mlavalle and maybe we can discuss about this min bw issue https://bugs.launchpad.net/neutron/+bug/1861442 at least
14:02:26 <openstack> Launchpad bug 1861442 in neutron "QOS minimum bandwidth rejection of non-physnet network and updates should be driver specific" [Undecided,New]
14:02:31 <slaweq> or we will cancel the meeting
14:02:50 <slaweq> ahh, and welcome our new member of drivers team - njohnston :)
14:02:53 <ralonsoh> perfect, I would like to talk about this issue in particular
14:03:03 <ralonsoh> welcome!!
14:03:27 <njohnston> \o/ thanks all!
14:04:13 <mlavalle> o/
14:04:16 <slaweq> hi mlavalle
14:04:52 <njohnston> so in quorum 50% or >50%?
14:05:11 <slaweq> I would say that >50% but I don't think it was written somewhere
14:05:32 <slaweq> mlavalle: do You think we can say that we have quorum if we have 50% of team?
14:06:08 <mlavalle> no, I think quorum should be simple majority
14:06:10 <njohnston> we always had an odd number of members so it did not come up, at least in the past 4 years or so if I remember correctly
14:06:37 <slaweq> I don't think we have any rush today
14:06:52 <mlavalle> it is better for the sake of the validity of any decision we make that we have a moajority of the team
14:06:55 <slaweq> so lets skip discussing RFEs and go straigh to open discussion
14:07:11 <slaweq> #topic open discussion
14:07:32 <slaweq> we have one bug to discuss: https://bugs.launchpad.net/neutron/+bug/1861442
14:07:32 <openstack> Launchpad bug 1861442 in neutron "QOS minimum bandwidth rejection of non-physnet network and updates should be driver specific" [Undecided,New]
14:07:40 <ralonsoh> TomStappaerts, hi ^^
14:07:41 <TomStappaerts> Hi
14:08:00 <slaweq> TomStappaerts: ralonsoh do You want to explain what You want to discuss exactly?
14:08:11 <njohnston> ralonsoh your question is on point
14:08:31 <ralonsoh> I'm just a viewer here
14:08:35 <ralonsoh> the bug is not mine
14:09:03 <slaweq> TomStappaerts: it's You who reported it, right?
14:09:06 <TomStappaerts> Yes
14:09:45 <TomStappaerts> My problem is that we block stuff in the qos_plugin without calling upon the qos_drivers to ask them whether they support it. With the message "this is unsupported regardless of driver"
14:10:11 <TomStappaerts> But in our SDN we can support minimum bandwidth without needing a physnet
14:10:59 <ralonsoh> sorry for interrupting, just a bit of context
14:11:05 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1819029
14:11:05 <openstack> Launchpad bug 1819029 in neutron "QoS policies with minimum-bandwidth rule should be rejected on non-physnet ports/networks" [Undecided,Fix released] - Assigned to Slawek Kaplonski (slaweq)
14:11:35 <ralonsoh> that was added to, in case of enforcing a minBW, to have a reference of the available BW in the physnet
14:11:53 <TomStappaerts> Yes thx ralonsoh, so at the time it was blocked on purpose. My argument is that the plugin should ask the available drivers whether they support it instead of plainly blocking the possibility.
14:12:25 <slaweq> so how You are than reporting to the placement what resources You have available?
14:13:02 <njohnston> Right, how does the driver indicate the bandwidth capacity of a non-physnet network?
14:13:37 <TomStappaerts> So it seems like I overlooked that part of the reference implementation. Our implementation is based on best effort
14:14:05 <TomStappaerts> Btw I totally understand if you say it makes no sense in normal world :)
14:14:21 <ralonsoh> but, eventually this minBW must be enforced in a physical device
14:14:42 <ralonsoh> (that's why we included this limitation, a part from the placement info population)
14:15:02 <mlavalle> yeah, this is indpendent of placement
14:15:13 <njohnston> So you are trying to get the effect of kin BW by prioritization as opposed to quotaing?
14:15:25 <njohnston> *min
14:16:02 <slaweq> ok, I see
14:16:27 <slaweq> I'm looking into the code now and IMO we made simply mistake by merging this validation in https://github.com/openstack/neutron/blob/master/neutron/services/qos/qos_plugin.py#L222
14:16:45 <slaweq> it should be in driver.is_rule_supported() method
14:17:04 <slaweq> and than it would be "per driver" to decide that
14:17:27 <slaweq> it is called in https://github.com/openstack/neutron/blob/bd26435b5bf42ade45af1059f57b423346663cf5/neutron/services/qos/drivers/manager.py#L137
14:17:46 <slaweq> and it's base definition is in neutron-lib https://opendev.org/openstack/neutron-lib/src/branch/master/neutron_lib/services/qos/base.py#L71
14:18:40 <slaweq> IMO it is simple bug which should be fixed by moving this validation to the driver
14:18:41 <TomStappaerts> That is what I was thinking yes
14:18:55 <slaweq> but I wonder what ralonsoh and maybe rubasov have to say about it
14:19:07 <ralonsoh> I'm reviewing the code right now
14:19:23 <TomStappaerts> But in is_rule_supported you do not have access to the network object as of now
14:20:13 <ralonsoh> ??
14:20:27 <ralonsoh> I don't understand the last comment
14:20:58 <TomStappaerts> https://opendev.org/openstack/neutron-lib/src/branch/master/neutron_lib/services/qos/base.py#L71 Would add verification that it does not support a rule when the network does not have physnet as first segment, right?
14:21:01 <ralonsoh> ahhh you say you don't have the network object in this method
14:21:01 <slaweq> TomStappaerts: yes, now it only accepts rule as argument but IMO we can add additional argument(s) to it
14:21:11 <ralonsoh> exactly
14:21:16 <rubasov> hello
14:21:26 <slaweq> hi rubasov
14:21:29 <ralonsoh> rubasov, https://bugs.launchpad.net/neutron/+bug/1861442
14:21:29 <openstack> Launchpad bug 1861442 in neutron "QOS minimum bandwidth rejection of non-physnet network and updates should be driver specific" [Undecided,New]
14:21:48 <rubasov> looking at it
14:21:53 <ralonsoh> slaweq, you made a good point there
14:22:06 <ralonsoh> the rule should be checked in https://github.com/openstack/neutron/blob/bd26435b5bf42ade45af1059f57b423346663cf5/neutron/services/qos/drivers/manager.py#L137
14:22:22 <ralonsoh> there, each driver decide what to do
14:22:37 <ralonsoh> my concern is what is happening with the information reported to the placement
14:22:52 <ralonsoh> those ports won't have any BW assigned
14:22:56 <ralonsoh> just in the other way
14:23:05 <ralonsoh> when those ports are going to be assigned to a host
14:23:13 <ralonsoh> this host won't have (reported) any BW
14:23:52 <slaweq> ralonsoh: yes, but IMO it's driver's problem, not neutron core - if driver says that it supports this kind of rule, it should be fine for neutron
14:24:05 <ralonsoh> yes
14:24:09 <rubasov> I may not be understanding the whole problem, but the comment in the bug report wanted to say that we were not considering support for min-bw qos for tunneled networks
14:24:20 <njohnston> that is a very good point. but what awareness does placement have of non-physnet networks?  I don’t totally understand the use case.
14:24:28 <rubasov> just to reduce the scope of what we were doing
14:25:05 <rubasov> so if TomStappaerts is doing something like that now then it may not be relevant anymore
14:25:55 <slaweq> rubasov: basically my question to You was: why we are doing validation of phys_net in https://github.com/openstack/neutron/blob/master/neutron/services/qos/qos_plugin.py#L229 rather than in driver.is_rule_supported() method
14:27:53 <rubasov> it seems to me I made a bug
14:28:25 <TomStappaerts> rubasov: You're saying I should not be trying to re-use reference qos_plugin for min bw on tunneled networks? (Sorry I don't quite understand :) )
14:28:52 <slaweq> rubasov: :) that's what I was thinkning now when looking at it
14:29:01 <rubasov> TomStappaerts: I'm trying to say less
14:29:36 <rubasov> TomStappaerts: that we did not consider the consequences of doing that, therefore you may get surprises
14:30:46 <ralonsoh> rubasov, what happens if you have ports with minBW and the placement does not have any host with BW reported?
14:31:05 <ralonsoh> you can't spawn a VM on those hosts
14:31:17 <rubasov> ralonsoh: vm scheduling should fail, shouldn't it?
14:31:24 <ralonsoh> that's the point
14:31:33 <ralonsoh> we can assign ports with minBW qos
14:31:48 <ralonsoh> if we remove this limitation (valid or not)
14:32:05 <ralonsoh> and then, if the placement does not have BW reported, won't schedule to those hosts
14:32:24 <TomStappaerts> Ah ok, yes that is something I haven't considered yet, as I am blocked at this point now. I am more than fine with you guys saying it doesn't make much sense
14:33:14 <slaweq> that's very good point, thx rubasov and ralonsoh
14:33:39 <ralonsoh> ??
14:33:50 <ralonsoh> bye...
14:34:03 <rubasov> so basically I think slaweq summed it up correctly in this: it is simple bug which should be fixed by moving this validation to the driver
14:34:59 <ralonsoh> how do you disable this placement check?
14:35:11 <ralonsoh> when scheduling, rubasov
14:37:31 <rubasov> ralonsoh: by not adding a min-bw rule on the port's policy  (but I may be misunderstanding your question)
14:38:40 <ralonsoh> sorry, but at this point this is the perfect example of interlocking
14:39:31 <njohnston> Can we say that placement only tracks physnet networks, so if the network is not a physnet we treat it as if there is no min bw policy in placement interactions?
14:39:56 <slaweq> tbh if You will spawn instance without qos_policy and than attach qos to the port than it should be fine, no?
14:40:04 <ralonsoh> slaweq, yes
14:40:18 <slaweq> but that's "hack" to workaround this limitation
14:40:18 <ralonsoh> njohnston, I don't know, I need to check that
14:40:34 <ralonsoh> placement is only for scheduling
14:40:47 <ralonsoh> then the port is responsibility of Neutron only
14:41:07 <TomStappaerts> https://github.com/openstack/neutron/blob/master/neutron/services/qos/qos_plugin.py#L129 Seems to handle that case
14:42:05 <ralonsoh> TomStappaerts, this is a temporary code until the placement and Neutron fully supports this feature
14:42:47 <rubasov> currently there's no way to represent bw of non-physnet networks in placement, we need to come up with a way to do that
14:43:17 <ralonsoh> exactly: some kind of manual report (or configured report)
14:44:19 <TomStappaerts> Is that something that fits into the overall scheme of things, or would it be better for me to werite my own qos_plugin
14:44:50 <slaweq> TomStappaerts: IMO this validation part should be fixed by moving it to the driver's code
14:44:58 <slaweq> that's a bug as rubasov confirmed also
14:45:06 <rubasov> this is implicit in the use of CUSTOM_PHYSNET_ traits
14:45:11 <slaweq> so that's we should IMO treat this LP
14:45:41 <slaweq> and about reporting non-physnet networks bw to placement - that's IMO different issue and this should be new RFE
14:45:42 <rubasov> slaweq, TomStappaerts: yes, I'm okay with moving the check to the driver
14:45:57 <slaweq> what do other people think about it? Does it makes sense?
14:46:15 <rubasov> slaweq: exactly
14:46:26 <ralonsoh> at least, for now, we should  add documentation for this: if not physnet --> no scheduling
14:46:30 <ralonsoh> it makes sense
14:46:32 <ralonsoh> +1
14:46:32 <njohnston> makes sense
14:46:33 <njohnston> +1
14:46:49 <ralonsoh> thanks rubasov and TomStappaerts
14:47:16 <TomStappaerts> Thx guys
14:47:24 <rubasov> thanks
14:47:24 <slaweq> mlavalle: any thoughts?
14:47:32 <mlavalle> +1
14:47:43 <njohnston> so we can revisit this when we have quorum but hopefully this discussion will help set the stage for then
14:49:01 <slaweq> ok, I will sum up this discussion in the comment to LP and will also ask offline other drivers to check if they have any concerns to treat it as regular bug
14:49:19 <slaweq> IMO in this case this should be enough
14:49:36 <njohnston> sounds good!
14:50:17 <slaweq> ok, we have few more minutes and one more "not really rfe" thing do discuss
14:50:19 <slaweq> https://review.opendev.org/#/c/702806/
14:50:34 <slaweq> and related bug https://bugs.launchpad.net/neutron/+bug/1855260
14:50:34 <openstack> Launchpad bug 1855260 in neutron "SmartNIC OVS representor port is not removed from OVS on instance tear down" [Undecided,In progress] - Assigned to waleed mousa (waleedm)
14:50:55 <slaweq> ralonsoh: can You elaborate on this a bit as You were involved in that already?
14:51:02 <ralonsoh> if we assume that the OVS agent is going to plug/unplug the interfaces
14:51:04 <ralonsoh> then OK
14:51:21 <ralonsoh> but as Sean pointed out, the L1 operations should be done by Nova/os-vif
14:51:22 <ralonsoh> but
14:51:30 <ralonsoh> because those are smartnic ports
14:51:41 <ralonsoh> and this is happening when the OVS agent is offloaded
14:51:54 <ralonsoh> then I'm not 100% sure about Sean's affirmation
14:52:03 <ralonsoh> I need to check with Sean this last point
14:52:21 <ralonsoh> conclusion: I know nothing
14:52:28 <slaweq> ok :)
14:53:07 <slaweq> but current solution with ovs agent responsible to plug/unplug interfaces is already in neutron (iirc in train at least), right?
14:53:17 <ralonsoh> yes for smartnic ports
14:53:37 <ralonsoh> so yes, the patch is needed once we assumed this role
14:53:59 <slaweq> yes, so IMO we should go with fix for the reported bug now as quick and backportable solution
14:54:05 <ralonsoh> sure
14:54:18 <slaweq> and later maybe propose new rfe to change the logic
14:54:24 <ralonsoh> +1
14:54:47 <njohnston> +1
14:54:52 <mlavalle> =1
14:54:54 <slaweq> I know that sean-k-mooney and You had some opinions about it so I would like to ask what do You think about such solution
14:54:56 <mlavalle> +1
14:56:07 <ralonsoh> I'll review the patch assuming the current state: ovs agent handle those ports
14:57:08 <slaweq> ok, thx a lot
14:57:32 <sean-k-mooney> just reading scrollback now
14:57:34 <slaweq> so I think we had pretty productive meeting even without discussion any rfe :)
14:57:41 <slaweq> hi sean-k-mooney :)
14:57:51 <sean-k-mooney> o/
14:58:34 <sean-k-mooney> slaweq: the question is should the neutron l2 agent be managin l1
14:58:39 <ralonsoh> yes
14:59:14 <sean-k-mooney> IMO i dont think that desing sacles well and i dont think it should. that said im not going to block extending os-vif to support this
14:59:17 <slaweq> sean-k-mooney: yes, I understand Your point and IMO we should think about better solution maybe (I'm not smartnics expert so I may be wrong)
14:59:43 <sean-k-mooney> so the choice is really up to the neuton team to decide
14:59:46 <slaweq> but now I would like to go with quicker solution to fix existing bug and backport fix to stable release
14:59:53 <sean-k-mooney> well my many concner is upgrades
14:59:54 <slaweq> and later we can think about redesign
15:00:22 <slaweq> ok, we are out of time now, thx for the meeting guys
15:00:27 <slaweq> and have a great weekend
15:00:29 <slaweq> o/
15:00:32 <slaweq> #endmeeting