14:00:02 <slaweq> #startmeeting neutron_drivers 14:00:03 <openstack> Meeting started Fri Jan 31 14:00:02 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:04 <slaweq> hi 14:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:07 <openstack> The meeting name has been set to 'neutron_drivers' 14:00:10 <njohnston> o/ 14:00:25 <felixhuettner> o/ 14:01:02 <ralonsoh> hi 14:01:14 <slaweq> I don't think we will have quorum today as yamamoto and haleyb sent me info that they can't be today and amotoki is sick 14:01:37 <ralonsoh> hmmm you are right... 14:01:53 <ralonsoh> so what do you suggest? 14:02:26 <slaweq> lets wait few more minutes for mlavalle and maybe we can discuss about this min bw issue https://bugs.launchpad.net/neutron/+bug/1861442 at least 14:02:26 <openstack> Launchpad bug 1861442 in neutron "QOS minimum bandwidth rejection of non-physnet network and updates should be driver specific" [Undecided,New] 14:02:31 <slaweq> or we will cancel the meeting 14:02:50 <slaweq> ahh, and welcome our new member of drivers team - njohnston :) 14:02:53 <ralonsoh> perfect, I would like to talk about this issue in particular 14:03:03 <ralonsoh> welcome!! 14:03:27 <njohnston> \o/ thanks all! 14:04:13 <mlavalle> o/ 14:04:16 <slaweq> hi mlavalle 14:04:52 <njohnston> so in quorum 50% or >50%? 14:05:11 <slaweq> I would say that >50% but I don't think it was written somewhere 14:05:32 <slaweq> mlavalle: do You think we can say that we have quorum if we have 50% of team? 14:06:08 <mlavalle> no, I think quorum should be simple majority 14:06:10 <njohnston> we always had an odd number of members so it did not come up, at least in the past 4 years or so if I remember correctly 14:06:37 <slaweq> I don't think we have any rush today 14:06:52 <mlavalle> it is better for the sake of the validity of any decision we make that we have a moajority of the team 14:06:55 <slaweq> so lets skip discussing RFEs and go straigh to open discussion 14:07:11 <slaweq> #topic open discussion 14:07:32 <slaweq> we have one bug to discuss: https://bugs.launchpad.net/neutron/+bug/1861442 14:07:32 <openstack> Launchpad bug 1861442 in neutron "QOS minimum bandwidth rejection of non-physnet network and updates should be driver specific" [Undecided,New] 14:07:40 <ralonsoh> TomStappaerts, hi ^^ 14:07:41 <TomStappaerts> Hi 14:08:00 <slaweq> TomStappaerts: ralonsoh do You want to explain what You want to discuss exactly? 14:08:11 <njohnston> ralonsoh your question is on point 14:08:31 <ralonsoh> I'm just a viewer here 14:08:35 <ralonsoh> the bug is not mine 14:09:03 <slaweq> TomStappaerts: it's You who reported it, right? 14:09:06 <TomStappaerts> Yes 14:09:45 <TomStappaerts> My problem is that we block stuff in the qos_plugin without calling upon the qos_drivers to ask them whether they support it. With the message "this is unsupported regardless of driver" 14:10:11 <TomStappaerts> But in our SDN we can support minimum bandwidth without needing a physnet 14:10:59 <ralonsoh> sorry for interrupting, just a bit of context 14:11:05 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1819029 14:11:05 <openstack> Launchpad bug 1819029 in neutron "QoS policies with minimum-bandwidth rule should be rejected on non-physnet ports/networks" [Undecided,Fix released] - Assigned to Slawek Kaplonski (slaweq) 14:11:35 <ralonsoh> that was added to, in case of enforcing a minBW, to have a reference of the available BW in the physnet 14:11:53 <TomStappaerts> Yes thx ralonsoh, so at the time it was blocked on purpose. My argument is that the plugin should ask the available drivers whether they support it instead of plainly blocking the possibility. 14:12:25 <slaweq> so how You are than reporting to the placement what resources You have available? 14:13:02 <njohnston> Right, how does the driver indicate the bandwidth capacity of a non-physnet network? 14:13:37 <TomStappaerts> So it seems like I overlooked that part of the reference implementation. Our implementation is based on best effort 14:14:05 <TomStappaerts> Btw I totally understand if you say it makes no sense in normal world :) 14:14:21 <ralonsoh> but, eventually this minBW must be enforced in a physical device 14:14:42 <ralonsoh> (that's why we included this limitation, a part from the placement info population) 14:15:02 <mlavalle> yeah, this is indpendent of placement 14:15:13 <njohnston> So you are trying to get the effect of kin BW by prioritization as opposed to quotaing? 14:15:25 <njohnston> *min 14:16:02 <slaweq> ok, I see 14:16:27 <slaweq> I'm looking into the code now and IMO we made simply mistake by merging this validation in https://github.com/openstack/neutron/blob/master/neutron/services/qos/qos_plugin.py#L222 14:16:45 <slaweq> it should be in driver.is_rule_supported() method 14:17:04 <slaweq> and than it would be "per driver" to decide that 14:17:27 <slaweq> it is called in https://github.com/openstack/neutron/blob/bd26435b5bf42ade45af1059f57b423346663cf5/neutron/services/qos/drivers/manager.py#L137 14:17:46 <slaweq> and it's base definition is in neutron-lib https://opendev.org/openstack/neutron-lib/src/branch/master/neutron_lib/services/qos/base.py#L71 14:18:40 <slaweq> IMO it is simple bug which should be fixed by moving this validation to the driver 14:18:41 <TomStappaerts> That is what I was thinking yes 14:18:55 <slaweq> but I wonder what ralonsoh and maybe rubasov have to say about it 14:19:07 <ralonsoh> I'm reviewing the code right now 14:19:23 <TomStappaerts> But in is_rule_supported you do not have access to the network object as of now 14:20:13 <ralonsoh> ?? 14:20:27 <ralonsoh> I don't understand the last comment 14:20:58 <TomStappaerts> https://opendev.org/openstack/neutron-lib/src/branch/master/neutron_lib/services/qos/base.py#L71 Would add verification that it does not support a rule when the network does not have physnet as first segment, right? 14:21:01 <ralonsoh> ahhh you say you don't have the network object in this method 14:21:01 <slaweq> TomStappaerts: yes, now it only accepts rule as argument but IMO we can add additional argument(s) to it 14:21:11 <ralonsoh> exactly 14:21:16 <rubasov> hello 14:21:26 <slaweq> hi rubasov 14:21:29 <ralonsoh> rubasov, https://bugs.launchpad.net/neutron/+bug/1861442 14:21:29 <openstack> Launchpad bug 1861442 in neutron "QOS minimum bandwidth rejection of non-physnet network and updates should be driver specific" [Undecided,New] 14:21:48 <rubasov> looking at it 14:21:53 <ralonsoh> slaweq, you made a good point there 14:22:06 <ralonsoh> the rule should be checked in https://github.com/openstack/neutron/blob/bd26435b5bf42ade45af1059f57b423346663cf5/neutron/services/qos/drivers/manager.py#L137 14:22:22 <ralonsoh> there, each driver decide what to do 14:22:37 <ralonsoh> my concern is what is happening with the information reported to the placement 14:22:52 <ralonsoh> those ports won't have any BW assigned 14:22:56 <ralonsoh> just in the other way 14:23:05 <ralonsoh> when those ports are going to be assigned to a host 14:23:13 <ralonsoh> this host won't have (reported) any BW 14:23:52 <slaweq> ralonsoh: yes, but IMO it's driver's problem, not neutron core - if driver says that it supports this kind of rule, it should be fine for neutron 14:24:05 <ralonsoh> yes 14:24:09 <rubasov> I may not be understanding the whole problem, but the comment in the bug report wanted to say that we were not considering support for min-bw qos for tunneled networks 14:24:20 <njohnston> that is a very good point. but what awareness does placement have of non-physnet networks? I don’t totally understand the use case. 14:24:28 <rubasov> just to reduce the scope of what we were doing 14:25:05 <rubasov> so if TomStappaerts is doing something like that now then it may not be relevant anymore 14:25:55 <slaweq> rubasov: basically my question to You was: why we are doing validation of phys_net in https://github.com/openstack/neutron/blob/master/neutron/services/qos/qos_plugin.py#L229 rather than in driver.is_rule_supported() method 14:27:53 <rubasov> it seems to me I made a bug 14:28:25 <TomStappaerts> rubasov: You're saying I should not be trying to re-use reference qos_plugin for min bw on tunneled networks? (Sorry I don't quite understand :) ) 14:28:52 <slaweq> rubasov: :) that's what I was thinkning now when looking at it 14:29:01 <rubasov> TomStappaerts: I'm trying to say less 14:29:36 <rubasov> TomStappaerts: that we did not consider the consequences of doing that, therefore you may get surprises 14:30:46 <ralonsoh> rubasov, what happens if you have ports with minBW and the placement does not have any host with BW reported? 14:31:05 <ralonsoh> you can't spawn a VM on those hosts 14:31:17 <rubasov> ralonsoh: vm scheduling should fail, shouldn't it? 14:31:24 <ralonsoh> that's the point 14:31:33 <ralonsoh> we can assign ports with minBW qos 14:31:48 <ralonsoh> if we remove this limitation (valid or not) 14:32:05 <ralonsoh> and then, if the placement does not have BW reported, won't schedule to those hosts 14:32:24 <TomStappaerts> Ah ok, yes that is something I haven't considered yet, as I am blocked at this point now. I am more than fine with you guys saying it doesn't make much sense 14:33:14 <slaweq> that's very good point, thx rubasov and ralonsoh 14:33:39 <ralonsoh> ?? 14:33:50 <ralonsoh> bye... 14:34:03 <rubasov> so basically I think slaweq summed it up correctly in this: it is simple bug which should be fixed by moving this validation to the driver 14:34:59 <ralonsoh> how do you disable this placement check? 14:35:11 <ralonsoh> when scheduling, rubasov 14:37:31 <rubasov> ralonsoh: by not adding a min-bw rule on the port's policy (but I may be misunderstanding your question) 14:38:40 <ralonsoh> sorry, but at this point this is the perfect example of interlocking 14:39:31 <njohnston> Can we say that placement only tracks physnet networks, so if the network is not a physnet we treat it as if there is no min bw policy in placement interactions? 14:39:56 <slaweq> tbh if You will spawn instance without qos_policy and than attach qos to the port than it should be fine, no? 14:40:04 <ralonsoh> slaweq, yes 14:40:18 <slaweq> but that's "hack" to workaround this limitation 14:40:18 <ralonsoh> njohnston, I don't know, I need to check that 14:40:34 <ralonsoh> placement is only for scheduling 14:40:47 <ralonsoh> then the port is responsibility of Neutron only 14:41:07 <TomStappaerts> https://github.com/openstack/neutron/blob/master/neutron/services/qos/qos_plugin.py#L129 Seems to handle that case 14:42:05 <ralonsoh> TomStappaerts, this is a temporary code until the placement and Neutron fully supports this feature 14:42:47 <rubasov> currently there's no way to represent bw of non-physnet networks in placement, we need to come up with a way to do that 14:43:17 <ralonsoh> exactly: some kind of manual report (or configured report) 14:44:19 <TomStappaerts> Is that something that fits into the overall scheme of things, or would it be better for me to werite my own qos_plugin 14:44:50 <slaweq> TomStappaerts: IMO this validation part should be fixed by moving it to the driver's code 14:44:58 <slaweq> that's a bug as rubasov confirmed also 14:45:06 <rubasov> this is implicit in the use of CUSTOM_PHYSNET_ traits 14:45:11 <slaweq> so that's we should IMO treat this LP 14:45:41 <slaweq> and about reporting non-physnet networks bw to placement - that's IMO different issue and this should be new RFE 14:45:42 <rubasov> slaweq, TomStappaerts: yes, I'm okay with moving the check to the driver 14:45:57 <slaweq> what do other people think about it? Does it makes sense? 14:46:15 <rubasov> slaweq: exactly 14:46:26 <ralonsoh> at least, for now, we should add documentation for this: if not physnet --> no scheduling 14:46:30 <ralonsoh> it makes sense 14:46:32 <ralonsoh> +1 14:46:32 <njohnston> makes sense 14:46:33 <njohnston> +1 14:46:49 <ralonsoh> thanks rubasov and TomStappaerts 14:47:16 <TomStappaerts> Thx guys 14:47:24 <rubasov> thanks 14:47:24 <slaweq> mlavalle: any thoughts? 14:47:32 <mlavalle> +1 14:47:43 <njohnston> so we can revisit this when we have quorum but hopefully this discussion will help set the stage for then 14:49:01 <slaweq> ok, I will sum up this discussion in the comment to LP and will also ask offline other drivers to check if they have any concerns to treat it as regular bug 14:49:19 <slaweq> IMO in this case this should be enough 14:49:36 <njohnston> sounds good! 14:50:17 <slaweq> ok, we have few more minutes and one more "not really rfe" thing do discuss 14:50:19 <slaweq> https://review.opendev.org/#/c/702806/ 14:50:34 <slaweq> and related bug https://bugs.launchpad.net/neutron/+bug/1855260 14:50:34 <openstack> Launchpad bug 1855260 in neutron "SmartNIC OVS representor port is not removed from OVS on instance tear down" [Undecided,In progress] - Assigned to waleed mousa (waleedm) 14:50:55 <slaweq> ralonsoh: can You elaborate on this a bit as You were involved in that already? 14:51:02 <ralonsoh> if we assume that the OVS agent is going to plug/unplug the interfaces 14:51:04 <ralonsoh> then OK 14:51:21 <ralonsoh> but as Sean pointed out, the L1 operations should be done by Nova/os-vif 14:51:22 <ralonsoh> but 14:51:30 <ralonsoh> because those are smartnic ports 14:51:41 <ralonsoh> and this is happening when the OVS agent is offloaded 14:51:54 <ralonsoh> then I'm not 100% sure about Sean's affirmation 14:52:03 <ralonsoh> I need to check with Sean this last point 14:52:21 <ralonsoh> conclusion: I know nothing 14:52:28 <slaweq> ok :) 14:53:07 <slaweq> but current solution with ovs agent responsible to plug/unplug interfaces is already in neutron (iirc in train at least), right? 14:53:17 <ralonsoh> yes for smartnic ports 14:53:37 <ralonsoh> so yes, the patch is needed once we assumed this role 14:53:59 <slaweq> yes, so IMO we should go with fix for the reported bug now as quick and backportable solution 14:54:05 <ralonsoh> sure 14:54:18 <slaweq> and later maybe propose new rfe to change the logic 14:54:24 <ralonsoh> +1 14:54:47 <njohnston> +1 14:54:52 <mlavalle> =1 14:54:54 <slaweq> I know that sean-k-mooney and You had some opinions about it so I would like to ask what do You think about such solution 14:54:56 <mlavalle> +1 14:56:07 <ralonsoh> I'll review the patch assuming the current state: ovs agent handle those ports 14:57:08 <slaweq> ok, thx a lot 14:57:32 <sean-k-mooney> just reading scrollback now 14:57:34 <slaweq> so I think we had pretty productive meeting even without discussion any rfe :) 14:57:41 <slaweq> hi sean-k-mooney :) 14:57:51 <sean-k-mooney> o/ 14:58:34 <sean-k-mooney> slaweq: the question is should the neutron l2 agent be managin l1 14:58:39 <ralonsoh> yes 14:59:14 <sean-k-mooney> IMO i dont think that desing sacles well and i dont think it should. that said im not going to block extending os-vif to support this 14:59:17 <slaweq> sean-k-mooney: yes, I understand Your point and IMO we should think about better solution maybe (I'm not smartnics expert so I may be wrong) 14:59:43 <sean-k-mooney> so the choice is really up to the neuton team to decide 14:59:46 <slaweq> but now I would like to go with quicker solution to fix existing bug and backport fix to stable release 14:59:53 <sean-k-mooney> well my many concner is upgrades 14:59:54 <slaweq> and later we can think about redesign 15:00:22 <slaweq> ok, we are out of time now, thx for the meeting guys 15:00:27 <slaweq> and have a great weekend 15:00:29 <slaweq> o/ 15:00:32 <slaweq> #endmeeting