15:00:08 <slaweq> #startmeeting neutron_qos 15:00:09 <openstack> Meeting started Tue May 8 15:00:08 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:12 <openstack> The meeting name has been set to 'neutron_qos' 15:00:14 <slaweq> hi! 15:00:27 <mlavalle> hi 15:00:34 <rubasov> hi 15:01:16 <slaweq> ok, let's start 15:01:22 <slaweq> #topic RFEs 15:01:24 <lajoskatona_> hi 15:01:36 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1727578 15:01:37 <openstack> Launchpad bug 1727578 in neutron "[RFE]Support apply qos policy in VPN service" [Wishlist,Triaged] 15:01:56 <slaweq> just FYI: specs for this one is merged already 15:01:57 <njohnston> hello 15:02:07 <slaweq> I hope zhaobo6 will start working on implementation now 15:02:17 <mlavalle> I know he is working on it 15:02:34 <slaweq> super :) 15:02:44 <mlavalle> it is actually in his goals 15:02:59 <slaweq> is he from Your company? 15:03:03 <mlavalle> yeah 15:03:19 <slaweq> super - so we have good info about that :) 15:03:20 <slaweq> thx 15:03:43 <slaweq> next one on list is: 15:03:43 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1560963 15:03:45 <openstack> Launchpad bug 1560963 in neutron "[RFE] Minimum bandwidth support (egress)" [Wishlist,In progress] 15:03:55 <slaweq> I have some idea how maybe it could be done but I didn't even test PoC if that will works actually 15:04:43 <slaweq> I think about some marking of packets like (MARK in iptables) when packet is going out from tap interface and then match such marks on physical interface to proper class 15:05:17 <slaweq> but as I said I even check any PoC if that could makes sense even 15:05:32 <slaweq> do You think that such solution might work maybe? 15:05:46 <mlavalle> worth giving it a try 15:05:59 <mlavalle> did the paper I sent help? 15:06:08 <slaweq> yes, I read it 15:06:10 <reedip_> o/ 15:06:40 <slaweq> and it is quite helpful 15:06:45 <slaweq> thx mlavalle for it 15:06:45 <mlavalle> cool 15:06:50 <slaweq> hi reedip_ 15:06:51 <rubasov> I have a faint memory of somebody claiming iptables CLASSIFY more efficient than MARK 15:07:03 <njohnston> hi reedip_! 15:07:31 <slaweq> rubasov: thx for the tip 15:07:40 <slaweq> I will try to explore it 15:08:07 <slaweq> also there is the case with openflow rules and without iptables so this should be somehow covered also :) 15:09:36 <slaweq> I really want to make some PoC for it soon 15:09:40 <slaweq> :) 15:09:49 <slaweq> ok, moving to the next one 15:09:50 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1560963 15:09:52 <openstack> Launchpad bug 1560963 in neutron "[RFE] Minimum bandwidth support (egress)" [Wishlist,In progress] 15:10:13 <slaweq> AFAIK rubasov and mlavalle wanted to discuss something about that one today 15:10:16 <slaweq> right? 15:10:23 <mlavalle> yeah 15:10:25 <rubasov> yep 15:10:32 <slaweq> so go on :) 15:10:52 <rubasov> let me shortly summarize the question 15:11:00 <rubasov> we have two options 15:11:51 <rubasov> option 1: one qos policy rule (mimimum_bw as today) extended with two boolean parameters: enforce_data_plane, enforce_placement 15:12:09 <rubasov> option 1: two qos policy rules for data plane and placement 15:12:16 <rubasov> I mean option 2 :-) 15:12:55 <mlavalle> the point that I don't get is why we need to separate dta plane enforcement and placement enforcement 15:12:56 <rubasov> option 1 is clearly better API design and better user experience (for the admin at least) 15:13:30 <rubasov> but I'm not sure about how it works with rule validation 15:13:33 <slaweq> I have same question as mlavalle 15:14:01 <mlavalle> if a port has a policy with a minimum bandwidth rule, it should be enforced in the data plane and for placament purposes also 15:14:48 <rubasov> mlavalle, slaweq: since placement enforcement is agnostic of drivers we may have users wanting placement enforcement but not yet having data plane enforcement 15:15:45 <mlavalle> but shouldn't those users be aware of what is available? 15:15:52 <mlavalle> and request what is available 15:15:57 <rubasov> also (at least during upgrade) the current behavior is enforce_data_plane=True but enforce_placement=False 15:16:50 <rubasov> mlavalle: er, I'm not sure I understand your last question 15:17:32 <slaweq> what about validating this rule type for all backends and document properly that it is supported by X, Y, Z backends and if You are using different one then min bandwidth will be only enforcement by placement 15:18:00 <slaweq> without additional switch in API 15:18:16 <mlavalle> The way I see it, having a minimum bandwidth rule in a port means that it should be enforced in all the components that are necessary in the OpenStack system to make it effective 15:18:52 <mlavalle> we shouldn't separate data plane and placement 15:19:02 <rubasov> but the system will actually work different in different openstack versions, shouldn't that be shown in the API? 15:19:19 <slaweq> yes, but (at least for now) we don't have a way to enforce it on data plane yet 15:19:59 <slaweq> but for some private clouds it might be useful to use it only with "guarantee" on placement level 15:20:16 <mlavalle> so, if I can create a minimum bandwidth rule, that is the API telling me it is possible 15:20:17 <slaweq> rubasov: is that what You want to achieve? 15:20:57 <rubasov> slaweq: yes 15:21:14 <slaweq> ok, so I at least understand it properly :) 15:21:15 <mlavalle> the fact that I can create and assign a minimum bandwidth rule to a port should be the necessary condition for that rule to be enforced accross all the OpenStack componentts necessary to make it effective 15:21:26 <gibi> keeping a single rule and enforcing that placement allocation is mandatory makes the upgrade harder. This will force us to create allocations for bound ports in placement during upgrade. However it is the same upgrade that will install the code that will report bandwidth inventory. 15:21:52 <gibi> feels like a chicken and egg problem 15:22:29 <mlavalle> so let's solve a migration problem 15:22:36 <rubasov> mlavalle: for example what about direction ingress? 15:22:43 <mlavalle> but let's not create artificial concepts in the API 15:23:23 <slaweq> mlavalle++ 15:23:30 <rubasov> as soon as we'll have placement enforcement that will work for that direction (both directions in fact) for all drivers but we may not have data plane enforcement for all drivers at that point 15:24:20 <slaweq> yes, and IMHO that should be properly documented that users would be aware of whatis guaranteed and what is best effort only 15:25:04 <mlavalle> yes, isn't this a documentation issue? 15:26:57 <rubasov> but that documentation could not be global, it would depend on which drivers are loaded, right? so each deployer would have to document this again for each deployment 15:27:31 <slaweq> rubasov: but I was thinking about something like: 15:28:04 <mlavalle> we can give them the rules: if you have such and such drivers, you minimum bandwidth rules will be enforced 15:28:06 <slaweq> min bw limit is supported by ovs, sr-iov backends so if You use those backends, Your min bw will be guarantee 15:28:34 * mlavalle tried to say the same thing as slaweq 15:28:37 <slaweq> if You use other backend, like linuxbridge, bw limit is only enforcement by placement and is not guarantee 15:29:04 <slaweq> what You think about something like that? 15:30:10 <rubasov> that's pretty much what we had in patch set 6 15:30:15 <rubasov> https://review.openstack.org/#/c/508149/6/specs/rocky/minimum-bandwidth-allocation-placement-api.rst@83 15:30:15 <patchbot> patch 508149 - neutron-specs - QoS minimum bandwidth allocation in Placement API 15:30:52 <rubasov> I'm trying to remember why we moved away from it (if there was a proper reason, I'm not sure) 15:31:21 <mlavalle> I am happy with that 15:31:46 <mlavalle> because what is stated there is a "temporary" situation 15:31:58 <mlavalle> that we can change as we add support for more drivers 15:32:18 <slaweq> at least it should be temporary and we should add support for ovs and lb backends for it :) 15:32:26 <mlavalle> in contrast, adding stuff to the API creates long term, permamnet situations 15:32:37 <mlavalle> that are difficult to change in the future 15:33:06 <slaweq> mlavalle: in our case API changes are almost impossible to change :) 15:33:08 <mlavalle> APIs create commitments 15:34:21 <rubasov> I'm just a bit afraid of the user being confused when asking for the same thing in the API and getting many different things depending on OpenStack version and/or backend drivers (both being things the end user should not know about) 15:35:27 <mlavalle> In that sense, we have an undesirable situation given how fast the community can add minimum bandwidth support to different drivers 15:35:29 <rubasov> but I can let that fear go if you prefer to handle this kind of support matrix by documentation 15:35:59 <slaweq> I would prefer docs way for that 15:36:06 <mlavalle> but that undesirable situation is temporary, as we add support for more drivers 15:36:10 <slaweq> but I think that it's mlavalle and drivers team choice :) 15:36:44 <mlavalle> and we can mitigate it with proper documentation 15:37:25 <rubasov> I accept that choice 15:37:34 <gibi> can we get back to the upgrade issue? 15:37:34 <mlavalle> in fact, we use that matrix approach already in the QoS documentation, IIRC 15:37:43 <mlavalle> sure 15:37:49 <gibi> assume there is a bound SRIOV port with minimum bandwidth rule today. that is enforced on the data plane 15:38:36 <gibi> now the deployer upgrades neutron and nova to the version that support placement bandwidth rule and that r 15:38:58 <gibi> requires the port to have bandwidth allocation in placement 15:39:15 <gibi> during upgrade we cannot make that allocation, as no bandwidht inventory will be in place 15:39:16 <slaweq> mlavalle: we have such matrix here: https://docs.openstack.org/neutron/latest/contributor/internals/quality_of_service.html#agent-backends 15:39:49 <rubasov> slaweq: yep, I was thinking of the same 15:39:56 <gibi> so after such upgade we will have an inconsistent resource allocation situation in placement 15:40:27 <gibi> which means any new boot with QoS bandwidth aware port will result in a not enforced resource limit 15:40:46 <mlavalle> can't we include in the upgrade process a script that cretes the bandwidth inventories 15:40:48 <mlavalle> ? 15:40:58 <gibi> mlavalle: that inventory is agent dependent 15:41:02 <gibi> mlavalle: so I'm not sure 15:41:47 <gibi> moreover there are two cases 15:41:53 <mlavalle> if a host is running agents with minimum bandwidth supppirt 15:42:10 <mlavalle> the script will create the RPs necessary 15:42:30 <gibi> mlavalle: so that script will duplicate the work of the given agent (ovs, sriov, etc) 15:42:40 <mlavalle> we can get that data in the controller from the agents heartbeats 15:42:58 <mlavalle> only during the upgrades 15:43:00 <gibi> mlavalle: but that would require an upgraded and running agent 15:44:01 <gibi> mlavalle: and an upgraded and running controller that understands the new heartbeat sturcture 15:44:04 <slaweq> gibi: such script would need to duplicate only sriov agent's work as only this backend supports now min bandwidth 15:44:12 <gibi> slaweq: good point 15:44:16 <gibi> slaweq: that help 15:44:27 <gibi> still there are two cases 15:44:45 <gibi> i) we can do an in place allocation for the bound port 15:44:56 <gibi> ii) we cannot as there is not enough bandwidth on that host 15:45:11 <rubasov> I think (ii) is the really hard one 15:45:18 <gibi> #ii) would require a livemigration to resolve 15:45:38 <mlavalle> yeah, and the admin should be aware of it 15:45:49 <mlavalle> and take necessary actions 15:46:05 <slaweq> mlavalle++ 15:46:17 <mlavalle> however unpleasant the upgrade process might be, it will happen once 15:46:19 <gibi> mlavalle: either we fail the upgrade or implement a pre-flight check for such situation 15:46:55 <mlavalle> but that is preferable to add things to the API that we will never be able to remove 15:47:21 <mlavalle> and that will create an unpleasant experience to users and admins for a loooong time 15:48:02 <gibi> mlavalle: you statement about API commitment make sense to me. I'm just pretty afraid about the unplesantness of this upgrade 15:48:26 <mlavalle> me too, but it is only once per deployment 15:48:34 <gibi> mlavalle: maybe when we will have some PoC code where I can play with the upgrade that would help removing some of the fear 15:48:59 <mlavalle> let's focus on making mitigating that unplestaness as much as we can 15:49:14 * mlavalle not sure that word exists 15:50:02 <rubasov> IMHO even if gibi's upgrade concerns are valid (not 100% sure myself at the moment) we can get back to this if and when the upgrade proves to be too painful in a lab setting (but before removing the experimental flag from the overall feature) 15:50:31 <gibi> rubasov: I can live with that 15:50:36 <mlavalle> yeah, let's move ahead with the spec assuming documentation 15:50:55 <mlavalle> will help us address the migration problem 15:51:07 <mlavalle> we all understand that there are trade offs 15:51:30 <mlavalle> and let's not over commit to an API design that is not ideal 15:51:33 <slaweq> I agree, let's make something first and then deal with upgrades problem :) 15:51:47 <rubasov> understood 15:52:02 <rubasov> I'll update the spec accordingly tomorrow 15:52:42 <mlavalle> rubasov, gibi, lajoskatona_: I want to commend your great work over the past few weeks on this topic 15:52:51 <mlavalle> you guys are the best ! 15:52:58 <slaweq> mlavalle++ 15:52:59 <gibi> mlavalle: thanks 15:53:03 <rubasov> mlavalle, slaweq: thank you for all the help 15:54:16 <mlavalle> ok, slaweq I think we can move on 15:54:26 <slaweq> ok, I think we can quickly move to next topic now :) 15:54:38 <slaweq> or wait 15:54:41 <slaweq> there is also 15:54:42 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1505627 15:54:43 <openstack> Launchpad bug 1505627 in neutron "[RFE] QoS Explicit Congestion Notification (ECN) Support" [Wishlist,Triaged] - Assigned to Reedip (reedip-banerjee) 15:54:47 <slaweq> which is mark as postponed 15:55:03 <slaweq> but specs is waiting for review: https://review.openstack.org/#/c/445762/ 15:55:04 <patchbot> patch 445762 - neutron-specs - Spec for Explicit Congestion Notification 15:55:16 * mlavalle has that spec in his bakclog of reviews 15:55:18 <slaweq> so please add it to Your queue maybe :) 15:55:23 <slaweq> thx mlavalle 15:55:28 <slaweq> ok, so next topic 15:55:33 <slaweq> #topic Bugs 15:55:52 <slaweq> I'm now aware about any new bug related to QoS 15:56:01 <slaweq> so just short sum up 15:56:06 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1758316 15:56:07 <openstack> Launchpad bug 1758316 in neutron "Floating IP QoS don't work in DVR router" [High,In progress] - Assigned to LIU Yulong (dragon889) 15:56:21 <slaweq> patch for that is in gate probably: https://review.openstack.org/#/c/558724/ 15:56:22 <patchbot> patch 558724 - neutron - [L3][QoS] Cover mixed dvr_snat and compute node dv... 15:56:33 <slaweq> so it might be consiedered as fixed IMO :) 15:56:51 <slaweq> no, there is -2 from zuul :/ 15:56:52 <mlavalle> yeap 15:56:58 <slaweq> so we will have to recheck it 15:57:28 * mlavalle who is weizj, he is been reviewing a lot of patches 15:57:40 <slaweq> mlavalle: I don't know 15:57:49 <slaweq> but I also saw a lot of reviews from him recently 15:57:54 <mlavalle> well, it is good to have one more reviewer 15:58:03 <slaweq> yes 15:58:11 <slaweq> two more bugs (very old): 15:58:12 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1639186 15:58:13 <openstack> Launchpad bug 1639186 in neutron "qos max bandwidth rules not working for neutron trunk ports" [Low,Confirmed] 15:58:16 <slaweq> #link https://bugs.launchpad.net/neutron/+bug/1732852 15:58:17 <openstack> Launchpad bug 1732852 in neutron "neutron don't support Router gateway rate limit " [Low,In progress] 15:58:32 <slaweq> there is no any update about them 15:58:37 <slaweq> it's just short reminder - maybe someone want's to hug them :) 15:59:05 <lajoskatona_> bye 16:00:18 <slaweq> ok, I think that we are almost out of time now 16:00:24 <slaweq> so thanks guys 16:00:25 <mlavalle> yeap 16:00:35 <slaweq> #endmeeting