14:03:42 <ajo> #startmeeting neutron_qos 14:03:43 <openstack> Meeting started Wed Feb 24 14:03:42 2016 UTC and is due to finish in 60 minutes. The chair is ajo. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:03:47 <openstack> The meeting name has been set to 'neutron_qos' 14:03:47 <ajo> Hi everybody! ;) 14:03:48 <ihrachys> o/ 14:03:53 <jschwarz> \o/ 14:04:02 <irenab> hi 14:04:06 <davidsha> hi! 14:04:06 <ajo> #link http://eavesdrop.openstack.org/#Neutron_QoS_Meeting 14:04:12 <njohnston> o/ 14:04:15 <ajo> o/ :) 14:04:46 <ajo> I wanted to start raising the topic of our roadmap 14:05:14 <ajo> on last drivers meetings there were concerns about our roadmap, status, and the amount of RFEs they were finding 14:05:28 <ajo> #link http://eavesdrop.openstack.org/meetings/neutron_drivers/2016/neutron_drivers.2016-02-18-22.01.log.html#l-52 14:05:51 <irenab> ajo: so qos features have high request 14:05:56 <ajo> So I thought we may clarify that 14:05:59 <ajo> #link http://lists.openstack.org/pipermail/openstack-dev/2016-February/087360.html 14:06:08 <ajo> I sent this email to the mailing list 14:06:24 <ihrachys> irenab: yeah, but it's not like you post an RFE and it magically happen 14:06:46 <ajo> and, to be right, armax was partly right, because I haven't been doing a good review of new RFEs, because I was focused on the mitaka bits 14:06:47 <ihrachys> we should consider available resources, current roadmap... 14:07:12 <irenab> ihrachys: the intent should be approved so the one who proposes can move on 14:07:18 <ajo> and I guess, they felt overwhelmed by RFEs they didn't understand how exactly fit in the architecture we designed 14:07:24 <armax> ajo: I am always 100% right! 14:07:24 <armax> :) 14:07:30 <armax> ajo: jokes aside, I saw your email…but I haven’t had the chance to reply yet…I’ll do that today 14:07:30 <ajo> armax++ 14:07:31 <ajo> lol 14:07:32 <ihrachys> irenab: you can't effectively move forward without having reviewers on board 14:07:50 <ihrachys> that's why we have approvers for blueprints 14:08:00 <irenab> ihrachys: so intent and review commitmentm right? 14:08:02 <ihrachys> (we don't have them for RFEs and I believe that's a bug) 14:08:19 <ajo> armax: I wanted to discuss the current status on the meeting, and then send a detailed report, I'm sorry I haven't been communicating to you properly, and actively reviewing new RFEs, consider that changed from now on 14:08:19 <ihrachys> armax: btw do we plan to have approvers for RFEs? 14:08:58 <armax> ihrachys: something to consider/experiment next cycle. We’ll do a postmortem once mitaka is out of the way 14:09:11 <njohnston> So I think we take the list of all QoS features that could be implemented - BW limiting, DSCP, ECN, 802.1p/q, and minimum bandwidth guarantees - and the possible implementations for each - OVS, LB, SR-IOV - and we can provide a matrix of all the QoS items between us and full implementation. Some of them will be empty spots - DSCP on SR-IOV is an impossibility - but at least we can say "this is the comp 14:09:11 <njohnston> lete roadmap", and show which 1-3 items we're targeting this cycle and next cycle 14:09:33 <ajo> yes, 14:09:40 <ajo> I have brought up this: 14:09:42 <ajo> #link https://etherpad.openstack.org/p/qos-roadmap 14:09:47 <ajo> to discuss during the meeting 14:09:48 <ihrachys> the matrix idea seems like a good idea 14:09:55 <ajo> And I was thinking of the same 14:10:08 <ajo> one important thing, is that we don't need to fill RFEs for specific ref-arch implementations 14:10:25 <ajo> probably a bug is enough 14:10:41 <ihrachys> + 14:10:57 <ajo> only if the implementation is a huge change to the specific implementation, then that could be a matter of an spec/rfe/devref 14:11:14 <ajo> to have a better understanding of how is it going to be implemented 14:11:33 <njohnston> That sounds fair to me. 14:11:47 <ajo> So, in the tiny etherpad, 14:11:54 <ajo> I have detailed r current status 14:11:57 <ajo> the documentation we have, 14:12:04 <ajo> and what we're doing for mitaka 14:12:23 <ajo> basically, we don't have a *lot* of things in mitaka, because, doing things right, we need to cover a lot of related dependencis 14:12:38 <ihrachys> irenab: my understanding is that posting an RFE without having anyone to implement and approve the code is a waste of drivers time 14:12:49 <ajo> yeah 14:12:49 * jschwarz thinks that aside from listing features, etc, you guys may want to assign them to people (so drivers will feel comfortable and will know who to ask when things go south) 14:12:51 <ajo> I agree too 14:12:57 <ajo> we can discuss new ideas in the meeting 14:13:18 <ihrachys> jschwarz: and that's where approvers for RFE should come to help 14:13:23 <ajo> but I'd say, let's only fill RFEs if we have people willing, and with the ability to implement 14:13:47 <irenab> ajo: ihrachys : sounds reasonable 14:14:11 <ajo> We still won't have control over the RFEs people fill, but I will monitor that 14:14:16 <ajo> on a weekly basis 14:14:37 <ajo> #action ajo sets a calendar reminder for himself before driver's meeting to check any new QoS related RFE 14:15:24 <irenab> Initially I thought that RFE was for users to express requirements 14:15:51 <ajo> yes, in fact I understand that's the thing 14:16:16 <ajo> but if we can globally do some filtering ourselves here, we're as developers proposing features, so we can discuss it in advance 14:16:19 <ajo> to have a more filtered RFE 14:16:31 <irenab> +1 14:16:35 <ajo> or higher quality RFE, know that we have backers to write the code, etc.. 14:16:35 <ihrachys> irenab: well kinda. are users posting the RFEs in question though? for the most part, it's people who are in the community, so a more informal means of tracking ideas could be less harsh for drivers. but maybe it's just me ranting and we should post more RFEs. 14:16:51 <ajo> I guess the general workflow is 14:17:16 <ajo> customer -> openstack-related-company -> developer -> RFE 14:17:20 <ajo> and in some cases 14:17:29 <ihrachys> the thing is, I see that some RFEs are actually closed on drivers meetings because there is no one to back the implementation up. 14:17:33 <ajo> openstack-user/contributor-company -> developer -> RFE 14:17:44 <moshele> hi 14:18:28 <ajo> njohnston, and vhoward seem to be in a good TZ for the drivers meeting 14:18:41 <ajo> and they helped so far by being there and answering :) 14:18:50 <njohnston> We're happy to represent :) 14:18:51 <ihrachys> I usually try to join too, but that time I was off 14:18:57 <ajo> so I guess we could pre-analize here, and if they can represent us, that's great 14:19:23 <ajo> I used too, and I will try from now on, but I will be quite random 14:19:28 <ajo> njohnston++ 14:19:30 <ajo> thanks 14:19:43 <ihrachys> yes, let's have US folks on board with representing the group there :) 14:20:04 <ajo> I guess that from now on, we could have a meeting section for qos-related-rfes 14:20:45 <njohnston> sounds good 14:20:55 <ajo> if you find anything missing or you believe something is wrong, please update https://etherpad.openstack.org/p/qos-roadmap when you have time 14:21:55 <ajo> I think VLAN marking, Ingress QoS rate limiting, are probably quite straight forward 14:22:20 <ajo> in fact, when we implemented the low levels of vm-egress , we did vm-ingress by mistake 14:22:31 <ajo> and had to switch the implementation 14:22:37 <ihrachys> :D we could close two features in one go 14:22:45 <davidsha> VLAN marking is the same way as dscp for openvswitch. 14:22:54 <ajo> gal-sagie implementation is still there in gerrit 14:23:11 <ajo> davidsha, exactly, it's almost the same 14:23:26 <ajo> one tackles L3, and the other tackles L2 14:23:38 <ihrachys> still two separate rule types 14:23:42 <ajo> exactly 14:24:05 <ajo> then, we have the ECN RFE, which makes sense IMO, but there are a few things to clarify 14:24:36 <ajo> basically, ECN seems to be a mechanism that can be used in combination with TCP/IP to throttle the other host end dinamically 14:24:43 <ajo> if your ingress is getting congested 14:25:14 <ajo> but I believe we need to clarify how congestion is detected, and how we model the rules 14:25:28 <irenab> ajo: I have a suggestion 14:25:37 <davidsha> would that be something to use with traffic classification then? 14:25:39 <ajo> irenab, shot :D 14:25:46 <njohnston> "Conventionally, TCP/IP networks signal congestion by dropping packets. When ECN is successfully negotiated, an ECN-aware router may set a mark in the IP header instead of dropping a packet in order to signal impending congestion. The receiver of the packet echoes the congestion indication to the sender, which reduces its transmission rate as if it detected a dropped packet." 14:25:53 <irenab> I think each RFE should present relevant use case(s) 14:26:06 <ajo> davidsha, no, that's more related to your other RFE :) 14:26:19 <irenab> so it will be clear how the requested functionality is used 14:26:31 <ajo> irenab: +1 14:26:42 <davidsha> ajo: Ah ok. 14:26:50 <ajo> I see use cases for ECN now that I understood it, 14:26:56 <ajo> but, yes, that's not well addressed 14:27:10 <ajo> we should ask vikram and reedip_ reedip for that 14:27:11 <irenab> I think neutron implementation details much less important and can be resolved later 14:27:30 <ihrachys> is there a case when you have ECN supported but you want to disable it? 14:27:30 <ajo> yes 14:27:47 <ajo> ihrachys, like filter ECN flags? 14:28:07 <irenab> I beleive this can be the case, since it should be across the fabric 14:28:09 <njohnston> My question is, ECN is negotiated, it isn't something that is supposed to be administratively enabled or disabled. And the thing doing the negotiation won't be neutron, it will be the TCP stack implementation itself. 14:28:46 <ajo> njohnston, as far as I undestood, switches, and mid-point network devices can modify the flags 14:28:48 <ajo> on flight 14:29:03 <ihrachys> ajo: yeah but who's going to decide the flag to be set? 14:29:03 <ajo> so they're able to throttle traffic going through it 14:29:12 <ajo> but I'm not 100% sure, we may ask vikram and reedit 14:29:13 <njohnston> "When both endpoints support ECN they mark their packets with ECT(0) or ECT(1). If the packet traverses an active queue management (AQM) queue (e.g., a queue that uses random early detection (RED)) that is experiencing congestion and the corresponding router supports ECN, it may change the codepoint to CE instead of dropping the packet." 14:29:21 <ajo> I'm not an ECN expert by any mean, totally new to me 14:29:44 <ajo> njohnston, ahh, exactly, I got it right then 14:29:55 <ajo> ihrachys, : that's one of the question I had for them 14:30:16 <ajo> ihrachys, it could be the agent, inspecting the ports sustained BW, the host load, the host br-* interfaces bw... etc 14:30:28 <ihrachys> it's clearly not well understood in the team. let's do some homework reading docs before we decide anything on its feasibility. 14:30:33 <njohnston> +1 14:30:38 <davidsha> +1` 14:30:39 <ajo> yeah, 14:30:48 <ajo> it's the time to read, and ask the RFE proposers 14:30:52 <ajo> I'm still on that proccess 14:31:03 <ajo> I see possible value in it 14:31:22 <ajo> as something softer/more effective than policing 14:31:29 <ajo> but, policing is fully automatic 14:31:33 <irenab> ajo: ihrachys : general question regarding RFE. Lets say there something that cannot be impemented by Ref implementation, it should not be proposed? 14:32:06 <ajo> irenab, it's my understanding that "no", but, well, we have things that are only cisco implemented 14:32:12 <ajo> what was the name of it... 14:32:13 <ajo> hmm 14:32:26 <ihrachys> irenab: I think otherwise, I believe it can be proposed. 14:32:51 <ihrachys> though I really wonder what can't be implemented in ovs. 14:32:57 <ajo> I believe it could be proposed, if some SDN vendor implements it, but we'd have to discuss with drivers & core team 14:33:09 <irenab> got it, thanks 14:33:20 <ajo> ihrachys, I start to grasp the limits sometimes, 14:33:21 <ihrachys> yes, that would require some exception process, but I believe it may have place for itself 14:33:50 <ajo> yes, if the use case is well funded, as you said, and it can be modeled, we may try 14:34:07 <ajo> Then 14:34:29 <ajo> we have davidsha's RFE #link https://etherpad.openstack.org/p/qos-roadmap about neutron QoS priority queing rules 14:34:44 <ajo> davisha, if I didn't get it wrong 14:34:46 <ihrachys> ajo: wrong link 14:34:50 <ajo> sorry 14:34:50 <ajo> :/ 14:35:04 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1527671 14:35:04 <openstack> Launchpad bug 1527671 in neutron "[RFE]Neutron QoS Priority Queuing rule" [Wishlist,Triaged] 14:35:12 <ajo> If I didn't get it wrong 14:35:27 <ajo> you propose to have filters for traffic, so different traffic can be limited in different ways 14:35:28 <ajo> right? 14:35:36 <davidsha> correct 14:35:39 <ajo> ok 14:35:52 <ihrachys> does it rely on traffic classifier? 14:35:55 <ihrachys> I assume yes 14:36:05 <ajo> that was foreseen in our initial brainstorms 14:36:16 <ajo> and we thought we could model such thing 14:36:20 <ihrachys> I don't see the dep mentioned there 14:36:23 <ajo> by attaching rules to traffic classifiers 14:36:30 <davidsha> it was originally going to use ovs flows and then I was looking into tc 14:36:33 <ajo> ihrachys, I commented in #12 14:36:39 <ihrachys> oh I see 14:36:46 <ihrachys> was looking at original description 14:36:48 <ajo> I believe 14:36:54 <ajo> the user case is clear, 14:37:11 <ajo> and the modeling, needs some eye on this: http://git.openstack.org/cgit/openstack/neutron-classifier 14:37:14 <ajo> #link http://git.openstack.org/cgit/openstack/neutron-classifier 14:37:33 <ajo> the RFE should probably be refactored to something like that 14:37:35 <njohnston> davidsha: Can we talk after this meeting about this? I would like to understand how "the least likelihood of being rejected due to a queue reaching its maximum capacity" is different from DSCP. That's kind of what DSCP is all about. 14:38:08 <davidsha> njohnston: kk, I'm free to talk. 14:38:12 <njohnston> thanks 14:38:20 <ajo> njohnston, the idea is that you assign different bw limits to different kinds of traffic 14:38:28 <ajo> so you have different likelihoods 14:38:30 <ajo> but let's expand later :) 14:38:53 <irenab> ajo: what is the state of neutron-classifier? 14:39:03 <ajo> davidsha, does it seem reasonable for you to change that RFE into: integrating QoS rules to neutron-classifiers ? 14:39:10 <ajo> that is something we should investigate, definitely 14:39:21 <ajo> I thought we'd end up with some common REST API to manage the classifiers 14:39:25 <ajo> but I don't see that, just libs 14:39:26 <ihrachys> irenab: I believe it's on hold 14:39:29 <ajo> and DB moles 14:39:31 <ajo> models 14:39:42 <ajo> ok, may be they need help on that 14:39:48 <ihrachys> irenab: probably starving for implementers 14:39:57 <davidsha> ajo: would it be ok if I looked into neutron classifier a bit more first? 14:40:11 <ajo> davidsha, makes total sense 14:40:25 <ajo> #action davidsha to look into neutron-classifiers state 14:40:30 <irenab> I do not remember seeing anything on the mailing list or any dedicated sub-team 14:41:02 <ihrachys> irenab: I believe it was just an experiment from Sean Collins that never delivered much 14:41:08 <ajo> let's investigate, and bring up the topic to see how it is. 14:41:19 <ihrachys> I suspect Sean would appreciate help 14:41:27 <ajo> ihrachys, when it was proposed the call was to make it a separate library 14:41:54 <ajo> yes 14:42:03 <ajo> ok 14:42:23 <ajo> and, the last RFE(s) in place are for bandwidth guarantees 14:42:44 <ajo> when we talk about BW guarantees, it's about minimum bandwidth on ports 14:42:54 <ajo> we can have strict, or best-effort 14:43:11 <ajo> strict requires coordination with nova-scheduler, so no interface is oversubscribed .... 14:43:24 <ajo> I'm trying to fight on that battle, but ... to be fair, I'm far from success, 14:43:50 <ajo> there's a spec from Jaypipes which could satisfy what we need in that regard, but It doesn't look to me as dynamic as I think it could be 14:44:06 <ajo> if we could use that mechanism they're designing, it could be awesome 14:44:11 <ajo> (generic resource pools) 14:44:13 * ajo looks for the link 14:44:26 <irenab> ajo: meaning nova will manage BW counting? 14:44:27 <ajo> #link https://review.openstack.org/#/c/253187/ 14:44:32 <ajo> irenab, nope 14:44:38 <ajo> not by itself 14:44:54 <ajo> I mean yes, sorry 14:45:02 <irenab> refer the counting done by 3rd party (neutron?) 14:45:17 <ajo> but our dynamic way of modifying policies I'm unsure it plays well with that 14:45:30 <ajo> we may need to some sort of process to sync to that API 14:45:36 <ajo> any of our changes in policies 14:45:44 <ajo> so the nova database is always up to date 14:46:05 <ajo> what bugs me, is that to make that possible, we could need to create one resource pool (or several) per compute node 14:46:19 <ajo> because those resources are consumed in the compute nodes itselves 14:46:22 <ajo> themselves 14:46:24 <ajo> (sorry) :) 14:46:54 <ajo> I guess that could also help model things like TOR switch bandwidth, and things like that 14:47:03 <ajo> but ok, I'm trying to explore that 14:47:10 <ajo> .. 14:47:16 <ajo> In the other hand, and I finish, 14:47:22 <ajo> is best-effort 14:47:31 <ajo> that basically, is... do what we can within the hosts/hypervisors 14:47:33 <ajo> to guarantee that 14:47:42 <ajo> ovs and TC have mechanisms for that 14:48:01 <ajo> I explored them, and I think davidsha did it too 14:48:11 <ajo> they seem to work 14:48:21 <irenab> ajo: any summary you can share on your findings? 14:48:25 <ajo> the OVS/OF ones require a total refactor or our openflow rules 14:48:32 <ajo> of our 14:48:48 <ajo> because NORMAL rules don't work to queue traffic (we need to use queues) 14:48:49 <ajo> and... 14:49:19 <ajo> TC mixes technologies for filtering traffic (TC and OF...) (a bit like mixing linuxbridge and iptables with openflow) 14:49:22 <ajo> so 14:49:26 <ajo> there's no golden path 14:49:28 <ajo> it can be done 14:49:51 <ajo> may be we could start by TC,and then upgrade to something better in the future (OF only) 14:50:03 <ajo> it worked pretty well on my testings 14:50:13 <ajo> but ok 14:50:28 <ajo> I can dive in the details on another meeting, probably not important now 14:50:49 <ajo> I will switch to checking the status of the ongoing patches if there's no objection 14:51:00 <ihrachys> YES 14:51:05 <njohnston> no objection 14:51:07 <ajo> #topic status 14:51:21 <ajo> njohnston, how's DSCP? and L2 api, any blocker? 14:51:46 <ajo> I made a comment on the RPC patch so you can test the upgrade mechanism 14:51:47 <njohnston> The L2 agent patch has Ihar's +2 and just needs another https://review.openstack.org/#/c/267591/ 14:51:56 <ajo> I'm not sure if I fully clarified it 14:52:04 <ajo> #action ajo review L2 agent patch!!! :] 14:52:07 <ihrachys> njohnston: there is a concern from yamamoto there. are we going to handle that? 14:52:11 <njohnston> RPC rolling upgrades has some concerns https://review.openstack.org/#/c/268040/ 14:52:32 <ajo> njohnston, yes will address it tonight I guess, I was focusing on the roadmap this morning :) 14:52:59 <irenab> ajo: just wanted to raise the discussion at neutron channel today regarding max_burst parameter misleading name in bw_limit rule 14:53:03 <njohnston> ihrachys: I don't know how we can properly account for that 14:53:07 <davidsha> ihrachys: I don't think it's a problem, any project that used the agent_uuid_stamp was using it for flows. 14:53:22 <njohnston> +1 ^^ 14:53:45 <ihrachys> davidsha: I am good. just wanted to clarify. 14:54:02 <njohnston> So once those 2 patches get merged the main DSCP patch looks good, it only has one nit from Vikram https://review.openstack.org/#/c/251738 14:54:06 <ajo> irenab, I agree it's missleading :/ can we talk about it after meeting? :) 14:54:30 <irenab> if the is agreed to be the bug slawek mentioned he would like to fix it 14:54:46 <irenab> ajo: sure 14:54:48 <njohnston> and then the python-neutronclient change for DSCP also looks to be in good shape: https://review.openstack.org/#/c/254280 14:55:21 <ajo> ok, that's great :) 14:55:25 <njohnston> The documentation changes associated with DSCP already have 2 +2s, so I think they can go as soon as the patch they depend on merges 14:55:39 <njohnston> https://review.openstack.org/#/c/273638 14:55:58 <ajo> njohnston, great , side note, the QoS API docs got re-injected, it seems the coauthor removed it by error 14:56:13 <ajo> https://review.openstack.org/#/c/284059/1 14:56:39 <ajo> njohnston, we need to contribute it to the common API guide ^ 14:56:43 <njohnston> d'oh 14:56:58 <ajo> it's a hell, XML :) 14:57:05 <njohnston> ajo: There is an API guide change as well: https://review.openstack.org/#/c/275253 with one +2 already 14:57:09 <ajo> I must admit for QoS somebody from the doc team helped 14:57:24 <ajo> ohhh 14:57:29 <ajo> awesome njohnston !!! 14:57:33 <ajo> good work 14:57:46 <njohnston> ajo: All of the gaggle of DSCP changes are listed explicitly in the main patch's commit message: https://review.openstack.org/#/c/251738 14:57:56 <njohnston> and they all depend on the main patch 14:58:14 <ajo> yikes 14:58:15 <ihrachys> nice work 14:58:21 <ajo> 2 minutes to the end of the hour :/ 14:58:26 <ajo> any other important updates? 14:58:38 <ajo> I saw the LB support was making good progress too 14:58:42 <ajo> we have fullstack tests now :) 14:58:46 <ihrachys> I believe it's ready to merge 14:58:50 <ajo> ok 14:58:54 <ajo> so another action for me 14:59:01 <ajo> #action ajo review Linux bridge related patches for QoS 14:59:09 <ihrachys> I wonder whether everyone is fine that two redhat cores merge stuff 14:59:21 * ajo tries to clone himself: raise CloneError() 14:59:30 <ajo> ihrachys, that's a good question 14:59:51 <ihrachys> ajo: maybe it's fine to review and then ask someone else to rubber stamp it 14:59:51 <ajo> give that LB is not our main thing in redhat 14:59:58 <ajo> we're doing it for the community mostly :) 15:00:03 <njohnston> link for the LB change: https://review.openstack.org/236210 15:00:12 <ihrachys> ok we need to wrap up 15:00:15 <ajo> yeah, may be asking for a third +2 and +W 15:00:16 <ajo> yes 15:00:31 <ajo> ok, wrapping up,thanks everybody 15:00:33 * njohnston has no religion on +2s from y'all 15:00:49 <ajo> let's keep discussing on #openstack-neutron (who cans) 15:00:53 <ajo> #endmeeting