14:03:39 <ajo> #startmeeting neutron-qos
14:03:40 <openstack> Meeting started Wed Nov 25 14:03:39 2015 UTC and is due to finish in 60 minutes.  The chair is ajo. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:03:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:03:43 <openstack> The meeting name has been set to 'neutron_qos'
14:03:49 <ajo> hi mrunge  :)
14:04:02 <ajo> can I ping you later to get updates on QoS/horizon integration?
14:04:12 <ajo> #topic agenda
14:04:18 <mrunge> ajo, I'm about to leave the house
14:04:22 <ajo> #link https://etherpad.openstack.org/p/qos-mitaka
14:04:40 <ajo> mrunge : ok, if you can send me a quick status update via query I will tell everybody later,
14:04:58 <ajo> if not possible, it's ok, we will see that next meeting then
14:04:59 <irenab> I am at parallel meeting, sorry for slow participation
14:05:34 <ajo> So, first thing on the agenda,
14:05:49 <ajo> well, first, welcome back to the meeting, and 2nd...
14:06:09 <ajo> does it seem right to move this meeting into "every two weeks mode" ?
14:06:27 <ajo> irenab , ihrachys , moshele , njohnston ?
14:06:33 <ihrachys> I don't mind if we do.
14:06:38 <moshele> fine by me
14:06:39 <irenab> +1
14:06:47 <ihrachys> I am not that involved though these days, so meh.
14:06:48 <ajo> I suspect we will have work, and not enough fuel for 1h meetings for now
14:06:48 <njohnston> I am all right with it, I don't think the updates will be coming fast and furious at this point.
14:07:02 <ajo> yeah njohnston  :)
14:07:09 <ajo> ok, so, I will send an update
14:07:18 <ihrachys> yeah, it was different during L when we were indeed rushing and stepping on each one's feet
14:07:26 <ajo> #action set the meeting for every two weeks, instead of weekly
14:07:45 <ajo> ok,
14:08:01 <ajo> let's track the current ongoing work :)
14:08:10 <ajo> #topic ongoing items
14:08:16 <mrunge> ajo, sent in separate query
14:08:24 <ajo> thanks mrunge : very appreciated
14:08:59 <ajo> First item is RBAC, hdaniel is just joining (pinged me via query), so he will update us
14:09:27 * ajo looks for the bug link
14:09:31 <ihrachys> I don't see him though
14:09:52 <ihrachys> can we move forward while he joins?
14:10:13 <ajo> yes
14:10:19 <ajo> njohnston , could you update on the DSCP status? :)
14:10:57 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1468353
14:10:57 <openstack> Launchpad bug 1468353 in neutron "QoS DSCP marking rule support" [Medium,New] - Assigned to Nate Johnston (nate-johnston)
14:11:30 <njohnston> So the way we left it is that you were thinking the RFE was not required
14:11:40 <njohnston> So I wasn't sure if I should abandon it, or not.
14:11:54 <ajo> You can use the bug as a tracker
14:12:01 <ajo> and start sending patches if you want
14:12:07 <njohnston> But we're working on code on our side to implement it using OVS flows
14:12:10 <ihrachys> well I believe RFE bug is good to track the feature in Mitaka. not sure about spec.
14:12:27 <ajo> as per my conversation from armax after the previous week drivers meeting
14:12:39 <ihrachys> njohnston++ for starting on the code
14:12:40 <ajo> we don't need the #rfe or #rfe-approved tags for this
14:12:55 <ihrachys> agreed with no need for tags
14:13:07 <ihrachys> just make sure it's targeted for M
14:13:08 <ajo> but the bug is good as a tracker of completion
14:13:20 <njohnston> One idea we were wondering is, should we modify the existing QoS devref or create a new one?  I was thinking modify the existing...
14:13:52 <ihrachys> njohnston: existing
14:14:58 <njohnston> Good.  So currently we're working on our updates to the agent code, making sure that everything gets run properly through add_flow/mod_flow etc. as well as devref updates, starting with the unit tests and then working back to create working code.
14:15:46 <moshele> do you have code patches for review ?
14:17:26 <njohnston> No, not up yet; we have about half the unit tests done, and we're trying to think very hard about making sure we don't screw up the flow table.
14:17:47 <njohnston> We hope to have parts up for review in the coming weeks.
14:17:51 <ihrachys> njohnston: WIP patches could be of help
14:18:03 <ajo> ack, coming weeks sounds nice
14:18:12 <ihrachys> ajo: I don't like 's' in that 'weeks'
14:18:13 <njohnston> Great, I'll work hard to get something up ASAP
14:18:14 <ihrachys> :)
14:18:32 <njohnston> I say weeks because ofvacation around the Thanksgiving holiday in the US
14:18:32 <slaweq_work> hello, is it qos meeting now?
14:18:44 <ajo> njohnston : please note that the flow tables could eventually get refactored into a different form, but I will be aware of the qos/dscp part
14:18:48 <ihrachys> slaweq_work: right. we'll discuss lb bwlimit later
14:18:59 <slaweq_work> ihrachys: ok, thx
14:19:02 <ajo> njohnston and not likely to happen this cycle unless required by the circumstances
14:19:02 <njohnston> ajo: Awesome
14:19:22 <ajo> ok, so, let's move on into RBAC
14:19:25 <ajo> ping hdaniel
14:19:33 <hdaniel> ajo: pong
14:19:40 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1512587
14:19:40 <openstack> Launchpad bug 1512587 in neutron "[RFE] Role-based Access Control for QoS policies" [Wishlist,Triaged] - Assigned to Haim Daniel (hdaniel)
14:19:47 <ajo> hdaniel , could you update us on your findings? :)
14:20:19 <hdaniel> ajo: So there's a dilemma regarding the qos shared field
14:20:51 <ajo> correct,
14:21:19 <ajo> as far as I understood from our last talk, networks code emulate the --shared flag to create rbac entries, or to pull rbac entries as "True/False" on that field
14:21:27 <ajo> over the api requests
14:21:35 <ajo> so here, we have two options:
14:21:35 <hdaniel> ajo: exactly.
14:21:51 <ajo> 1) we throw away the --shared flag on the policies, and solely rely on rbac
14:22:25 <ajo> 2) we introduce the compatibility layer (which is a bit of a PITA to keep), in kevinbenton words: if you can get rid of shared now, do it :)
14:22:52 <ajo> we would be introducing an incompatible change to integrations with qos,
14:23:05 <ihrachys> ajo: they already have the field, we can't just drop it
14:23:08 <ajo> but as it's an experimental API yet, it could be ok
14:23:09 <ajo> opinions?
14:23:20 <ihrachys> experimental? where did we claim it?
14:24:01 <ajo> ihrachys : that was my understanding , may be I got it wrong :)
14:24:23 <ihrachys> my belief is it's not experimental unless we make it clear
14:24:34 <ihrachys> with warnings, or docs
14:24:44 <ajo> ok
14:25:05 <ajo> any other thoughts?
14:25:12 <ihrachys> ajo: it's also not that much of a PITA since object properly will hide it
14:25:20 <ihrachys> *property
14:25:26 <ajo> hdaniel : I guess we may want to keep the shared flag then.
14:25:44 <hdaniel> ajo, ihrachys: I gambled on that , so the current patch behaves that way
14:25:46 <ajo> and what ihrachys suggests sounds like a good pattern, including a shared (settable) property on the object.
14:25:57 <ajo> ok hdaniel  :)
14:26:09 <ihrachys> hdaniel: wise boy ;)
14:26:11 <ajo> code_cleanups--
14:26:14 <ajo> compatibility++
14:26:23 <hdaniel> head_ache++
14:26:27 <ajo> lol
14:26:33 <ihrachys> hdaniel: rule#1: always selected the most painful path
14:26:37 <ihrachys> *select
14:27:02 <ajo> ':)
14:27:23 <hdaniel> ihrachys, ajo: will write that (with blood)
14:27:36 * ihrachys hopes not his blood
14:27:52 * ajo runs scared
14:27:57 <ajo> ok :)
14:27:59 <ajo> next topic
14:28:02 <ajo> or
14:28:12 <ajo> hdaniel , any other thing related to RBAC that could be important?
14:28:40 <hdaniel> ajo: nope, but I'm 100% sure  they'll appear after the submission -
14:29:11 <ajo> ack :)
14:29:16 <ajo> devil is in the details...
14:29:21 <ajo> ok so
14:29:25 <ajo> #topic horizon integration
14:29:28 <ajo> #link https://blueprints.launchpad.net/horizon/+spec/network-bandwidth-limiting-qos
14:29:48 <ajo> masco is making progress on that front: https://review.openstack.org/#/c/247997/
14:29:51 <ajo> thank you masco! ;)
14:30:26 <ihrachys> oh so cool
14:30:31 <ajo> wow +521 lines
14:30:43 <ajo> it's still not possible to create policies, but it's on it's way
14:31:27 <ajo> I guess we should all eventually start testing the patch
14:31:38 <ihrachys> it's full of js magic. I bow before masco's greatness.
14:32:08 <ajo> "Masco Kaliyamoorthy"++
14:32:30 <ajo> ok
14:32:40 <ajo> after this moment of awesomeness,
14:32:43 <ajo> #topic slow moving things
14:32:55 <ajo> hmmm
14:32:56 <ajo> sorry
14:33:07 <ajo> slaweq_work , you wanted to update on LB integration?
14:33:14 <ajo> #undo
14:33:14 <openstack> Removing item from minutes: <ircmeeting.items.Topic object at 0x8bbb990>
14:33:25 <ajo> #topic LinuxBridge/qos integration
14:33:32 <slaweq_work> ajo: for now I'm working more on fullstack tests for linuxbridge
14:33:46 <slaweq_work> and then I will continue qos for linuxbridge
14:34:05 <ihrachys> slaweq_work: I guess we can move bwlimit part to Mitaka-2
14:34:26 <slaweq_work> ihrachys: if You said so :)
14:34:27 <ajo> yes, also, there's ongoing work to refactor the linux bridge agent, so I guess it makes sense
14:34:54 <ihrachys> ajo: fullstack does not really care about internal structure of the agent since it runs processes
14:35:17 <ajo> ihrachys : yes, but implementing qos cares about that :)
14:35:21 <slaweq_work> ihrachys: true, but I'm not expert with fullstack tests so far
14:35:26 <ajo> so it's better not to mess with a moving target :)
14:35:26 <slaweq_work> and I'm still learning it
14:35:41 <slaweq_work> I hope that at end of this week I will push something to review :)
14:35:50 <ajo> slaweq_work : I recommend you to talk to jschwartz when available if you have questions
14:35:54 <ajo> or amuller
14:36:05 <ihrachys> ajo: it's better to get in before that other moving target has a chance ;)
14:36:08 <slaweq_work> yep, I was talking with amuller few times
14:36:11 <ajo> ihrachys : lol
14:36:18 <ihrachys> slaweq_work: cool, I will review once smth reviewable is up
14:36:24 <ajo> different philosophies :D
14:36:34 <slaweq_work> k, thx ihrachys
14:36:37 <ajo> ok,
14:36:48 <ajo> #topic slow moving topics
14:37:01 <ajo> ping sc68cal (for traffic classification later)
14:37:13 <ajo> first, bandwidth guarantee support,
14:37:51 <ajo> I spent quite a bit of time investigating about it,
14:38:03 <moshele> ajo so for SR-IOV the NIC driver is not ready yet at least for mellanox
14:38:03 <ajo> in the context of OVS & LB, and a bit on sr-iov
14:38:22 <ajo> moshele , ack, so the min-bw settings are still on the way, I guess
14:38:33 <ihrachys> ajo: is it scheduler thing?
14:38:35 <slaweq_work> for ovs I think that such thinkgs could be done quite easily with "tc"
14:38:45 <ajo> ihrachys : technical side on neutron & scheduler thing, yes
14:38:47 <moshele> ajo yes
14:39:12 <slaweq_work> especially when we are using hybrid connection than we can use tc with htb qdisc on those interfaces
14:39:22 <ajo> On the technical side, it's not possible to manage bandwidth guarantees within a node for OVS solely based on openflow rules,
14:39:30 <ajo> yes slaweq_work: tc works, I tried that
14:39:34 <irenab> moshele: any sr-iov nic support min_bm?
14:39:46 <ajo> but I'm not happy with adding another layer of filtering when openflow rules could do that
14:39:54 <moshele> irenab: I don't know
14:40:08 <ihrachys> ajo: do we want to look into using LB qos driver for OVS agent? :)
14:40:09 <slaweq_work> for linuxbridge it could be also made with tc but it will only work in one direction as we have only tap interface to apply rules
14:40:25 <ajo> TL;DR: the arrangement of our openflow rules doesn't allow to direct traffic flows through specific queues.
14:40:35 <ajo> slaweq_work: that's not exactly correct :)
14:40:39 <ajo> slaweq_work : tc is confusing
14:40:45 <ihrachys> slaweq_work: what do you need more? we could think of extending agent API for extensions
14:40:47 <ajo> that was my initial understanding
14:41:08 <ajo> The issue with bandwidth guarantees
14:41:22 <slaweq_work> ihrachys: what do I need more for what?
14:41:35 <ajo> is that you need to build hierarchical queues over a single interface, and then make one queue for every single traffic flow
14:41:57 <ajo> in our case, the optimal point seems to be the connection between br-int and the external networks, or br-tun
14:41:57 <ihrachys> slaweq_work: you mentioned you have 'only' tap device, so you probably miss smth
14:42:15 <ajo> ihrachys , it's a technical thing with linux kernel, queues, and interfaces
14:42:26 <ajo> you can only "queue" ingress traffic (to a bridge)
14:42:30 <ajo> sorry
14:42:34 <ajo> egress traffic from a bridge
14:42:38 <slaweq_work> ajo: exactly
14:42:38 <ajo> I always change the direction :)
14:42:43 <ajo> but
14:42:52 <ajo> there's also a requisite to build queues hierarchicahly on a single port
14:42:58 <ajo> so for example
14:43:05 * ihrachys 's head just blew
14:43:08 <ajo> if we had the connection from br-int to br-tun
14:43:19 <ajo> we could create a top queue that indicates the maximum bandwidth of that link
14:43:34 <ajo> and then another queue under it to handle another flow,
14:43:36 <ajo> another queue,
14:43:37 <ajo> etc..
14:43:39 <ajo> yes
14:43:41 <ajo> it's mindblowing
14:44:02 <ajo> slaweq_work : if we take both sides (br-int to br-tun , and br-tun to br-int) you effectively can control both paths
14:44:06 <slaweq_work> ajo: good to know that
14:44:14 <ajo> and you also comply with having the queues as hierarchical
14:44:21 <ajo> bad part is
14:44:24 <slaweq_work> for ovs yes
14:44:33 <ajo> yes, linuxbridge is a different history
14:44:39 <ajo> we need somebody to look at how to handle that
14:44:42 <slaweq_work> but I didn't know that it is possible for linuxbridge (where there is no such bridges)
14:44:53 <ajo> slaweq_work , probably you need another bridge
14:45:07 <ajo> and a veth pair
14:45:08 <ajo> deployed
14:45:24 <slaweq_work> as I said before: similiar to hybrid connection when You are using ovs bindings
14:45:27 <ihrachys> ajo: meaning you need to rewire network to enable it? that's kinda against rolling upgrade requirements.
14:45:36 <ajo> so OVS has it's issues (openflow rule arrangement is not optimal, and we may need to filter traffic again by mac/vlan),
14:45:43 <ajo> linux bridge has it's issues too
14:45:56 <ajo> ihrachys , it's not a rolling upgrade in this case, it's installing a new service
14:46:08 <ajo> ihrachys , in that case operators could take it, or leave it
14:46:22 <slaweq_work> but still IMHO if we will do it directly with tc then it could be done in same way for both agents
14:46:24 <ajo> ihrachys : I say it could be an optional thing for LB
14:46:38 <ajo> slaweq_work , yes, that's the good point of using TC
14:46:39 <slaweq_work> if there will be another veth-pair for lb
14:46:56 <ihrachys> ajo: it's not a new service if you had it enabled before
14:46:57 <ajo> sharing the implementation
14:47:05 <ihrachys> ajo: we have it in L
14:47:15 <ajo> ihrachys , hmm, true, but not for LB
14:47:44 <ajo> ihrachys , btw, that's only for bandwidth guarantees, may be LB won't be able to support bandwidth guarantees without such configuration change
14:47:54 <ihrachys> ajo: indeed not for LB. though once we merge slaweq_work's patch, it affects that too
14:48:02 <ajo> or may be slaweq_work is able to find a workaround around it
14:48:03 <ajo> :)
14:48:08 <ajo> I'm not pushing to make this now btw
14:48:09 <ajo> :)
14:48:14 <ihrachys> ajo: ok, I need to think it thru
14:48:22 <ajo> I'm just sharing the facts, and saying: this is not for now, we're not ready :)
14:48:27 <slaweq_work> me too
14:48:43 <ajo> I have a half-cooked post about the topic I never finished
14:48:54 <ajo> I guess I should finish it, and push the publish button
14:48:56 <ihrachys> ajo: looking fwd for the post
14:49:03 <ajo> slaweq_work , ihrachys , will ping you
14:49:04 <ajo> also
14:49:06 * njohnston too
14:49:09 <ihrachys> ok should we move?
14:49:20 <ajo> this technical discussion above ^, is for the in-compute-node bandwidth-guarantees
14:49:29 <ajo> SR-IOV: no go, OVS: no go , LB: no go  (yet)
14:49:38 <ajo> also, we have the scheduling bits
14:49:51 <ihrachys> ajo: it will be a long road for sure
14:49:59 <ajo> we should collaborate with nova to send information to the scheduler, and influence scheduling decissions
14:50:11 <ajo> because otherwise there will be ports which cannot be bound because we don't have enough BW
14:50:14 <ajo> on a compute node
14:50:19 <ajo> over an specific network
14:50:38 <ajo> I'm currently working on a spec to keep that scheduler discussion moving
14:51:12 <moshele> ajo: let me know if you need help with that
14:51:12 <ihrachys> ajo++ for taking the burden of working with nova project on that hard bite
14:51:14 <ajo> and irenab and I, thought that it's probably a good thing to start, at least, collecting on our side, the available klink bandwidth related to each physical network on every compute host/network node
14:51:28 <ajo> ihrachys : my teeth are hurting :D
14:51:41 <ajo> moshele , ihrachys , will loop you in the spec
14:51:45 <irenab> ajo: :-)
14:51:51 <ajo> I will announce it next meeting
14:51:55 <ajo> in two weeks ;)
14:52:13 <ajo> we're tight on time
14:52:20 <ajo> let's move to next topic
14:52:28 <ajo> #topic traffic classification
14:52:52 <irenab> can we spend few mins on bugs?
14:52:59 <ajo> I know that work was making progress, but probably to live in a separate library for reuse from other projects
14:53:02 <ajo> sc68cal was leading that
14:53:11 <ajo> yep irenab , I think it's a good idea
14:53:13 <ajo> let's jump on that
14:53:18 <ajo> #topic Bugs
14:53:32 <ajo> We have Update network with New Qos-Policy isn't working with SR-IOV agent - https://bugs.launchpad.net/neutron/+bug/1504166
14:53:32 <openstack> Launchpad bug 1504166 in neutron "Update network with New Qos-Policy isn't working with SR-IOV agent" [Undecided,In progress] - Assigned to yalei wang (yalei-wang)
14:53:46 <ajo> moshele , ihrachys , you were handling it , right?
14:53:54 <ihrachys> was I? oh
14:54:00 <moshele> I will
14:54:05 <ajo> oh
14:54:09 <ajo> sorry
14:54:12 <ajo> Yalei Wang sent a patch: https://review.openstack.org/#/c/233499/
14:54:30 <ajo> let's make sure we get it reviewed
14:54:37 <ajo> #link  https://review.openstack.org/#/c/233499/
14:54:53 <moshele> it WIP
14:54:56 <ajo> theres this one on me:
14:54:57 <ajo> https://review.openstack.org/#/c/233499/
14:55:12 <ajo> anybody has bandwidth to make those API failures nicer?
14:55:33 <njohnston> isn't that the same link as you mentioned above re: Yalei Wang?
14:55:35 <ajo> I'm removing the assignee and letting other volunteers eventually take it
14:55:39 <ajo> since it's not realistic that I finish that
14:55:45 <ihrachys> ajo: link wrong?
14:55:49 <ajo> oh
14:55:49 <ajo> sorry
14:55:56 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1496787
14:55:56 <openstack> Launchpad bug 1496787 in neutron "If qos service_plugin is enabled, but ml2 extension driver is not, api requests attaching policies to ports or nets will fail with an ugly exception" [Low,Confirmed]
14:56:27 <ajo> we also have this one: https://bugs.launchpad.net/neutron/+bug/1486607
14:56:27 <openstack> Launchpad bug 1486607 in neutron "tenants seem like they were able to detach admin enforced QoS policies from ports or networks" [Low,In progress] - Assigned to yong sheng gong (gongysh)
14:56:39 <ihrachys> I see core resource extension manager mentioned... I feel guilt now.
14:56:45 <ihrachys> totally forgot about that beast
14:57:01 <ajo> ihrachys , np, I know you're perfectly capable of handling it :)
14:57:12 <ihrachys> I will probably take that one for now
14:57:13 <ajo> and it's partly coupled to that objectization of neutron core resources
14:57:18 <slaweq_work> ajo: I can check https://bugs.launchpad.net/neutron/+bug/1496787 if it is not problem :)
14:57:18 <openstack> Launchpad bug 1496787 in neutron "If qos service_plugin is enabled, but ml2 extension driver is not, api requests attaching policies to ports or nets will fail with an ugly exception" [Low,Confirmed]
14:57:28 <ajo> slaweq_work : thanks a lot, that'd be great
14:57:33 <slaweq_work> ok
14:57:48 <ihrachys> slaweq_work: ok, keep me in the loop, I may have silly ideas about it
14:57:48 <slaweq_work> great, thx :)
14:57:57 <slaweq_work> ihrachys: ok
14:58:06 <ajo> and we also have this other one:  https://bugs.launchpad.net/neutron/+bug/1509232
14:58:06 <openstack> Launchpad bug 1509232 in neutron "If we update a QoSPolicy description, the agents get notified and rules get rewired for nothing" [Medium,Confirmed] - Assigned to Irena Berezovsky (irenab)
14:58:35 <irenab> I checked this bug, need some advise regarding the level where to filter the change
14:58:38 <ajo> it's not of high importance
14:58:42 <ihrachys> ajo: for 1486607 I believe the best way is adding tenant_id to qos rule models
14:58:50 <ajo> irenab : probably in the notification driver,
14:58:59 <ajo> hmmm
14:59:08 <irenab> ajo: ihrachys : will ping you on the channel to discuss the alternatives
14:59:14 <ajo> but the notification driver has no idea about what changed probably
14:59:21 <ajo> irenab , ping me and let's look at it together
14:59:33 <irenab> ajo: great, thanks
14:59:35 <ihrachys> ajo: yeah, it does not care. so it belongs to plugin
14:59:40 <ajo> ihrachys : probably I agree
14:59:47 <irenab> ihrachys: or to the agent
15:00:01 <ajo> ok, next time I probably must make better use of the meeting time :)
15:00:05 <ajo> #endmeeting