14:03:39 <ajo> #startmeeting neutron-qos 14:03:40 <openstack> Meeting started Wed Nov 25 14:03:39 2015 UTC and is due to finish in 60 minutes. The chair is ajo. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:03:43 <openstack> The meeting name has been set to 'neutron_qos' 14:03:49 <ajo> hi mrunge :) 14:04:02 <ajo> can I ping you later to get updates on QoS/horizon integration? 14:04:12 <ajo> #topic agenda 14:04:18 <mrunge> ajo, I'm about to leave the house 14:04:22 <ajo> #link https://etherpad.openstack.org/p/qos-mitaka 14:04:40 <ajo> mrunge : ok, if you can send me a quick status update via query I will tell everybody later, 14:04:58 <ajo> if not possible, it's ok, we will see that next meeting then 14:04:59 <irenab> I am at parallel meeting, sorry for slow participation 14:05:34 <ajo> So, first thing on the agenda, 14:05:49 <ajo> well, first, welcome back to the meeting, and 2nd... 14:06:09 <ajo> does it seem right to move this meeting into "every two weeks mode" ? 14:06:27 <ajo> irenab , ihrachys , moshele , njohnston ? 14:06:33 <ihrachys> I don't mind if we do. 14:06:38 <moshele> fine by me 14:06:39 <irenab> +1 14:06:47 <ihrachys> I am not that involved though these days, so meh. 14:06:48 <ajo> I suspect we will have work, and not enough fuel for 1h meetings for now 14:06:48 <njohnston> I am all right with it, I don't think the updates will be coming fast and furious at this point. 14:07:02 <ajo> yeah njohnston :) 14:07:09 <ajo> ok, so, I will send an update 14:07:18 <ihrachys> yeah, it was different during L when we were indeed rushing and stepping on each one's feet 14:07:26 <ajo> #action set the meeting for every two weeks, instead of weekly 14:07:45 <ajo> ok, 14:08:01 <ajo> let's track the current ongoing work :) 14:08:10 <ajo> #topic ongoing items 14:08:16 <mrunge> ajo, sent in separate query 14:08:24 <ajo> thanks mrunge : very appreciated 14:08:59 <ajo> First item is RBAC, hdaniel is just joining (pinged me via query), so he will update us 14:09:27 * ajo looks for the bug link 14:09:31 <ihrachys> I don't see him though 14:09:52 <ihrachys> can we move forward while he joins? 14:10:13 <ajo> yes 14:10:19 <ajo> njohnston , could you update on the DSCP status? :) 14:10:57 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1468353 14:10:57 <openstack> Launchpad bug 1468353 in neutron "QoS DSCP marking rule support" [Medium,New] - Assigned to Nate Johnston (nate-johnston) 14:11:30 <njohnston> So the way we left it is that you were thinking the RFE was not required 14:11:40 <njohnston> So I wasn't sure if I should abandon it, or not. 14:11:54 <ajo> You can use the bug as a tracker 14:12:01 <ajo> and start sending patches if you want 14:12:07 <njohnston> But we're working on code on our side to implement it using OVS flows 14:12:10 <ihrachys> well I believe RFE bug is good to track the feature in Mitaka. not sure about spec. 14:12:27 <ajo> as per my conversation from armax after the previous week drivers meeting 14:12:39 <ihrachys> njohnston++ for starting on the code 14:12:40 <ajo> we don't need the #rfe or #rfe-approved tags for this 14:12:55 <ihrachys> agreed with no need for tags 14:13:07 <ihrachys> just make sure it's targeted for M 14:13:08 <ajo> but the bug is good as a tracker of completion 14:13:20 <njohnston> One idea we were wondering is, should we modify the existing QoS devref or create a new one? I was thinking modify the existing... 14:13:52 <ihrachys> njohnston: existing 14:14:58 <njohnston> Good. So currently we're working on our updates to the agent code, making sure that everything gets run properly through add_flow/mod_flow etc. as well as devref updates, starting with the unit tests and then working back to create working code. 14:15:46 <moshele> do you have code patches for review ? 14:17:26 <njohnston> No, not up yet; we have about half the unit tests done, and we're trying to think very hard about making sure we don't screw up the flow table. 14:17:47 <njohnston> We hope to have parts up for review in the coming weeks. 14:17:51 <ihrachys> njohnston: WIP patches could be of help 14:18:03 <ajo> ack, coming weeks sounds nice 14:18:12 <ihrachys> ajo: I don't like 's' in that 'weeks' 14:18:13 <njohnston> Great, I'll work hard to get something up ASAP 14:18:14 <ihrachys> :) 14:18:32 <njohnston> I say weeks because ofvacation around the Thanksgiving holiday in the US 14:18:32 <slaweq_work> hello, is it qos meeting now? 14:18:44 <ajo> njohnston : please note that the flow tables could eventually get refactored into a different form, but I will be aware of the qos/dscp part 14:18:48 <ihrachys> slaweq_work: right. we'll discuss lb bwlimit later 14:18:59 <slaweq_work> ihrachys: ok, thx 14:19:02 <ajo> njohnston and not likely to happen this cycle unless required by the circumstances 14:19:02 <njohnston> ajo: Awesome 14:19:22 <ajo> ok, so, let's move on into RBAC 14:19:25 <ajo> ping hdaniel 14:19:33 <hdaniel> ajo: pong 14:19:40 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1512587 14:19:40 <openstack> Launchpad bug 1512587 in neutron "[RFE] Role-based Access Control for QoS policies" [Wishlist,Triaged] - Assigned to Haim Daniel (hdaniel) 14:19:47 <ajo> hdaniel , could you update us on your findings? :) 14:20:19 <hdaniel> ajo: So there's a dilemma regarding the qos shared field 14:20:51 <ajo> correct, 14:21:19 <ajo> as far as I understood from our last talk, networks code emulate the --shared flag to create rbac entries, or to pull rbac entries as "True/False" on that field 14:21:27 <ajo> over the api requests 14:21:35 <ajo> so here, we have two options: 14:21:35 <hdaniel> ajo: exactly. 14:21:51 <ajo> 1) we throw away the --shared flag on the policies, and solely rely on rbac 14:22:25 <ajo> 2) we introduce the compatibility layer (which is a bit of a PITA to keep), in kevinbenton words: if you can get rid of shared now, do it :) 14:22:52 <ajo> we would be introducing an incompatible change to integrations with qos, 14:23:05 <ihrachys> ajo: they already have the field, we can't just drop it 14:23:08 <ajo> but as it's an experimental API yet, it could be ok 14:23:09 <ajo> opinions? 14:23:20 <ihrachys> experimental? where did we claim it? 14:24:01 <ajo> ihrachys : that was my understanding , may be I got it wrong :) 14:24:23 <ihrachys> my belief is it's not experimental unless we make it clear 14:24:34 <ihrachys> with warnings, or docs 14:24:44 <ajo> ok 14:25:05 <ajo> any other thoughts? 14:25:12 <ihrachys> ajo: it's also not that much of a PITA since object properly will hide it 14:25:20 <ihrachys> *property 14:25:26 <ajo> hdaniel : I guess we may want to keep the shared flag then. 14:25:44 <hdaniel> ajo, ihrachys: I gambled on that , so the current patch behaves that way 14:25:46 <ajo> and what ihrachys suggests sounds like a good pattern, including a shared (settable) property on the object. 14:25:57 <ajo> ok hdaniel :) 14:26:09 <ihrachys> hdaniel: wise boy ;) 14:26:11 <ajo> code_cleanups-- 14:26:14 <ajo> compatibility++ 14:26:23 <hdaniel> head_ache++ 14:26:27 <ajo> lol 14:26:33 <ihrachys> hdaniel: rule#1: always selected the most painful path 14:26:37 <ihrachys> *select 14:27:02 <ajo> ':) 14:27:23 <hdaniel> ihrachys, ajo: will write that (with blood) 14:27:36 * ihrachys hopes not his blood 14:27:52 * ajo runs scared 14:27:57 <ajo> ok :) 14:27:59 <ajo> next topic 14:28:02 <ajo> or 14:28:12 <ajo> hdaniel , any other thing related to RBAC that could be important? 14:28:40 <hdaniel> ajo: nope, but I'm 100% sure they'll appear after the submission - 14:29:11 <ajo> ack :) 14:29:16 <ajo> devil is in the details... 14:29:21 <ajo> ok so 14:29:25 <ajo> #topic horizon integration 14:29:28 <ajo> #link https://blueprints.launchpad.net/horizon/+spec/network-bandwidth-limiting-qos 14:29:48 <ajo> masco is making progress on that front: https://review.openstack.org/#/c/247997/ 14:29:51 <ajo> thank you masco! ;) 14:30:26 <ihrachys> oh so cool 14:30:31 <ajo> wow +521 lines 14:30:43 <ajo> it's still not possible to create policies, but it's on it's way 14:31:27 <ajo> I guess we should all eventually start testing the patch 14:31:38 <ihrachys> it's full of js magic. I bow before masco's greatness. 14:32:08 <ajo> "Masco Kaliyamoorthy"++ 14:32:30 <ajo> ok 14:32:40 <ajo> after this moment of awesomeness, 14:32:43 <ajo> #topic slow moving things 14:32:55 <ajo> hmmm 14:32:56 <ajo> sorry 14:33:07 <ajo> slaweq_work , you wanted to update on LB integration? 14:33:14 <ajo> #undo 14:33:14 <openstack> Removing item from minutes: <ircmeeting.items.Topic object at 0x8bbb990> 14:33:25 <ajo> #topic LinuxBridge/qos integration 14:33:32 <slaweq_work> ajo: for now I'm working more on fullstack tests for linuxbridge 14:33:46 <slaweq_work> and then I will continue qos for linuxbridge 14:34:05 <ihrachys> slaweq_work: I guess we can move bwlimit part to Mitaka-2 14:34:26 <slaweq_work> ihrachys: if You said so :) 14:34:27 <ajo> yes, also, there's ongoing work to refactor the linux bridge agent, so I guess it makes sense 14:34:54 <ihrachys> ajo: fullstack does not really care about internal structure of the agent since it runs processes 14:35:17 <ajo> ihrachys : yes, but implementing qos cares about that :) 14:35:21 <slaweq_work> ihrachys: true, but I'm not expert with fullstack tests so far 14:35:26 <ajo> so it's better not to mess with a moving target :) 14:35:26 <slaweq_work> and I'm still learning it 14:35:41 <slaweq_work> I hope that at end of this week I will push something to review :) 14:35:50 <ajo> slaweq_work : I recommend you to talk to jschwartz when available if you have questions 14:35:54 <ajo> or amuller 14:36:05 <ihrachys> ajo: it's better to get in before that other moving target has a chance ;) 14:36:08 <slaweq_work> yep, I was talking with amuller few times 14:36:11 <ajo> ihrachys : lol 14:36:18 <ihrachys> slaweq_work: cool, I will review once smth reviewable is up 14:36:24 <ajo> different philosophies :D 14:36:34 <slaweq_work> k, thx ihrachys 14:36:37 <ajo> ok, 14:36:48 <ajo> #topic slow moving topics 14:37:01 <ajo> ping sc68cal (for traffic classification later) 14:37:13 <ajo> first, bandwidth guarantee support, 14:37:51 <ajo> I spent quite a bit of time investigating about it, 14:38:03 <moshele> ajo so for SR-IOV the NIC driver is not ready yet at least for mellanox 14:38:03 <ajo> in the context of OVS & LB, and a bit on sr-iov 14:38:22 <ajo> moshele , ack, so the min-bw settings are still on the way, I guess 14:38:33 <ihrachys> ajo: is it scheduler thing? 14:38:35 <slaweq_work> for ovs I think that such thinkgs could be done quite easily with "tc" 14:38:45 <ajo> ihrachys : technical side on neutron & scheduler thing, yes 14:38:47 <moshele> ajo yes 14:39:12 <slaweq_work> especially when we are using hybrid connection than we can use tc with htb qdisc on those interfaces 14:39:22 <ajo> On the technical side, it's not possible to manage bandwidth guarantees within a node for OVS solely based on openflow rules, 14:39:30 <ajo> yes slaweq_work: tc works, I tried that 14:39:34 <irenab> moshele: any sr-iov nic support min_bm? 14:39:46 <ajo> but I'm not happy with adding another layer of filtering when openflow rules could do that 14:39:54 <moshele> irenab: I don't know 14:40:08 <ihrachys> ajo: do we want to look into using LB qos driver for OVS agent? :) 14:40:09 <slaweq_work> for linuxbridge it could be also made with tc but it will only work in one direction as we have only tap interface to apply rules 14:40:25 <ajo> TL;DR: the arrangement of our openflow rules doesn't allow to direct traffic flows through specific queues. 14:40:35 <ajo> slaweq_work: that's not exactly correct :) 14:40:39 <ajo> slaweq_work : tc is confusing 14:40:45 <ihrachys> slaweq_work: what do you need more? we could think of extending agent API for extensions 14:40:47 <ajo> that was my initial understanding 14:41:08 <ajo> The issue with bandwidth guarantees 14:41:22 <slaweq_work> ihrachys: what do I need more for what? 14:41:35 <ajo> is that you need to build hierarchical queues over a single interface, and then make one queue for every single traffic flow 14:41:57 <ajo> in our case, the optimal point seems to be the connection between br-int and the external networks, or br-tun 14:41:57 <ihrachys> slaweq_work: you mentioned you have 'only' tap device, so you probably miss smth 14:42:15 <ajo> ihrachys , it's a technical thing with linux kernel, queues, and interfaces 14:42:26 <ajo> you can only "queue" ingress traffic (to a bridge) 14:42:30 <ajo> sorry 14:42:34 <ajo> egress traffic from a bridge 14:42:38 <slaweq_work> ajo: exactly 14:42:38 <ajo> I always change the direction :) 14:42:43 <ajo> but 14:42:52 <ajo> there's also a requisite to build queues hierarchicahly on a single port 14:42:58 <ajo> so for example 14:43:05 * ihrachys 's head just blew 14:43:08 <ajo> if we had the connection from br-int to br-tun 14:43:19 <ajo> we could create a top queue that indicates the maximum bandwidth of that link 14:43:34 <ajo> and then another queue under it to handle another flow, 14:43:36 <ajo> another queue, 14:43:37 <ajo> etc.. 14:43:39 <ajo> yes 14:43:41 <ajo> it's mindblowing 14:44:02 <ajo> slaweq_work : if we take both sides (br-int to br-tun , and br-tun to br-int) you effectively can control both paths 14:44:06 <slaweq_work> ajo: good to know that 14:44:14 <ajo> and you also comply with having the queues as hierarchical 14:44:21 <ajo> bad part is 14:44:24 <slaweq_work> for ovs yes 14:44:33 <ajo> yes, linuxbridge is a different history 14:44:39 <ajo> we need somebody to look at how to handle that 14:44:42 <slaweq_work> but I didn't know that it is possible for linuxbridge (where there is no such bridges) 14:44:53 <ajo> slaweq_work , probably you need another bridge 14:45:07 <ajo> and a veth pair 14:45:08 <ajo> deployed 14:45:24 <slaweq_work> as I said before: similiar to hybrid connection when You are using ovs bindings 14:45:27 <ihrachys> ajo: meaning you need to rewire network to enable it? that's kinda against rolling upgrade requirements. 14:45:36 <ajo> so OVS has it's issues (openflow rule arrangement is not optimal, and we may need to filter traffic again by mac/vlan), 14:45:43 <ajo> linux bridge has it's issues too 14:45:56 <ajo> ihrachys , it's not a rolling upgrade in this case, it's installing a new service 14:46:08 <ajo> ihrachys , in that case operators could take it, or leave it 14:46:22 <slaweq_work> but still IMHO if we will do it directly with tc then it could be done in same way for both agents 14:46:24 <ajo> ihrachys : I say it could be an optional thing for LB 14:46:38 <ajo> slaweq_work , yes, that's the good point of using TC 14:46:39 <slaweq_work> if there will be another veth-pair for lb 14:46:56 <ihrachys> ajo: it's not a new service if you had it enabled before 14:46:57 <ajo> sharing the implementation 14:47:05 <ihrachys> ajo: we have it in L 14:47:15 <ajo> ihrachys , hmm, true, but not for LB 14:47:44 <ajo> ihrachys , btw, that's only for bandwidth guarantees, may be LB won't be able to support bandwidth guarantees without such configuration change 14:47:54 <ihrachys> ajo: indeed not for LB. though once we merge slaweq_work's patch, it affects that too 14:48:02 <ajo> or may be slaweq_work is able to find a workaround around it 14:48:03 <ajo> :) 14:48:08 <ajo> I'm not pushing to make this now btw 14:48:09 <ajo> :) 14:48:14 <ihrachys> ajo: ok, I need to think it thru 14:48:22 <ajo> I'm just sharing the facts, and saying: this is not for now, we're not ready :) 14:48:27 <slaweq_work> me too 14:48:43 <ajo> I have a half-cooked post about the topic I never finished 14:48:54 <ajo> I guess I should finish it, and push the publish button 14:48:56 <ihrachys> ajo: looking fwd for the post 14:49:03 <ajo> slaweq_work , ihrachys , will ping you 14:49:04 <ajo> also 14:49:06 * njohnston too 14:49:09 <ihrachys> ok should we move? 14:49:20 <ajo> this technical discussion above ^, is for the in-compute-node bandwidth-guarantees 14:49:29 <ajo> SR-IOV: no go, OVS: no go , LB: no go (yet) 14:49:38 <ajo> also, we have the scheduling bits 14:49:51 <ihrachys> ajo: it will be a long road for sure 14:49:59 <ajo> we should collaborate with nova to send information to the scheduler, and influence scheduling decissions 14:50:11 <ajo> because otherwise there will be ports which cannot be bound because we don't have enough BW 14:50:14 <ajo> on a compute node 14:50:19 <ajo> over an specific network 14:50:38 <ajo> I'm currently working on a spec to keep that scheduler discussion moving 14:51:12 <moshele> ajo: let me know if you need help with that 14:51:12 <ihrachys> ajo++ for taking the burden of working with nova project on that hard bite 14:51:14 <ajo> and irenab and I, thought that it's probably a good thing to start, at least, collecting on our side, the available klink bandwidth related to each physical network on every compute host/network node 14:51:28 <ajo> ihrachys : my teeth are hurting :D 14:51:41 <ajo> moshele , ihrachys , will loop you in the spec 14:51:45 <irenab> ajo: :-) 14:51:51 <ajo> I will announce it next meeting 14:51:55 <ajo> in two weeks ;) 14:52:13 <ajo> we're tight on time 14:52:20 <ajo> let's move to next topic 14:52:28 <ajo> #topic traffic classification 14:52:52 <irenab> can we spend few mins on bugs? 14:52:59 <ajo> I know that work was making progress, but probably to live in a separate library for reuse from other projects 14:53:02 <ajo> sc68cal was leading that 14:53:11 <ajo> yep irenab , I think it's a good idea 14:53:13 <ajo> let's jump on that 14:53:18 <ajo> #topic Bugs 14:53:32 <ajo> We have Update network with New Qos-Policy isn't working with SR-IOV agent - https://bugs.launchpad.net/neutron/+bug/1504166 14:53:32 <openstack> Launchpad bug 1504166 in neutron "Update network with New Qos-Policy isn't working with SR-IOV agent" [Undecided,In progress] - Assigned to yalei wang (yalei-wang) 14:53:46 <ajo> moshele , ihrachys , you were handling it , right? 14:53:54 <ihrachys> was I? oh 14:54:00 <moshele> I will 14:54:05 <ajo> oh 14:54:09 <ajo> sorry 14:54:12 <ajo> Yalei Wang sent a patch: https://review.openstack.org/#/c/233499/ 14:54:30 <ajo> let's make sure we get it reviewed 14:54:37 <ajo> #link https://review.openstack.org/#/c/233499/ 14:54:53 <moshele> it WIP 14:54:56 <ajo> theres this one on me: 14:54:57 <ajo> https://review.openstack.org/#/c/233499/ 14:55:12 <ajo> anybody has bandwidth to make those API failures nicer? 14:55:33 <njohnston> isn't that the same link as you mentioned above re: Yalei Wang? 14:55:35 <ajo> I'm removing the assignee and letting other volunteers eventually take it 14:55:39 <ajo> since it's not realistic that I finish that 14:55:45 <ihrachys> ajo: link wrong? 14:55:49 <ajo> oh 14:55:49 <ajo> sorry 14:55:56 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1496787 14:55:56 <openstack> Launchpad bug 1496787 in neutron "If qos service_plugin is enabled, but ml2 extension driver is not, api requests attaching policies to ports or nets will fail with an ugly exception" [Low,Confirmed] 14:56:27 <ajo> we also have this one: https://bugs.launchpad.net/neutron/+bug/1486607 14:56:27 <openstack> Launchpad bug 1486607 in neutron "tenants seem like they were able to detach admin enforced QoS policies from ports or networks" [Low,In progress] - Assigned to yong sheng gong (gongysh) 14:56:39 <ihrachys> I see core resource extension manager mentioned... I feel guilt now. 14:56:45 <ihrachys> totally forgot about that beast 14:57:01 <ajo> ihrachys , np, I know you're perfectly capable of handling it :) 14:57:12 <ihrachys> I will probably take that one for now 14:57:13 <ajo> and it's partly coupled to that objectization of neutron core resources 14:57:18 <slaweq_work> ajo: I can check https://bugs.launchpad.net/neutron/+bug/1496787 if it is not problem :) 14:57:18 <openstack> Launchpad bug 1496787 in neutron "If qos service_plugin is enabled, but ml2 extension driver is not, api requests attaching policies to ports or nets will fail with an ugly exception" [Low,Confirmed] 14:57:28 <ajo> slaweq_work : thanks a lot, that'd be great 14:57:33 <slaweq_work> ok 14:57:48 <ihrachys> slaweq_work: ok, keep me in the loop, I may have silly ideas about it 14:57:48 <slaweq_work> great, thx :) 14:57:57 <slaweq_work> ihrachys: ok 14:58:06 <ajo> and we also have this other one: https://bugs.launchpad.net/neutron/+bug/1509232 14:58:06 <openstack> Launchpad bug 1509232 in neutron "If we update a QoSPolicy description, the agents get notified and rules get rewired for nothing" [Medium,Confirmed] - Assigned to Irena Berezovsky (irenab) 14:58:35 <irenab> I checked this bug, need some advise regarding the level where to filter the change 14:58:38 <ajo> it's not of high importance 14:58:42 <ihrachys> ajo: for 1486607 I believe the best way is adding tenant_id to qos rule models 14:58:50 <ajo> irenab : probably in the notification driver, 14:58:59 <ajo> hmmm 14:59:08 <irenab> ajo: ihrachys : will ping you on the channel to discuss the alternatives 14:59:14 <ajo> but the notification driver has no idea about what changed probably 14:59:21 <ajo> irenab , ping me and let's look at it together 14:59:33 <irenab> ajo: great, thanks 14:59:35 <ihrachys> ajo: yeah, it does not care. so it belongs to plugin 14:59:40 <ajo> ihrachys : probably I agree 14:59:47 <irenab> ihrachys: or to the agent 15:00:01 <ajo> ok, next time I probably must make better use of the meeting time :) 15:00:05 <ajo> #endmeeting