14:03:39 #startmeeting neutron-qos 14:03:40 Meeting started Wed Nov 25 14:03:39 2015 UTC and is due to finish in 60 minutes. The chair is ajo. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:03:43 The meeting name has been set to 'neutron_qos' 14:03:49 hi mrunge :) 14:04:02 can I ping you later to get updates on QoS/horizon integration? 14:04:12 #topic agenda 14:04:18 ajo, I'm about to leave the house 14:04:22 #link https://etherpad.openstack.org/p/qos-mitaka 14:04:40 mrunge : ok, if you can send me a quick status update via query I will tell everybody later, 14:04:58 if not possible, it's ok, we will see that next meeting then 14:04:59 I am at parallel meeting, sorry for slow participation 14:05:34 So, first thing on the agenda, 14:05:49 well, first, welcome back to the meeting, and 2nd... 14:06:09 does it seem right to move this meeting into "every two weeks mode" ? 14:06:27 irenab , ihrachys , moshele , njohnston ? 14:06:33 I don't mind if we do. 14:06:38 fine by me 14:06:39 +1 14:06:47 I am not that involved though these days, so meh. 14:06:48 I suspect we will have work, and not enough fuel for 1h meetings for now 14:06:48 I am all right with it, I don't think the updates will be coming fast and furious at this point. 14:07:02 yeah njohnston :) 14:07:09 ok, so, I will send an update 14:07:18 yeah, it was different during L when we were indeed rushing and stepping on each one's feet 14:07:26 #action set the meeting for every two weeks, instead of weekly 14:07:45 ok, 14:08:01 let's track the current ongoing work :) 14:08:10 #topic ongoing items 14:08:16 ajo, sent in separate query 14:08:24 thanks mrunge : very appreciated 14:08:59 First item is RBAC, hdaniel is just joining (pinged me via query), so he will update us 14:09:27 * ajo looks for the bug link 14:09:31 I don't see him though 14:09:52 can we move forward while he joins? 14:10:13 yes 14:10:19 njohnston , could you update on the DSCP status? :) 14:10:57 #link https://bugs.launchpad.net/neutron/+bug/1468353 14:10:57 Launchpad bug 1468353 in neutron "QoS DSCP marking rule support" [Medium,New] - Assigned to Nate Johnston (nate-johnston) 14:11:30 So the way we left it is that you were thinking the RFE was not required 14:11:40 So I wasn't sure if I should abandon it, or not. 14:11:54 You can use the bug as a tracker 14:12:01 and start sending patches if you want 14:12:07 But we're working on code on our side to implement it using OVS flows 14:12:10 well I believe RFE bug is good to track the feature in Mitaka. not sure about spec. 14:12:27 as per my conversation from armax after the previous week drivers meeting 14:12:39 njohnston++ for starting on the code 14:12:40 we don't need the #rfe or #rfe-approved tags for this 14:12:55 agreed with no need for tags 14:13:07 just make sure it's targeted for M 14:13:08 but the bug is good as a tracker of completion 14:13:20 One idea we were wondering is, should we modify the existing QoS devref or create a new one? I was thinking modify the existing... 14:13:52 njohnston: existing 14:14:58 Good. So currently we're working on our updates to the agent code, making sure that everything gets run properly through add_flow/mod_flow etc. as well as devref updates, starting with the unit tests and then working back to create working code. 14:15:46 do you have code patches for review ? 14:17:26 No, not up yet; we have about half the unit tests done, and we're trying to think very hard about making sure we don't screw up the flow table. 14:17:47 We hope to have parts up for review in the coming weeks. 14:17:51 njohnston: WIP patches could be of help 14:18:03 ack, coming weeks sounds nice 14:18:12 ajo: I don't like 's' in that 'weeks' 14:18:13 Great, I'll work hard to get something up ASAP 14:18:14 :) 14:18:32 I say weeks because ofvacation around the Thanksgiving holiday in the US 14:18:32 hello, is it qos meeting now? 14:18:44 njohnston : please note that the flow tables could eventually get refactored into a different form, but I will be aware of the qos/dscp part 14:18:48 slaweq_work: right. we'll discuss lb bwlimit later 14:18:59 ihrachys: ok, thx 14:19:02 njohnston and not likely to happen this cycle unless required by the circumstances 14:19:02 ajo: Awesome 14:19:22 ok, so, let's move on into RBAC 14:19:25 ping hdaniel 14:19:33 ajo: pong 14:19:40 #link https://bugs.launchpad.net/neutron/+bug/1512587 14:19:40 Launchpad bug 1512587 in neutron "[RFE] Role-based Access Control for QoS policies" [Wishlist,Triaged] - Assigned to Haim Daniel (hdaniel) 14:19:47 hdaniel , could you update us on your findings? :) 14:20:19 ajo: So there's a dilemma regarding the qos shared field 14:20:51 correct, 14:21:19 as far as I understood from our last talk, networks code emulate the --shared flag to create rbac entries, or to pull rbac entries as "True/False" on that field 14:21:27 over the api requests 14:21:35 so here, we have two options: 14:21:35 ajo: exactly. 14:21:51 1) we throw away the --shared flag on the policies, and solely rely on rbac 14:22:25 2) we introduce the compatibility layer (which is a bit of a PITA to keep), in kevinbenton words: if you can get rid of shared now, do it :) 14:22:52 we would be introducing an incompatible change to integrations with qos, 14:23:05 ajo: they already have the field, we can't just drop it 14:23:08 but as it's an experimental API yet, it could be ok 14:23:09 opinions? 14:23:20 experimental? where did we claim it? 14:24:01 ihrachys : that was my understanding , may be I got it wrong :) 14:24:23 my belief is it's not experimental unless we make it clear 14:24:34 with warnings, or docs 14:24:44 ok 14:25:05 any other thoughts? 14:25:12 ajo: it's also not that much of a PITA since object properly will hide it 14:25:20 *property 14:25:26 hdaniel : I guess we may want to keep the shared flag then. 14:25:44 ajo, ihrachys: I gambled on that , so the current patch behaves that way 14:25:46 and what ihrachys suggests sounds like a good pattern, including a shared (settable) property on the object. 14:25:57 ok hdaniel :) 14:26:09 hdaniel: wise boy ;) 14:26:11 code_cleanups-- 14:26:14 compatibility++ 14:26:23 head_ache++ 14:26:27 lol 14:26:33 hdaniel: rule#1: always selected the most painful path 14:26:37 *select 14:27:02 ':) 14:27:23 ihrachys, ajo: will write that (with blood) 14:27:36 * ihrachys hopes not his blood 14:27:52 * ajo runs scared 14:27:57 ok :) 14:27:59 next topic 14:28:02 or 14:28:12 hdaniel , any other thing related to RBAC that could be important? 14:28:40 ajo: nope, but I'm 100% sure they'll appear after the submission - 14:29:11 ack :) 14:29:16 devil is in the details... 14:29:21 ok so 14:29:25 #topic horizon integration 14:29:28 #link https://blueprints.launchpad.net/horizon/+spec/network-bandwidth-limiting-qos 14:29:48 masco is making progress on that front: https://review.openstack.org/#/c/247997/ 14:29:51 thank you masco! ;) 14:30:26 oh so cool 14:30:31 wow +521 lines 14:30:43 it's still not possible to create policies, but it's on it's way 14:31:27 I guess we should all eventually start testing the patch 14:31:38 it's full of js magic. I bow before masco's greatness. 14:32:08 "Masco Kaliyamoorthy"++ 14:32:30 ok 14:32:40 after this moment of awesomeness, 14:32:43 #topic slow moving things 14:32:55 hmmm 14:32:56 sorry 14:33:07 slaweq_work , you wanted to update on LB integration? 14:33:14 #undo 14:33:14 Removing item from minutes: 14:33:25 #topic LinuxBridge/qos integration 14:33:32 ajo: for now I'm working more on fullstack tests for linuxbridge 14:33:46 and then I will continue qos for linuxbridge 14:34:05 slaweq_work: I guess we can move bwlimit part to Mitaka-2 14:34:26 ihrachys: if You said so :) 14:34:27 yes, also, there's ongoing work to refactor the linux bridge agent, so I guess it makes sense 14:34:54 ajo: fullstack does not really care about internal structure of the agent since it runs processes 14:35:17 ihrachys : yes, but implementing qos cares about that :) 14:35:21 ihrachys: true, but I'm not expert with fullstack tests so far 14:35:26 so it's better not to mess with a moving target :) 14:35:26 and I'm still learning it 14:35:41 I hope that at end of this week I will push something to review :) 14:35:50 slaweq_work : I recommend you to talk to jschwartz when available if you have questions 14:35:54 or amuller 14:36:05 ajo: it's better to get in before that other moving target has a chance ;) 14:36:08 yep, I was talking with amuller few times 14:36:11 ihrachys : lol 14:36:18 slaweq_work: cool, I will review once smth reviewable is up 14:36:24 different philosophies :D 14:36:34 k, thx ihrachys 14:36:37 ok, 14:36:48 #topic slow moving topics 14:37:01 ping sc68cal (for traffic classification later) 14:37:13 first, bandwidth guarantee support, 14:37:51 I spent quite a bit of time investigating about it, 14:38:03 ajo so for SR-IOV the NIC driver is not ready yet at least for mellanox 14:38:03 in the context of OVS & LB, and a bit on sr-iov 14:38:22 moshele , ack, so the min-bw settings are still on the way, I guess 14:38:33 ajo: is it scheduler thing? 14:38:35 for ovs I think that such thinkgs could be done quite easily with "tc" 14:38:45 ihrachys : technical side on neutron & scheduler thing, yes 14:38:47 ajo yes 14:39:12 especially when we are using hybrid connection than we can use tc with htb qdisc on those interfaces 14:39:22 On the technical side, it's not possible to manage bandwidth guarantees within a node for OVS solely based on openflow rules, 14:39:30 yes slaweq_work: tc works, I tried that 14:39:34 moshele: any sr-iov nic support min_bm? 14:39:46 but I'm not happy with adding another layer of filtering when openflow rules could do that 14:39:54 irenab: I don't know 14:40:08 ajo: do we want to look into using LB qos driver for OVS agent? :) 14:40:09 for linuxbridge it could be also made with tc but it will only work in one direction as we have only tap interface to apply rules 14:40:25 TL;DR: the arrangement of our openflow rules doesn't allow to direct traffic flows through specific queues. 14:40:35 slaweq_work: that's not exactly correct :) 14:40:39 slaweq_work : tc is confusing 14:40:45 slaweq_work: what do you need more? we could think of extending agent API for extensions 14:40:47 that was my initial understanding 14:41:08 The issue with bandwidth guarantees 14:41:22 ihrachys: what do I need more for what? 14:41:35 is that you need to build hierarchical queues over a single interface, and then make one queue for every single traffic flow 14:41:57 in our case, the optimal point seems to be the connection between br-int and the external networks, or br-tun 14:41:57 slaweq_work: you mentioned you have 'only' tap device, so you probably miss smth 14:42:15 ihrachys , it's a technical thing with linux kernel, queues, and interfaces 14:42:26 you can only "queue" ingress traffic (to a bridge) 14:42:30 sorry 14:42:34 egress traffic from a bridge 14:42:38 ajo: exactly 14:42:38 I always change the direction :) 14:42:43 but 14:42:52 there's also a requisite to build queues hierarchicahly on a single port 14:42:58 so for example 14:43:05 * ihrachys 's head just blew 14:43:08 if we had the connection from br-int to br-tun 14:43:19 we could create a top queue that indicates the maximum bandwidth of that link 14:43:34 and then another queue under it to handle another flow, 14:43:36 another queue, 14:43:37 etc.. 14:43:39 yes 14:43:41 it's mindblowing 14:44:02 slaweq_work : if we take both sides (br-int to br-tun , and br-tun to br-int) you effectively can control both paths 14:44:06 ajo: good to know that 14:44:14 and you also comply with having the queues as hierarchical 14:44:21 bad part is 14:44:24 for ovs yes 14:44:33 yes, linuxbridge is a different history 14:44:39 we need somebody to look at how to handle that 14:44:42 but I didn't know that it is possible for linuxbridge (where there is no such bridges) 14:44:53 slaweq_work , probably you need another bridge 14:45:07 and a veth pair 14:45:08 deployed 14:45:24 as I said before: similiar to hybrid connection when You are using ovs bindings 14:45:27 ajo: meaning you need to rewire network to enable it? that's kinda against rolling upgrade requirements. 14:45:36 so OVS has it's issues (openflow rule arrangement is not optimal, and we may need to filter traffic again by mac/vlan), 14:45:43 linux bridge has it's issues too 14:45:56 ihrachys , it's not a rolling upgrade in this case, it's installing a new service 14:46:08 ihrachys , in that case operators could take it, or leave it 14:46:22 but still IMHO if we will do it directly with tc then it could be done in same way for both agents 14:46:24 ihrachys : I say it could be an optional thing for LB 14:46:38 slaweq_work , yes, that's the good point of using TC 14:46:39 if there will be another veth-pair for lb 14:46:56 ajo: it's not a new service if you had it enabled before 14:46:57 sharing the implementation 14:47:05 ajo: we have it in L 14:47:15 ihrachys , hmm, true, but not for LB 14:47:44 ihrachys , btw, that's only for bandwidth guarantees, may be LB won't be able to support bandwidth guarantees without such configuration change 14:47:54 ajo: indeed not for LB. though once we merge slaweq_work's patch, it affects that too 14:48:02 or may be slaweq_work is able to find a workaround around it 14:48:03 :) 14:48:08 I'm not pushing to make this now btw 14:48:09 :) 14:48:14 ajo: ok, I need to think it thru 14:48:22 I'm just sharing the facts, and saying: this is not for now, we're not ready :) 14:48:27 me too 14:48:43 I have a half-cooked post about the topic I never finished 14:48:54 I guess I should finish it, and push the publish button 14:48:56 ajo: looking fwd for the post 14:49:03 slaweq_work , ihrachys , will ping you 14:49:04 also 14:49:06 * njohnston too 14:49:09 ok should we move? 14:49:20 this technical discussion above ^, is for the in-compute-node bandwidth-guarantees 14:49:29 SR-IOV: no go, OVS: no go , LB: no go (yet) 14:49:38 also, we have the scheduling bits 14:49:51 ajo: it will be a long road for sure 14:49:59 we should collaborate with nova to send information to the scheduler, and influence scheduling decissions 14:50:11 because otherwise there will be ports which cannot be bound because we don't have enough BW 14:50:14 on a compute node 14:50:19 over an specific network 14:50:38 I'm currently working on a spec to keep that scheduler discussion moving 14:51:12 ajo: let me know if you need help with that 14:51:12 ajo++ for taking the burden of working with nova project on that hard bite 14:51:14 and irenab and I, thought that it's probably a good thing to start, at least, collecting on our side, the available klink bandwidth related to each physical network on every compute host/network node 14:51:28 ihrachys : my teeth are hurting :D 14:51:41 moshele , ihrachys , will loop you in the spec 14:51:45 ajo: :-) 14:51:51 I will announce it next meeting 14:51:55 in two weeks ;) 14:52:13 we're tight on time 14:52:20 let's move to next topic 14:52:28 #topic traffic classification 14:52:52 can we spend few mins on bugs? 14:52:59 I know that work was making progress, but probably to live in a separate library for reuse from other projects 14:53:02 sc68cal was leading that 14:53:11 yep irenab , I think it's a good idea 14:53:13 let's jump on that 14:53:18 #topic Bugs 14:53:32 We have Update network with New Qos-Policy isn't working with SR-IOV agent - https://bugs.launchpad.net/neutron/+bug/1504166 14:53:32 Launchpad bug 1504166 in neutron "Update network with New Qos-Policy isn't working with SR-IOV agent" [Undecided,In progress] - Assigned to yalei wang (yalei-wang) 14:53:46 moshele , ihrachys , you were handling it , right? 14:53:54 was I? oh 14:54:00 I will 14:54:05 oh 14:54:09 sorry 14:54:12 Yalei Wang sent a patch: https://review.openstack.org/#/c/233499/ 14:54:30 let's make sure we get it reviewed 14:54:37 #link https://review.openstack.org/#/c/233499/ 14:54:53 it WIP 14:54:56 theres this one on me: 14:54:57 https://review.openstack.org/#/c/233499/ 14:55:12 anybody has bandwidth to make those API failures nicer? 14:55:33 isn't that the same link as you mentioned above re: Yalei Wang? 14:55:35 I'm removing the assignee and letting other volunteers eventually take it 14:55:39 since it's not realistic that I finish that 14:55:45 ajo: link wrong? 14:55:49 oh 14:55:49 sorry 14:55:56 #link https://bugs.launchpad.net/neutron/+bug/1496787 14:55:56 Launchpad bug 1496787 in neutron "If qos service_plugin is enabled, but ml2 extension driver is not, api requests attaching policies to ports or nets will fail with an ugly exception" [Low,Confirmed] 14:56:27 we also have this one: https://bugs.launchpad.net/neutron/+bug/1486607 14:56:27 Launchpad bug 1486607 in neutron "tenants seem like they were able to detach admin enforced QoS policies from ports or networks" [Low,In progress] - Assigned to yong sheng gong (gongysh) 14:56:39 I see core resource extension manager mentioned... I feel guilt now. 14:56:45 totally forgot about that beast 14:57:01 ihrachys , np, I know you're perfectly capable of handling it :) 14:57:12 I will probably take that one for now 14:57:13 and it's partly coupled to that objectization of neutron core resources 14:57:18 ajo: I can check https://bugs.launchpad.net/neutron/+bug/1496787 if it is not problem :) 14:57:18 Launchpad bug 1496787 in neutron "If qos service_plugin is enabled, but ml2 extension driver is not, api requests attaching policies to ports or nets will fail with an ugly exception" [Low,Confirmed] 14:57:28 slaweq_work : thanks a lot, that'd be great 14:57:33 ok 14:57:48 slaweq_work: ok, keep me in the loop, I may have silly ideas about it 14:57:48 great, thx :) 14:57:57 ihrachys: ok 14:58:06 and we also have this other one: https://bugs.launchpad.net/neutron/+bug/1509232 14:58:06 Launchpad bug 1509232 in neutron "If we update a QoSPolicy description, the agents get notified and rules get rewired for nothing" [Medium,Confirmed] - Assigned to Irena Berezovsky (irenab) 14:58:35 I checked this bug, need some advise regarding the level where to filter the change 14:58:38 it's not of high importance 14:58:42 ajo: for 1486607 I believe the best way is adding tenant_id to qos rule models 14:58:50 irenab : probably in the notification driver, 14:58:59 hmmm 14:59:08 ajo: ihrachys : will ping you on the channel to discuss the alternatives 14:59:14 but the notification driver has no idea about what changed probably 14:59:21 irenab , ping me and let's look at it together 14:59:33 ajo: great, thanks 14:59:35 ajo: yeah, it does not care. so it belongs to plugin 14:59:40 ihrachys : probably I agree 14:59:47 ihrachys: or to the agent 15:00:01 ok, next time I probably must make better use of the meeting time :) 15:00:05 #endmeeting