15:02:22 #startmeeting neutron_qos 15:02:22 hi 15:02:22 Meeting started Tue Jan 31 15:02:22 2017 UTC and is due to finish in 60 minutes. The chair is ajo. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:23 Hello :) 15:02:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:25 The meeting name has been set to 'neutron_qos' 15:02:39 #topic Announcements 15:02:45 :q 15:02:49 Qos driver is finally in shape and refactor merged 15:02:53 great!!! 15:03:04 \o/ 15:03:08 hi njohnston 15:03:21 #link https://review.openstack.org/#/c/396651/ 15:03:22 sorry for being sooooo slow :/ 15:03:30 We shall ping the relevant people on midokura (yamamoto), ovn (russelb), 15:03:30 nsx (garyk) to see if they need any help migrating to the new 15:03:30 model, the old notification_driver support will be removed in Pike 15:03:53 #action ajo ping all qos-driver implementers to update their drivers in pike, change shall be easy, 15:04:30 Now slaweq_ is working on the 2nd thing: Enhanced validation 15:04:30 #link https://review.openstack.org/#/c/426946/ 15:04:35 for Pike 15:05:03 I will do my best to finish it asap 15:05:20 I'll review this patch tomorrow and I'll follow it 15:05:21 The structural changes made him reboot his original patch 15:05:25 But I hope the new interface will help 15:05:25 now it should be easier than it was before Your refactor 15:05:49 yes, before it was becoming a bit too complicated, let's see how it will look now 15:06:00 as I was looking into it yesterday it should be easier but we will see :) 15:06:37 My colleagues always fear me when I say it will be easy, it never is :) right dalvarez ? :P 15:07:02 I know, that's why I wrote "it should be" :P 15:07:02 so... I have another point for documentation 15:07:20 'XD 15:07:21 #topic Documentation 15:07:26 Networking guide needs to be updated about minimum egress bandwidth 15:07:26 support. 15:07:27 #link https://bugs.launchpad.net/neutron/+bug/1618769 15:07:28 #link https://bugs.launchpad.net/neutron/+bug/1618762 15:07:28 Launchpad bug 1618769 in openstack-manuals " SR-IOV: add agent QoS driver to support egress minimum bandwidth" [Low,Confirmed] - Assigned to Miguel Angel Ajo (mangelajo) 15:07:30 Launchpad bug 1618762 in openstack-manuals " Add QoS minimum bandwidth rule for instance egress traffic" [Low,Confirmed] - Assigned to Miguel Angel Ajo (mangelajo) 15:07:43 I planned to work on that 15:07:47 it shall be easy 15:08:00 but we must have it ready for the release, this was on Newton I think, but we forgot ':D 15:08:23 I can take one of this 15:08:25 ajo, lol im scared now 15:08:28 one of these 15:08:33 X) dalvarez 15:08:56 ralonsoh it's the same one I think, not sure if we need to do anything special for SR-IOV 15:09:05 No, nothing special 15:09:21 one was generated for the "api changes" and the other one for docimpact on the SR-IOV 15:09:27 ahhh I see 15:09:28 so one commit shall close it all, may by I can write, and I can ping you for reviews ralonsoh ? 15:09:34 yes 15:09:38 may by->may be 15:09:39 :) 15:09:49 o/ 15:10:00 hi reedip_ I've got a point for you later ;) 15:10:15 :D ok ajo 15:10:21 #topic Bugs 15:10:32 #link https://bugs.launchpad.net/neutron/+bug/1649503 15:10:32 Launchpad bug 1649503 in neutron "Mechanism driver can't be notified with updated network" [High,In progress] - Assigned to Hong Hui Xiao (xiaohhui) 15:10:32 We need kevinbenton's help on this one 15:10:47 it's related to some recent change, now when you detach a policy from a network, that's not notified properly 15:11:24 this looks like a high priority bug (in the context of QoS) 15:11:41 but it's waiting for action for quite long time now 15:11:54 maybe You can ask kevinbenton to look on it? :) 15:12:00 yes, I will, 15:12:04 I started by adding him to the bug 15:12:08 I will ping him personally 15:12:48 #action ajo ping kevinbenton about bug 1649503 15:12:48 bug 1649503 in neutron "Mechanism driver can't be notified with updated network" [High,In progress] https://launchpad.net/bugs/1649503 - Assigned to Hong Hui Xiao (xiaohhui) 15:13:26 #link https://bugs.launchpad.net/neutron/+bug/1627749 better error handling 15:13:26 Launchpad bug 1627749 in neutron "qos driver api can have better error handling" [Medium,Confirmed] - Assigned to Miguel Angel Ajo (mangelajo) 15:13:38 there was some work around that on the qos driver refactor, but not sure if enough 15:13:41 * ajo looks for the link 15:14:10 https://review.openstack.org/#/c/396651/29/neutron/services/qos/drivers/manager.py@69 15:14:15 may be this is not enough to close that bug 15:14:44 shall we probably, let all drivers be called, and then raise the exception after rpc push happens ? 15:15:02 thoughts? 15:15:22 I don't think so 15:15:28 we don't have yamamoto here 15:15:31 I prefer the way is now implemented 15:15:53 ralonsoh the problem with current implementation, is that if one driver fails, the others aren't called at all 15:15:58 may be some are called, some are not 15:15:59 I know 15:16:01 depending on the order 15:16:04 but no driver should fail 15:16:07 it's very undeterministic 15:16:16 that is true 15:16:29 if a driver fails it's it's responsibility to retry later, and resync... 15:16:30 but 15:16:40 a bad implementation of one, should not affect the others, or the rpc 15:16:56 it's not bad if we are more robust in the face of external errors 15:17:20 unless we find a good reason why that would be problematic 15:17:38 ok, that's a new patch! 15:17:55 (code being more complex can be a good reason against it, but I believe it shouldn't be very complicated *warning on perceived complexity O:)* ) 15:18:23 ralonsoh I will try to put a new patch for this, see how it looks, code goes weird I'm ok to drop it 15:18:38 ok 15:19:21 * ajo looks at https://bugs.launchpad.net/neutron/+bugs?field.tag=qos 15:19:23 #link https://bugs.launchpad.net/neutron/+bug/1657381 15:19:23 Launchpad bug 1657381 in neutron "QoS drivers need to implement a precommit for the actions" [Medium,In progress] - Assigned to Miguel Angel Ajo (mangelajo) 15:19:32 This one, after a thought, is not very clear to me that it's necessary 15:19:56 since in the end, drivers are not supposed to fail on any policy modification 15:20:04 may be what's bad for them is good for other drivers 15:20:17 and that will be handled by the enhanced validation 15:20:35 bad for them = "the specific driver can't handle it" 15:21:00 so, the fact that ODL for example writes to a log, and then sends the log, is implementation detail, they can do that in the one existing call 15:21:20 and if they fail, it's their responsibility, again, to retry to sync it later 15:21:29 ralonsoh thoughts? 15:21:45 I wish I had an ODL'r here :D 15:21:57 the point is why they need those calls? 15:22:16 they don't implement any action for them 15:22:32 ralonsoh exactly, and, even if they need them, if we're not goint to allow exceptions happening to stop the db transactions I see no benefit 15:22:32 I think that maybe we should make it working with any plugin, not only ML2 which not uses such precommit 15:22:34 buy I'll review ODL code 15:22:47 so maybe it's worth to do it if some plugins needs it 15:23:01 slaweq_ if there's good justification, right 15:23:21 but they still failed to justify the need 15:23:27 or may be they did, but I didn't got it 15:24:28 #link https://bugs.launchpad.net/neutron/+bug/1639186 15:24:28 Launchpad bug 1639186 in neutron "qos max bandwidth rules not working for neutron trunk ports" [Low,Confirmed] - Assigned to Luis Tomas Bolivar (ltomasbo) 15:24:30 ltomasbo ^ 15:24:53 any advance on this? did you look at russellb comments here https://bugs.launchpad.net/neutron/+bug/1639186/comments/11 ? 15:25:24 I'm pinging him 15:25:45 ralonsoh russellb says they use queues, and they steer the traffic using OF rules 15:25:47 hi ajo 15:25:55 hi ltomasbo :) 15:25:59 no, I did not take any further action on that 15:26:20 ralonsoh : so may be that will be doable when we have min bw and queues in OVS. ? 15:26:41 ltomasbo may be I should detach you from the bug for now for just in case anyone else wants to step up for it? 15:26:49 but we still don't have min in OVS 15:27:08 ajo: sure! please do 15:27:20 ralonsoh correct, but, it would be similar to what you did in LB, and then we could get rid of this bug with trunk ports 15:27:52 ajo: I tried to do this, but we don't have the IFB like in Linux Bridge 15:28:13 ajo: I tried to figure out how to implement this, without any luck... 15:28:21 ralonsoh yes, we'd need to rely on queues, etc... all the thing we talked about, *not easy* 15:28:30 ok 15:28:58 let's revisit it on pike and try to get it going. Will you be around on the PTG? 15:29:10 yes, we can talk about his in the PTG 15:29:23 s/his/this 15:30:14 ralonsoh we also have the one related to router GW ports 15:30:20 I can't find it now 15:30:51 https://review.openstack.org/#/c/425218/ 15:30:53 this one? 15:31:02 #link https://review.openstack.org/#/c/425218/ 15:31:27 oh, I must review it again 15:31:27 yes, I know. kevinbenton told us to implement it 15:32:00 and https://review.openstack.org/#/c/425280/ 15:32:40 #link https://review.openstack.org/#/c/425280/ 15:33:21 please folks review those two ^ 15:33:22 thanks ralonsoh !! 15:33:23 any bug I'm missing? 15:34:01 3, 15:34:09 2, 15:34:17 1, 15:34:28 #topic RFEs 15:34:53 So, after the validation, for pike, we have in pipeline: 15:34:54 * instance ingress bw limiting, 15:35:11 yes, it waits for improved validation :) 15:35:22 * strict minimum bw (integration with nova placement api) 15:35:31 I believe bits are in place for those two 15:35:47 cool! 15:36:01 we can think of VLAN 802.1p if somebody is willing to take it, it shall be easy 15:36:29 ajo : link ? 15:37:10 do you mean ECN? 15:37:21 #link https://bugs.launchpad.net/neutron/+bug/1560961. instance-ingress 15:37:21 Launchpad bug 1560961 in neutron "[RFE] Allow instance-ingress bandwidth limiting" [Wishlist,In progress] - Assigned to Slawek Kaplonski (slaweq) 15:37:22 #link https://bugs.launchpad.net/neutron/+bug/1578989 strict minimum bw 15:37:23 neverming, got it : https://bugs.launchpad.net/neutron/+bug/1505631 15:37:23 Launchpad bug 1578989 in neutron "[RFE] Strict minimum bandwidth support (egress)" [Wishlist,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:37:24 Launchpad bug 1505631 in neutron "[RFE] QoS VLAN 802.1p Support" [Wishlist,Confirmed] - Assigned to Kannan Raman (kannanrc20) 15:37:33 #link https://bugs.launchpad.net/neutron/+bug/1505631 15:37:34 ralonsoh: There is another for mapping dscp to v-lan pcp 15:37:48 yes, we need to talk about that one 15:38:00 but I wanted to talk about reedip's ECN proposal again 15:38:09 he updated the etherpad and I couldn't look at it until today 15:38:17 #link https://etherpad.openstack.org/p/QoS_ECN 15:38:52 reedip_ : most details make sense, but we can't control the VM internal settings 15:39:04 ajo : hmm, okay 15:39:20 reedip_ it will be OS dependant, etc... and it's no business of neutron or nova to tweak anything inside the VMs 15:39:21 we can provide documentation, or heat templates for that 15:39:25 in linux I believe it's on by default 15:39:27 may be I'm wrong 15:39:54 ajo : in linux. no , ECN has to be enabled 15:40:14 aha, so we may want to provide documentation to let people do that 15:40:17 I think yes, we can provide heat templates for that / documentation if required 15:40:19 I thought It was on by default 15:40:27 so 15:40:46 the proposal, has an API that is a bit out of how we handle everything else in QoS plugin now 15:40:57 It'd be great if we can find a way to fit it in the current model 15:41:20 ajo : yes, I saw that comment, would try to integrate it , but need to look more into it . 15:41:21 may be, if the change to routers is not invasive, 15:41:22 (or changes performance) 15:41:23 we could just implement that in the l3 agent 15:41:24 and have it enabled by default 15:41:48 So if the ECN bit is set and its not enabled, the packet isn't echo'ed with the receive bit set right?? 15:42:46 I believe that what reedip_ proposes is that our routers will set the flag on the outgoing packet, that then will be echoed by the receiving machine back to the sender 15:42:50 davidsha : you mean if one router has detected COngestion, but one of the End points ( a VM for example ) doesnt have ECN enabled, then YES, the ACK packet back to the sender wont have the ECN receievd bit set 15:43:21 VM1 sends packet 15:43:22 packet crosses router (congested one) 15:43:23 reedip_: ack, thanks! 15:43:23 router sets flag on packet 15:43:24 packet arrives VM2 15:43:35 VM2 sends the echo bit on next TCP frame to sender 15:43:43 so 15:43:48 missing details for this proposal are: 15:43:53 1) how to do it in the neutron routers 15:44:02 2) ways to fit this on the current model 15:44:24 for 2, may be we don't even need it, but if we need conditional enablement of ECN on the routers, we could, for example: 15:44:43 let admins set policies with an ECN rule in the external and internal networks of the router 15:44:55 and when at least one internal network, and the external network provide ECN, we ECN-enable the router 15:45:02 ajo : ack 15:45:20 this would require integration of QoS in l3, but now l3 has a framework to add extensions, it should be doable 15:45:21 thanks njohnston !! ;) 15:45:33 * njohnston bows 15:45:44 thanks njohnston :D 15:46:01 reedip_ I don't know about 1, if you show me a viable POC of how to do it in the low level (qrouter), then we can formalize this RFE properly 15:46:24 ajo : yep, this is an Action Item for me before the next meeting 15:46:47 #action reedip ECN RFE refinement :) 15:47:20 davidsha did you want to talk about the DSCP/VLAN mapping ? 15:47:31 may be we should cover first the basic VLAN rules 15:47:49 I asked the submitter for more details and never answered 15:47:50 ajo : meanwhile , I think I can voluteer on the 802.1p if its available 15:47:51 ajo: kk, it wan't my rfe I'd just read it previously ;) 15:48:08 wasn't* 15:49:23 reedip_ https://bugs.launchpad.net/neutron/+bug/1505631 It seems like you could revive it in Pike, it's postponed 15:49:23 Launchpad bug 1505631 in neutron "[RFE] QoS VLAN 802.1p Support" [Wishlist,Confirmed] - Assigned to Kannan Raman (kannanrc20) 15:49:30 which means, it was approved, but we were waiting on other stuff, or we had no hands :) 15:49:42 Just for clarity on this going forward, this would mean we'd have the dscp marking rule, a pcp marking rule and a "traffic class" marking rule correct? 15:50:22 Yes, I will take it up ajo for Pike 15:50:30 davidsha what was pcp? %) 15:51:13 ajo: the vlan version of dscp it stands for Priority Code Point 15:51:20 ahh, 15:51:28 that would be then 2 ? 15:51:31 1) DSCP 15:51:40 2) VLAN 15:52:08 There was another RFE for mapping over DSCP and PCP if I recall 15:52:44 so Pcp is a 0-7 values and dscp is mapped to 0-7 traffic classes. 15:53:23 davidsha, let's look at that when we have vlan, but ye 15:53:41 at that point, DSCP marking rules would be incompatible with mapping rules 15:53:58 or am I getting it wrong? 15:53:59 do you mean 15:54:07 mapping the DSCP flags when going over vlan? or what ? 15:54:18 that RFE we mentioned before? 15:54:27 ajo: kk, I'm not sure who proposed it but I recall reading the RFE. 15:55:21 davidsha link, or bring it to next meeting :) 15:55:27 ajo: It was a while since I read it so I'm not entirely sure. 15:55:34 anyway, unless of it being of special interest to you, or anyone willing to code it, I suspect we have enough in our plate for Pike already :) 15:55:50 ack 15:56:06 we have the openflow pipeline in the queue too, which could be cool to fix for Pike too :) 15:56:08 one more: https://bugs.launchpad.net/neutron/+bug/1639220 15:56:08 Launchpad bug 1639220 in neutron "[RFE] Default action for RBAC" [Undecided,New] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:56:12 this one is very short 15:56:13 oh right 15:56:13 +1 15:56:25 just waiting your reply 15:57:32 * ajo reads 15:58:00 yes, having an api to set a per-tenant default could make sense 15:58:08 a lot of sense actually 15:58:47 I'm trying to add tags, but it ignores me 15:58:56 ralonsoh can you add #qos and #rfe tags? 15:59:20 ralonsoh we would need that triaged by the drivers meeting 15:59:28 but, makes sense to me, I'm going to comment 15:59:32 and close the meeting, we have 30 secs 15:59:32 ajo: done 15:59:54 #endmeeting