15:02:22 <ajo> #startmeeting neutron_qos 15:02:22 <davidsha> hi 15:02:22 <openstack> Meeting started Tue Jan 31 15:02:22 2017 UTC and is due to finish in 60 minutes. The chair is ajo. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:23 <ajo> Hello :) 15:02:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:25 <openstack> The meeting name has been set to 'neutron_qos' 15:02:39 <ajo> #topic Announcements 15:02:45 <njohnston> :q 15:02:49 <ajo> Qos driver is finally in shape and refactor merged 15:02:53 <ralonsoh> great!!! 15:03:04 <slaweq_> \o/ 15:03:08 <davidsha> hi njohnston 15:03:21 <ajo> #link https://review.openstack.org/#/c/396651/ 15:03:22 <ajo> sorry for being sooooo slow :/ 15:03:30 <ajo> We shall ping the relevant people on midokura (yamamoto), ovn (russelb), 15:03:30 <ajo> nsx (garyk) to see if they need any help migrating to the new 15:03:30 <ajo> model, the old notification_driver support will be removed in Pike 15:03:53 <ajo> #action ajo ping all qos-driver implementers to update their drivers in pike, change shall be easy, 15:04:30 <ajo> Now slaweq_ is working on the 2nd thing: Enhanced validation 15:04:30 <ajo> #link https://review.openstack.org/#/c/426946/ 15:04:35 <ajo> for Pike 15:05:03 <slaweq_> I will do my best to finish it asap 15:05:20 <ralonsoh> I'll review this patch tomorrow and I'll follow it 15:05:21 <ajo> The structural changes made him reboot his original patch 15:05:25 <ajo> But I hope the new interface will help 15:05:25 <slaweq_> now it should be easier than it was before Your refactor 15:05:49 <ajo> yes, before it was becoming a bit too complicated, let's see how it will look now 15:06:00 <slaweq_> as I was looking into it yesterday it should be easier but we will see :) 15:06:37 <ajo> My colleagues always fear me when I say it will be easy, it never is :) right dalvarez ? :P 15:07:02 <slaweq_> I know, that's why I wrote "it should be" :P 15:07:02 <ajo> so... I have another point for documentation 15:07:20 <ajo> 'XD 15:07:21 <ajo> #topic Documentation 15:07:26 <ajo> Networking guide needs to be updated about minimum egress bandwidth 15:07:26 <ajo> support. 15:07:27 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1618769 15:07:28 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1618762 15:07:28 <openstack> Launchpad bug 1618769 in openstack-manuals " SR-IOV: add agent QoS driver to support egress minimum bandwidth" [Low,Confirmed] - Assigned to Miguel Angel Ajo (mangelajo) 15:07:30 <openstack> Launchpad bug 1618762 in openstack-manuals " Add QoS minimum bandwidth rule for instance egress traffic" [Low,Confirmed] - Assigned to Miguel Angel Ajo (mangelajo) 15:07:43 <ajo> I planned to work on that 15:07:47 <ajo> it shall be easy 15:08:00 <ajo> but we must have it ready for the release, this was on Newton I think, but we forgot ':D 15:08:23 <ralonsoh> I can take one of this 15:08:25 <dalvarez> ajo, lol im scared now 15:08:28 <ralonsoh> one of these 15:08:33 <ajo> X) dalvarez 15:08:56 <ajo> ralonsoh it's the same one I think, not sure if we need to do anything special for SR-IOV 15:09:05 <ralonsoh> No, nothing special 15:09:21 <ajo> one was generated for the "api changes" and the other one for docimpact on the SR-IOV 15:09:27 <ralonsoh> ahhh I see 15:09:28 <ajo> so one commit shall close it all, may by I can write, and I can ping you for reviews ralonsoh ? 15:09:34 <ralonsoh> yes 15:09:38 <ajo> may by->may be 15:09:39 <ajo> :) 15:09:49 <reedip_> o/ 15:10:00 <ajo> hi reedip_ I've got a point for you later ;) 15:10:15 <reedip_> :D ok ajo 15:10:21 <ajo> #topic Bugs 15:10:32 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1649503 15:10:32 <openstack> Launchpad bug 1649503 in neutron "Mechanism driver can't be notified with updated network" [High,In progress] - Assigned to Hong Hui Xiao (xiaohhui) 15:10:32 <ajo> We need kevinbenton's help on this one 15:10:47 <ajo> it's related to some recent change, now when you detach a policy from a network, that's not notified properly 15:11:24 <ajo> this looks like a high priority bug (in the context of QoS) 15:11:41 <slaweq_> but it's waiting for action for quite long time now 15:11:54 <slaweq_> maybe You can ask kevinbenton to look on it? :) 15:12:00 <ajo> yes, I will, 15:12:04 <ajo> I started by adding him to the bug 15:12:08 <ajo> I will ping him personally 15:12:48 <ajo> #action ajo ping kevinbenton about bug 1649503 15:12:48 <openstack> bug 1649503 in neutron "Mechanism driver can't be notified with updated network" [High,In progress] https://launchpad.net/bugs/1649503 - Assigned to Hong Hui Xiao (xiaohhui) 15:13:26 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1627749 better error handling 15:13:26 <openstack> Launchpad bug 1627749 in neutron "qos driver api can have better error handling" [Medium,Confirmed] - Assigned to Miguel Angel Ajo (mangelajo) 15:13:38 <ajo> there was some work around that on the qos driver refactor, but not sure if enough 15:13:41 * ajo looks for the link 15:14:10 <ajo> https://review.openstack.org/#/c/396651/29/neutron/services/qos/drivers/manager.py@69 15:14:15 <ajo> may be this is not enough to close that bug 15:14:44 <ajo> shall we probably, let all drivers be called, and then raise the exception after rpc push happens ? 15:15:02 <ajo> thoughts? 15:15:22 <ralonsoh> I don't think so 15:15:28 <ajo> we don't have yamamoto here 15:15:31 <ralonsoh> I prefer the way is now implemented 15:15:53 <ajo> ralonsoh the problem with current implementation, is that if one driver fails, the others aren't called at all 15:15:58 <ajo> may be some are called, some are not 15:15:59 <ralonsoh> I know 15:16:01 <ajo> depending on the order 15:16:04 <ralonsoh> but no driver should fail 15:16:07 <ajo> it's very undeterministic 15:16:16 <ajo> that is true 15:16:29 <ajo> if a driver fails it's it's responsibility to retry later, and resync... 15:16:30 <ajo> but 15:16:40 <ajo> a bad implementation of one, should not affect the others, or the rpc 15:16:56 <ajo> it's not bad if we are more robust in the face of external errors 15:17:20 <ajo> unless we find a good reason why that would be problematic 15:17:38 <ralonsoh> ok, that's a new patch! 15:17:55 <ajo> (code being more complex can be a good reason against it, but I believe it shouldn't be very complicated *warning on perceived complexity O:)* ) 15:18:23 <ajo> ralonsoh I will try to put a new patch for this, see how it looks, code goes weird I'm ok to drop it 15:18:38 <ralonsoh> ok 15:19:21 * ajo looks at https://bugs.launchpad.net/neutron/+bugs?field.tag=qos 15:19:23 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1657381 15:19:23 <openstack> Launchpad bug 1657381 in neutron "QoS drivers need to implement a precommit for the actions" [Medium,In progress] - Assigned to Miguel Angel Ajo (mangelajo) 15:19:32 <ajo> This one, after a thought, is not very clear to me that it's necessary 15:19:56 <ajo> since in the end, drivers are not supposed to fail on any policy modification 15:20:04 <ajo> may be what's bad for them is good for other drivers 15:20:17 <ajo> and that will be handled by the enhanced validation 15:20:35 <ajo> bad for them = "the specific driver can't handle it" 15:21:00 <ajo> so, the fact that ODL for example writes to a log, and then sends the log, is implementation detail, they can do that in the one existing call 15:21:20 <ajo> and if they fail, it's their responsibility, again, to retry to sync it later 15:21:29 <ajo> ralonsoh thoughts? 15:21:45 <ajo> I wish I had an ODL'r here :D 15:21:57 <ralonsoh> the point is why they need those calls? 15:22:16 <ralonsoh> they don't implement any action for them 15:22:32 <ajo> ralonsoh exactly, and, even if they need them, if we're not goint to allow exceptions happening to stop the db transactions I see no benefit 15:22:32 <slaweq_> I think that maybe we should make it working with any plugin, not only ML2 which not uses such precommit 15:22:34 <ralonsoh> buy I'll review ODL code 15:22:47 <slaweq_> so maybe it's worth to do it if some plugins needs it 15:23:01 <ajo> slaweq_ if there's good justification, right 15:23:21 <ajo> but they still failed to justify the need 15:23:27 <ajo> or may be they did, but I didn't got it 15:24:28 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1639186 15:24:28 <openstack> Launchpad bug 1639186 in neutron "qos max bandwidth rules not working for neutron trunk ports" [Low,Confirmed] - Assigned to Luis Tomas Bolivar (ltomasbo) 15:24:30 <ajo> ltomasbo ^ 15:24:53 <ajo> any advance on this? did you look at russellb comments here https://bugs.launchpad.net/neutron/+bug/1639186/comments/11 ? 15:25:24 <ajo> I'm pinging him 15:25:45 <ajo> ralonsoh russellb says they use queues, and they steer the traffic using OF rules 15:25:47 <ltomasbo> hi ajo 15:25:55 <ajo> hi ltomasbo :) 15:25:59 <ltomasbo> no, I did not take any further action on that 15:26:20 <ajo> ralonsoh : so may be that will be doable when we have min bw and queues in OVS. ? 15:26:41 <ajo> ltomasbo may be I should detach you from the bug for now for just in case anyone else wants to step up for it? 15:26:49 <ralonsoh> but we still don't have min in OVS 15:27:08 <ltomasbo> ajo: sure! please do 15:27:20 <ajo> ralonsoh correct, but, it would be similar to what you did in LB, and then we could get rid of this bug with trunk ports 15:27:52 <ralonsoh> ajo: I tried to do this, but we don't have the IFB like in Linux Bridge 15:28:13 <ralonsoh> ajo: I tried to figure out how to implement this, without any luck... 15:28:21 <ajo> ralonsoh yes, we'd need to rely on queues, etc... all the thing we talked about, *not easy* 15:28:30 <ralonsoh> ok 15:28:58 <ajo> let's revisit it on pike and try to get it going. Will you be around on the PTG? 15:29:10 <ralonsoh> yes, we can talk about his in the PTG 15:29:23 <ralonsoh> s/his/this 15:30:14 <ajo> ralonsoh we also have the one related to router GW ports 15:30:20 <ajo> I can't find it now 15:30:51 <ralonsoh> https://review.openstack.org/#/c/425218/ 15:30:53 <ralonsoh> this one? 15:31:02 <ajo> #link https://review.openstack.org/#/c/425218/ 15:31:27 <ajo> oh, I must review it again 15:31:27 <ralonsoh> yes, I know. kevinbenton told us to implement it 15:32:00 <ralonsoh> and https://review.openstack.org/#/c/425280/ 15:32:40 <ajo> #link https://review.openstack.org/#/c/425280/ 15:33:21 <ajo> please folks review those two ^ 15:33:22 <ajo> thanks ralonsoh !! 15:33:23 <ajo> any bug I'm missing? 15:34:01 <ajo> 3, 15:34:09 <ajo> 2, 15:34:17 <ajo> 1, 15:34:28 <ajo> #topic RFEs 15:34:53 <ajo> So, after the validation, for pike, we have in pipeline: 15:34:54 <ajo> * instance ingress bw limiting, 15:35:11 <slaweq_> yes, it waits for improved validation :) 15:35:22 <ajo> * strict minimum bw (integration with nova placement api) 15:35:31 <ajo> I believe bits are in place for those two 15:35:47 <ralonsoh> cool! 15:36:01 <ajo> we can think of VLAN 802.1p if somebody is willing to take it, it shall be easy 15:36:29 <reedip_> ajo : link ? 15:37:10 <ralonsoh> do you mean ECN? 15:37:21 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1560961. instance-ingress 15:37:21 <openstack> Launchpad bug 1560961 in neutron "[RFE] Allow instance-ingress bandwidth limiting" [Wishlist,In progress] - Assigned to Slawek Kaplonski (slaweq) 15:37:22 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1578989 strict minimum bw 15:37:23 <reedip_> neverming, got it : https://bugs.launchpad.net/neutron/+bug/1505631 15:37:23 <openstack> Launchpad bug 1578989 in neutron "[RFE] Strict minimum bandwidth support (egress)" [Wishlist,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:37:24 <openstack> Launchpad bug 1505631 in neutron "[RFE] QoS VLAN 802.1p Support" [Wishlist,Confirmed] - Assigned to Kannan Raman (kannanrc20) 15:37:33 <ajo> #link https://bugs.launchpad.net/neutron/+bug/1505631 15:37:34 <davidsha> ralonsoh: There is another for mapping dscp to v-lan pcp 15:37:48 <ajo> yes, we need to talk about that one 15:38:00 <ajo> but I wanted to talk about reedip's ECN proposal again 15:38:09 <ajo> he updated the etherpad and I couldn't look at it until today 15:38:17 <ajo> #link https://etherpad.openstack.org/p/QoS_ECN 15:38:52 <ajo> reedip_ : most details make sense, but we can't control the VM internal settings 15:39:04 <reedip_> ajo : hmm, okay 15:39:20 <ajo> reedip_ it will be OS dependant, etc... and it's no business of neutron or nova to tweak anything inside the VMs 15:39:21 <ajo> we can provide documentation, or heat templates for that 15:39:25 <ajo> in linux I believe it's on by default 15:39:27 <ajo> may be I'm wrong 15:39:54 <reedip_> ajo : in linux. no , ECN has to be enabled 15:40:14 <ajo> aha, so we may want to provide documentation to let people do that 15:40:17 <reedip_> I think yes, we can provide heat templates for that / documentation if required 15:40:19 <ajo> I thought It was on by default 15:40:27 <ajo> so 15:40:46 <ajo> the proposal, has an API that is a bit out of how we handle everything else in QoS plugin now 15:40:57 <ajo> It'd be great if we can find a way to fit it in the current model 15:41:20 <reedip_> ajo : yes, I saw that comment, would try to integrate it , but need to look more into it . 15:41:21 <ajo> may be, if the change to routers is not invasive, 15:41:22 <ajo> (or changes performance) 15:41:23 <ajo> we could just implement that in the l3 agent 15:41:24 <ajo> and have it enabled by default 15:41:48 <davidsha> So if the ECN bit is set and its not enabled, the packet isn't echo'ed with the receive bit set right?? 15:42:46 <ajo> I believe that what reedip_ proposes is that our routers will set the flag on the outgoing packet, that then will be echoed by the receiving machine back to the sender 15:42:50 <reedip_> davidsha : you mean if one router has detected COngestion, but one of the End points ( a VM for example ) doesnt have ECN enabled, then YES, the ACK packet back to the sender wont have the ECN receievd bit set 15:43:21 <ajo> VM1 sends packet 15:43:22 <ajo> packet crosses router (congested one) 15:43:23 <davidsha> reedip_: ack, thanks! 15:43:23 <ajo> router sets flag on packet 15:43:24 <ajo> packet arrives VM2 15:43:35 <ajo> VM2 sends the echo bit on next TCP frame to sender 15:43:43 <ajo> so 15:43:48 <ajo> missing details for this proposal are: 15:43:53 <ajo> 1) how to do it in the neutron routers 15:44:02 <ajo> 2) ways to fit this on the current model 15:44:24 <ajo> for 2, may be we don't even need it, but if we need conditional enablement of ECN on the routers, we could, for example: 15:44:43 <ajo> let admins set policies with an ECN rule in the external and internal networks of the router 15:44:55 <ajo> and when at least one internal network, and the external network provide ECN, we ECN-enable the router 15:45:02 <reedip_> ajo : ack 15:45:20 <ajo> this would require integration of QoS in l3, but now l3 has a framework to add extensions, it should be doable 15:45:21 <ajo> thanks njohnston !! ;) 15:45:33 * njohnston bows 15:45:44 <reedip_> thanks njohnston :D 15:46:01 <ajo> reedip_ I don't know about 1, if you show me a viable POC of how to do it in the low level (qrouter), then we can formalize this RFE properly 15:46:24 <reedip_> ajo : yep, this is an Action Item for me before the next meeting 15:46:47 <ajo> #action reedip ECN RFE refinement :) 15:47:20 <ajo> davidsha did you want to talk about the DSCP/VLAN mapping ? 15:47:31 <ajo> may be we should cover first the basic VLAN rules 15:47:49 <ajo> I asked the submitter for more details and never answered 15:47:50 <reedip_> ajo : meanwhile , I think I can voluteer on the 802.1p if its available 15:47:51 <davidsha> ajo: kk, it wan't my rfe I'd just read it previously ;) 15:48:08 <davidsha> wasn't* 15:49:23 <ajo> reedip_ https://bugs.launchpad.net/neutron/+bug/1505631 It seems like you could revive it in Pike, it's postponed 15:49:23 <openstack> Launchpad bug 1505631 in neutron "[RFE] QoS VLAN 802.1p Support" [Wishlist,Confirmed] - Assigned to Kannan Raman (kannanrc20) 15:49:30 <ajo> which means, it was approved, but we were waiting on other stuff, or we had no hands :) 15:49:42 <davidsha> Just for clarity on this going forward, this would mean we'd have the dscp marking rule, a pcp marking rule and a "traffic class" marking rule correct? 15:50:22 <reedip_> Yes, I will take it up ajo for Pike 15:50:30 <ajo> davidsha what was pcp? %) 15:51:13 <davidsha> ajo: the vlan version of dscp it stands for Priority Code Point 15:51:20 <ajo> ahh, 15:51:28 <ajo> that would be then 2 ? 15:51:31 <ajo> 1) DSCP 15:51:40 <ajo> 2) VLAN 15:52:08 <davidsha> There was another RFE for mapping over DSCP and PCP if I recall 15:52:44 <davidsha> so Pcp is a 0-7 values and dscp is mapped to 0-7 traffic classes. 15:53:23 <ajo> davidsha, let's look at that when we have vlan, but ye 15:53:41 <ajo> at that point, DSCP marking rules would be incompatible with mapping rules 15:53:58 <ajo> or am I getting it wrong? 15:53:59 <ajo> do you mean 15:54:07 <ajo> mapping the DSCP flags when going over vlan? or what ? 15:54:18 <ajo> that RFE we mentioned before? 15:54:27 <davidsha> ajo: kk, I'm not sure who proposed it but I recall reading the RFE. 15:55:21 <ajo> davidsha link, or bring it to next meeting :) 15:55:27 <davidsha> ajo: It was a while since I read it so I'm not entirely sure. 15:55:34 <ajo> anyway, unless of it being of special interest to you, or anyone willing to code it, I suspect we have enough in our plate for Pike already :) 15:55:50 <davidsha> ack 15:56:06 <ajo> we have the openflow pipeline in the queue too, which could be cool to fix for Pike too :) 15:56:08 <ralonsoh> one more: https://bugs.launchpad.net/neutron/+bug/1639220 15:56:08 <openstack> Launchpad bug 1639220 in neutron "[RFE] Default action for RBAC" [Undecided,New] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:56:12 <ralonsoh> this one is very short 15:56:13 <ajo> oh right 15:56:13 <davidsha> +1 15:56:25 <ralonsoh> just waiting your reply 15:57:32 * ajo reads 15:58:00 <ajo> yes, having an api to set a per-tenant default could make sense 15:58:08 <ajo> a lot of sense actually 15:58:47 <ajo> I'm trying to add tags, but it ignores me 15:58:56 <ajo> ralonsoh can you add #qos and #rfe tags? 15:59:20 <ajo> ralonsoh we would need that triaged by the drivers meeting 15:59:28 <ajo> but, makes sense to me, I'm going to comment 15:59:32 <ajo> and close the meeting, we have 30 secs 15:59:32 <ralonsoh> ajo: done 15:59:54 <ajo> #endmeeting