14:01:51 <ajo_> #startmeeting neutron_qos 14:01:52 <openstack> Meeting started Wed Oct 5 14:01:51 2016 UTC and is due to finish in 60 minutes. The chair is ajo_. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:56 <openstack> The meeting name has been set to 'neutron_qos' 14:02:06 <ajo_> let's leave some time for people to join 14:02:24 <njohnston> Hi! I am lurking today. 14:02:36 <ajo_> #chair njohnston 14:02:37 <openstack> Current chairs: ajo_ njohnston 14:02:43 <ajo_> ack njohnston :) 14:03:19 <ajo_> I see a few missing people, please say hi, whoever is around for this meeting :-) 14:03:28 <rnoriega> hi 14:03:33 <ajo_> hi rnoriega ;) 14:03:45 <ltomasbo> hi 14:03:50 <ajo_> hi ltomasbo !! :) 14:04:52 <ajo_> I heard that some people from Ericsson R&D would be joined because they're interested in contributing ;) 14:05:59 <ajo_> I miss slaweq, ralonso, and others hmmm 14:06:20 <ajo_> I wonder if it's worth having the meeting or if a email status update will be enough 14:06:36 <ajo_> rnoriega, ltomasbo thoughts? , shall we proceed, email, or wait? 14:07:00 * ajo_ got everyone bored of QoS :P 14:07:07 * njohnston is still here! 14:07:17 <ajo_> hi njohnston :=) 14:07:27 <ajo_> this was more or less the agenda I had ready: 14:07:28 <ajo_> #link https://etherpad.openstack.org/p/qos-meeting 14:09:52 <ajo_> ok, 10 minutes is enough, I'm going to summarize the status, 14:09:57 <ajo_> and wrap up this quickly 14:10:01 <rnoriega> ajo_, good! 14:10:09 <ajo_> so 14:10:11 <davidsha> hey, sorry for being late! 14:10:15 <ajo_> #topic RFEs-approved 14:10:26 <ajo_> hi davidsha :) 14:10:45 <ajo_> #links https://bugs.launchpad.net/neutron/+bugs?field.tag=qos+rfe-approved+&field.tags_combinator=ALL 14:10:50 <ajo_> one is missing on that link, let me fix it 14:11:20 <rnoriega> so we have three RFEs that have been approved, right? 14:12:33 <rnoriega> ajo_, it's fixed...thanks 14:12:44 <ajo_> Yeah, I've toggled strict min bw to rfe-approved (from rfe-postponed) 14:13:03 <ajo_> but I'm unsure I have the right to do that, I have commented it, and said that we should start by spec there. 14:13:29 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1586056 (extended validation) 14:13:31 <openstack> Launchpad bug 1586056 in neutron "[RFE] Improved validation mechanism for QoS rules with port types" [Wishlist,In progress] - Assigned to Slawek Kaplonski (slaweq) 14:13:40 <ajo_> that one is about fixing some technical debt we have, 14:13:56 <ajo_> to make sure any changes to policies are validated with the plugin or mechanism drivers, 14:14:10 <ajo_> or changes to ports or networks (attached policy changes) 14:14:31 <ajo_> since dataplane capabilities for QoS are quite heterogeneous 14:14:46 <ajo_> that depends on more tech-debt 14:15:04 <ajo_> #link https://review.openstack.org/#/c/351858/ (qos notification-driver to "driver") 14:15:12 <rnoriega> ok 14:15:31 <ajo_> Some refactor over the current "qos driver" to make it more consistent and be able to do the improved validation properly 14:16:13 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1560961 (instance ingress bw limit) 14:16:15 <openstack> Launchpad bug 1560961 in neutron "[RFE] Allow instance-ingress bandwidth limiting" [Wishlist,In progress] - Assigned to Slawek Kaplonski (slaweq) 14:16:25 <ajo_> this one depends on the two above ;) 14:16:30 <ajo_> this is like a train, sorry ;) 14:17:17 <ajo_> it does not physically depend on the others, but it's a constraint neutron-drivers has imposed on it, to make sure we fix the technical debt first, 14:17:21 <ajo_> which, makes sense IMHO 14:17:52 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1560963 (minimum bw egress -non-strict-) 14:17:54 <openstack> Launchpad bug 1560963 in neutron "[RFE] Minimum bandwidth support (egress)" [Wishlist,Fix released] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 14:18:09 <ajo_> that one is being worked out by Rodolfo Alonso 14:18:23 <ajo_> partly merged for newton (SR-IOV support) 14:18:42 <ajo_> and OVS & LB efforts are being developed to be merged in Ocata 14:18:47 <ajo_> let me look for the links, 14:18:52 <ajo_> I know ltomasbo is interested in this :) 14:18:58 <rnoriega> ajo_, this last one is not strict bw support 14:19:01 <ltomasbo> yep, I was about to ask.. 14:19:06 <rnoriega> right? 14:19:11 <ajo_> rnoriega, non-strict, right 14:19:34 <rnoriega> ok 14:20:04 <ltomasbo> non-stric in the sense that it is not "nova-aware"? 14:20:18 <ajo_> correct 14:20:22 <ajo_> non scheduling aware 14:20:34 <ltomasbo> ok 14:20:36 <ajo_> #link https://review.openstack.org/318531 QoS minimum egress for ovs-agent 14:20:54 <ajo_> #link https://review.openstack.org/357865 QoS minimum egress for LinuxBridge 14:21:17 <ajo_> buy the great Rodolfo Alonso Hernandez!, we should get him on some hall of fame (/me blinks eye on njohnston ) 14:21:37 <ajo_> buy -> by :) 14:21:55 <rnoriega> ajo_, lol! I wanted to buy a Rodolfo for me as well 14:22:06 <ajo_> he's awesome 14:22:16 <njohnston> :-) 14:22:25 <ajo_> and now, the hot topic 14:22:29 <davidsha> +1 14:22:49 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1578989 minimum bandwidth egress strict (scheduling aware) 14:22:50 <openstack> Launchpad bug 1578989 in neutron "[RFE] Strict minimum bandwidth support (egress)" [Wishlist,Confirmed] 14:22:56 <ajo_> this feature builds on the other 14:23:22 <ajo_> and it's necessary to make sure no hypervisor is scheduled more "minimum bandwidth" than the specific nics can handle in total 14:23:48 <ltomasbo> so, it will be enforce at host level, right? 14:24:11 <ajo_> so, effectively enforcing SUM(port[i].min_bw) for port in hypervisor <= hypervisor.bw 14:24:13 <ajo_> more or less 14:24:18 <ltomasbo> if the bottleneck is at another point in the network, it will not be enforced, right 14:24:34 <ajo_> that's a simplified equation, as we need to consider the network over which the packets travel, and the direction too 14:24:49 <rnoriega> ajo_, sorry for my ignorant question. But is there anything that could be done in parallel to accelerate the development of the strict min bw support? 14:24:58 <rnoriega> like writing a spec 14:25:12 <rnoriega> helping with some patches on the previous work 14:25:14 <ajo_> rnoriega, yes, let me get into that :) 14:25:21 <rnoriega> ajo_, ok, thanks! 14:25:27 <ajo_> I will summarize now the steps to get there 14:25:33 <ajo_> ltomasbo, you're right 14:25:53 <ajo_> ltomasbo, we have no capability now to see how the network architecture physically is 14:26:03 <ajo_> I wonder if we could model that on a later step 14:26:13 <ajo_> and let the scheduler consume from there 14:26:21 <ajo_> for example, we have the switches dataplane capacity, etc... 14:26:25 <ajo_> interconnections between switches 14:26:31 <ajo_> that's a harder stone to chew 14:26:47 <ltomasbo> ajo_, any plans to tackle that? Perhaps in a VNF chain (as a first step) 14:26:54 <ajo_> IP connectivity /& routes for tunnels 14:27:22 <ajo_> ltomasbo, thinking of that problem causes me heartburn literally ;D 14:27:28 <ltomasbo> :D 14:27:44 <ajo_> ltomasbo, I'd be very happy if anybody wants to look at it 14:27:54 <ajo_> but we should build the foundations first :) 14:28:09 <ltomasbo> could be related to the EU project, but it will depend on the previous patches too 14:28:47 <ajo_> ltomasbo, is on a fancy NFV related project ;) 14:28:53 <ajo_> sorry, no comma :) 14:29:01 <ajo_> so 14:29:24 <ajo_> steps to get this RFE done (eventually) 14:29:44 <ajo_> I'd say it's an ocata & beyond effort, specially since Ocata cycle is shorter 14:29:55 <ajo_> and since we have dependencies on some extra work for nova 14:30:02 <ajo_> so 14:30:07 <ajo_> 0) Writing a neutron spec including all the next steps in detail 14:30:39 <ajo_> I plan on tackling that, but I'm super happy if anybody wants to step in to write the barebones, and have me as Co-Author 14:31:01 <ajo_> since I must first kill the technical debt we have or drivers will kill me :) 14:31:35 <ajo_> such spec could contain some of the next steps: 14:31:39 <ajo_> 1) Neutron collecting physnet/tunneling available bandwidth on every compute, with the option to override via config. 14:31:52 <ajo_> 2) Neutron reporting such available bandwidth to the new nova placement API inventories in the form of NIC_BW_<physnet>_{ingress,egress} 14:32:28 <ajo_> that'd mean that neutron tells nova "this hypervisor has 10Gb total of NIC_BW_tenant_egress , 10Gb total of NIC_BW_tenant_ingress" as an example 14:32:35 <ajo_> for every hypervisor 14:32:47 <ajo_> also for external networks attached to compute nodes, etc.. 14:33:19 <ajo_> We'd make use of the new nova placement API: #link http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/generic-resource-pools.html 14:33:28 <ajo_> but we'd be missing a key feature from nova 14:33:47 <ajo_> #link https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/custom-resource-classes 14:33:51 <ajo_> which is being worked out 14:34:01 <ajo_> Jaypipes++ 14:34:36 <ajo_> that is 14:34:37 <ajo_> 3) Nova accepting custom resource types (NIC_BW_.....) now compute nodes report via this http api the CPU, DISK, RAM. 14:34:57 <ajo_> 4) Changes in how Nova handles ports, by creating/fetching them before doing any scheduling. (this seems to be planned for Cellsv2) 14:35:15 <ajo_> that way, nova can know the requirements of the port (in form of bandwidth) before trying to schedule the instance 14:35:30 <ltomasbo> ajo_, why neutron needs to tell nova about static information (total NIC_BW)? 14:35:40 <ltomasbo> or with total you meant total_in_use? 14:35:41 <ajo_> ltomasbo, the alternative would be 14:35:49 <ajo_> nova find the qos_policy_id on port, as it's now 14:36:00 <ajo_> but then has to fetch the policy, and rules, and make an interpretation of such rules 14:36:12 <ajo_> which could be complex as they grow 14:36:53 <ajo_> so after discussing with the nova team, it seems that the more reasonable option was to give a chewed output for nova (when nova creates, or fetches a port) 14:37:10 <ltomasbo> so, if there is no QoS policy, there will be no information about the bandwidth of the hosts, right? 14:37:45 <ajo_> ltomasbo, I'm talking about the ports 14:37:56 <ajo_> when nova does a GET or POST of a port to neutron 14:38:14 <ajo_> we'd provide the breakdowns of NIC_BW per net/direction 14:38:24 <ajo_> and that'd be empty if there is no qos policy attached to the port 14:38:57 <ltomasbo> ahh, ok, I thought you were talking about host_bw 14:39:32 <ajo_> ah, no no :), that's on step 2, when neutron reports to nova placement API, 14:39:54 <ajo_> such reporting is static (not changing as we add or remove ports, it's the total available) 14:40:10 <ajo_> nova will be responsible for counting down/up available traffic for this guarantee 14:40:26 <ajo_> and that's all 14:40:26 <ltomasbo> ok, now it is clear! 14:41:07 <ajo_> rnoriega, any question about all this? :) 14:41:32 <rnoriega> ajo_, not really! it was a very clear explanation of the current state... 14:41:35 <rnoriega> ajo_, thanks!!! 14:41:49 <rnoriega> ajo_++ 14:41:51 <ajo_> ltomasbo, rnoriega , anyone, you're welcome to throw in an initial spec for this based on the steps :) 14:42:23 <rnoriega> ajo_, cool! let's see what we can do! 14:42:32 <ltomasbo> ok 14:43:23 <ajo_> #topic other RFEs 14:43:24 <ajo_> https://bugs.launchpad.net/neutron/+bugs?field.tag=qos+rfe+&field.tags_combinator=ALL 14:43:36 <ajo_> we had this one too: 14:43:37 <ajo_> https://bugs.launchpad.net/neutron/+bug/1614728 14:43:39 <openstack> Launchpad bug 1614728 in neutron "REF: qos: rule list in policy is too difficult to use" [Undecided,Won't fix] 14:44:10 <ajo_> which I have moved to won't fix by now, since we can't make changes to the API, unless we get microversioning (one day) 14:44:41 <ajo_> I'm moving this other one to Won't fix for the same reason: 14:44:41 <ajo_> https://bugs.launchpad.net/neutron/+bug/1580149 14:44:43 <openstack> Launchpad bug 1580149 in neutron "[RFE] Rename API options related to QoS bandwidth limit rule" [Wishlist,Incomplete] - Assigned to Slawek Kaplonski (slaweq) 14:46:02 <ajo_> and, there's ECN which needs more maturing 14:46:23 <ajo_> I've heard non formal requests to provide pps limits, right rnoriega ? 14:46:44 <rnoriega> ajo_, yep 14:47:07 <ajo_> and also requests about, providing warnings on bandwidth usage 14:47:16 <rnoriega> ajo_, right too 14:47:21 <ajo_> but that's more likely to fit in https://bugs.launchpad.net/neutron/+bug/1592918 14:47:22 <openstack> Launchpad bug 1592918 in neutron "[RFE] Adding Port Statistics to Neutron Metering Agent" [Wishlist,In progress] - Assigned to Sana Khan (sana.khan) 14:47:36 <ajo_> we can look at it when we don't have so much on our shoulders 14:47:59 <rnoriega> ajo_, agreed. Thanks 14:48:08 <ajo_> #topic bugs 14:48:16 <ajo_> #link http://bit.ly/1WhXlzm BUGS 14:48:23 <ajo_> let's see if we have anything there to be tackled 14:48:48 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1627749 qos driver api can have better error handling 14:48:48 <openstack> Launchpad bug 1627749 in neutron "qos driver api can have better error handling" [Medium,Confirmed] 14:48:55 <ajo_> I agree, this one has been filled by yamamoto 14:49:09 <ajo_> basically, we could have other backends, like midonet in this case 14:49:26 <ajo_> which could fail when we ask to modify a policy 14:49:34 <ajo_> and we need to handle that properly 14:49:39 <ajo_> now the opreation is just stopped 14:51:23 <ajo_> #link https://bugs.launchpad.net/python-neutronclient/+bug/1587291 14:51:26 <openstack> Launchpad bug 1587291 in python-neutronclient "Specifying '-F' or '--field' parameter in the qos related commands, returns abnormal result" [Low,In progress] - Assigned to Yan Songming (songmingyan) 14:51:28 <ajo_> this one is in progress, probably needs reviews 14:51:45 <ajo_> #link https://review.openstack.org/#/c/326902/ 14:51:48 <ajo_> ohh, it's merged 14:53:08 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1625570 testing DSCP in fullstack via packet inspection 14:53:09 <openstack> Launchpad bug 1625570 in neutron "fullstack : should add test of ensure traffic is using DSCP marks outbound " [Wishlist,New] 14:53:27 <ajo_> that one is wishlist, but important too, because now we only make sure rules are set properly, etc, 14:53:36 <ajo_> but we never check the outgoing traffic for DSCP marks 14:53:40 <njohnston> so in that case we're really testing ovs flows 14:53:42 <ajo_> njohnston, ^:) 14:53:56 <njohnston> not testing the neutron code 14:53:58 <ajo_> yes, we have some proposals to use pcap / tcpdump to check the real packets coming out the VM 14:54:12 <ajo_> yes, but since those are our flows, we may want to test them 14:54:16 <ajo_> they can become broken 14:54:32 <ajo_> (incompatibilities to ovs-fw, or the new L2 flow pipeline (eventually)) 14:54:40 <ajo_> the intent is... avoiding the feature from being silently broken 14:54:50 <njohnston> Understood. Is there any other packet capture code in fullstack? 14:55:04 <ajo_> njohnston, we have some experiments with tcpdump 14:55:15 <ajo_> and I discussed some options with jlibosva but not sure what's the status 14:55:19 <njohnston> Sounds fun! And it frightens me to my core. 14:55:29 <ajo_> nah, it shall just be fun 14:55:33 <ajo_> 1) setup rules 14:55:49 <ajo_> 2) ssh to the VM to send packets to tempest controller 14:55:52 <ajo_> sorry 14:55:56 <ajo_> not even that, no VM :) 14:56:06 <ajo_> just send packets from the first port to the 2nd 14:56:10 <ajo_> (this is fullstack, no tempest 14:56:20 <ajo_> 3) capture packets on the other side with a filter for the dscp flag 14:56:24 <ajo_> no packets -> rules don't work 14:56:28 <ajo_> packets -> rule works 14:57:47 <ajo_> and 14:58:04 <ajo_> I guess we can wrap up the meeting for today, .... wasn't it going to be short? ':] 14:58:11 <njohnston> :-D 14:58:21 <ltomasbo> :D 14:58:38 <ajo_> njohnston, ltomasbo , davidsha , njohnston , see u around 14:58:44 <njohnston> thanks ajo_ 14:58:46 <davidsha> Thanks! see ya! 14:58:54 <ajo_> thank you sirs! 14:58:56 <ltomasbo> Thanks! See you! 14:59:00 <ajo_> #endmeeting