14:01:51 <ajo_> #startmeeting neutron_qos
14:01:52 <openstack> Meeting started Wed Oct  5 14:01:51 2016 UTC and is due to finish in 60 minutes.  The chair is ajo_. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:56 <openstack> The meeting name has been set to 'neutron_qos'
14:02:06 <ajo_> let's leave some time for people to join
14:02:24 <njohnston> Hi!  I am lurking today.
14:02:36 <ajo_> #chair njohnston
14:02:37 <openstack> Current chairs: ajo_ njohnston
14:02:43 <ajo_> ack njohnston  :)
14:03:19 <ajo_> I see a few missing people, please say hi, whoever is around for this meeting :-)
14:03:28 <rnoriega> hi
14:03:33 <ajo_> hi rnoriega  ;)
14:03:45 <ltomasbo> hi
14:03:50 <ajo_> hi ltomasbo !! :)
14:04:52 <ajo_> I heard that some people from Ericsson R&D would be joined because they're interested in contributing ;)
14:05:59 <ajo_> I miss slaweq, ralonso, and others hmmm
14:06:20 <ajo_> I wonder if it's worth having the meeting or if a email status update will be enough
14:06:36 <ajo_> rnoriega, ltomasbo thoughts? , shall we proceed, email, or wait?
14:07:00 * ajo_ got everyone bored of QoS :P
14:07:07 * njohnston is still here!
14:07:17 <ajo_> hi njohnston  :=)
14:07:27 <ajo_> this was more or less the agenda I had ready:
14:07:28 <ajo_> #link https://etherpad.openstack.org/p/qos-meeting
14:09:52 <ajo_> ok, 10 minutes is enough, I'm going to summarize the status,
14:09:57 <ajo_> and wrap up this quickly
14:10:01 <rnoriega> ajo_, good!
14:10:09 <ajo_> so
14:10:11 <davidsha> hey, sorry for being late!
14:10:15 <ajo_> #topic RFEs-approved
14:10:26 <ajo_> hi davidsha  :)
14:10:45 <ajo_> #links https://bugs.launchpad.net/neutron/+bugs?field.tag=qos+rfe-approved+&field.tags_combinator=ALL
14:10:50 <ajo_> one is missing on that link, let me fix it
14:11:20 <rnoriega> so we have three RFEs that have been approved, right?
14:12:33 <rnoriega> ajo_, it's fixed...thanks
14:12:44 <ajo_> Yeah, I've toggled strict min bw to rfe-approved (from rfe-postponed)
14:13:03 <ajo_> but I'm unsure I have the right to do that, I have commented it, and said that we should start by spec there.
14:13:29 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1586056  (extended validation)
14:13:31 <openstack> Launchpad bug 1586056 in neutron "[RFE] Improved validation mechanism for QoS rules with port types" [Wishlist,In progress] - Assigned to Slawek Kaplonski (slaweq)
14:13:40 <ajo_> that one is about fixing some technical debt we have,
14:13:56 <ajo_> to make sure any changes to policies are validated with the plugin or mechanism drivers,
14:14:10 <ajo_> or changes to ports or networks (attached policy changes)
14:14:31 <ajo_> since dataplane capabilities for QoS are quite heterogeneous
14:14:46 <ajo_> that depends on more tech-debt
14:15:04 <ajo_> #link https://review.openstack.org/#/c/351858/  (qos notification-driver  to  "driver")
14:15:12 <rnoriega> ok
14:15:31 <ajo_> Some refactor over the current "qos driver" to make it more consistent and be able to do the improved validation properly
14:16:13 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1560961  (instance ingress bw limit)
14:16:15 <openstack> Launchpad bug 1560961 in neutron "[RFE] Allow instance-ingress bandwidth limiting" [Wishlist,In progress] - Assigned to Slawek Kaplonski (slaweq)
14:16:25 <ajo_> this one depends on the two above ;)
14:16:30 <ajo_> this is like a train, sorry ;)
14:17:17 <ajo_> it does not physically depend on the others, but it's a constraint neutron-drivers has imposed on it, to make sure we fix the technical debt first,
14:17:21 <ajo_> which, makes sense IMHO
14:17:52 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1560963 (minimum bw egress -non-strict-)
14:17:54 <openstack> Launchpad bug 1560963 in neutron "[RFE] Minimum bandwidth support (egress)" [Wishlist,Fix released] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
14:18:09 <ajo_> that one is being worked out by Rodolfo Alonso
14:18:23 <ajo_> partly merged for newton (SR-IOV support)
14:18:42 <ajo_> and OVS & LB efforts are being developed to be merged in Ocata
14:18:47 <ajo_> let me look for the links,
14:18:52 <ajo_> I know ltomasbo is interested in this :)
14:18:58 <rnoriega> ajo_, this last one is not strict bw support
14:19:01 <ltomasbo> yep, I was about to ask..
14:19:06 <rnoriega> right?
14:19:11 <ajo_> rnoriega, non-strict, right
14:19:34 <rnoriega> ok
14:20:04 <ltomasbo> non-stric in the sense that it is not "nova-aware"?
14:20:18 <ajo_> correct
14:20:22 <ajo_> non scheduling aware
14:20:34 <ltomasbo> ok
14:20:36 <ajo_> #link https://review.openstack.org/318531  QoS minimum egress for ovs-agent
14:20:54 <ajo_> #link https://review.openstack.org/357865 QoS minimum egress for LinuxBridge
14:21:17 <ajo_> buy the great Rodolfo Alonso Hernandez!, we should get him on some hall of fame (/me blinks eye on njohnston )
14:21:37 <ajo_> buy -> by :)
14:21:55 <rnoriega> ajo_, lol! I wanted to buy a Rodolfo for me as well
14:22:06 <ajo_> he's awesome
14:22:16 <njohnston> :-)
14:22:25 <ajo_> and now, the hot topic
14:22:29 <davidsha> +1
14:22:49 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1578989 minimum bandwidth egress strict (scheduling aware)
14:22:50 <openstack> Launchpad bug 1578989 in neutron "[RFE] Strict minimum bandwidth support (egress)" [Wishlist,Confirmed]
14:22:56 <ajo_> this feature builds on the other
14:23:22 <ajo_> and it's necessary to make sure no hypervisor is scheduled more "minimum bandwidth" than the specific nics can handle in total
14:23:48 <ltomasbo> so, it will be enforce at host level, right?
14:24:11 <ajo_> so, effectively enforcing SUM(port[i].min_bw) for port in hypervisor   <= hypervisor.bw
14:24:13 <ajo_> more or less
14:24:18 <ltomasbo> if the bottleneck is at another point in the network, it will not be enforced, right
14:24:34 <ajo_> that's a simplified equation, as we need to consider the network over which the packets travel, and the direction too
14:24:49 <rnoriega> ajo_, sorry for my ignorant question. But is there anything that could be done in parallel to accelerate the development of the strict min bw support?
14:24:58 <rnoriega> like writing a spec
14:25:12 <rnoriega> helping with some patches on the previous work
14:25:14 <ajo_> rnoriega, yes, let me get into that :)
14:25:21 <rnoriega> ajo_, ok, thanks!
14:25:27 <ajo_> I will summarize now the steps to get there
14:25:33 <ajo_> ltomasbo, you're right
14:25:53 <ajo_> ltomasbo, we have no capability now to see how the network architecture physically is
14:26:03 <ajo_> I wonder if we could model that on a later step
14:26:13 <ajo_> and let the scheduler consume from there
14:26:21 <ajo_> for example, we have the switches dataplane capacity, etc...
14:26:25 <ajo_> interconnections between switches
14:26:31 <ajo_> that's a harder stone to chew
14:26:47 <ltomasbo> ajo_, any plans to tackle that? Perhaps in a VNF chain (as a first step)
14:26:54 <ajo_> IP connectivity /& routes for tunnels
14:27:22 <ajo_> ltomasbo, thinking of that problem causes me heartburn literally ;D
14:27:28 <ltomasbo> :D
14:27:44 <ajo_> ltomasbo, I'd be very happy if anybody wants to look at it
14:27:54 <ajo_> but we should build the foundations first :)
14:28:09 <ltomasbo> could be related to the EU project, but it will depend on the previous patches too
14:28:47 <ajo_> ltomasbo, is on a fancy NFV related project ;)
14:28:53 <ajo_> sorry, no comma :)
14:29:01 <ajo_> so
14:29:24 <ajo_> steps to get this RFE done (eventually)
14:29:44 <ajo_> I'd say it's an ocata & beyond effort, specially since Ocata cycle is shorter
14:29:55 <ajo_> and since we have dependencies on some extra work for nova
14:30:02 <ajo_> so
14:30:07 <ajo_> 0) Writing a neutron spec including all the next steps in detail
14:30:39 <ajo_> I plan on tackling that, but I'm super happy if anybody wants to step in to write the barebones, and have me as Co-Author
14:31:01 <ajo_> since I must first kill the technical debt we have or drivers will kill me :)
14:31:35 <ajo_> such spec could contain some of the next steps:
14:31:39 <ajo_> 1) Neutron collecting physnet/tunneling available bandwidth on every compute, with the option to override via config.
14:31:52 <ajo_> 2) Neutron reporting such available bandwidth to the new nova placement API   inventories in the form of NIC_BW_<physnet>_{ingress,egress}
14:32:28 <ajo_> that'd mean that neutron tells nova "this hypervisor has  10Gb total of NIC_BW_tenant_egress , 10Gb total of NIC_BW_tenant_ingress" as an example
14:32:35 <ajo_> for every hypervisor
14:32:47 <ajo_> also for external networks attached to compute nodes, etc..
14:33:19 <ajo_> We'd make use of the new nova placement API: #link  http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/generic-resource-pools.html
14:33:28 <ajo_> but we'd be missing a key feature from nova
14:33:47 <ajo_> #link https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/custom-resource-classes
14:33:51 <ajo_> which is being worked out
14:34:01 <ajo_> Jaypipes++
14:34:36 <ajo_> that is
14:34:37 <ajo_> 3) Nova accepting custom resource types (NIC_BW_.....) now compute nodes report via this http api the CPU, DISK, RAM.
14:34:57 <ajo_> 4) Changes in how Nova handles ports, by creating/fetching them before doing any scheduling. (this seems to be planned for Cellsv2)
14:35:15 <ajo_> that way, nova can know the requirements of the port (in form of bandwidth) before trying to schedule the instance
14:35:30 <ltomasbo> ajo_, why neutron needs to tell nova about static information (total NIC_BW)?
14:35:40 <ltomasbo> or with total you meant total_in_use?
14:35:41 <ajo_> ltomasbo, the alternative would be
14:35:49 <ajo_> nova find the qos_policy_id on port, as it's now
14:36:00 <ajo_> but then has to fetch the policy, and rules, and make an interpretation of such rules
14:36:12 <ajo_> which could be complex as they grow
14:36:53 <ajo_> so after discussing with the nova team, it seems that the more reasonable option was to give a chewed output for nova (when nova creates, or fetches a port)
14:37:10 <ltomasbo> so, if there is no QoS policy, there will be no information about the bandwidth of the hosts, right?
14:37:45 <ajo_> ltomasbo, I'm talking about the ports
14:37:56 <ajo_> when nova does a GET or POST of a port to neutron
14:38:14 <ajo_> we'd provide the breakdowns of NIC_BW per net/direction
14:38:24 <ajo_> and that'd be empty if there is no qos policy attached to the port
14:38:57 <ltomasbo> ahh, ok, I thought you were talking about host_bw
14:39:32 <ajo_> ah, no no :), that's on step 2, when neutron reports to nova placement API,
14:39:54 <ajo_> such reporting is static (not changing as we add or remove ports, it's the total available)
14:40:10 <ajo_> nova will be responsible for counting down/up available traffic for this guarantee
14:40:26 <ajo_> and that's all
14:40:26 <ltomasbo> ok, now it is clear!
14:41:07 <ajo_> rnoriega, any question about all this? :)
14:41:32 <rnoriega> ajo_, not really! it was a very clear explanation of the current state...
14:41:35 <rnoriega> ajo_, thanks!!!
14:41:49 <rnoriega> ajo_++
14:41:51 <ajo_> ltomasbo, rnoriega , anyone, you're welcome to throw in an initial spec for this based on the steps :)
14:42:23 <rnoriega> ajo_, cool! let's see what we can do!
14:42:32 <ltomasbo> ok
14:43:23 <ajo_> #topic other RFEs
14:43:24 <ajo_> https://bugs.launchpad.net/neutron/+bugs?field.tag=qos+rfe+&field.tags_combinator=ALL
14:43:36 <ajo_> we had this one too:
14:43:37 <ajo_> https://bugs.launchpad.net/neutron/+bug/1614728
14:43:39 <openstack> Launchpad bug 1614728 in neutron "REF: qos: rule list in policy is too difficult to use" [Undecided,Won't fix]
14:44:10 <ajo_> which I have moved to won't fix by now, since we can't make changes to the API, unless we get microversioning (one day)
14:44:41 <ajo_> I'm moving this other one to Won't fix for the same reason:
14:44:41 <ajo_> https://bugs.launchpad.net/neutron/+bug/1580149
14:44:43 <openstack> Launchpad bug 1580149 in neutron "[RFE] Rename API options related to QoS bandwidth limit rule" [Wishlist,Incomplete] - Assigned to Slawek Kaplonski (slaweq)
14:46:02 <ajo_> and, there's ECN which needs more maturing
14:46:23 <ajo_> I've heard non formal requests to provide pps limits, right rnoriega  ?
14:46:44 <rnoriega> ajo_, yep
14:47:07 <ajo_> and also requests about, providing warnings on bandwidth usage
14:47:16 <rnoriega> ajo_, right too
14:47:21 <ajo_> but that's more likely to fit in https://bugs.launchpad.net/neutron/+bug/1592918
14:47:22 <openstack> Launchpad bug 1592918 in neutron "[RFE] Adding Port Statistics to Neutron Metering Agent" [Wishlist,In progress] - Assigned to Sana Khan (sana.khan)
14:47:36 <ajo_> we can look at it when we don't have so much on our shoulders
14:47:59 <rnoriega> ajo_, agreed. Thanks
14:48:08 <ajo_> #topic bugs
14:48:16 <ajo_> #link http://bit.ly/1WhXlzm BUGS
14:48:23 <ajo_> let's see if we have anything there to be tackled
14:48:48 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1627749 qos driver api can have better error handling
14:48:48 <openstack> Launchpad bug 1627749 in neutron "qos driver api can have better error handling" [Medium,Confirmed]
14:48:55 <ajo_> I agree, this one has been filled by yamamoto
14:49:09 <ajo_> basically, we could have other backends, like midonet in this case
14:49:26 <ajo_> which could fail when we ask to modify a policy
14:49:34 <ajo_> and we need to handle that properly
14:49:39 <ajo_> now the opreation is just stopped
14:51:23 <ajo_> #link https://bugs.launchpad.net/python-neutronclient/+bug/1587291
14:51:26 <openstack> Launchpad bug 1587291 in python-neutronclient "Specifying '-F' or '--field' parameter in the qos related commands, returns abnormal result" [Low,In progress] - Assigned to Yan Songming (songmingyan)
14:51:28 <ajo_> this one is in progress, probably needs reviews
14:51:45 <ajo_> #link https://review.openstack.org/#/c/326902/
14:51:48 <ajo_> ohh, it's merged
14:53:08 <ajo_> #link https://bugs.launchpad.net/neutron/+bug/1625570 testing DSCP in fullstack via packet inspection
14:53:09 <openstack> Launchpad bug 1625570 in neutron "fullstack : should add test of ensure traffic is using DSCP marks outbound " [Wishlist,New]
14:53:27 <ajo_> that one is wishlist, but important too, because now we only make sure rules are set properly, etc,
14:53:36 <ajo_> but we never check the outgoing traffic for DSCP marks
14:53:40 <njohnston> so in that case we're really testing ovs flows
14:53:42 <ajo_> njohnston, ^:)
14:53:56 <njohnston> not testing the neutron code
14:53:58 <ajo_> yes, we have some proposals to use pcap / tcpdump to check the real packets coming out the VM
14:54:12 <ajo_> yes, but since those are our flows, we may want to test them
14:54:16 <ajo_> they can become broken
14:54:32 <ajo_> (incompatibilities to ovs-fw, or the new L2 flow pipeline (eventually))
14:54:40 <ajo_> the intent is... avoiding the feature from being silently broken
14:54:50 <njohnston> Understood.  Is there any other packet capture code in fullstack?
14:55:04 <ajo_> njohnston, we have some experiments with tcpdump
14:55:15 <ajo_> and I discussed some options with jlibosva but not sure what's the status
14:55:19 <njohnston> Sounds fun!  And it frightens me to my core.
14:55:29 <ajo_> nah, it shall just be fun
14:55:33 <ajo_> 1) setup rules
14:55:49 <ajo_> 2) ssh to the VM to send packets to tempest controller
14:55:52 <ajo_> sorry
14:55:56 <ajo_> not even that, no VM :)
14:56:06 <ajo_> just send packets from the first port to the 2nd
14:56:10 <ajo_> (this is fullstack, no tempest
14:56:20 <ajo_> 3) capture packets on the other side with a filter for the dscp flag
14:56:24 <ajo_> no packets -> rules don't work
14:56:28 <ajo_> packets -> rule works
14:57:47 <ajo_> and
14:58:04 <ajo_> I guess we can wrap up the meeting for today, .... wasn't it going to be short? ':]
14:58:11 <njohnston> :-D
14:58:21 <ltomasbo> :D
14:58:38 <ajo_> njohnston, ltomasbo , davidsha , njohnston , see u around
14:58:44 <njohnston> thanks ajo_
14:58:46 <davidsha> Thanks! see ya!
14:58:54 <ajo_> thank you sirs!
14:58:56 <ltomasbo> Thanks! See you!
14:59:00 <ajo_> #endmeeting