#openstack-meeting-3 log

13:05:01 <alexpilotti> #startmeeting hyper-v
13:05:02 <openstack> Meeting started Wed Jan 20 13:05:01 2016 UTC and is due to finish in 60 minutes.  The chair is alexpilotti. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:05:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:05:06 <openstack> The meeting name has been set to 'hyper_v'
13:05:10 <claudiub> o/
13:05:22 <itoader> o/
13:05:22 * alexpilotti keeps of forgetting :-)
13:05:31 <abalutoiu> o/
13:05:35 <alexpilotti> next time I'll try #deplymeeting
13:06:05 <alexpilotti> #topy networking-hyperv parallel ACL execution
13:06:12 <alexpilotti> #topic networking-hyperv parallel ACL execution
13:06:27 * alexpilotti reconnects fingers to brain
13:06:56 <alexpilotti> claudiub: would you like to share some thoughts on our recent benchmarks and results?
13:07:09 <claudiub> yeees, just a second, i have some numbers
13:07:39 <primeministerp> o/
13:07:41 <alexpilotti> Sonu__, kvinod: are you guys with us today?
13:07:52 <kvinod> also would like to discuss on following from neutron front
13:07:56 <claudiub> https://docs.google.com/spreadsheets/d/1DJtBFFan72HUOjXpg4_ojOD592OgQQiuXzarRtjPbjg/edit#gid=0
13:08:00 <Sonu__> I am here.
13:08:02 <kvinod> Microsoft certification for OVS.
13:08:10 <kvinod> Help on using VXLAN using OVS in Hyper-V
13:08:18 <kvinod> Discuss PYMI results
13:08:42 <alexpilotti> kvinod: ok, adding all to the agenda
13:08:47 <claudiub> ok, so, I've run a couple of neutron-hyperv-agent scenarios: wmi, PyMI, PyMI +  native threads.
13:08:49 <kvinod> thanks
13:09:13 <claudiub> and on the given link, there are some graphs, for a simpler visualization of the results. :)
13:09:32 <alexpilotti> drumroll...
13:09:46 <claudiub> first of all, there seems to be about ~2.5X execution time difference between WMI and PyMI.
13:09:56 <claudiub> PyMI being the faster option.
13:10:08 <sagar_nikam> claudiub: on the graph, which line is pyMI and which one os WMI ?
13:10:29 <claudiub> see the legend of the right. :) red is PyMI.
13:10:48 <Sonu__> for processing 75'th port, does it mean it takes 70 sec with PyMI and 32 native threads.
13:10:50 <alexpilotti> also: lower is better
13:11:00 <Sonu__> can I interpret it like that.
13:11:31 <Sonu__> Basically what is the Execution time on the Y-axis. Sorry.
13:11:37 <claudiub> anyways, as for Native threads + PyMI, there is another ~6X improvement, compared to origin/master + PyMI, so, in total, around ~13X improvement.
13:12:21 <claudiub> as for the number of threads used, it seems to me that 10 native threads to be the most optimal solution
13:12:52 <claudiub> as there isn't a lot of performance gain between 10 native threads or 20 or 32.
13:13:25 <Sonu__> Sorry asking again, What is Execution Time?
13:13:40 <Sonu__> is it the time VM started pinging?
13:13:56 <alexpilotti> Sonu__: this is a specific networking-hyperv test
13:14:14 <alexpilotti> Sonu__: so teh time it takes to perform the operation
13:14:23 <claudiub> it is the time when the Nth port was processed: bound to vSwitch, VLAN tagged, SG ACLs added.
13:14:24 <alexpilotti> Sonu__: this is not a Rally test
13:15:12 <Sonu__> claudiub: Thanks for the answer
13:15:28 <claudiub> also, from what I've seen, having a higher worker count, it can lead to a higher number of random 32775 Hyper-V exceptions
13:15:40 <claudiub> which means that the port will have to be reprocessed
13:15:46 <Sonu__> We faced it too. And that happen in a single process context
13:16:12 <claudiub> this basically means that vmms can't handle the ammount of things we're trying to push into it. :)
13:16:14 <Sonu__> In our case, we had spawned independent processes, so  not such exceptions were seen.
13:16:31 <Sonu__> within a single process, the VMM handle will have a limit.
13:17:13 <claudiub> Sonu__: in the multiprocessing patch you had, I couldn't see the logging output from the workers.
13:17:44 <alexpilotti> Sonu__: processes or threads have nothing to do with 32775 exceptions
13:17:49 <claudiub> there could have been exceptions, but they were only visisble in the sdtout, not in the log file.
13:17:50 <Sonu__> I know, but retry would have happended if such an exception occurs.
13:18:10 <Sonu__> we minimized retries to almost 0.
13:18:35 <Sonu__> but the improvement I see using PYMI is great
13:19:03 <alexpilotti> Sonu__: thanks
13:19:14 <Sonu__> and with native threads, you seem to reach closer to the benchmark required to scale up.
13:19:23 <alexpilotti> this one (networking-hyperv) is just a part of the whole picture
13:19:31 <Sonu__> yes.
13:19:44 <Sonu__> the enhanced rpc patch I had posted will improve this further.
13:20:01 <Sonu__> 'cz neutron server has to do lot less work with enhanced rpc.
13:20:04 <alexpilotti> claudiub: did you include Sonu__'s RPC patch in those tests?
13:20:12 <Sonu__> no.
13:20:14 <claudiub> alexpilotti: haven't.
13:20:19 <sagar_nikam> alexpilotti:are these improvements only implemented now in networking-hyperv ?
13:20:24 <alexpilotti> Sonu__: BTW merging your RPC patch is next on the todo list
13:20:45 <alexpilotti> sagar_nikam: not sure what you mean
13:21:00 <sagar_nikam> i meant if these changes are there in os-win as well ?
13:21:13 <claudiub> sagar_nikam: so, PyMI can be used on any other branch, the 2.5X benefit will still be the same.
13:21:26 <sagar_nikam> ok
13:21:30 <alexpilotti> sagar_nikam: they work on both os-win and networking-hyperv
13:21:35 <claudiub> sagar_nikam: as for the native threads, they are only on master, at the moment.
13:21:53 <alexpilotti> while the threading patch is in networking-hyperv
13:21:59 <sagar_nikam> so native threads change still needs to be implemented in os-win ?
13:22:13 <alexpilotti> nop, because it's one layer above
13:22:27 <alexpilotti> os-win is agnostic on this
13:22:40 <sagar_nikam> ok
13:23:03 <sagar_nikam> so we need to do these changes in  nova, cinder, hyperv-networing etc
13:23:11 <sagar_nikam> i mean nova driver
13:23:14 <sagar_nikam> for hyperv
13:23:25 <alexpilotti> the nova driver case is now under investigation
13:23:32 <alexpilotti> teh operation there are parallel
13:23:39 <sagar_nikam> ok
13:23:40 <alexpilotti> thanks to eventlet
13:24:06 <alexpilotti> PyMI allows parallelism even with greenlets by releasing the GIL before doing MI calls
13:24:33 <alexpilotti> we need some extra testing to ensure that we dont need some extra monkey_patching
13:25:21 <alexpilotti> that's what is referred usually to "greening" a module
13:25:51 <alexpilotti> but so far parallelism has already improved a lot, even in Nova
13:27:15 <alexpilotti> we managed, thanks to PyMI to identify with a profiler where other bottlenecks are
13:27:26 <alexpilotti> most of them are on "associator" calls
13:28:01 <alexpilotti> so abalutoiu is currently working on a patch
13:28:17 <alexpilotti> that gives another good boost
13:28:48 <alexpilotti> based on Rally tests we're now much closer to KVM times
13:29:00 <alexpilotti> anything else to add on this topic?
13:29:04 <claudiub> anyways, if you are going to test the native threads on networking-hyperv for yourselves, make sure you have the latest pymi version.
13:29:29 <alexpilotti> pip install -U pymi
13:29:57 <alexpilotti> I'm moving on
13:30:08 <alexpilotti> #topic rally and benchmarks
13:30:54 <alexpilotti> is Thalabathy with us?
13:31:13 <Thala> <alexpilotti>Yes
13:31:17 <alexpilotti> cool
13:31:38 <alexpilotti> so first, thanks for sharing your results
13:31:53 <alexpilotti> what is the primary objective of your test runs?
13:32:10 <alexpilotti> e.g.: stability, benchmarking, etc
13:32:51 <Thala> <alexpilotti>to check how many concurrency i can ... so that load gets introduced on openstack components
13:32:59 <Sonu__> alexpilotti : I would assume, stability, long hours of operations, concurrency (private cloud),
13:33:28 <Sonu__> benchmarking is definitely an outcome, that we wish to publish and rally can be quite useful.
13:33:30 <Thala> <alexpilotti>based on this I can conclude how many user can login and create their objects
13:33:45 <alexpilotti> Thala: for that, in order to have some data to support the results, some improvements are needed
13:34:42 <alexpilotti> the first scenario is to validate how many VM operations can be sustained by the environment
13:35:42 <alexpilotti> an ideal curve shows a correlation between number of VMs and time that is up to linear
13:36:13 <alexpilotti> anything above that means that there are some bottlenecks to deal with
13:36:54 <Thala> <alexpilotti>: I could see some time the egress packets are not going out
13:37:15 <alexpilotti> Thala: out from the VMs?
13:37:29 <Thala> <alexpilotti>:because of this dhcp request not reaching to openstack network node,
13:37:30 <alexpilotti> or from teh host?
13:37:53 <alexpilotti> when this happens, are the VLANs properly set on the ports?
13:37:57 <Thala> correct when i refer egress from VM to out
13:38:06 <Thala> Yes
13:38:30 <alexpilotti> we found a Hyper-V bug that even when WMI reports that the VLAN tag has been properly applied it's not
13:38:38 <Thala> I used to see this issue even with older release on openstack
13:38:47 <alexpilotti> this happens in 3-5% of the cases
13:38:54 <kvinod> Thala: I feel our requirement was to get same results from Hyper-V as we got from KVM
13:39:42 <alexpilotti> we have a patch in progress for this as it's a very annoying bug
13:40:11 <kvinod> and we carried out the same test under KVM and Hyper-V in which we  observed Hyper-V not giving same result
13:40:15 <Thala> vinod: agreed, but these kind of issue we do not have workaround to fix
13:40:17 <alexpilotti> basically we wait for the WMI event telling us that the port actually got set and repeat the operation if not
13:40:44 <sagar_nikam> alexpilloti: patch is in networking-hyperv ?
13:41:04 <Thala> one key thing was KVM flavor and HyperV flavor was different
13:41:06 <alexpilotti> sagar_nikam: it's under development now, will be in networking-hyperv
13:41:24 <sagar_nikam> ok
13:42:01 <alexpilotti> another thing is that the tests require some extra work on the hosts, before running them
13:42:03 <sagar_nikam> thala: your tests show the improvement which claudiub: tests achieved ?
13:42:35 <alexpilotti> as a minimum: Full Windows updates, "high performance" power scheme
13:42:42 <sagar_nikam> alexpilloti: can you share the extra work on the hosts that need to be done
13:43:12 <alexpilotti> sagar_nikam: it's all in the email thread
13:43:20 <sagar_nikam> ok
13:44:07 <alexpilotti> for the performance scheme: PowerCfg.exe /S 8C5E7FDA-E8BF-4A96-9A85-A6E23A8C635C
13:44:30 <alexpilotti> this alone gives roughy a 10% boost
13:44:51 <alexpilotti> Windows updates are definitely necessary
13:45:13 <alexpilotti> especially on an old OS, like 2012 R2
13:45:36 <Thala> alexpilotti:agreed, will be taken care in next execution...
13:45:49 <Thala> power management too
13:46:09 <alexpilotti> Thala: thanks
13:46:39 <kvinod> alexpilotti: one question
13:46:45 <alexpilotti> kvinod: sure
13:47:20 <kvinod> we will take care of doing all those and try to test again
13:48:03 <alexpilotti> great thanks
13:48:08 <kvinod> but what we saw in our previous run was that things starts failing after we reach certain no. of VMs
13:48:18 <Sonu__> and the native thread fixes too.
13:48:34 <claudiub> and update pymi. :)
13:48:41 <alexpilotti> kvinod: we need to see the logs for that
13:48:44 <kvinod> so, my question is that have you guys tried booting more than 1000Vms
13:49:08 <alexpilotti> we booted more than 200 per host
13:49:09 <kvinod> and see if you get 100% results
13:49:47 <kvinod> ok, with how many users and how many computes
13:49:54 <kvinod> ?
13:50:01 <alexpilotti> the only errors that we see are due to the hyper-v bug mentioned above
13:50:24 <alexpilotti> around 3%, you can see it in the link attached to the last email thread as an example
13:50:38 <alexpilotti> we run the tests using 2 compute nodes
13:50:49 <kvinod> so total 400VMs
13:51:10 <alexpilotti> we dont care about number of user, as that's not relevant for hyper-v or the Nova driver
13:51:28 <alexpilotti> the only thing that matters at the driver level is the number of concurrent operations
13:51:30 <claudiub> kvinod: from what I've seen from the logs I've sent, the VMs couldn't be spawned because the neutron port could not be created
13:51:32 <kvinod> was interested in knowing how your test environment behaves with 1000Vms with all fixes included
13:51:53 <claudiub> kvinod: wondering if you have a certain quota set on neutron?
13:52:05 <alexpilotti> kvinod: again, the number of ports are relevant on VM per node
13:52:17 <alexpilotti> if you have 1000 VMs on 10 compute nodes
13:52:45 <alexpilotti> assuming a uniform distribution you'll end with 100 VMs per host
13:53:18 <Sonu__> yes, but the number of ACLS will be different when you have 1000 vms in a security group.
13:53:18 <alexpilotti> the tests that you are doing are more significative for stressing the controller
13:53:27 <kvinod> yes
13:53:44 <Sonu__> so on a compute, you may have to handle cyclic rules for all 1000 vms.
13:53:49 <kvinod> not only controller but the agent
13:53:54 <kvinod> also
13:53:59 <alexpilotti> Sonu__: not really: the ACLs applied on a port depend on the number of rules in the SG
13:54:09 <alexpilotti> kvinod: the agent wont care
13:54:10 <kvinod> in terms of applying rules
13:54:10 <Sonu__> we dont try 1000, but we have 250 vms in a single security group.
13:54:49 <Sonu__> default security group is what we use in our cloud deployment. That becomes a real challenge.
13:54:50 <alexpilotti> how many rules are in the SG?
13:55:21 <sagar_nikam> thala: after you next tests, will you be able to plot a graph of WMI vs PYMI ? in the same was as claudiub: graphs
13:55:52 <Thala> <sagar_nikam>: as of now not planned,
13:55:57 <alexpilotti> if you have an SG with eg TCP SSH and RDP igress with 0.0.0.0/0, that's just 4 ACLs
13:56:04 <kvinod> around 150 to 200 rules
13:56:44 <alexpilotti> kvinod: can you share how you create the SG?
13:56:45 <Thala> <kvinod> default security groups works differen way then normal custom security group
13:56:55 <Sonu__> we use default security group. And we have cyclic rules for each member by protocol.
13:57:09 <Sonu__> last 3 minutes.
13:57:52 <alexpilotti> Sonu__: ok, we'll reproduce your same config!
13:58:01 <Sonu__> thanks
13:58:05 <Thala> we use default security group will introduce additional one rule on all the vm ports when new vm gets spawned
13:58:30 <alexpilotti> 2 minutes to go
13:58:33 <kvinod> alexpilotti: would be good if you can give us some update on OVS certification from Microsoft
13:58:37 <Thala> its bounded with openstack projects
13:58:48 <alexpilotti> quick answer on that
13:58:57 <alexpilotti> #topic OVS certification
13:59:16 <alexpilotti> we are going to get the 2.5 OVS release signed by MSFT (WHQL)
13:59:38 <Sonu__> when is OVS 2.5 be released?
13:59:47 <alexpilotti> which is due in a few weeks (unless the OVS TPL changes plans)
13:59:53 <Sonu__> great. thanks
14:00:02 <kvinod> thanks
14:00:12 <alexpilotti> tim's up, thanks guys!
14:00:16 <alexpilotti> #endmeeting