13:05:01 <alexpilotti> #startmeeting hyper-v 13:05:02 <openstack> Meeting started Wed Jan 20 13:05:01 2016 UTC and is due to finish in 60 minutes. The chair is alexpilotti. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:05:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:05:06 <openstack> The meeting name has been set to 'hyper_v' 13:05:10 <claudiub> o/ 13:05:22 <itoader> o/ 13:05:22 * alexpilotti keeps of forgetting :-) 13:05:31 <abalutoiu> o/ 13:05:35 <alexpilotti> next time I'll try #deplymeeting 13:06:05 <alexpilotti> #topy networking-hyperv parallel ACL execution 13:06:12 <alexpilotti> #topic networking-hyperv parallel ACL execution 13:06:27 * alexpilotti reconnects fingers to brain 13:06:56 <alexpilotti> claudiub: would you like to share some thoughts on our recent benchmarks and results? 13:07:09 <claudiub> yeees, just a second, i have some numbers 13:07:39 <primeministerp> o/ 13:07:41 <alexpilotti> Sonu__, kvinod: are you guys with us today? 13:07:52 <kvinod> also would like to discuss on following from neutron front 13:07:56 <claudiub> https://docs.google.com/spreadsheets/d/1DJtBFFan72HUOjXpg4_ojOD592OgQQiuXzarRtjPbjg/edit#gid=0 13:08:00 <Sonu__> I am here. 13:08:02 <kvinod> Microsoft certification for OVS. 13:08:10 <kvinod> Help on using VXLAN using OVS in Hyper-V 13:08:18 <kvinod> Discuss PYMI results 13:08:42 <alexpilotti> kvinod: ok, adding all to the agenda 13:08:47 <claudiub> ok, so, I've run a couple of neutron-hyperv-agent scenarios: wmi, PyMI, PyMI + native threads. 13:08:49 <kvinod> thanks 13:09:13 <claudiub> and on the given link, there are some graphs, for a simpler visualization of the results. :) 13:09:32 <alexpilotti> drumroll... 13:09:46 <claudiub> first of all, there seems to be about ~2.5X execution time difference between WMI and PyMI. 13:09:56 <claudiub> PyMI being the faster option. 13:10:08 <sagar_nikam> claudiub: on the graph, which line is pyMI and which one os WMI ? 13:10:29 <claudiub> see the legend of the right. :) red is PyMI. 13:10:48 <Sonu__> for processing 75'th port, does it mean it takes 70 sec with PyMI and 32 native threads. 13:10:50 <alexpilotti> also: lower is better 13:11:00 <Sonu__> can I interpret it like that. 13:11:31 <Sonu__> Basically what is the Execution time on the Y-axis. Sorry. 13:11:37 <claudiub> anyways, as for Native threads + PyMI, there is another ~6X improvement, compared to origin/master + PyMI, so, in total, around ~13X improvement. 13:12:21 <claudiub> as for the number of threads used, it seems to me that 10 native threads to be the most optimal solution 13:12:52 <claudiub> as there isn't a lot of performance gain between 10 native threads or 20 or 32. 13:13:25 <Sonu__> Sorry asking again, What is Execution Time? 13:13:40 <Sonu__> is it the time VM started pinging? 13:13:56 <alexpilotti> Sonu__: this is a specific networking-hyperv test 13:14:14 <alexpilotti> Sonu__: so teh time it takes to perform the operation 13:14:23 <claudiub> it is the time when the Nth port was processed: bound to vSwitch, VLAN tagged, SG ACLs added. 13:14:24 <alexpilotti> Sonu__: this is not a Rally test 13:15:12 <Sonu__> claudiub: Thanks for the answer 13:15:28 <claudiub> also, from what I've seen, having a higher worker count, it can lead to a higher number of random 32775 Hyper-V exceptions 13:15:40 <claudiub> which means that the port will have to be reprocessed 13:15:46 <Sonu__> We faced it too. And that happen in a single process context 13:16:12 <claudiub> this basically means that vmms can't handle the ammount of things we're trying to push into it. :) 13:16:14 <Sonu__> In our case, we had spawned independent processes, so not such exceptions were seen. 13:16:31 <Sonu__> within a single process, the VMM handle will have a limit. 13:17:13 <claudiub> Sonu__: in the multiprocessing patch you had, I couldn't see the logging output from the workers. 13:17:44 <alexpilotti> Sonu__: processes or threads have nothing to do with 32775 exceptions 13:17:49 <claudiub> there could have been exceptions, but they were only visisble in the sdtout, not in the log file. 13:17:50 <Sonu__> I know, but retry would have happended if such an exception occurs. 13:18:10 <Sonu__> we minimized retries to almost 0. 13:18:35 <Sonu__> but the improvement I see using PYMI is great 13:19:03 <alexpilotti> Sonu__: thanks 13:19:14 <Sonu__> and with native threads, you seem to reach closer to the benchmark required to scale up. 13:19:23 <alexpilotti> this one (networking-hyperv) is just a part of the whole picture 13:19:31 <Sonu__> yes. 13:19:44 <Sonu__> the enhanced rpc patch I had posted will improve this further. 13:20:01 <Sonu__> 'cz neutron server has to do lot less work with enhanced rpc. 13:20:04 <alexpilotti> claudiub: did you include Sonu__'s RPC patch in those tests? 13:20:12 <Sonu__> no. 13:20:14 <claudiub> alexpilotti: haven't. 13:20:19 <sagar_nikam> alexpilotti:are these improvements only implemented now in networking-hyperv ? 13:20:24 <alexpilotti> Sonu__: BTW merging your RPC patch is next on the todo list 13:20:45 <alexpilotti> sagar_nikam: not sure what you mean 13:21:00 <sagar_nikam> i meant if these changes are there in os-win as well ? 13:21:13 <claudiub> sagar_nikam: so, PyMI can be used on any other branch, the 2.5X benefit will still be the same. 13:21:26 <sagar_nikam> ok 13:21:30 <alexpilotti> sagar_nikam: they work on both os-win and networking-hyperv 13:21:35 <claudiub> sagar_nikam: as for the native threads, they are only on master, at the moment. 13:21:53 <alexpilotti> while the threading patch is in networking-hyperv 13:21:59 <sagar_nikam> so native threads change still needs to be implemented in os-win ? 13:22:13 <alexpilotti> nop, because it's one layer above 13:22:27 <alexpilotti> os-win is agnostic on this 13:22:40 <sagar_nikam> ok 13:23:03 <sagar_nikam> so we need to do these changes in nova, cinder, hyperv-networing etc 13:23:11 <sagar_nikam> i mean nova driver 13:23:14 <sagar_nikam> for hyperv 13:23:25 <alexpilotti> the nova driver case is now under investigation 13:23:32 <alexpilotti> teh operation there are parallel 13:23:39 <sagar_nikam> ok 13:23:40 <alexpilotti> thanks to eventlet 13:24:06 <alexpilotti> PyMI allows parallelism even with greenlets by releasing the GIL before doing MI calls 13:24:33 <alexpilotti> we need some extra testing to ensure that we dont need some extra monkey_patching 13:25:21 <alexpilotti> that's what is referred usually to "greening" a module 13:25:51 <alexpilotti> but so far parallelism has already improved a lot, even in Nova 13:27:15 <alexpilotti> we managed, thanks to PyMI to identify with a profiler where other bottlenecks are 13:27:26 <alexpilotti> most of them are on "associator" calls 13:28:01 <alexpilotti> so abalutoiu is currently working on a patch 13:28:17 <alexpilotti> that gives another good boost 13:28:48 <alexpilotti> based on Rally tests we're now much closer to KVM times 13:29:00 <alexpilotti> anything else to add on this topic? 13:29:04 <claudiub> anyways, if you are going to test the native threads on networking-hyperv for yourselves, make sure you have the latest pymi version. 13:29:29 <alexpilotti> pip install -U pymi 13:29:57 <alexpilotti> I'm moving on 13:30:08 <alexpilotti> #topic rally and benchmarks 13:30:54 <alexpilotti> is Thalabathy with us? 13:31:13 <Thala> <alexpilotti>Yes 13:31:17 <alexpilotti> cool 13:31:38 <alexpilotti> so first, thanks for sharing your results 13:31:53 <alexpilotti> what is the primary objective of your test runs? 13:32:10 <alexpilotti> e.g.: stability, benchmarking, etc 13:32:51 <Thala> <alexpilotti>to check how many concurrency i can ... so that load gets introduced on openstack components 13:32:59 <Sonu__> alexpilotti : I would assume, stability, long hours of operations, concurrency (private cloud), 13:33:28 <Sonu__> benchmarking is definitely an outcome, that we wish to publish and rally can be quite useful. 13:33:30 <Thala> <alexpilotti>based on this I can conclude how many user can login and create their objects 13:33:45 <alexpilotti> Thala: for that, in order to have some data to support the results, some improvements are needed 13:34:42 <alexpilotti> the first scenario is to validate how many VM operations can be sustained by the environment 13:35:42 <alexpilotti> an ideal curve shows a correlation between number of VMs and time that is up to linear 13:36:13 <alexpilotti> anything above that means that there are some bottlenecks to deal with 13:36:54 <Thala> <alexpilotti>: I could see some time the egress packets are not going out 13:37:15 <alexpilotti> Thala: out from the VMs? 13:37:29 <Thala> <alexpilotti>:because of this dhcp request not reaching to openstack network node, 13:37:30 <alexpilotti> or from teh host? 13:37:53 <alexpilotti> when this happens, are the VLANs properly set on the ports? 13:37:57 <Thala> correct when i refer egress from VM to out 13:38:06 <Thala> Yes 13:38:30 <alexpilotti> we found a Hyper-V bug that even when WMI reports that the VLAN tag has been properly applied it's not 13:38:38 <Thala> I used to see this issue even with older release on openstack 13:38:47 <alexpilotti> this happens in 3-5% of the cases 13:38:54 <kvinod> Thala: I feel our requirement was to get same results from Hyper-V as we got from KVM 13:39:42 <alexpilotti> we have a patch in progress for this as it's a very annoying bug 13:40:11 <kvinod> and we carried out the same test under KVM and Hyper-V in which we observed Hyper-V not giving same result 13:40:15 <Thala> vinod: agreed, but these kind of issue we do not have workaround to fix 13:40:17 <alexpilotti> basically we wait for the WMI event telling us that the port actually got set and repeat the operation if not 13:40:44 <sagar_nikam> alexpilloti: patch is in networking-hyperv ? 13:41:04 <Thala> one key thing was KVM flavor and HyperV flavor was different 13:41:06 <alexpilotti> sagar_nikam: it's under development now, will be in networking-hyperv 13:41:24 <sagar_nikam> ok 13:42:01 <alexpilotti> another thing is that the tests require some extra work on the hosts, before running them 13:42:03 <sagar_nikam> thala: your tests show the improvement which claudiub: tests achieved ? 13:42:35 <alexpilotti> as a minimum: Full Windows updates, "high performance" power scheme 13:42:42 <sagar_nikam> alexpilloti: can you share the extra work on the hosts that need to be done 13:43:12 <alexpilotti> sagar_nikam: it's all in the email thread 13:43:20 <sagar_nikam> ok 13:44:07 <alexpilotti> for the performance scheme: PowerCfg.exe /S 8C5E7FDA-E8BF-4A96-9A85-A6E23A8C635C 13:44:30 <alexpilotti> this alone gives roughy a 10% boost 13:44:51 <alexpilotti> Windows updates are definitely necessary 13:45:13 <alexpilotti> especially on an old OS, like 2012 R2 13:45:36 <Thala> alexpilotti:agreed, will be taken care in next execution... 13:45:49 <Thala> power management too 13:46:09 <alexpilotti> Thala: thanks 13:46:39 <kvinod> alexpilotti: one question 13:46:45 <alexpilotti> kvinod: sure 13:47:20 <kvinod> we will take care of doing all those and try to test again 13:48:03 <alexpilotti> great thanks 13:48:08 <kvinod> but what we saw in our previous run was that things starts failing after we reach certain no. of VMs 13:48:18 <Sonu__> and the native thread fixes too. 13:48:34 <claudiub> and update pymi. :) 13:48:41 <alexpilotti> kvinod: we need to see the logs for that 13:48:44 <kvinod> so, my question is that have you guys tried booting more than 1000Vms 13:49:08 <alexpilotti> we booted more than 200 per host 13:49:09 <kvinod> and see if you get 100% results 13:49:47 <kvinod> ok, with how many users and how many computes 13:49:54 <kvinod> ? 13:50:01 <alexpilotti> the only errors that we see are due to the hyper-v bug mentioned above 13:50:24 <alexpilotti> around 3%, you can see it in the link attached to the last email thread as an example 13:50:38 <alexpilotti> we run the tests using 2 compute nodes 13:50:49 <kvinod> so total 400VMs 13:51:10 <alexpilotti> we dont care about number of user, as that's not relevant for hyper-v or the Nova driver 13:51:28 <alexpilotti> the only thing that matters at the driver level is the number of concurrent operations 13:51:30 <claudiub> kvinod: from what I've seen from the logs I've sent, the VMs couldn't be spawned because the neutron port could not be created 13:51:32 <kvinod> was interested in knowing how your test environment behaves with 1000Vms with all fixes included 13:51:53 <claudiub> kvinod: wondering if you have a certain quota set on neutron? 13:52:05 <alexpilotti> kvinod: again, the number of ports are relevant on VM per node 13:52:17 <alexpilotti> if you have 1000 VMs on 10 compute nodes 13:52:45 <alexpilotti> assuming a uniform distribution you'll end with 100 VMs per host 13:53:18 <Sonu__> yes, but the number of ACLS will be different when you have 1000 vms in a security group. 13:53:18 <alexpilotti> the tests that you are doing are more significative for stressing the controller 13:53:27 <kvinod> yes 13:53:44 <Sonu__> so on a compute, you may have to handle cyclic rules for all 1000 vms. 13:53:49 <kvinod> not only controller but the agent 13:53:54 <kvinod> also 13:53:59 <alexpilotti> Sonu__: not really: the ACLs applied on a port depend on the number of rules in the SG 13:54:09 <alexpilotti> kvinod: the agent wont care 13:54:10 <kvinod> in terms of applying rules 13:54:10 <Sonu__> we dont try 1000, but we have 250 vms in a single security group. 13:54:49 <Sonu__> default security group is what we use in our cloud deployment. That becomes a real challenge. 13:54:50 <alexpilotti> how many rules are in the SG? 13:55:21 <sagar_nikam> thala: after you next tests, will you be able to plot a graph of WMI vs PYMI ? in the same was as claudiub: graphs 13:55:52 <Thala> <sagar_nikam>: as of now not planned, 13:55:57 <alexpilotti> if you have an SG with eg TCP SSH and RDP igress with 0.0.0.0/0, that's just 4 ACLs 13:56:04 <kvinod> around 150 to 200 rules 13:56:44 <alexpilotti> kvinod: can you share how you create the SG? 13:56:45 <Thala> <kvinod> default security groups works differen way then normal custom security group 13:56:55 <Sonu__> we use default security group. And we have cyclic rules for each member by protocol. 13:57:09 <Sonu__> last 3 minutes. 13:57:52 <alexpilotti> Sonu__: ok, we'll reproduce your same config! 13:58:01 <Sonu__> thanks 13:58:05 <Thala> we use default security group will introduce additional one rule on all the vm ports when new vm gets spawned 13:58:30 <alexpilotti> 2 minutes to go 13:58:33 <kvinod> alexpilotti: would be good if you can give us some update on OVS certification from Microsoft 13:58:37 <Thala> its bounded with openstack projects 13:58:48 <alexpilotti> quick answer on that 13:58:57 <alexpilotti> #topic OVS certification 13:59:16 <alexpilotti> we are going to get the 2.5 OVS release signed by MSFT (WHQL) 13:59:38 <Sonu__> when is OVS 2.5 be released? 13:59:47 <alexpilotti> which is due in a few weeks (unless the OVS TPL changes plans) 13:59:53 <Sonu__> great. thanks 14:00:02 <kvinod> thanks 14:00:12 <alexpilotti> tim's up, thanks guys! 14:00:16 <alexpilotti> #endmeeting