15:30:02 #startmeeting Performance Team 15:30:03 Meeting started Tue Sep 13 15:30:02 2016 UTC and is due to finish in 60 minutes. The chair is DinaBelova. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:30:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:30:07 The meeting name has been set to 'performance_team' 15:30:14 hey folks o/ 15:30:20 o/ 15:30:20 o/ 15:30:24 o/ 15:30:37 rohanion, rcherrueau o/ 15:30:38 Hi, 15:30:44 hello 15:30:49 o/ 15:31:04 JorgeCardoso nice to see you sir 15:31:11 ilyashakhat are you around today? :) 15:31:13 likewise :) 15:31:39 ok, so good evening / good morning :) 15:31:50 let's start with action items from last meeting 15:31:57 #topic Action Items 15:32:07 msimonin rcherrueau - one of them was on you folks 15:32:12 add information about what monitoring tools were used to the https://pad.inria.fr/p/FK18ZSyjFyLPITfE 15:32:40 yep sorry we are doing it right now 15:32:54 msimonin ack, lemme keep this action item 15:33:00 hey everyone! 15:33:08 #action msimonin rcherrueau add information about what monitoring tools were used to the https://pad.inria.fr/p/FK18ZSyjFyLPITfE 15:33:17 rohanion o/ 15:33:35 rohanion second action item is about you 15:33:46 publish osprofiler config patcher 15:33:55 rohanion afaik you pushed it somewhere 15:34:01 may you please share? 15:34:12 yes 15:34:29 https://github.com/rohanion/osprofiler-cfg 15:34:33 here we go 15:34:36 #link https://github.com/rohanion/osprofiler-cfg 15:34:54 is the readme good enough or should I change it? 15:35:27 rohanion I'd propose to add info that this needs to be run on all OpenStack nodes 15:35:40 rohanion and mention that services need to be restrted separately 15:35:46 as this depends much on the deployment 15:36:04 rohanion what do you think? 15:36:23 I guess it's pretty obvious... but will do, thanks 15:36:35 rohanion it's obvious for you :) 15:36:46 rohanion thanks 15:37:00 that' all about our action items for today 15:37:11 #topic Current progress on the planned tests 15:37:23 okay msimonin ad_rien_ rcherrueau - the floor is yours 15:37:29 thanks 15:37:35 I see some new details added to the etherpad 15:37:41 actually we have lot of questions 15:37:47 :-) 15:37:51 #link https://pad.inria.fr/p/FK18ZSyjFyLPITfE 15:37:53 ad_rien_ sure 15:38:12 msimonin will start 15:38:14 so, 15:38:16 :) 15:39:02 first set of experiments : collect metric of an "idle" openstack with different number of compute 15:39:28 this one is done 15:39:34 we are analysing the results 15:39:35 msimonin may you please clarify what is :idle: here? 15:39:44 DinaBelova: sure 15:40:11 For us idle means an openstack without any incoming requests 15:40:32 the goal is to measure the noise generated by openstack services 15:40:43 an maybe find some patterns 15:40:48 msimonin ack, gotcha 15:41:31 second set of experiments : collect metric of an "non-idle" openstack with different number of compute 15:41:56 basically during rally benchmarks 15:42:13 msimonin due to what I ee in the etherpad - some timeouts happen? 15:42:21 which line ? 15:42:30 376 15:42:52 yep right this has to be explained 15:43:17 msimonin is this happening for boot VM I suppose? 15:43:39 the fact is that with a bigger deployment (1000 compute) we got some timeout in the rally reports 15:44:03 msimonin was rally timed out waiting for server to become active? 15:45:06 DinaBelova: Here is an example of error we got 'Timed out waiting for a reply to message ID b0628f97f1f942c1962859a3c290a731' 15:45:20 rcherrueau oh, interesting 15:45:52 Actually, we are working on these experiments now so we can dive into details I'm not sure whether it is the best time to discuss such points right now ;) Maybe we can just raise all our questions and see which ones need deeper investigations. 15:46:12 ad_rien_ probably ye 15:46:15 ok thanks 15:46:23 rcherrueau btw - where are you observing these logs? 15:46:28 the final quesiton :) 15:46:36 :) 15:46:37 From the rally report 15:46:38 in rally? or in one of the services? 15:46:41 rcherrueau ack 15:47:04 Yes I have to dig into this and take a look at the logs of other services 15:47:08 maybe rabbitmq 15:47:25 ad_rien_ rcherrueau if you won't be able to find the root cause quickly, may you please send me rally logs / rally report? 15:47:31 sure 15:47:53 Can we switch to next questions ;) ? 15:47:54 I'd jump to our rally folks :) they have observed all types of isssues :) 15:47:58 ad_rien_ sure 15:48:35 DinaBelova: cool :) 15:48:45 OK we observed another issue. If we do not redeploy our testbed between scenarios, we got weird behaviors 15:49:09 For that point we need to better understand what does happen…. 15:49:19 maybe you can just tell us if one your side you saw such issues ? 15:49:51 We plan to launch 10 times the same scenario on a small deployment to see whether we got same results 15:50:28 So DinaBelova others ? did you see such issue ? Maybe we should ask the rally team ? 15:50:29 folks, Dina's laptop froze to death, she's rebooting 15:50:37 lol 15:50:40 please wait for a couple of minutes 15:50:45 sorry 15:50:55 she's already tired of our questions 15:51:11 ;) 15:52:07 meanwhile AugieMena may I ask you if you received our email regarding the IBM scheduling presentation ? 15:52:37 folks, I'm back, sorry, my laptop went crazy 15:52:40 we are definitely interested by this work 15:52:44 ok 15:52:48 ad_rien_ may you please repeat everything I missed? :) 15:53:00 the last msg I see is about second question 15:53:15 ad_rien_ Yes, got the email. I believe Chris K replied to that 15:53:52 We observed another issue. If we do not redeploy our testbed between scenarios, we got weird behaviors 15:53:52 For this point we need to better understand what does happen…. 15:53:52 Maybe you can just tell us if one your side you saw such issues ? 15:53:52 We plan to launch 10 times the same scenario on a small deployment to see whether we got same results 15:53:52 So DinaBelova others ? did you see such issue ? Maybe we should ask the rally team ? 15:54:26 s/one your side/on your side sorry for the typo 15:54:41 ad_rien_ ususally we do not observe some behavior changing - rally clean up usually is enough 15:54:50 I mean clean up done after every scenario 15:55:09 we also did that (actually Rally is doing that :() 15:55:18 ok we will confirm that for the next meeting 15:55:22 ad_rien_ - what kind of changes? andreykurilin - may you please try to help? 15:55:48 ad_rien_ ack 15:55:49 increase in the memory of the DB 15:55:54 DinaBelova ad_rien_: hi hi 15:56:06 Hi andreykurilin 15:56:22 ad_rien_ I'm rally core and I'm ready to help 15:56:23 ) 15:56:28 thanks 15:56:34 that's a good news 15:57:07 Actually we would like to be able reproduce such weird behaviors 15:57:12 ad_rien_ what DB? if Nova - so lines with deleted VMs still will be there :) with deleted == true there 15:57:22 soft-delete ? 15:57:24 so we can come back to you with a concrete example 15:57:31 ok 15:57:31 ad_rien_ sure 15:57:52 ad_rien_ anything else? 15:58:05 so the state of openstack isn't the same when running a second time a benchmark ? 15:58:06 Since the DB is not on the same state, the question is whether there will be an impact on the performance we measured 15:58:45 Memcached is also not cleaned 15:59:33 msimonin ad_rien_ well, that's a complex system - if you'll run tests always on purely new env, you'll always observe warming up OpenStack - with all caches being not filled, etc. 15:59:35 so I think we need time to identify all aspects and get back to andreykurilin 15:59:50 ad_rien_: feel free to ping me here or at #openstack-rally (there you can find more number of friendly rally cores) :) 16:00:09 DinaBelova: yes that's why we would like to see whether we can ensure reproducibility criteria between experiments 16:00:11 so the ideal test case is to run every case twice (each pair on 100% clean env) and count only a second attempt 16:00:22 andreykurilin: thanks 16:00:43 ad_rien_ yes, that's tricky 16:00:49 Ok 16:01:01 so the last question for today is related to the scheduling process on top of the fake driver 16:01:11 ad_rien_ sure 16:01:19 actually the fake driver presents 1000 vcpu and a lot of memory 16:01:36 we are wondering how 1000 VM instances are scheduled on top of a 1000 fake drivers 16:02:25 Can all VMs be scheduled on the same fake driver ? 16:02:34 What does prevent to get such a placement ? 16:03:54 ad_rien_ afaik the whole process is the same as for usual runs - filter scheduler uses filters to schedule servers to the compute nodes, some of the filters define if one node should be filled as a priority or all nodes should be filled in more or less same fasion (nova has info about resources already "eaten" on the each hypervisor) 16:04:18 ad_rien_ I guess you need to take a look on the scheduler filters used 16:04:29 and see what kind of configurations are used 16:05:00 DinaBelova: i have a few networking questions, let me know when i can go 16:05:09 sai sure 16:05:19 Ok we will perform a small test to answer this question 16:05:28 thanks DinaBelova this is ok from our side 16:05:30 sai we'll jump soon to the open discussions 16:05:35 ad_rien_ ack, thanks 16:05:39 please keep us updated 16:05:51 rohanion are you still around? ready for osprofiler? 16:06:01 always am! 16:06:04 #topic OSProfiler weekly update 16:06:19 rohanion please proceed 16:07:20 5 sec.. 16:07:59 We're ready to merge the neutron patch https://review.openstack.org/#/c/342505/, waiting for CR from neutron core team 16:08:44 cinder https://review.openstack.org/#/c/315676/ - add one test and ready to merge too. 16:09:13 #link https://review.openstack.org/#/c/342505/ 16:09:19 #link https://review.openstack.org/#/c/315676/ 16:09:29 rohanion ack, thanks 16:09:37 it's not everything :) 16:09:52 it's just thanks :) you can proceed :D 16:09:56 glance https://review.openstack.org/#/c/316799/ - work in progress, debugging failed tests 16:10:07 #link https://review.openstack.org/#/c/316799/ 16:10:30 sahara https://review.openstack.org/#/c/344370/ - nothing, no active development. will ask sahara team for support 16:10:47 internal osprofiler stuff: 16:10:56 #link https://review.openstack.org/#/c/344370/ 16:11:16 elasticsearch https://review.openstack.org/#/c/340936/ - on review, almost ready to merge. will finish reviewing that tomorrow 16:12:01 redis driver https://review.openstack.org/#/c/364487/ - I had no time to review it, will work on that tomorrow as well. 16:12:10 rohanion ack 16:12:46 influxdb driver - no active development. if the performance team decides that we have to implement it asap, I'll code it in an hour. 16:12:55 rohanion :D 16:13:03 you'll be quick :D 16:13:14 anything else sir? 16:13:23 1 hour writing - 1 week testing :) 16:13:34 no, that's everything regarding OSProfiler 16:13:43 rohanion thanks for extended update 16:13:46 #topic Open Discussion 16:13:52 sai please go ahead 16:13:55 sure 16:14:19 so DinaBelova was wondering what kind of networking tests you are running currently and I could update with some of the network perf work we are doing at red hat 16:14:55 sai so mostly we're running data plane tests using shaker 16:15:05 ok shaker is what we are using too 16:15:13 sai lemme share a test plan 16:15:29 #link http://docs.openstack.org/developer/performance-docs/test_plans/tenant_networking/plan.html 16:15:33 sai here it is ^^ 16:15:38 cool 16:15:44 and the related report http://docs.openstack.org/developer/performance-docs/test_results/tenant_networking/neutron_vxlan_dvr_200/index.html#tenant-networking-report-vxlan-dvr-200-nodes 16:15:45 any specific neutron features u r targetting? 16:15:53 we r working on DVR currently 16:16:03 sai for neutron features we have separated tests being run 16:16:08 and had some previous work looking at control plane performance tuning neutron workers 16:16:18 sai http://docs.openstack.org/developer/performance-docs/test_plans/hardware_features/index.html 16:16:30 as for DVR - we usually run with it, yes 16:16:45 sai some time ago we run both DVR and non-DVR topologies 16:17:05 right now mostly DVR (as requested by Mirantis) 16:17:07 cool, i have some of those results, i can share upstream and contribute to the repo 16:17:13 the comparison 16:17:17 sai this will be perfect 16:17:40 so please submit lab description here as well http://docs.openstack.org/developer/performance-docs/labs/index.html 16:17:43 sai ^^ 16:17:51 so we can compare the test bed as well 16:18:31 DinaBelova: what im really curious about is the tunings that u put into the network tests 16:18:37 threads of neteprf/iperf 16:18:40 and TCP packet sizes 16:18:43 or u run defaults 16:18:48 sai sorry I added a wrong link when was targeting the neutron features - here is right one http://docs.openstack.org/developer/performance-docs/test_plans/neutron_features/index.html 16:19:41 sai you can take a look here http://docs.openstack.org/developer/performance-docs/test_results/tenant_networking/neutron_vxlan_dvr_200/index.html#tenant-networking-report-vxlan-dvr-200-nodes - if you'll click to every scenario - e.g. http://docs.openstack.org/developer/performance-docs/test_results/tenant_networking/neutron_vxlan_dvr_200/perf_l2_dense/index.html - you'll see full scenario 16:19:41 DinaBelova: because we are running uperf with different IP packet sizes( outside of shaker) and seeing better throughput at higher packet sizes 16:20:00 cool thanks DinaBelova 16:20:11 sai: we ran with defaults, but our networking guys tried with different settings, mostly number of threads 16:20:19 sai np, you're really welcome 16:20:22 hopefully i will controbute to performance-docs 16:20:37 and do i need to have a test plan or just test-results with scenario will help?/ 16:20:49 ilyashakhat_mobi: ack 16:20:59 sai: it would be interesting to run tests with the same parameters as you do (meaning packet size) 16:21:17 yu can follow existing plan or update it 16:21:38 it's not restricted :) 16:21:50 sai if your tests are a subset of http://docs.openstack.org/developer/performance-docs/test_plans/tenant_networking/plan.html (just with different params) you can jut push test results with other configs and explanations - if scenarios are different, you can add yours to the test plan - http://docs.openstack.org/developer/performance-docs/test_plans/tenant_networking/plan.html 16:22:14 DinaBelova: ilyashakhat_mobi https://github.com/openstack/browbeat/tree/master/rally/rally-plugins/nova-create-pbench-uperf 16:22:36 #link https://github.com/openstack/browbeat/tree/master/rally/rally-plugins/nova-create-pbench-uperf 16:22:43 that is the rally plugin we use combined with pbench(inhouse but opensourced) 16:22:53 kicks of uperf with varying packet sizes and threads 16:23:39 prints rally summary with throughput results as well 16:23:46 so interesting data plane perf through rally 16:23:52 sai ok, so the tool is different - so you can add new section to the test plan I believe 16:24:11 as it's still the same testing field (tenant networking) but really other type of tools used 16:24:12 DinaBelova: my test results are with shaker 16:24:17 sai ah 16:24:19 I got it 16:24:25 but i have some toher results too with the tool i mentioned 16:24:27 :) 16:24:50 i think thats all the network perf questions i have 16:25:10 sai :D ack :) so shaker stuff can be just published as test report - against your env with your configs, for browbeat stuff I'll propose to add one more section to the test plan 16:25:30 and publish results regarding it 16:25:37 sai you're welcome 16:25:40 sure, we index shaker results into elastic 16:25:48 i can pull up the graphs thru kibana 16:25:50 :) 16:25:53 cool 16:25:57 that was we average out results 16:26:05 because im seeing varying results due to NUMA 16:26:12 when a vm is numa local vs non local 16:26:29 so we figured the best was it rerun same test multiple times and average out using elasticsearch/kibana 16:26:41 so we can get something really representative 16:26:41 sai I believe our networking folks were looking on this as well, but not sure 16:26:55 sai that's a good apprioach 16:27:18 cool, is there a specific subset of openstack-network=perf somewhere on irc 16:27:23 or is this the best channel 16:27:30 i know ilyashakhat_mobi does networking perf 16:27:34 not sure who else 16:27:35 sai I believe that's a best channel 16:27:42 awesome 16:27:43 and ilyashakhat_mobi is probably the best person to ping 16:27:45 ill stick to this then 16:27:49 sure 16:28:12 ok, anything else here? 16:28:17 msimonin rcherrueau ad_rien_ ? 16:28:30 nope 16:28:31 nothing on my side 16:28:32 thanks 16:28:35 nope 16:28:37 thanks folks! 16:28:39 bye! 16:28:41 #endmeeting