15:30:02 #startmeeting Performance Team 15:30:03 Meeting started Tue Aug 30 15:30:02 2016 UTC and is due to finish in 60 minutes. The chair is DinaBelova. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:30:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:30:06 The meeting name has been set to 'performance_team' 15:30:12 hey folks :) 15:30:18 hey 15:30:21 hi 15:30:22 hope new time will be more comfortable :) 15:30:37 rohanion msimonin o/ 15:30:58 let's wait for a few moments :) 15:31:09 Hey, hello :) 15:31:46 so let's get started with action items 15:31:47 #topic Action Items 15:32:01 last time we had only one in fact: find out what's slowing down OpenStack REST APIs in the Inria 1000 nodes experiment 15:32:24 yes 15:32:26 msimonin rcherrueau afaiu the issue was in DB settings? 15:32:35 yes it was 15:32:52 kolla deployment configure the mariadb with galera replication out of the box 15:33:12 in our primary tests we only have 1 mariadb instance 15:33:30 hiii 15:33:40 msimonin yep, to have comparable stuff 15:33:55 ok, so you found what was the difference even without any help :) 15:33:57 rohanion o/ 15:34:10 so we can jump to the nice part :) 15:34:13 #topic Upcoming summit 15:34:28 o/ 15:34:34 so from what I know: 1000 nodes emulation testing sesison was approved 15:34:40 lemme find a link 15:35:10 #link https://www.openstack.org/summit/barcelona-2016/summit-schedule/events/15977/chasing-1000-nodes-scale 15:35:30 so this means msimonin and Alex and I are going to have funny time preparing :) 15:35:42 yes :) 15:35:59 rcherrueau should contribute too 15:36:06 ad_rien_ ack 15:36:06 (even if his name does not appear ;) ) 15:36:47 regarding the separated time slot for performance team discussions: WG sessions were not yet approved, although as technically we're part of LDT we may not capture separated slot as we were lucky to do last summit 15:36:48 Is there any other talk accepted for the performance WG ? 15:37:43 not that I know about 15:37:46 :) 15:38:19 so going back to the WG stuff: as we're part of LDT, we may not have separated slot 15:38:36 What do you mean ? 15:38:37 so I wonder if LDT will give us some time to present the results and collect feedback 15:38:41 klindgren ? ^^ 15:38:59 ok 15:38:59 so 15:39:06 klindgren do you think it's possible? 15:39:12 I think our schedulue is fairly open right now. 15:39:34 What is your idea DinaBelova ? you would like to present twice ? 15:39:52 klindgren ok, so I just want to have some backup variant if separated session/time slot won't be available 15:40:02 Or you plan to give a more synthetic presentation just for the LDT WG? 15:40:07 ad_rien_ the thing is that official talk != design summit 15:40:11 and usual discussions 15:40:15 yes 15:40:23 + 1000 nodes is only one set of tests 15:40:27 from what would be done 15:40:29 but we can expect that LDT key persons will attend the ''official'' presentation ? 15:40:39 so sharing the whole status will be nice I think 15:40:53 I think that should be fine. I haven't looked at the schedule to see the LDT sloted times. 15:41:00 ad_rien_ this depends much on the schedule (that's not defined yet :)) 15:41:14 klindgren I did not see these slots being added and defined yet 15:41:19 klindgren that's the issue :) 15:42:12 klindgren so I have 0 idea on if my slot separated request for performance team was accepted or not :) 15:42:19 and that's also an issue :D 15:43:19 Ah I - C 15:43:40 #info right now schedule is not yet finalized, not clear if performance team will gather the separated time slot for discussions, or we can present our results using small part of LDT time slot: LDT agenda is fairly open right now 15:44:10 ok, so that's pretty all I know about upcoming summit in terms of performance team :) 15:44:21 does someone know something else? 15:45:03 it looks like nope :) 15:45:05 fyi, someone from my team (Chris Kirkland) had a performance presentation accepted 15:45:06 rook AugieMena ? 15:45:20 #link ttps://www.openstack.org/summit/barcelona-2016/summit-schedule/global-search?t=nova+scheduler 15:45:34 We have a Performance talk accepted as well /me gets link 15:45:43 rook ack 15:45:54 #link https://www.openstack.org/summit/barcelona-2016/summit-schedule/events/15424/a-nova-scheduler-for-public-cloud-scale 15:46:28 #link https://www.openstack.org/summit/barcelona-2016/summit-schedule/events/15279 15:46:45 rook ack, thanks :) 15:46:57 so we have some summit agenda already defined :) 15:47:25 btw, folks, who's going to attend the summit? 15:47:33 o/ 15:47:42 msimonin I suppose you'll, and what's about rcherrueau ? 15:47:50 ad_rien_ ;) 15:47:50 should be there too 15:47:57 ok, cool, that's nice 15:48:00 AugieMena, rook ? 15:48:02 yep 15:48:05 I won't be there, but Chris will 15:48:12 xnegative 15:48:22 AugieMena ack, thanks for letting know 15:48:27 I do have a bunch of people from my team attending 15:48:32 Too close to the birth of my second son :) 15:48:46 rook wow, congratulations on upcoming EVENT 15:49:04 thanks! 15:49:18 congrats rook... also close to the birth of my first grandson :) 15:49:57 ahaha, it's an epidemic :) 15:50:14 ok, so it looks like we may proceed 15:50:15 #topic Current progress on the planned tests 15:50:27 msimonin rcherrueau may you please start? 15:50:42 yes sure 15:51:08 so with rcherrueau we deployed a 1000 compute nodes openstack 15:51:23 to make some premilinary tests and validate our experiment workflow 15:51:59 as we said before with the patch on the mariadb cofiguration, things were better 15:52:07 ok, so it looks like you're having nice progress 15:52:12 that's really cool 15:52:30 msimonin thanks 15:52:38 moreover :) 15:52:54 Next week we will try different topologies 15:53:07 with this 1000 computes (fake driver) 15:53:21 like adding more controllers 15:53:50 maybe we should discuss what could be relevant ? 15:53:58 msimonin good idea 15:54:27 because actually I discussed briefly with Steve Dake 15:54:40 from Kolla 15:55:10 and they are making some experimentation as well using a kolla deployed openstack and a set of rally benchmarks 15:55:52 msimonin ok, any insights from them? 15:56:16 let me share a link 15:56:26 #link https://etherpad.openstack.org/p/kolla-N-midcycle-osic 15:57:23 * DinaBelova trying to find the deployment topology in the doc 15:57:36 line 35 15:57:41 for the first scenario 15:57:58 a-ha, I see 15:58:07 and what's meant by controller in this case btw? 15:58:51 in the kolla terminology 15:59:29 control = nova-[api|scheduler], horizon, rabbitmq, mariadb, keystone 15:59:40 I'm asking as personally from what we've seen several control plane components need to be removed from controller nodes to separated ones if we're going up with scale (rabbitmq, mysql, conductor, neutron server, keystone) 15:59:56 so for their case (130 nodes) just all-in-one controller should be ok 16:00:15 but for 1000+ nodes scale I believe something needs to be separated 16:00:29 yes 1 control should be able to handle 100 compute node scale 16:00:49 yes actually that's the purpose of the experiment I'd like to conduct next week 16:01:23 having separated DB, rabbitmq, conductor, … 16:01:44 and see how it'll go :) 16:01:52 that's nice, thanks from doing this :) 16:02:18 To your knowledge is their some similar evaluation in the Openstack community , 16:02:18 ? 16:02:50 msimonin I know operators were discussing the deployment topologies. LDT in particular 16:03:12 and everyone is having it's own opinion on this :) 16:03:30 I can imagine easily :) 16:03:47 so we can just measure the numbers and present them and create some recommendations - but HOW to do it it's still up to operator 16:04:10 and sometimes there are internal reasons not to follow these recommendations :) 16:04:15 Maybe we can add such a point in the agenda of the working session in Barcelona 16:04:25 ad_rien_ good point 16:04:31 we have different scenarios in mind @Inria 16:04:56 the one msimonin explained and also somes that include regions/multi sites deployments 16:05:13 #info let's go thorough 1000 nodes experiment result on Barcelona summit and present the deployment recommendations we'll come up with as a result of it 16:05:31 ad_rien_ ack 16:06:06 ok, so from Mirantis side: we have finished 400 nodes control plane / dataplane test runs, currently collecting the results 16:06:21 the thing is that in fact due to several hw issues it were 378 nodes, not 400 :( 16:06:37 but we decided not to wait till they will be fixed and run tests now 16:06:50 if we'll be able, we'll rerun them on 500 nodes at the end of sept 16:07:40 so what's upcoming: k8s + fuel-ccp (containerized control plane) evaluation, 1000 nodes emulation on 250 nodes 16:08:33 as for the first item: it's something requested by Mirantis folks, who are working on fuel-ccp and and k8s, so we're interested on how k8s itself and openstack on top of it can scale 16:08:54 could you please clarify 16:08:58 I'm bit lost: 16:09:08 k8s + fuel ccp 16:09:23 are you deploying VMs or containers to emulate your 1000 nodes ? 16:09:25 ad_rien_ there is much effort here in Mirantis is spent on containeraized OpenStack development 16:09:39 ok 16:09:42 and we (as scale team inside Mirantis) got a request to try it 16:09:47 ok 16:10:02 I'm just trying to understand 16:10:15 how are you emulating your 1000 nodes on top of 250 physical servers 16:10:37 ad_rien_ so that's something I was going to talk about now 16:10:47 cool 16:10:48 last time on small lab we used containers for this purpose 16:11:05 this time we really wanted to have non-fake driver 16:11:52 so we have several options now: use kolla/ccp for this purpose (on top of 1000 vms run against 250 nodes) or just fuel (again, on top of 1000 vms) 16:12:32 due to the fact we're pushed to use ccp, we're currently evaluating it on small scale 16:12:57 and we'll see what will be chosen in next 2 weeks or so 16:13:53 ad_rien_ this is much about politics and what's more useful for Mirantis as a company, but I really hope we'll grad these 1000 nodes cluster :D 16:14:01 grab* 16:14:32 so the main moment: we think there will be enough HW to run these 1000 nodes with usual libvirt Nova driver, not fake one 16:14:48 ok 16:14:56 it's clear thanks 16:15:14 and see how the picture will change comparing with what we've done previously on small lab with fake river 16:15:19 ok, cool 16:15:27 so it looks like we can jump to the OSprofiler 16:15:32 #topic OSProfiler weekly update 16:15:38 rohanion the floor is yours :) 16:15:49 ok cool 16:16:41 I'm still working on a script that changes the config files and restarts the services 16:16:51 will finish it by the end of today 16:17:03 nothing besides that, unfortunately :( 16:17:19 rohanion you are about automation of osprofiler usage on Fuel-installed clouds? 16:17:33 yes 16:17:53 rohanion ack, thanks 16:18:00 but it will work with vanilla OS too 16:18:18 rohanion ok, thanks for mentioning this :) 16:18:20 I decided not to work with hiera and detect the role based on the services 16:19:15 rohanion any updates from Alex? (I know he could not attend today meeting) 16:19:34 No, he switched to another project 16:19:38 afair 16:20:03 rohanion ack 16:20:10 #topic Open Discussion 16:20:21 so do we have something else to cover? 16:21:15 it looks like nope :) 16:21:22 thanks everyone for participating :) 16:21:24 bye! 16:21:26 bye 16:21:28 #endmeeting