15:00:49 #startmeeting gantt 15:00:50 Meeting started Tue Jan 14 15:00:49 2014 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:51 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:53 The meeting name has been set to 'gantt' 15:01:03 anyone here for the gantt/scheduler meeting? 15:01:09 yes 15:01:17 hi 15:01:24 hi 15:01:47 hi all 15:02:58 I think boris-42 normally gets here a little late so lets talk about the code frorklift first 15:03:07 #topic scheduler code forklift 15:03:41 Don't know if you saw my email but I've got about 24 patches to bring the gantt tree synced up to recent nova changes 15:04:15 feelings seem to be that we should still review those sync up patches so I'll push them soon but that means there will be a lot of reviews needed 15:04:36 * coolsvap is here 15:04:38 just a warning that we will need to do those reviews 15:04:47 n0ano we are still fixing bugs & so on 15:05:02 n0ano and we started preparing demo with Rally 15:05:04 on second thought, let's switch topics 15:05:12 #topic no_db scheduler 15:05:23 n0ano so qucik update 15:05:40 n0ano we are fixing unit test that are related to DB code 15:05:56 n0ano and in parallel working around benchmarking it at scale 15:06:12 so the bugs are not that critical i guess 15:06:12 n0ano we will use Rally and some new functionality of it that is not yet merged 15:06:25 n0ano yep it works on home devstack installation 15:06:34 n0ano but need to pass all jenkins stuff 15:06:40 for sure 15:06:41 n0ano and verify results using Rally 15:06:53 n0ano so Rally is able to deploy multimode OpenStack deployment 15:07:07 n0ano and to deploy compute_nodes insinde LXC containers 15:07:27 n0ano so we will deploy it probably on 1 controller + 1000 compute node 15:07:27 is there a lot of functionality that is unmerged? sounds like there will be a big change when your stuff goes in 15:07:59 n0ano where in Rally? 15:08:05 boris-42: what do you use for nova driver inside LXC? 15:08:20 toan-tran we will use virt fake 15:08:31 that will allow us to avoid usage of resources 15:08:44 toan-tran and run 1 compute node per 100mb of RAM 15:09:11 boris-42 ok 15:09:32 toan-tran it will be actually really simple to repeat on your pc 15:09:48 toan-tran so that's all 15:09:56 n0ano ^ 15:10:12 not sure what your question was, I didn't mention rally 15:10:12 we've also tried LXC actually :) 15:10:29 but failed to use libvirt or lxc as driver 15:10:56 toan-tran don't try to do it=) 15:11:05 n0ano I mean all new scheduler is not merged 15:11:14 n0ano all patches are still on review 15:11:33 n0ano what we need is to pass all tests in gate 15:11:52 n0ano and to test how performance changed 15:12:08 then the `not merged functionality' is stuff that needs to be added to your no_db code, right? 15:14:28 boris-42, you there? 15:14:52 n0ano nope 15:15:06 n0ano it's functionality that is not merged in master, and is on review 15:15:37 n0ano these patches will add new scheduler https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/no-db-scheduler,n,z 15:15:44 n0ano when they will be merged 15:16:13 boris-42: i have been over the code - there are some cases where there are no tests cases in the patch sets and added in sets after that. would it be possible to address that. it may speed things up a little 15:17:18 garyk could you just put this on review? 15:17:29 garyk yep sure we will address all such stuff 15:18:33 boris-42: sure. have already in some that i have been over, will do again 15:18:34 so, to be clear, these 5 patches implment the no_db scheduler, is that right? 15:18:50 n0ano yep 15:19:06 n0ano but the work is not ended 15:19:35 n0ano we should refactor code a bit + add data from cinder 15:19:44 cool, that will be a big accomplishment 15:20:01 sure, there's always more work but getting it in will be a big start 15:20:08 n0ano 15:20:10 yep 15:20:15 and we will make benchmarks as well 15:20:23 to approve that new approach is better 15:21:29 yeah, benchmarks will be needed, I think it's the right approach but you have to be able to measure it to prove it's the right way to go 15:21:41 boris-42 after the merge will all the functions nova.db.xxx still work? 15:22:15 toan-tran that's the part of refactoring 15:22:23 toan-tran we will remove them 15:22:24 for instance, does availability-zone filter's call for DB work or we have to modify the call? 15:22:56 I think that we will need to modify all this stuff 15:23:19 it's important to remove all related to DB stuff from scheduler logic 15:23:36 then we will be able to implement Gantt 15:23:45 can you make a proxy call from Synchronizer? 15:23:47 scheduler as a service 15:24:14 toan-tran seems like we will need for all such data (or to store it in scheduler) 15:24:51 e.g. some functions for compute still use nova DB , some host-state functions will be redirected by synchronizer 15:24:56 ? 15:24:57 two approach available 1. store all on scheduler 2. make calls to data provider 15:26:07 toan-tran at this moment we moved compute_node/compute_node_stats tables to schedler 15:26:47 toan-tran and we will remove compute_node* table and all db.api.calls that are related to it 15:27:00 boris-42 what about aggregate-hosts tables ? 15:28:38 toan-tran at the first moment we won't touch them 15:28:46 toan-tran then we will refactor 15:29:17 boris-42, funny, that's what I thought you'd say :-) 15:30:03 sounds good, any other questions for boris-42 ? 15:30:22 boris-42 my concern is that some of the tables, like aggragates - hosts - metadata are used in more than one nvoa service 15:30:54 as I understand, you will make others to call nova-scheduler to consult them 15:30:58 is that correct? 15:31:10 toan-tran, which means the refactoring to deal with that will be a little tricky, including new calls to the scheduler 15:31:22 n0ano yep 15:31:40 n0ano because current approach with AMOUNT_OF_SERVERS calls will not scale at all 15:31:48 toan-tran ^ 15:32:37 toan-tran so by refactoring I am thinking about redesign it 15:32:45 toan-tran to work without N calls 15:32:58 boris-42 understood 15:33:17 toan-tran to explain why it doesn't scale you can just calculate 15:33:37 toan-tran 2k servers * 100 (instance to run) == 200k db calls 15:34:46 boris-42 each call = compute_node join metadata join service ..., yeah, i see the picture 15:35:04 toan-tran yep that makes picture even worse 15:35:16 toan-tran and our goal is to make OpenStack working at least on 10k servers 15:35:31 out of box 15:36:09 boris-42, tnx for the update, good work 15:36:25 n0ano np 15:36:27 moving back 15:36:31 n0ano hope to show some results on next week 15:36:36 #scheduler code forklift 15:37:02 as I said, gantt tree up & I will be posting about 24 syncup patches 15:37:15 sorry to jump in ... but have people tried to measure against mysql running on ramdisk ... 1TB RAM is quite a lot 15:37:18 I'll probably get a little obnoxious about getting reviews for those 15:37:42 ddutta: :) 15:38:01 ddutta, can you hold off a bit, I think we can close the current topic quickly 15:38:52 I see some patches against the gantt client tree so people are starting to look at things but there's a lot of work that needs to be done yet 15:39:28 my biggest concern is getting nova to use the gantt tree, I'd like to see some progress there 15:39:45 n0ano: I agree, it was just a initial code I am trying to add to ganttclient 15:40:10 coolsvap, NP, very necessary to get things started, now the real work begins :-) 15:40:20 I would like to see the patches you will be submitting 15:41:15 coolsvap, I hope to push them this afternoon (lots of meetings this morning) but there all pretty clean updaes from the nova tree, should be simple to review 15:42:16 coolsvap, btw, nothing for the ganttclient tree yet, I'll sync that tree later this week. 15:42:52 n0ano: I have submitted https://review.openstack.org/#/c/66263/ 15:43:06 coolsvap, tnx, I'll go look at it. 15:43:20 i am sorry but i need to leave. ddutta can provide an update for the instance groups. the scheduling patch has been in review since june :). 15:43:26 have a good day/evening 15:43:39 garyk, sorry we took so long, have a good day 15:43:57 anyway, I think that's all for the forklift so... 15:44:07 #topic opens 15:44:15 ddutta, you had a question? 15:44:41 regarding the performance of the scheduler etc - wondering if someone had tried mysql on ramdisk 15:44:55 ddutta it is not so big problem with mysql 15:45:04 ddutta, how would that work with multiple schedulers? 15:45:07 sorry I jumped in late and saw back of the envelope calculation ... 15:45:17 ddutta e.g. for 10k nodes it takes about 2sec to get all data from mysql 15:45:26 ddutta on not so super upper powerful server 15:45:52 well how does the multiple scheduler use case supposed to work 15:46:14 * n0ano ignore my comment on multiple schedulers, it doesn't apply 15:46:17 ddutta the most part of time is taken to create python object from mysql result 15:46:24 for example we extracted data from nova and punt it to a constraint solver and even that is fast 15:46:28 and we are doing math optimziation 15:46:54 well i think the bottleneck is in all the filter scheduler ... you are solving for constraints in python 15:47:19 ddutta not only in filter 15:47:23 we demo-ed it last nov in a talk and have it for review 15:47:30 ddutta for example for 10k servers filter works takes about 1 sec 15:47:42 ddutta and getting data about 10 15:47:43 https://blueprints.launchpad.net/nova/+spec/solver-scheduler 15:48:03 ddutta we made some performance testing 15:48:11 ddutta even more we already improve speed of scheduler 15:48:14 ddutta in havana 15:48:15 interesting... do u have a doc? 15:48:47 with the breakdown of numbers 15:48:51 ddutta: I took a look at solver-scheduler, it requires much more time than filter scheduler 15:49:05 well depends on what constraints you use 15:49:21 with filter scheduler, it's mysql access that cost much 15:49:23 ddutta https://review.openstack.org/#/c/43151/ 15:49:52 ddutta as I see it, your constraint depends on the filters you use 15:49:55 toan-tran in case of no-db-scheduler you are accessing local python object 15:50:27 ddutta using to the same filters , it requires more treatment than simply apply the filters 15:50:32 as in filter scheduler 15:50:56 but basically you'll need host state access, too 15:51:11 not really ... if you have a complex set of filters ... you will compete against optimized C++ code in the optimixation backend (we use apache licensed solvers from google) 15:51:13 I think that no-db would profit you, too 15:51:25 yeah its orthogonal to the solving ... I agree .... 15:51:43 if the bottleneck is only in the mysql -> py objects, then fine .... 15:51:57 ddutta it's not 15:52:05 ddutta but it's the biggest one 15:52:29 ddutta btw do you have server with 1TB RAM? 15:52:40 all that a solver (or any filter) needs is a simple matrix from the mysql calls which IMO is much easier than assembling from py objects 15:53:12 yeah I have access to servers with 1TB and 768GB RAM ... 1TB is not common but doable ... 768GB is very common 15:53:20 I can run a benchmark if you want :) 15:53:49 ddutta okay I will introduce you later 15:53:55 * n0ano considers the statment `768GB is very common' 15:54:18 n0ano: common in the lab I have access to :) 15:54:26 * coolsvap too n0ano :) 15:54:30 ddutta, still :-) 15:55:17 sounds like ddutta and boris-42 need to email each other and maybe report out next week, I'll keep the fire on you guys 15:55:23 sure! 15:55:27 n0ano yep 15:55:40 getting close to the hour, any last minute opens? 15:55:48 quick update on instance groups 15:55:53 ddutta, go for it 15:56:15 v2 API under review for a while ... the gating item was v3 which is 80% done 15:56:40 hopefully we will commit it in a few days and start bugging people for reviews 15:56:48 15:57:05 cool, tnx (lots of reviews coming up, people get ready) 15:57:16 last call? 15:57:35 OK, tnx everyone, talk to you next week. 15:57:38 #endmeeting