15:01:05 <n0ano> #startmeeting scheduler 15:01:06 <openstack> Meeting started Tue Aug 13 15:01:05 2013 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:09 <openstack> The meeting name has been set to 'scheduler' 15:01:29 <n0ano> anyone here for the scheduler meeting? 15:01:47 <PhilDay> Yep (but I have to drop at half past) 15:02:13 <jog0> o/ 15:03:12 <n0ano> let's wait a minute or two and then go... 15:03:16 <jgallard> hi all 15:04:11 <n0ano> #topic Perspective for nova scheduler 15:04:25 <n0ano> I hope eveyone has had a chance to look at Bors' paper 15:04:30 <n0ano> #link https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit#heading=h.6ixj0ctv4rwu 15:05:29 <garyk> hi guys 15:05:29 <n0ano> my read is that basically he is saying update state info through the DB is a scaling problem, doing RPC calls to the scheduler would solve this problem 15:06:25 <n0ano> my intuition is to agree with that but I believe there is a significant group the feels the DB is a more scalable solution 15:06:35 <n0ano> how to we resolve this dichotomy 15:07:35 <PhilDay> I think there were also some open questions on how does a newly started scheduler get a full set of state 15:07:56 <jog0> wasn't a great deal of this hashed out on the ML? 15:08:26 <PhilDay> And is all of this is now held just in memory by the scheduler(s) how do we get visibility into that state 15:08:51 <n0ano> jog0, discussed on the ML -yes, resolved -I don't think so 15:09:07 <PhilDay> Could be that I missed that mlist discussion (had way to much in my inbox when I came back - must stop taking holidays) 15:09:31 <n0ano> PhilDay, good points but are those implementation details or architectural problems 15:10:09 <jog0> PhilDay: here is the thread http://lists.openstack.org/pipermail/openstack-dev/2013-July/012221.html 15:10:24 <PhilDay> If we're moving to a model of not persisting the scheduling related state in the DB, then I'd say they are architectural 15:10:39 <PhilDay> @jog0 - thanks 15:12:45 <PhilDay> I guess what I'm thinking is that in addition to "time to scheudule a new VM" I'd like to see "time for a new scheduler to retrieve its state" as an explicit metric 15:12:47 <n0ano> PhilDay, would providing an API to the scheduler to access this info be sufficient or would somehow periodically syncing to the DB work 15:13:13 <jog0> there was some good ideas at the end ofthat thread 15:13:20 <PhilDay> I guess either 15:13:41 <PhilDay> Ok - sounds like i have some more reading to do before I can contribute intellegently ;-) 15:14:54 <PhilDay> <lurk mode on> 15:15:05 <jog0> also boris-42's paper didn't clearly show the actual issue (IE not sure how to reproduce there results) 15:15:17 <boris-42> jog0 hi 15:15:26 <boris-42> jog0 whatsup? 15:15:51 <jog0> I think we all agree the current scheduler has limitiations the questions are at what point exactly and are there any good short term fixes we can do for now, until Icehouse dev is open 15:16:07 <jog0> boris-42: see backlog 15:16:14 <boris-42> jog0 yeah you are doing great work 15:16:21 <boris-42> jog0 around removing fanout] 15:17:03 <boris-42> n0ano hi 15:17:13 <n0ano> boris-42, welcome 15:17:44 <boris-42> n0ano we updated today our document 15:17:54 <boris-42> n0ano so we are not doing fanout call to scheduler 15:18:08 <PhilDay> As I understood last weeks discussion, any short term fix would keep the DB in place but the path for updates would change from "comp->conductor->DB" to "comp->Sched->DB" 15:18:13 <n0ano> must read new doc 15:18:31 <boris-42> n0ano there is not so much change 15:18:43 <boris-42> PhilDay not only this change 15:18:48 <PhilDay> Do we really think there is time to move to a non-DB model still in Havana ? 15:19:00 <boris-42> PhilDay no 15:19:07 <boris-42> PhilDay this should be done in I cycle 15:19:34 <jog0> PhilDay: things like more optimized DB queries or caching are options right now 15:19:43 <PhilDay> Ok, so if its all to be done in I - shouldn't this be a topic to be bottomed out in HK ? 15:19:57 <boris-42> PhilDay yes 15:20:04 <n0ano> PhilDay, absolutly, but doing some prep work before hand is good 15:20:38 <boris-42> PhilDay yeah we would like to prapaer 1) all code 2) benchmark results on real deployments before summit 15:20:59 <PhilDay> OK - sorry I got the wrong end of the stick from the start of this then. I thought you were trying to drive to a conclusion on the architecture in here 15:21:12 <boris-42> nonon 15:21:13 <boris-42> =) 15:21:18 <jog0> PhilDay: thats what I thought too 15:21:18 <n0ano> boris-42, have you considered PhilDay concern that you need a way to look at the compute node states, moving from the DB makes that hard 15:21:47 <jog0> so I liked some of Clint's ideas for scheduling 15:21:50 <boris-42> n0ano we will put all data about HOST into DB 15:22:03 <boris-42> scheduler DB 15:22:08 <boris-42> not only compute_node table 15:22:19 <boris-42> but also data from compute_node_stats and probably from cinder 15:22:31 <boris-42> to be able to use different data from different project in our scheduler 15:22:44 <PhilDay> Just to be clear I want any query on data to be behind an API - so I'n not wedded to it being in the DB, I just want to be sure I don't lose any visbility 15:23:04 <boris-42> PhilDay visibility about what? 15:23:22 <PhilDay> The data the scheduler is using (i.e host states, etc_ 15:23:32 <boris-42> PhilDay and? 15:23:38 <jog0> boris-42: any cross project DB stuff makes things much harder 15:23:44 <boris-42> jog0 no 15:23:49 <boris-42> jog0 it don't make 15:23:52 <jog0> politically 15:24:01 <boris-42> jog0 our goal is to have one scheduler 15:24:07 <boris-42> that keeps all data about hosts 15:24:12 <n0ano> personally, I like an API and a back channel (for debugging when the API server fails) 15:24:17 <jog0> it becomes a nother contractual API to maintain 15:24:39 <boris-42> jog0 it will be much easier 15:24:42 <PhilDay> A single scheduler that can also know about Network locatilty (from Quantum) and Volume locality (from Cinder) ? 15:24:48 <boris-42> yeah 15:24:52 <boris-42> PhilDay yes 15:24:58 <boris-42> philDay and is actually scalable 15:25:18 <boris-42> PhilDay it is very useful in a lot of cases 15:25:36 <boris-42> PhilDay for example you are runing cinder and nova on each host 15:26:11 <boris-42> PhilDay and would like to schedule you instance with block device with size of 200GB and ensure that on that host you have enough of free disk in cinder=) 15:26:36 <jog0> boris-42: I like the idea too, but doing it requires careful consideration to make sure it doesn't couple the assorted projects too much. 15:26:44 <jog0> also is this in the new document? 15:26:53 <boris-42> jog0 sorry not ready yet 15:27:02 <PhilDay> One other thing that's at the back of my mind (but I haven't done much thinking about it) is what it would take to plug in a third party scheduler (like say MOAB) - having only an RPC interface might make that simpler I guess 15:27:16 <boris-42> jog0 but iour goal is to finish all this things before summit and doc also 15:27:47 <n0ano> PhilDay, is a 3rd party scheduler really that necessary? 15:27:51 <jog0> PhilDay: that is a good question, most of this discussion is around we only have one scheduler 15:28:01 <boris-42> jog0 PhilDay it is really huge change (not in lines) but in approach. So I agree that we should really carefully discuss all this things 15:28:42 <jog0> n0ano: some people may want to use other information to schedule on, and simpler scheduler etc 15:29:07 <boris-42> PhilDay nano I don't see very is complexity of our approach? 15:29:18 <n0ano> jog0, I would hope that the extensibility we've already built into the scheduler is sufficient for 99% of the users 15:29:42 <PhilDay> Not necessary, and I wouldn't do that in favour of having all of these features in Openstack - but it is something that comes up from time to time in conversation with customers wanting to build thier own clouds. 15:29:48 <boris-42> PhilDay one simple scheduler that have small amount of methods (run_instance, migrate, cinder scheduler methods) 15:29:56 <boris-42> and one another method 15:30:01 <boris-42> that update host_state 15:30:12 <boris-42> and could be called from different serviesec 15:30:33 <PhilDay> Got to dive for another call - sorry 15:30:40 <boris-42> PhilDay good luck 15:30:41 <jog0> PhilDay: bye 15:30:54 <jog0> boris-42: in short this is a big change, huge infact 15:30:58 <n0ano> all ARs to to PhilDay :-) 15:31:04 <debo_os> sorry for joining late but just like we decided to do a separate network service, I dont see why we cant have a plugabble scheduler service 15:31:26 <n0ano> it already is, you can select from multiple scheduler right now 15:31:28 <boris-42> jog0 I agree that it is big change in approach, but small in LOCs=) 15:31:34 <boris-42> jog0 and could be done step by step 15:31:43 <jog0> LOCs don't matter in this 15:31:44 <boris-42> jog0 but first step should be done only in I cycle 15:32:01 <boris-42> jog0 I find your current job great 15:32:04 <boris-42> jog0 for H cycle 15:32:06 <jog0> there was a BP to do this a while backbut it got stalled 15:32:15 <debo_os> n0ano: however the state management is not pluggable yet 15:32:27 <jog0> boris-42: I do like this proposal, I am just saying it is tricky 15:32:33 <n0ano> debo_os, hence the discussion here 15:32:49 <jog0> I would recomend drafting up an early idea and putting it to the ML along with an outline of what you think 15:32:54 <debo_os> n0ano: apologies for joining late hence might sound repetitive :) 15:32:58 <jog0> along with any history of why tried it before 15:33:02 <boris-42> jog0 https://blueprints.launchpad.net/nova/+spec/no-db-scheduler 15:33:03 <n0ano> debo_os, NP 15:33:20 <boris-42> jog0 Ok will be done soon 15:33:44 <debo_os> in addition to the discussion ... one of my colleagues had written up a doc for the last summit and socialized it .. https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit# 15:34:13 <n0ano> boris-42, looks like your BP is mainly just a link to your doc 15:34:20 <debo_os> there was some good feedback and folks told him to get back a little later 15:34:22 <jog0> the main question is the mechanics of adding a new contracttual API for all projects that wires to the scheduler 15:34:30 <boris-42> n0ano yes 15:34:33 <debo_os> its a little like boris's doc 15:34:50 <boris-42> n0ano because in doc is described a lot of 15:34:54 <debo_os> boris-42: should we try to merge the 2 proposals 15:35:07 <boris-42> debo_os as I say in email yes of course 15:35:14 <boris-42> debo_os they are really close 15:36:46 <jog0> debo_os: don't use the word orchestrator in your proposal its an overloaded word 15:36:52 <boris-42> =)) 15:36:53 <jgallard> sorry for joining the conversation late, but, I like the idea of having a kind of Scheduler as a Service 15:37:28 <jog0> debo_os: also the doc needs an abstract/summery 15:37:34 <debo_os> jog0: agreed. I need to clean it up since my colleague wrote most of it and now left the OS world ... 15:37:37 <jog0> its TL;DR for me, skimming hte slides 15:37:37 <n0ano> ignoring current proposals I still don't see how to resolve the question - which is more scalable DB vs. RPM? 15:37:50 <debo_os> gr8 feedback 15:37:52 <boris-42> n0ano RPM 15:37:54 <n0ano> s/RPM/RPC 15:37:58 <boris-42> rpc* 15:38:10 <boris-42> for example 15:38:15 <boris-42> we have 10k nodes 15:38:29 <boris-42> we need to produce only 150req/sec 15:38:32 <boris-42> to all schedulers 15:38:46 <boris-42> so 150/SCHEDULER_AMOUNT 15:38:47 <boris-42> in sec 15:38:55 <jog0> n0ano: IMHO neither 15:39:07 <boris-42> even if you have only 3 schedulers for 10k nodes 15:39:09 <debo_os> boris-42 15:39:22 <boris-42> you will have to process only 50 req/sec 15:39:26 <boris-42> and it is nothing 15:39:35 <jog0> but then again we don't need to hash this out right now 15:39:44 <debo_os> boris-42: lets work to merge the 2 proposals. For starters we can add this doc for reference too 15:40:14 <boris-42> debo_os Ok it will be easier to merge it through emails then IRC chat =) 15:40:15 <n0ano> we don't have to answer now but I would like to know `how` to come to an answer, right now we're kind of in a `he said, she said' situation 15:40:30 <debo_os> boris-42: agreed! lets work to merge the 2 proposals over emails 15:40:32 <boris-42> n0ano we will make real benchmarks 15:40:38 <boris-42> n0ano on real deplyouments 15:40:48 <boris-42> n0ano will be it enough for you? 15:41:02 <boris-42> debo_os As I said "nods" 15:41:07 <n0ano> boris-42, I think we need that, measureable and reproducible would be great 15:41:39 <jog0> n0ano: ++ and any proposed idea has to show its better then the existing and why its better then other options 15:41:40 <boris-42> n0ano yes we are going to create some new project so everybody will be able to reproduce these things 15:42:02 <n0ano> Well, I'm hearing some actions out of all of this: 15:42:16 <n0ano> #action boris-42 & debo_os to merge proposals 15:42:42 <n0ano> #action come up with benchmark to measure DB vs. RPC scalability 15:42:42 <boris-42> yeha 15:42:50 <boris-42> n0ano 15:42:51 <boris-42> no 15:42:57 <boris-42> we are building whole system 15:43:01 <boris-42> to test real openstack 15:43:10 <boris-42> not just this case 15:43:44 <doron> guys, if I may jump in for a sec. Is there a reason to rule out an in-memory solution? 15:43:53 <n0ano> hmmm, notic `reproducible', if not then we're just providing anecdotal input 15:44:08 <debo_os> doron: not at all 15:44:11 <doron> ie- I agree db is problematic, but RPM will have it's price. 15:44:18 <debo_os> thats why we need to define APIs 1st instead of implementation 15:44:21 <doron> (RPC) 15:44:29 <debo_os> hence boris-42 and I need to merge the proposals .... 15:44:43 <boris-42> doron didn't understood question 15:45:05 <debo_os> doron: ideally if you run the scheuler as a service you could swap out the implementation and have an in memory solution for the state management 15:45:22 <doron> boris-42: did you consider of storing the needed data in-memory instead of a DB? 15:45:30 <boris-42> doron we will use in memory key-value storage 15:45:38 <boris-42> doron to avoid scheduler fanout 15:45:50 <doron> boris-42: gr8. this is what I had in mind 15:46:12 <boris-42> doron so each request to update host state will be processed only by one scheuler 15:46:35 <boris-42> doron this allows us to solve problem with too much for one scheduler rpc =) 15:46:48 <boris-42> just adding another schedulers in system 15:46:55 <doron> makes sense. I'll go over your merged doc 15:46:56 <debo_os> boris-42: I guess if we define the crisp update APIs etc ... the implementation could be separated and we will have all teh scaling featuers you want to do ..... 15:47:16 <boris-42> debo_os> 15:47:27 <boris-42> debo_os we already implemented this part of our scheduler 15:47:46 <boris-42> debo_os switch from Nova.DB to scheduler.DB 15:48:06 <doron> I'll take a look. sorry for the noise. 15:48:17 <boris-42> doron ok I think we will publish soon code 15:48:25 <boris-42> so I will add you as reviews 15:48:33 <doron> thanks! 15:48:50 <debo_os> boris-42: gr8 15:49:13 <n0ano> boris-42, now I'm confused, are you proposing that the scheduler implement it's own private DB 15:49:26 <boris-42> n0ano yes 15:49:32 <n0ano> I thought it was just maintaing state in it's memory 15:49:49 <boris-42> n0ano that produce fanout 15:50:00 <boris-42> n0ano and we spoke with Mike from BlueHost 15:50:32 <boris-42> n0ano and he said that it will be better to use fast key-value storage such as memcached 15:51:09 <n0ano> then it's not really a DB, it's just a backup for the internal memory storage 15:51:23 <boris-42> n0ano as I said we haven't enough time to update our docs, and they are updated today 15:51:33 <boris-42> n0ano we don't need "real" DB 15:51:40 <boris-42> n0ano for temp data 15:51:48 <doron> boris-42: +1 on no need for real db. 15:51:55 <jog0> if this isn't ready for review/discussion why are we here? 15:52:15 <boris-42> jog0 I am just answering on question 15:52:29 <boris-42> s/question/questions 15:52:42 <n0ano> jog0, I thought we were farther along and I wanted the answer to how do we decide DB vs. RPC 15:53:02 <boris-42> =) 15:53:42 <boris-42> sorry guys for misunderstanding =) 15:53:51 <n0ano> so, it's getting late, looks like boris-42 & debo_os need to update the doc, when that is done we can re-visit this 15:54:04 <boris-42> n0ano it is already updated 15:54:17 <n0ano> boris-42, what about merge with debo_os ? 15:54:29 <boris-42> n0ano I mean about using DB in scheduler 15:54:34 <boris-42> n0ano sorry=) 15:54:48 <boris-42> n0ano ok due next session we will update new combined doc 15:54:56 <n0ano> boris-42, NP 15:55:02 <boris-42> And I agree with jog0 that we should discuss in this moment about Havana 15:55:03 <boris-42> work 15:55:06 <boris-42> not I 15:55:29 <n0ano> let's see where we are next week, especially with an eye to what do we need/want to do for Havana 15:55:56 <n0ano> #topic opens 15:56:07 <n0ano> any opens in the remaining few minutes? 15:57:12 <n0ano> hearing silence, I'll thank everyone 15:57:20 <n0ano> #endmeeting