15:00:26 #startmeeting scheduler 15:00:27 Meeting started Tue Dec 3 15:00:26 2013 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:31 The meeting name has been set to 'scheduler' 15:00:34 hello all, anyone here for the scheduler meeting? 15:00:38 hi 15:00:41 hello 15:00:45 hi 15:00:59 hi 15:01:05 hi 15:01:47 are alaski and boris-42 around? 15:01:57 boris-42, alaski - are you here? 15:02:03 hi 15:02:27 garyk I'd like to get a mention in for my blueprint later 15:02:37 hi 15:02:42 all 15:02:46 just wanted to be sure because a lot of the discuss on the ML about thte scheduler forklift is related to items that they are working on. 15:03:00 PaulMurray: n0ano is leading :) 15:03:12 sorry - n0ano 15:03:25 PaulMurray: np, I am just heckling from the sidelines now 15:03:31 you just seem to authoritative :) 15:04:10 nah. 15:04:11 sorry guys, you can't do much when the battery on your mouse dies :-( 15:04:20 * n0ano remembers when a mouse was just a rodent 15:04:36 garyk hi 15:04:45 boris-42, excellent 15:04:52 #topic memcached based scheduler 15:04:54 hi 15:05:12 boris-42: just in time :) 15:05:17 yep 15:05:28 boris-42, so, do you have any update on the status of your scheduler changes 15:06:15 n0ano yep I have some 15:06:49 n0ano so we are now working on bunch of patches 15:06:57 n0ano let me find the links sorry 15:07:18 n0ano here is it https://review.openstack.org/#/c/45867/ 15:07:36 n0ano so there were a lot of changes from start of work 15:07:53 boris-42: are there any core guys signed up to review this? 15:08:02 n0ano first of all we are trying to add garbage collector (because previously memcahced objects weren't deleted) 15:08:43 i really think that if we are serious about the forkilft and this is one of the blocking items then we need russellb to try and ensure that there are cores involved here. 15:08:43 its worth setting a milestone on that BP, and we can get people signed up and approved 15:08:44 n0ano the second change is that we will soon finish work around sqlalchemy backend 15:08:46 garyk: no, the bp isn't sponsored at this point 15:08:48 what do you guys think? 15:09:09 alaski: thaks for the clarification 15:09:34 I think it's a great idea but I'm not core so how do we get someone like that on board? 15:09:38 johnthetubaguy done 15:09:42 as long as we keep the old and new present I am happy with the idea 15:09:48 boris-42: cool 15:10:01 ndipanov is looking at this patch 15:10:27 johnthetubaguy, since the basic idea is to remove references to the DB I'm not sure we can keep the old and new present 15:10:58 johnthetubaguy yep there should be switch between 2 approaches 15:11:10 johnthetubaguy in some moment new scheduler should use own DB 15:11:11 n0ano: but we just need "scheduler" update driver framework, and swap between the two 15:11:33 johnthetubaguy we just need to run new code of scheduler and it will collect all data 15:11:36 boris-42: exactly, the pull out of the scheduler needs the same entry points (I think) 15:11:49 ah, OK, that works 15:11:52 johnthetubaguy hmm not sure that I undertand=) 15:12:01 boris-42, which patch? 15:12:06 ndipanov scheduler 15:12:09 I wouldn't think any of the APIs change, just the internal implementation 15:12:11 ndipanov with memcached 15:12:27 boris-42: which blueprint is this, I don't see the no-db-scheduler one updated 15:12:32 boris-42, well... among others 15:12:37 johnthetubaguy https://blueprints.launchpad.net/nova/+spec/no-db-scheduler 15:12:52 boris-42, but yeah - will follow up 15:13:03 boris-42: need to set the milestone target to trigger to process 15:13:10 ndipanov thanks, we will fix all issues I hope soon 15:13:22 ndipanov there was a lot of refactoring since first solution 15:13:34 ndipanov name_spaces, garbage collector, mysql backend.. 15:13:38 boris-42, yeah I saw that there was good work being done 15:13:53 boris-42, but couldn't manage to catch up : 15:13:54 ( 15:14:45 ndipanov we will call you when we address all stuff 15:14:54 back to johnthetubaguy concerns, can we easily set this up to support both old & new schemes 15:14:55 ndipanov so you will save a lot of review time =) 15:15:12 n0ano actually yes 15:15:16 boris-42, I'll check it out to stay informed 15:15:31 n0ano we could update nova.db and scheduler.db in the same time 15:15:35 johnthetubaguy ^ 15:15:52 n0ano so we could move step by step from one scheduler to another 15:16:18 n0ano but I am not sure that it will be super easy to move from one Scheduler to another without this small steps (patch by patch) 15:16:25 so you'd make it switchable/configurable to update the memcached or DB or both 15:16:49 n0ano no I mean that first patch (where we are going to switch to new scheduler) 15:17:03 will still have the nova.db tables that are updated 15:17:18 and after it we will remove at all tables and use data from scheduler 15:17:21 yeah, I kinda would expect patches to add the entry points the new scheduler needs, then adding the alternatives for the new scheduler, and then extra tests in the gate for the new scheduler? 15:18:04 johnthetubaguy it will a bit hard to organize support of two schema but we could try 15:18:27 johnthetubaguy but at the end when we totally switch from one to another scheduler (in code) 15:18:51 johnthetubaguy updating from example havana to icehouse will require some arch changes 15:19:13 johnthetubaguy like switching from nova-network to neutron (but not so big hard and complex) 15:19:23 so we have to be aware that the first version will actually be a little slower (it modifies both the DB & memcached) but will be better in the future 15:19:38 hmm, I see, but why not create a brand new DB for the new stuff? 15:19:44 boris-42: will you merge the code after forklift? 15:20:02 shane-wang it will be easily for all us to make this change before 15:20:21 shane-wang because then it will be much easily to grab code from nova 15:20:46 johnthetubaguy I mean you have all data in nova.db and then data will be stored in another place 15:21:03 johnthetubaguy if you turnoff scheduler and run new version 15:21:21 boris-42 can't it be configurable instead of doing both 15:21:22 johnthetubaguy after 60 sec delay you will have all updated data in your new scheduler 15:21:22 ? 15:21:32 PaulMurray what exactly? 15:21:42 I'm not sure if I follow exactly 15:21:47 johnthetubaguy let me try to describe the order of patches 15:21:51 PaulMurray & 15:21:53 what I have in mind is that we follow trunk 15:22:02 boris-42: Yeah, that seems fine, I am just wanting to have both schedulers as an option, the data can be in different places 15:22:06 PaulMurray: are you concerned about migrating a running system, or about retaining ability to run old code in case new code has a problem? 15:22:08 it would be good to keep up to date without haveing to switch 15:22:14 while testing out new version 15:22:18 PaulMurray: +1 15:22:23 +1 15:22:28 1 sec 15:22:32 then when we are happy, we switch 15:22:43 eventually support for old way can be removed 15:22:45 +1 15:22:47 we need to have the scheduler pluggable, so people can test and switch as required 15:23:00 we don't need data migration, assuming it self-heals 15:23:01 Guys what I think is that there will be duplication of code 15:23:07 Paul: by swithching you mean in the configuration first? 15:23:12 Let me explain step by step 15:23:17 in code 15:23:19 then rescructure the code, so it shares common libs 15:23:26 1 patch: add synchronizer 15:23:59 2 patch: call not only db.compute_node_update but also new scheduler rpc method host_update 15:24:25 instead of 2, add a driver for "send_update" with two implementations? 15:24:52 johnthetubaguy, let him finish, I think there's more 15:24:55 3 patch: drop compute_nodes tables/ and db.compute_node_update call and get db.api data from scheduler 15:25:13 4 patch: remove db.api calls and call scheduler directly for host states 15:25:44 boris-42: I see, you're describing an update process, yes? 15:25:48 but you could have a "get_stats" driver too, with one from memcache, the other from DB right? 15:26:01 johnthetubaguy not sure 15:26:21 johnthetubaguy, what would be the advantage of that 15:26:23 johnthetubaguy what you mean by get_stats 15:26:46 n0ano: we have the old and new behind a common interface, just flick a switch between them 15:26:58 johnthetubaguy I mean we could just change step by step backend of scheduler without tracking anything 15:27:23 current scheme is to have both at the same time, no need for a switch 15:27:23 johnthetubaguy: are you thinking because stats is high overhead? but rest is same as usual? 15:27:29 or because new features? 15:28:06 PaulMurray johnthetubaguy there won't be anymore stats actually 15:28:19 PaulMurray johnthetubaguy we will store json like objects 15:28:35 yes - but I think he is suggesting two paths 15:28:38 to scheduler 15:28:43 wondering why 15:28:49 PaulMurray johnthetubaguy first strcture will be the same as now services, compute_node, compute_node_stats 15:28:58 boris: if we create a synchronizer inside nova, then after switching we have to remove it? 15:29:10 toan-tran it is not in nova 15:29:17 toan-tran it is inside scheduler 15:29:21 maybe its just I don't want two schedulers running at once, but maybe thats OK 15:29:26 toan-tran and the goal is to grab all code 15:29:45 johnthetubaguy I am not sure that there will be 2 scheduler in the same 15:29:54 boris: then who calls synchronizer? nova-conductor? and how? 15:29:58 johnthetubaguy, I don't think there's 2 schedulers, one scheduler storing data in 2 places 15:30:24 n0ano yep one scheduler that could store all data from all services 15:30:27 boris: do we have to create new interface for this synchonrizer? 15:30:40 toan-tran synchronizer is internal part of scheduler 15:30:57 toan-tran scheduler has special RPC method update_host_state that will call synchronizer 15:31:21 toan-tran this RPC method could be called from anywhere compute node, cinders nodes 15:31:22 well they sound like two implementations of the same thing, we generally encapsulate those inside driver objects, we could run both, thats just a multi driver driver. 15:31:24 correct me if I'm wrong, but the synchronizer keeps multiple schedulers in sync (needed since there will no longer be a DB to do that) 15:31:45 n0ano yep 15:31:58 n0ano all data is stored locally on each scheduler 15:32:06 n0ano plus mechanism that syncs states 15:32:31 n0ano one scheduler is processing one host_update message 15:32:41 n0ano and then all are synced using synchronizer 15:33:03 avoids the fan-out problem with schduler update messages 15:33:05 n0ano much more effective then just grab all data from DB on each request 15:33:15 n0ano yep exactly 15:33:29 n0ano add more schedulers and process more messages =) 15:33:46 this is all good, my only request is to have it optional in Icehouse 15:33:58 johnthetubaguy I am not a PTL 15:34:05 boris: ok so all schedulers will use this synchronizer to update its db 15:34:10 johnthetubaguy so I can't guarantee =) 15:34:23 toan-tran there is no DB 15:34:39 toan-tran, update it's internal data, not a database 15:34:42 boris-42 sure, but code wise, thats all I mean 15:34:42 toan-tran there is just local object in each scheduler that contains state of workld 15:34:55 johnthetubaguy we will try to implement it ASAP 15:34:58 =) 15:35:09 johnthetubaguy and then let's community/cores/PTL make decission 15:35:38 toan-tran and then synchronizer is used to update these local objects 15:35:54 boris-42: agreed, I just want to keep the options open 15:36:05 johnthetubaguy, how optional does this have to be - is `keep the current DB tables updated but don't use them' acceptable 15:36:27 I don't think they need updating 15:36:38 unless you have to read from them 15:36:40 n0ano it is just to make baby step implementation of new scheduler 15:36:55 note that we're not talking about changing any APIs, this is all internal implementation inside the scheduler 15:37:13 n0ano actually we are adding new RPC methods in scheduler 15:37:16 boris: ok, that's why it is absolutely necessary that your blueprint is merged before forking 15:37:32 toan-tran yep it will totally simplify work 15:37:32 boris-42, but that's just scheduler to scheduler, not seen by any other entity 15:37:41 toan-tran actually we already try to make oslo.scheduler 15:38:00 toan-tran and we failed because all internal stuff is deeply connected with project logic 15:38:06 toan-tran: or build it in the fork, maybe 15:38:26 johnthetubaguy it will be hard just believe me+) 15:38:37 johnthetubaguy we already tried to make this stuff, and failed=( 15:38:51 n0ano actually no, there is the new RPC method that is called from compute_nodes 15:38:57 n0ano or cinder or any other place 15:39:02 n0ano to update host_state 15:39:29 boris-42, hmmm, that will require carefull consideration then, that's a significant change 15:39:30 n0ano it is not API, it's internal stuff (like changes in conductor) 15:40:07 n0ano I don't think that such changes are pretty big, we don't change public API of services (like nova boot) 15:40:08 but there is already an RPC to do that, why do you need a new one 15:41:00 boris-42: ok for compute_node_update, then what does nova do for get compute's info? 15:41:03 n0ano where https://github.com/openstack/nova/blob/master/nova/scheduler/rpcapi.py ? 15:41:21 Some of this stuff will go behind objects - compute manager wont call conductor directly 15:41:23 toan-tran move will ask scheduler about info about all compute_nodes 15:41:45 PaulMurray compute manager want call conductor at all 15:41:47 the objects are remotable 15:42:00 PaulMurray conductor is nova stuff 15:42:18 PaulMurray I just don't understand why we need conductor or object heres 15:42:29 PaulMurray object are nova.db, conductor is nova stuff 15:42:36 What I mean is the db calls are behind objects 15:42:50 boris-42: that would be another challenge to consider if we want to maintain 2 versions of schedulers: nova and new one during transition 15:42:52 PaulMurray there is no DB calls in our new approach 15:42:54 you can switch implementation of objects pretty easy without changing main code 15:43:07 boris-42 currently nova does not need scheduler 15:43:08 boris-42: compute manager will call objects 15:43:10 yeah, the conductor calls the scheduler for which host to pick, are we changing that call? that seems bad? 15:43:16 toan-tran ? 15:43:19 boris-42 to search for data 15:43:33 toan-tran yep but it need scheduler to schedule=) 15:43:49 toan-tran so it should be just a bit of refactoring "where" to get data 15:44:03 johnthetubaguy there will be no condcutor 15:44:13 johnthetubaguy compute node calls scheduler dirrectly 15:44:23 johnthetubaguy make RPC call 15:44:40 I think we're thinking at two levels here 15:44:43 boris-42: which RPC call are you talking about on the compute node 15:44:43 boris-42 as the mater of fact, nova get direct access to its DB, so if we want it to call scheduler, we have to introduce a new method 15:44:47 compute node will deal with objects 15:44:47 boris-42: right, I agree with that, I think, its two points here though 15:44:54 lol) 15:44:58 4 questions / sec=) 15:44:58 how data moves areound is implemnetation detail 15:45:00 for instances, migration with destination does not need scheduler 15:45:04 honestly there are 2 issues …. foklifting the decision making (on where to place the resources) and forklifting the code to make that prpocess clean… During the last summit a bunch of us already demo-ed how the 1st problem has been already solved … 15:45:17 out here we should spend some time talking about both IMO 15:45:26 nova verifies directly if the destination can host the VM 15:45:37 by consulting its DB 15:46:03 toan-tran yep nova scheduler will response about all statuses of compute nodes 15:46:09 toan-tran so it should have methods 15:46:15 toan-tran RPC methods for that 15:46:21 toan-tran: the newer code uses an RPC call in the conductor to the scheudler to pick which node to use, the scheduler reads it from the nova db currently 15:46:22 boris-42: agreed 15:46:41 alaski We are going to add few RPC methods in scheduler 15:46:56 alaski so to store and keep all data about compute nodes inside scheduler instead of nova.db 15:47:27 boris-42 … could we make the store generic and not pegged to memcached 15:47:29 alaski this will allow us to make scheduler in depend of project data 15:47:30 we might want a different set of queues for the updates and the other scheduler requests? 15:47:38 boris-42: ok, just making sure. johnthetubaguy is talking about a call to the schduler from conductor, but you're talking about two different things 15:48:04 alaski I am talking actually about that conductur is unnessacery here 15:48:04 alaski: yeah, updates vs host picking, I am getting confused 15:48:15 boris-42: right, I agree 15:48:21 alaski we are able to call from compute.manager directly scheduler 15:48:27 +1 15:48:31 okay nice 15:48:41 debo_os we are not going to store all data inside memcached 15:49:05 debo_os memcached is used just to effectively sync states in different schedulers, and avoid fanaouts 15:49:20 boris-42: how will you deal with live-upgrade? 15:49:30 boris-42: thx … it would be nice to have a query-able state layer that can be used by anyone … 15:49:32 boris-42: in future 15:50:18 boris-42: you probably need ojects between computes and schedulers? 15:50:48 shane-wang, that's why the stepwise progression is to keep data in both memcache & DB to begin with, eventually deprecate/remove the DB 15:51:00 boris-42: does you design document have all of the stages that you plan to implement. it would be nice if it did then we could use it as a refernce. the bigger picture would be nice here 15:51:09 +1 15:51:17 garyk +1 15:51:30 things just seem very fluid at the moment and there seem to be a ton of edge cases that we are discussing 15:51:42 garyk we are working on it guys 15:51:55 great. please post a update when you have it. 15:52:02 garyk will be done 15:52:14 boris-42: got it … am guessing the doc will have the clean API to query the state engine (whatever is behind it) …. 15:52:16 I haven't seen any gaping holes, closing the edge cases is always the hard part 15:52:17 sorry to be a stick in the mud but with something complex like this a little extra detail helps 15:52:18 garyk I think that it could make understanding of our approach much more simplier 15:52:36 boris-42: agreed 15:52:39 +1 15:52:41 yep yep 15:52:48 +1 15:52:54 I hope that we will publish this week some stuff 15:52:59 request: could we please have a slightly higher level picture in the document too 15:53:08 debo_os sure 15:53:19 probably more to discuss on this but it's getting late, let's move on 15:53:20 you still have a few more days for a hanuka miracle :) 15:53:21 that will help a lot 15:53:27 document 15:53:44 I wanted to talk about the forklifting proposal from the mailing list but I don't know if we have time 15:53:52 #topic forlifting code 15:54:22 seems to be a lot of talk about this, has it been decided to move the scheduler code to a separate repo? 15:54:41 n0ano: i think that is the general idea 15:54:48 yep, pretty much 15:55:17 so, seems like the question is timing, who's going to do the work and when 15:55:30 and how many efforts do you expect? 15:55:31 i think that it is going to be non trivial without an api - but others seem to differ 15:55:39 I think we have a list of volunteers on the google doc page 15:55:55 A lot of people have signed up for the work. I believe it's just currently waiting on one of my blueprints to be finished, which I'm hoping will be by this week 15:56:08 garyk: I think its best to let people come to the conclusion on their own :) I agree 10000% 15:56:11 https://etherpad.openstack.org/p/icehouse-external-scheduler has a list of volunteers 15:56:28 do we want to get boris-42 changes in before or after (or does it matter) 15:56:44 Before we start the coding, I think we should have an approved doc 15:56:54 where we all agree on the plan … else its going to be messy 15:57:03 n0ano: honestly, I don't think it matters. It would be nive to get it in before, but it's probably not going to happen 15:57:23 alaski: +1 15:57:38 I just don't want all changes to the scheduler to stop for an indetermine time waiting for the restructure 15:57:43 could we please have the doc as a gate before ethe coding fest begins 15:57:52 debo_os, +1 15:57:57 +1 15:58:09 n0ano: definitely a concern, but the plan is for work to continue but it may have to happen in two places 15:58:10 +1 15:58:20 and reviewed/approived :) 15:58:31 can we add plan for works & some milestones on the doc ? we know that we must have boris' bluepritn merged first hand, 15:58:43 thus during that time we need to accomplish sth 15:58:43 debo_os: the etherpad I linked above has the plan, and there's a bp up for it 15:58:43 alaski, good answer, I can live with that (even if it does require a little more work) 15:59:11 alaski: as it been reviewed and approved? 15:59:25 by this subgroup 15:59:59 does everyone agree its going to be the right thing :) 16:00:19 if we avoid this formal get we will spend more time later 16:00:28 debo_os, well, I haven't heard anyone complain so it must be corret :-) 16:00:33 a blueprint review seems like the right way to agree the approach 16:00:43 everyone doesn't agree, but that's never going to happen anyways 16:00:44 I think we should just review the ether pad one meeting 16:00:52 sorry guys but we're out of time, let's continue on the email list 16:00:59 and say we are done 16:01:11 time :) 16:01:13 if people don't disagree by a deadline 16:01:18 we can't complain later 16:01:21 #endmeeting