15:00:29 <n0ano> #startmeeting scheduler 15:00:30 <openstack> Meeting started Tue Jan 7 15:00:29 2014 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:33 <openstack> The meeting name has been set to 'scheduler' 15:00:41 <n0ano> anyone here for the scheduler meeting? 15:00:53 <cloudon1> yes 15:00:53 <PaulMurray> hi n0ano, I'm here 15:01:26 * coolsvap : yes, for first time here 15:01:38 <n0ano> coolsvap, welcome (we don't bite) 15:01:51 <coolsvap> n0ano: :) 15:02:32 <garyk> hi, i am here 15:02:33 <toan-tran> hi all 15:02:40 <alaski> hi 15:03:23 <n0ano> I wanted to talk about the the no-db scheduler but Boris doesn't seem to be around, let's change the order a bit 15:03:48 <n0ano> #topic Scheduler code forklift 15:04:30 <n0ano> if you've been following the mail list you should see that I think we have the new gantt tree up to where it is publicly usable 15:04:59 <n0ano> Jenkins passes (with non-voting test failures) so we can follow the normal procedures to approve and commit changes 15:05:18 <cloudon1> Can I please ask how the Gantt work implementing (I think) nova-oslo-scheduler relates to the forklift-scheduler-breakout BP proposed by Rob Collins and discussed before Christmas? 15:05:42 <n0ano> for anyone interested in working on it there's an etherpad with lots of details... 15:05:50 <n0ano> #link https://etherpad.openstack.org/p/icehouse-external-scheduler 15:06:24 <garyk> n0ano: i have an idea which may help us move forward with integration into other modules 15:06:35 <n0ano> cloudon1, I believe that gantt is that proposal, Rob Collins has been heavily involved in getting the tree up and running 15:06:38 <n0ano> garyk, go ahead 15:06:53 <cloudon1> Ah, OK, thanks, that wasn't clear 15:06:55 <garyk> would it be worthwhile for us to move the scheduling code to be objects. that is, we will have a object end point that will work via the db 15:07:31 <garyk> the the gantt could speak with a nova object module and say a cinder one and gather all of the data without having to worry about database migrations. 15:07:45 <n0ano> garyk, as a second step yes, the first step is to use all of the current APIs, exactly as they exist, and just get nova to call the code in the new tree 15:08:17 <garyk> why is it not part of the first step, that is, it will save us a huge migration from nova afterwards 15:08:41 <PaulMurray> n0ano there are patches in progress moving compute_node to objects 15:09:02 <PaulMurray> don't really understand how these and forklift coordinate? 15:09:05 <garyk> i think that is the scheduler code was objectified (sorry no idea how to describe it) 15:09:12 <n0ano> how would that save us, a migration would have to happen to move to objects no matter when we do it, using the current APIs just get's things going now 15:09:22 <garyk> then it will be easier to 'pluck' out and maintain moving forwards 15:09:46 <garyk> insetad of talking object version 1.0, we will talk object version 2.0 15:10:03 <garyk> i think that it is inline with the way that objects work with versions, say havana and icehosue 15:10:08 <n0ano> garyk, not seeing it, my plan is to track scheduler changes to the nova tree (small set of files, easily automated) and push those changes to gantt 15:10:45 <garyk> i understand that. but i think that the tracking should start from the time that we do the object support 15:11:07 <garyk> if you guys want i can do the object support in the up and coming week 15:11:20 <PaulMurray> at the scheduler 15:11:34 <garyk> yes. at the scheduler. 15:11:42 <n0ano> garyk, if you do the object support to the nova scheduler code then I intend to just pull that into gantt 15:11:46 <garyk> at the moment it talks directly with the db. maybe it should talk obejcts 15:11:50 <PaulMurray> ok - we are working on compute node end along with intel guys 15:12:18 <garyk> does it sound logical and reasonable to do the scheduler code as objects? 15:12:52 <garyk> PaulMurray: and this will be based on or done in conjuction with what you guys are doing 15:13:04 <coolsvap> garyk: I think it does 15:13:29 <n0ano> garyk, ah, that's a question, I'm not opposed to that so making it an object would be OK 15:13:37 <PaulMurray> garyk: good - we are trying to get metrics/extra_resources/pci working 15:13:39 <alaski> garyk: I think it's reasonable to use objects 15:13:58 <garyk> ok, cool. i'll give it a bash to move it to objects. 15:14:15 <PaulMurray> garyk - great 15:14:40 <n0ano> anyway, as I see it the four main tasks for the gantt tree right now are... 15:14:50 <n0ano> 1) Get the unit tests working 15:15:02 <n0ano> 2) Get nova to call into the gantt tree 15:15:21 <n0ano> 3) Get the documentation working 15:15:52 <n0ano> 4) Start working on futures (RESTful API to make gantt a separate sevice usable by others than Nova) 15:16:18 <n0ano> The first 3 are critical and help on those areas would be greatly apprcieated 15:16:40 <n0ano> we can coordinate efforts throught the etherpad 15:18:16 <coolsvap> I can work on #3 to start with and co-ordinate someone in #1 & #2 15:19:01 <n0ano> coolsvap, great, I'd suggest you just tack a note onto the end of the etherpad about what you're doing and go for it 15:19:41 <garyk> i think that prior to the REST definitions we need to do some serious thinking 15:20:04 <garyk> neutron and cinder services both break with heavy load. do we want a similar model here? 15:20:28 <garyk> hope i am not stepping on peoples toes but we are getting a lot of flack in neutron about this 15:20:48 <n0ano> garyk, +1, I don't want to solve neutron and cinder now but I would like to address them with futures 15:21:15 <n0ano> garyk, neutron doesn't like the idea of a common scheduler, they'd prefer to do that work themselves? 15:21:42 <garyk> no, i am not talkking about adding scheduling for cinder and neutron resources. i am talking about learning about their painpoints and see how we can have a service that is interfaced from nova and will not be the achilles heal. 15:22:07 <n0ano> ah, agreed, no argument at all 15:23:06 <n0ano> I think making gantt a schedluer for nova, even moving to objects and separate service, is relatively easy, making it general enough for everyone will be a lot harder 15:24:23 <n0ano> anyway, the tree is open and there's lots to do, we can continue on the mailing list 15:24:37 <n0ano> moving on, is boris-42 or glikson here? 15:24:48 <toan-tran> just a question, what does neutron do with scheduler? 15:25:23 <n0ano> toan-tran, I belive they have their own, rudimentary scheduler, they need that capability 15:26:13 <n0ano> if a common service can satisfy their needs then fine, otherwise they would just stick with their own code 15:27:01 <n0ano> anyway, I wanted to talk no-db and multiple scheduler drivers but the people involved aren't here 15:27:35 <n0ano> #topic opens 15:28:42 <n0ano> I'm hearing silence 15:29:03 <alaski> I have a thought I can throw out there 15:29:20 <n0ano> alaski, throw away 15:30:06 <alaski> this is just some brainstorming from yesterday, but I was considering the idea of having a different approach to scheduling where there's a precalculated set of slots that can be filled and a scheduling request reserves a slot to send a build to 15:30:39 <alaski> so the heavy calculations for resource allocation become a background process basically 15:31:17 <alaski> well not allocation exactly, but defining what can fit where 15:31:36 <n0ano> hmm, my immediate thought is deadlock and latency, have you thought about those issues? 15:32:35 <garyk> alaski: that could work if the resources that you have are predictable 15:32:45 <alaski> latency could certainly be an issue, but maybe not enough to have an effect, would need some testing 15:32:58 <alaski> n0ano: I'm not sure where a deadlock would occur 15:33:21 <garyk> kind of like a memory pool. i think that things like ensembles and taking other services into account complicate issue a tad 15:33:24 <alaski> garyk: sure. I'm thinking in terms of flavors right now, but maybe that's not enough to quantify it all 15:33:32 <toan-tran> alaski: which kind of slot are you referring to? 15:33:44 <PaulMurray> I guess you could maintain pools - like memory allocation - and free space that's allocated as usual if no pre-defined fits 15:33:46 <n0ano> you have a `set of slots', that means a limited set of resources 15:34:17 <PaulMurray> could catch a large percentage of requests 15:34:19 <alaski> a slot would basically equate to a flavor, an allocatable resource 15:34:23 <garyk> once you start to add things like affinity, anti affinity, and other constraints then it becomes more challeging 15:34:26 <alaski> set of resources 15:34:41 <garyk> if the types scheduled are homogenous then it is a very good idea 15:34:59 <garyk> you can do prefetching etc. and know ahead of time the placement stragtegy 15:36:08 <garyk> not sure if mike is around. i am sure that he understand the placement cost and complexity very well 15:36:24 <n0ano> alaski, sounds interesting, are you at the point of creating a blueprint on this yet? 15:37:24 <alaski> so there are definitely some holes in the idea, but I was curious how it might play out with some work. 15:37:45 <alaski> n0ano: not at the bp stage. But I may prototype something to see how badly it falls apart, or doesn't 15:38:28 <n0ano> I'm not hearing any serious arguments against it (although things like affinity/anti-affinity need to be considered) so a prototype would be very interesting 15:38:49 <alaski> Mainly I was trying to see if some precalculations could help with speed, and this was my first concept 15:39:20 <toan-tran> alaski does it help if we have some kind of request histogram? 15:39:26 <alaski> cool, if/when I get something together I'll throw it up for additional eyes 15:39:55 <PaulMurray> very interesting alaski - thanks 15:39:58 <alaski> toan-tran: I can see some uses for that, but it may be a bit before it can be used 15:40:10 <toan-tran> because we have some ideas on the precalculation too 15:40:25 <toan-tran> our idea is to create some VM beforehand 15:40:38 <toan-tran> mainly based on popular flavors 15:41:11 <toan-tran> the VMs will be deployed beforehand in NFS 15:41:25 <toan-tran> so when a user requests for a VM 15:41:44 <toan-tran> these VMs can be directly "transferred" into hand of the user 15:41:49 <n0ano> toan-tran, not sure how you're going to pre-create a VM, what image would you start? 15:42:13 <toan-tran> n0ano: based on popukar image 15:42:40 <toan-tran> for instance, that's would be images for trial offer 15:42:50 <n0ano> toan-tran, maybe, I have to say I'm skeptical 15:43:00 <alaski> it's a very interesting idea, and has been lightly discussed before but there are some hurdles to overcome 15:43:47 <alaski> toan-tran: the tenant transfer of the vm is one of the bigger road blocks that needs to be solved 15:44:10 <toan-tran> yeah, that's where we're stuck here :) 15:45:22 <boris-42> n0ano hi 15:45:43 <n0ano> boris-42, great you're here 15:45:52 <n0ano> #topic no-db scheduler update 15:45:57 <n0ano> boris-42, any news? 15:46:09 <boris-42> n0ano happy new year=) and marry christmas=) 15:46:24 <boris-42> n0ano actually there is no updates, because of holidays 15:46:36 <boris-42> n0ano but we are going in 2-3 days to make some kind of live demo 15:46:44 <boris-42> n0ano and benchmark openstack at scale using Rally 15:46:58 <boris-42> n0ano old VS new scheduler 15:47:10 <n0ano> boris-42, NP, I've been blaming the holidays for all my delays, it's a common issue 15:47:25 <boris-42> n0ano yep but seems like Rally 15:47:32 <boris-42> n0ano is ready for such kinds of benchmarks 15:47:49 <boris-42> n0ano so will try to get some interesting results 15:48:24 <n0ano> will be interesting to see those results, are you holding off on the patches for nova ultil after that? 15:48:58 <boris-42> n0ano ? 15:49:23 <boris-42> n0ano there is a lot of work around, especially we will try to put data from cinder to nova scheduler 15:49:24 <n0ano> I thought you had changes for nova that were ready to be reviewed 15:49:48 <boris-42> n0ano https://review.openstack.org/#/c/45867/ 15:49:58 <boris-42> https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:bp/no-db-scheduler,n,z 15:50:03 <boris-42> But there will be more 15:50:39 <boris-42> n0ano it works but doesn't pass all tests=) 15:50:53 <boris-42> n0ano I think we will quick all them 15:50:57 <boris-42> n0ano after 9 Jan 15:51:34 <n0ano> ah, that is clearly an issue, so when you fix the test failure you should be mergeable into the tree, right? 15:51:45 <boris-42> n0ano yep 15:52:14 <boris-42> n0ano but there will be more patches, to show how it will work with to data sources (cinder/nova) 15:52:29 <boris-42> n0ano and cleanup of old code (compute_nodes tables and db.api) 15:52:56 <boris-42> n0ano and refactoring of nova.api to get compute_nodes though scheduler 15:53:06 <n0ano> that's OK, cleanup of the old code is only to be expected 15:53:23 <boris-42> n0ano why only?) 15:53:38 <boris-42> n0ano how about merging cinder and nova scchedulers?) 15:53:45 <n0ano> s/only to be/to be 15:53:51 * n0ano bad english 15:53:58 <boris-42> ah 15:56:31 <n0ano> well, we're aproaching the top of the hour, unless there are any last minute issues 15:57:46 <n0ano> hearing silence, I'll thank everyone and we'll talk again next week 15:57:53 <n0ano> #endmeeting