#openstack-meeting log

15:00:29 <n0ano> #startmeeting scheduler
15:00:30 <openstack> Meeting started Tue Jan  7 15:00:29 2014 UTC and is due to finish in 60 minutes.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:33 <openstack> The meeting name has been set to 'scheduler'
15:00:41 <n0ano> anyone here for the scheduler meeting?
15:00:53 <cloudon1> yes
15:00:53 <PaulMurray> hi n0ano, I'm here
15:01:26 * coolsvap : yes, for first time here
15:01:38 <n0ano> coolsvap, welcome (we don't bite)
15:01:51 <coolsvap> n0ano:  :)
15:02:32 <garyk> hi, i am here
15:02:33 <toan-tran> hi all
15:02:40 <alaski> hi
15:03:23 <n0ano> I wanted to talk about the the no-db scheduler but Boris doesn't seem to be around, let's change the order a bit
15:03:48 <n0ano> #topic Scheduler code forklift
15:04:30 <n0ano> if you've been following the mail list you should see that I think we have the new gantt tree up to where it is publicly usable
15:04:59 <n0ano> Jenkins passes (with non-voting test failures) so we can follow the normal procedures to approve and commit changes
15:05:18 <cloudon1> Can I please ask how the Gantt work implementing (I think) nova-oslo-scheduler relates to the forklift-scheduler-breakout BP proposed by Rob Collins and discussed before Christmas?
15:05:42 <n0ano> for anyone interested in working on it there's an etherpad with lots of details...
15:05:50 <n0ano> #link https://etherpad.openstack.org/p/icehouse-external-scheduler
15:06:24 <garyk> n0ano: i have an idea which may help us move forward with integration into other modules
15:06:35 <n0ano> cloudon1, I believe that gantt is that proposal, Rob Collins has been heavily involved in getting the tree up and running
15:06:38 <n0ano> garyk, go ahead
15:06:53 <cloudon1> Ah, OK, thanks, that wasn't clear
15:06:55 <garyk> would it be worthwhile for us to move the scheduling code to be objects. that is, we will have a object end point that will work via the db
15:07:31 <garyk> the the gantt could speak with a nova object module and say a cinder one and gather all of the data without having to worry about database migrations.
15:07:45 <n0ano> garyk, as a second step yes, the first step is to use all of the current APIs, exactly as they exist, and just get nova to call the code in the new tree
15:08:17 <garyk> why is it not part of the first step, that is, it will save us a huge migration from nova afterwards
15:08:41 <PaulMurray> n0ano there are patches in progress moving compute_node to objects
15:09:02 <PaulMurray> don't really understand how these and forklift coordinate?
15:09:05 <garyk> i think that is the scheduler code was objectified (sorry no idea how to describe it)
15:09:12 <n0ano> how would that save us, a migration would have to happen to move to objects no matter when we do it, using the current APIs just get's things going now
15:09:22 <garyk> then it will be easier to 'pluck' out and maintain moving forwards
15:09:46 <garyk> insetad of talking object version 1.0, we will talk object version 2.0
15:10:03 <garyk> i think that it is inline with the way that objects work with versions, say havana and icehosue
15:10:08 <n0ano> garyk, not seeing it, my plan is to track scheduler changes to the nova tree (small set of files, easily automated) and push those changes to gantt
15:10:45 <garyk> i understand that. but i think that the tracking should start from the time that we do the object support
15:11:07 <garyk> if you guys want i can do the object support in the up and coming week
15:11:20 <PaulMurray> at the scheduler
15:11:34 <garyk> yes. at the scheduler.
15:11:42 <n0ano> garyk, if you do the object support to the nova scheduler code then I intend to just pull that into gantt
15:11:46 <garyk> at the moment it talks directly with the db. maybe it should talk obejcts
15:11:50 <PaulMurray> ok - we are working on compute node end along with intel guys
15:12:18 <garyk> does it sound logical and reasonable to do the scheduler code as objects?
15:12:52 <garyk> PaulMurray: and this will be based on or done in conjuction with what you guys are doing
15:13:04 <coolsvap> garyk:  I think it does
15:13:29 <n0ano> garyk, ah, that's a question, I'm not opposed to that so making it an object would be OK
15:13:37 <PaulMurray> garyk: good - we are trying to get metrics/extra_resources/pci working
15:13:39 <alaski> garyk: I think it's reasonable to use objects
15:13:58 <garyk> ok, cool. i'll give it a bash to move it to objects.
15:14:15 <PaulMurray> garyk - great
15:14:40 <n0ano> anyway, as I see it the four main tasks for the gantt tree right now are...
15:14:50 <n0ano> 1) Get the unit tests working
15:15:02 <n0ano> 2) Get nova to call into the gantt tree
15:15:21 <n0ano> 3) Get the documentation working
15:15:52 <n0ano> 4) Start working on futures (RESTful API to make gantt a separate sevice usable by others than Nova)
15:16:18 <n0ano> The first 3 are critical and help on those areas would be greatly apprcieated
15:16:40 <n0ano> we can coordinate efforts throught the etherpad
15:18:16 <coolsvap> I can work on #3 to start with and co-ordinate someone in #1 & #2
15:19:01 <n0ano> coolsvap, great, I'd suggest you just tack a note onto the end of the etherpad about what you're doing and go for it
15:19:41 <garyk> i think that prior to the REST definitions we need to do some serious thinking
15:20:04 <garyk> neutron and cinder services both break with heavy load. do we want a similar model here?
15:20:28 <garyk> hope i am not stepping on peoples toes but we are getting a lot of flack in neutron about this
15:20:48 <n0ano> garyk, +1, I don't want to solve neutron and cinder now but I would like to address them with futures
15:21:15 <n0ano> garyk, neutron doesn't like the idea of a common scheduler, they'd prefer to do that work themselves?
15:21:42 <garyk> no, i am not talkking about adding scheduling for cinder and neutron resources. i am talking about learning about their painpoints and see how we can have a service that is interfaced from nova and will not be the achilles heal.
15:22:07 <n0ano> ah, agreed, no argument at all
15:23:06 <n0ano> I think making gantt a schedluer for nova, even moving to objects and separate service, is relatively easy, making it general enough for everyone will be a lot harder
15:24:23 <n0ano> anyway, the tree is open and there's lots to do, we can continue on the mailing list
15:24:37 <n0ano> moving on, is boris-42 or glikson here?
15:24:48 <toan-tran> just a question, what does neutron do with scheduler?
15:25:23 <n0ano> toan-tran, I belive they have their own, rudimentary scheduler, they need that capability
15:26:13 <n0ano> if a common service can satisfy their needs then fine, otherwise they would just stick with their own code
15:27:01 <n0ano> anyway, I wanted to talk no-db and multiple scheduler drivers but the people involved aren't here
15:27:35 <n0ano> #topic opens
15:28:42 <n0ano> I'm hearing silence
15:29:03 <alaski> I have a thought I can throw out there
15:29:20 <n0ano> alaski, throw away
15:30:06 <alaski> this is just some brainstorming from yesterday, but I was considering the idea of having a different approach to scheduling where there's a precalculated set of slots that can be filled and a scheduling request reserves a slot to send a build to
15:30:39 <alaski> so the heavy calculations for resource allocation become a background process basically
15:31:17 <alaski> well not allocation exactly, but defining what can fit where
15:31:36 <n0ano> hmm, my immediate thought is deadlock and latency, have you thought about those issues?
15:32:35 <garyk> alaski: that could work if the resources that you have are predictable
15:32:45 <alaski> latency could certainly be an issue, but maybe not enough to have an effect, would need some testing
15:32:58 <alaski> n0ano: I'm not sure where a deadlock would occur
15:33:21 <garyk> kind of like a memory pool. i think that things like ensembles and taking other services into account complicate issue a tad
15:33:24 <alaski> garyk: sure.  I'm thinking in terms of flavors right now, but maybe that's not enough to quantify it all
15:33:32 <toan-tran> alaski: which kind of slot are you referring to?
15:33:44 <PaulMurray> I guess you could maintain pools - like memory allocation - and free space that's allocated as usual if no pre-defined fits
15:33:46 <n0ano> you have a `set of slots', that means a limited set of resources
15:34:17 <PaulMurray> could catch a large percentage of requests
15:34:19 <alaski> a slot would basically equate to a flavor, an allocatable resource
15:34:23 <garyk> once you start to add things like affinity, anti affinity, and other constraints then it becomes more challeging
15:34:26 <alaski> set of resources
15:34:41 <garyk> if the types scheduled are homogenous then it is a very good idea
15:34:59 <garyk> you can do prefetching etc. and know ahead of time the placement stragtegy
15:36:08 <garyk> not sure if mike is around. i am sure that he understand the placement cost and complexity very well
15:36:24 <n0ano> alaski, sounds interesting, are you at the point of creating a blueprint on this yet?
15:37:24 <alaski> so there are definitely some holes in the idea, but I was curious how it might play out with some work.
15:37:45 <alaski> n0ano: not at the bp stage.  But I may prototype something to see how badly it falls apart, or doesn't
15:38:28 <n0ano> I'm not hearing any serious arguments against it (although things like affinity/anti-affinity need to be considered) so a prototype would be very interesting
15:38:49 <alaski> Mainly I was trying to see if some precalculations could help with speed, and this was my first concept
15:39:20 <toan-tran> alaski does it help if we have some kind of request histogram?
15:39:26 <alaski> cool, if/when I get something together I'll throw it up for additional eyes
15:39:55 <PaulMurray> very interesting alaski - thanks
15:39:58 <alaski> toan-tran: I can see some uses for that, but it may be a bit before it can be used
15:40:10 <toan-tran> because we have some ideas on the precalculation too
15:40:25 <toan-tran> our idea is to create some VM beforehand
15:40:38 <toan-tran> mainly based on popular flavors
15:41:11 <toan-tran> the VMs will be deployed beforehand in NFS
15:41:25 <toan-tran> so when a user requests for a VM
15:41:44 <toan-tran> these VMs can be directly "transferred" into hand of the user
15:41:49 <n0ano> toan-tran, not sure how you're going to pre-create a VM, what image would you start?
15:42:13 <toan-tran> n0ano: based on popukar image
15:42:40 <toan-tran> for instance, that's would be images for trial offer
15:42:50 <n0ano> toan-tran, maybe, I have to say I'm skeptical
15:43:00 <alaski> it's a very interesting idea, and has been lightly discussed before but there are some hurdles to overcome
15:43:47 <alaski> toan-tran: the tenant transfer of the vm is one of the bigger road blocks that needs to be solved
15:44:10 <toan-tran> yeah, that's where we're stuck here :)
15:45:22 <boris-42> n0ano hi
15:45:43 <n0ano> boris-42, great you're here
15:45:52 <n0ano> #topic no-db scheduler update
15:45:57 <n0ano> boris-42, any news?
15:46:09 <boris-42> n0ano happy new year=) and marry christmas=)
15:46:24 <boris-42> n0ano actually there is no updates, because of holidays
15:46:36 <boris-42> n0ano but we are going in 2-3 days to make some kind of live demo
15:46:44 <boris-42> n0ano and benchmark openstack at scale using Rally
15:46:58 <boris-42> n0ano old VS new scheduler
15:47:10 <n0ano> boris-42, NP, I've been blaming the holidays for all my delays, it's a common issue
15:47:25 <boris-42> n0ano yep but seems like Rally
15:47:32 <boris-42> n0ano is ready for such kinds of benchmarks
15:47:49 <boris-42> n0ano so will try to get some interesting results
15:48:24 <n0ano> will be interesting to see those results, are you holding off on the patches for nova ultil after that?
15:48:58 <boris-42> n0ano ?
15:49:23 <boris-42> n0ano there is a lot of work around, especially we will try to put data from cinder to nova scheduler
15:49:24 <n0ano> I thought you had changes for nova that were ready to be reviewed
15:49:48 <boris-42> n0ano https://review.openstack.org/#/c/45867/
15:49:58 <boris-42> https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:bp/no-db-scheduler,n,z
15:50:03 <boris-42> But there will be more
15:50:39 <boris-42> n0ano it works but doesn't pass all tests=)
15:50:53 <boris-42> n0ano I think we will quick all them
15:50:57 <boris-42> n0ano after 9 Jan
15:51:34 <n0ano> ah, that is clearly an issue, so when you fix the test failure you should be mergeable into the tree, right?
15:51:45 <boris-42> n0ano yep
15:52:14 <boris-42> n0ano but there will be more patches, to show how it will work with to data sources (cinder/nova)
15:52:29 <boris-42> n0ano and cleanup of old code (compute_nodes tables and db.api)
15:52:56 <boris-42> n0ano and refactoring of nova.api to get compute_nodes though scheduler
15:53:06 <n0ano> that's OK, cleanup of the old code is only to be expected
15:53:23 <boris-42> n0ano why only?)
15:53:38 <boris-42> n0ano how about merging cinder and nova scchedulers?)
15:53:45 <n0ano> s/only to be/to be
15:53:51 * n0ano bad english
15:53:58 <boris-42> ah
15:56:31 <n0ano> well, we're aproaching the top of the hour, unless there are any last minute issues
15:57:46 <n0ano> hearing silence, I'll thank everyone and we'll talk again next week
15:57:53 <n0ano> #endmeeting