14:59:45 <n0ano> #startmeeting scheduler
14:59:46 <openstack> Meeting started Tue Jun 11 14:59:45 2013 UTC.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:59:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:59:50 <openstack> The meeting name has been set to 'scheduler'
15:00:01 <n0ano> show of hands, anyone here for the scheduler meeting?
15:01:13 <senhuang> hi guys
15:02:38 <n0ano> hmm, slow start today, I'm getting a lack of enthusiasm from the crowd :-)
15:03:05 <PhilDay> I'm here - what do you want to talk aboui ?
15:03:29 <n0ano> I was hoping to talk about the scaling issues that jog0 brought up last week.
15:04:04 <pmurray> Hi I'm new to this group but I'll be joining in
15:04:23 <n0ano> pmurray, NP welcome
15:04:42 <PhilDay> Did jog0 have specific issues that he'd seen - or was it a general question ?
15:04:55 <n0ano> #topic scheduler scaling
15:05:15 <n0ano> unfortunately, he brought it up in the last 5 min. so we don't have a lot of detail...
15:05:56 <PhilDay> Are you here jog0 ?
15:05:59 <n0ano> the basic issues was BlueHost created a 16K node cluster, discovered the scheduler was not working and removed it in favor of a totally random scheduler
15:06:48 <PhilDay> Bluehost have a very specific use case though - they are in effect a trad. hosting company, and so they can in effect hand place thier VMs
15:06:51 <n0ano> I don't believe they did a thorough analysis of what was wrong but the guess would be the scheduler dealing with all the compute node updates.
15:06:56 <senhuang> this is interesting. it will be great if we can get more details on the bottleneck
15:07:35 <n0ano> senhuang, that was my thought, I'd like to know what is wrong to see if there's an implementation issue or something needs to be re-archtected
15:07:44 <PhilDay> So they didn't need a rich scheduler.   I didn't get the imporession that they spent long tying to work out the issues
15:07:55 <senhuang> n0nao: yes. agreed
15:08:00 <PhilDay> Most of thier effort went into DB access
15:08:24 <n0ano> PhilDay, I'm not that concerned with BH's specific use case, if there is a scaling issue I'd like to analyze it and see what can be done.
15:08:47 <PhilDay> So I think it would be wrong to conclude from Bluesacle that there is a specific scale issue
15:09:41 <n0ano> PhilDay, possibly, but it is a data point, do we know of any more traditional cloud use cases that have scaled beyond what the scheduler can handle?
15:09:42 <senhuang> PhilDay: what is special about a trad. hosting company in terms of scheduling?
15:10:04 <PhilDay> I did find one issue this week that I'm working on a fix for - which is that some filters really don't need to run for each instance in request - e.g. no need to evaluate the AZ filter 100 times for 100 instances - esp as it currently makes a db query for each host
15:10:37 <PhilDay> So I'm looking at making filters able to declare if they need to be run for each instance or just once
15:11:17 <PhilDay> compute_filter is another that it doesn't make sense to run multiple times
15:11:18 <n0ano> PhilDay, good idea but I'd be more concerned about compute node updates, seems like that would be an on-going overhead that could cause scaling issues.
15:12:00 <n0ano> also, I thought there was an effort to create a DB free compute node, is that still a goal or have we dropped that idea?
15:12:42 <PhilDay> As I understand it there are two update paths (not sure why).    The hosts send updates on capabitlies via mesages to the scheduler, but the resource counts are still updated via the DB.
15:13:10 <n0ano> we should really consolidate one or the other, two paths seem silly
15:13:29 <PhilDay> Not clear to me that there is value in the capability update messages as they stand, as they are pretty much fixed data.  You can filter the rate at which they send the updates.
15:13:32 <garyk> hi, sorry for joing late
15:13:44 <n0ano> also, not sure why capabilities are periodically sent since they are static
15:13:49 <senhuang> PhiDay: maybe the update is another way of heartbeat?
15:13:53 <PhilDay> Its two different sets of data; capabilites and capacity
15:14:08 <n0ano> I'd prefer to remove the DB update and put everything in the message to the scheduler
15:14:09 <PhilDay> Maybe, but its not used as such as far as I can see
15:14:28 <senhuang> agreed. basically qualitative and quantitative capabilities
15:15:26 <PhilDay> If you do it all in messages then you need some way for a scheduler to know at start up when it has all of the data
15:15:27 <n0ano> but it does create an obvious scale issue, have all compute nodes send a message with static data to the scheduler seems a little silly
15:16:06 <n0ano> PhilDay, how would that be different from getting the same data (possibly incomplete) from the DB
15:16:19 <PhilDay> I could be wrong - but that was my reading of the code.  For sure the host manager view of capacity is read from the DB at the start of each request
15:17:15 <n0ano> wouldn't be that hard to have the scheduler ignore nodes that haven't reported capacity yet
15:17:53 <PhilDay> But when your trying to stack to teh most loaded host you coudl get soem very wrong results durign that start up stage
15:18:35 <PhilDay> At least with teh DB you get the full scope of hosts, even if the data is stale.  And stale data is handled by the retry
15:18:41 <n0ano> PhilDay, but is a non-optimal scheduling decision, only during startup, that big a problem.
15:19:14 <PhilDay> Depends on your use case I guess ;-)   I'm sure there wil be someone its a problem for
15:19:56 <PhilDay> I think we have to be wary of trying to design out scale issues that we don't know for sure exist
15:20:09 <n0ano> I don't know, stale (e.g. incorrect data) in some sense is even worse to me than no data at all.
15:21:18 <n0ano> bottom line, 2 mechanisms (message & DB) seem wrong, we should pick one and use that
15:22:58 <PhilDay> Probably need to start with a post in openstack.dev to see if someone can explain why capabilities are sent by messages
15:23:14 <n0ano> PhilDay, good idea
15:23:37 <n0ano> #action n0ano to start thread on openstack-dev about messages to scheduler
15:24:16 <PhilDay> Perhaps the generic host state BP is the right place to mop up any changes around this ?
15:25:02 <n0ano> potentially, I'm interested in the area so I can look at that BP and see what's appropriate, do you have a specific link to it?
15:25:21 <PhilDay> https://blueprints.launchpad.net/nova/+spec/generic-host-state-for-scheduler
15:25:34 <n0ano> tnx, I'll check it out
15:26:15 <n0ano> we've talked about compute node capacity updates, are there any other obvious scaling issues in the scheduler we can think of?
15:28:14 <n0ano> then, without any empirical data (like what BH was seeing), we'll have to accept what the scheduler is doing so far.
15:29:04 <n0ano> #topic DB free compute node
15:29:18 * n0ano this is fun, being the chair means I can set the topics :-)
15:29:42 <n0ano> I thought there was a goal for this at one point in time, is that still true?
15:31:45 <n0ano> hmm, hearing silence, I guess I'll have to bring this up on the mailing list
15:31:53 <n0ano> #topic opens
15:32:05 <n0ano> Anyone have anything new they want to bring up today?
15:32:11 <haomaiwang> join #openstack-cinder
15:32:39 <n0ano> haomaiwang, forgot the `/' on that join :-)
15:32:51 <garyk> n0ano: the instance groups is coming along nicely and would be nice if we can get some help with the reviews
15:33:22 <n0ano> garyk, sure, you got a pointer I can add to ping people
15:33:47 <garyk> n0ano: that would be great - https://blueprints.launchpad.net/openstack/?searchtext=instance-group-api-extension . hopefully by the end of the week we'll have the cli support too
15:34:23 <n0ano> all - if you got the time, try and give a review on this
15:34:53 <garyk> n0ano: thanks
15:36:09 <n0ano> I'm hearing silence so I think we'll close a little early this week (more time for reviews :-)
15:36:44 <garyk> i need to run to swap the babysitter. sorry
15:37:07 <n0ano> OK, tnx everyone and we'll type at each other next week
15:37:12 <n0ano> #endmeeting