14:00:21 <edleafe> #startmeeting nova_scheduler
14:00:21 <openstack> Meeting started Mon May 16 14:00:21 2016 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:24 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:26 <edleafe> Anyone around?
14:00:27 <jaypipes> o/
14:00:29 <Yingxin> o/
14:00:30 <mlavalle> o/
14:00:38 <mriedem> o/
14:01:03 <cdent> o/
14:01:15 <edleafe> #topic Specs
14:01:16 <doffm> o/
14:01:33 <edleafe> I listed several on the agenda
14:01:41 <edleafe> https://wiki.openstack.org/wiki/Meetings/NovaScheduler
14:01:59 <edleafe> Anyone want to discuss anything in particular about them?
14:02:11 <mriedem> on the logging one https://review.openstack.org/#/c/306647/
14:02:12 <edleafe> (except "please review!!")
14:02:27 <jaypipes> would be great to get agreement on the g-r-p spec.
14:02:34 <mriedem> i see a comment was made, something like, if we're going to do x, let's go all out and making it super complicated!
14:02:41 <jaypipes> edleafe: cdent is pushing a new rev on that shortly.
14:02:44 <edleafe> mriedem: that would have been me
14:02:48 <cdent> I can push it now
14:02:52 <mriedem> i'd prefer to keep the logging spec simple to start
14:02:59 <mriedem> and build on it
14:03:01 <jaypipes> mriedem: ++
14:03:02 * alex_xu waves late
14:03:07 <mriedem> rather than 10 new config options for logging
14:03:14 <edleafe> I was thinking one new config vs. two
14:03:33 <johnthetubaguy> I have pushed up a WIP spec for the distinct-subset-shard-scheduler spec (partly as requested by doffm) https://review.openstack.org/#/c/313519/
14:03:45 <doffm> Awesome.
14:05:03 <Yingxin> mriedem: maybe the recursive filters can help the logging simple, but it makes code harder to be understand as jaypipes says.
14:05:04 <johnthetubaguy> I tried to answer questions on the ordered filter scheduler spec (https://review.openstack.org/#/c/256323/) but not sure if I managed to answer the questions there
14:05:04 <edleafe> So on the logging, we're definitely not going to offer extra logging except for NoValidHost?
14:05:18 <johnthetubaguy> mriedem: +1 on keeping that logging simple to start with
14:05:27 <cdent> okay, new version of g-r-p just pushed https://review.openstack.org/300176 was a bit rushed, so many need some tweaks but I believe it represents the latest agreements
14:05:40 <Yingxin> cdent: great
14:05:51 <johnthetubaguy> edleafe: it just seems like we should improve NoValidHost first, then see where we are
14:05:55 <edleafe> cdent: cool - will review after the meeting
14:06:05 <mriedem> edleafe: i'd need to read back through it in detail to get it in my head again, but it started out pretty simple and then there were some asks for additional things, and it seems to have grown into a bit of a monster
14:06:18 <edleafe> johnthetubaguy: sure
14:06:40 <edleafe> mriedem: originally it was to scratch a single itch. Other itches were then revealed
14:06:53 <mriedem> sure, but
14:07:03 <mriedem> that first itch is the bloody one
14:07:10 <mriedem> all scabby and gooey
14:07:22 <johnthetubaguy> ewww
14:07:27 * edleafe is getting hungry for breakfast all of a sudden...
14:07:30 <mriedem> heh
14:07:42 <mriedem> anyway, my 2 cents, i can read through it again, but might be a few hours
14:08:20 <edleafe> My only concern is that if we plan on adding the "success" case, we should lay the groundwork for that, rather than rip things up later to fit it
14:08:42 <edleafe> We don't have to implement it in one shot, of course
14:08:43 <johnthetubaguy> I hate the idea of the success case, as I have seen that suck performance out of the system
14:09:04 <johnthetubaguy> I assume we did something dumb to cause that, but either way, it worries me
14:09:14 <edleafe> johnthetubaguy: heh, yeah
14:09:22 <mriedem> yeah, idk, generally i'm not in favor or building for future requirements
14:09:27 <edleafe> But it was an ops guy asking for it.
14:09:28 <mriedem> *of building
14:09:51 <mriedem> sure, but i don't see that as a recurring ask from several ops people over time
14:09:56 <mriedem> like figuring out the novalidhost thing
14:10:10 <mriedem> let's put it this way,
14:10:19 <mriedem> the more complicated this is, the less chance of anything getting done in newton
14:10:27 <mriedem> because if it's too complicated, it probably won't get review attention
14:10:29 <edleafe> sure, makes sense
14:11:09 <mriedem> just something to keep in mind - i don't know how easy it would be to add the success stuff later because i don't know what cfriesen is implementing
14:11:34 <edleafe> jaypipes: cdent: anything we need to discuss on g-r-p?
14:11:49 <edleafe> i.e., any points of contention?
14:12:02 <cdent> edleafe: we've been round several blocks on the form of the api
14:12:10 <jaypipes> edleafe: I'm reviewing it now. cdent just pushed a new rev.
14:12:38 <cdent> the current form is supposed to reflect the latest discussions between jay and me
14:12:52 <edleafe> Alright, anything else to discuss for specs?
14:13:07 <cdent> I think what needs to happen there is that we agree on the spirit and just get on with it...
14:13:34 <edleafe> cdent: but blue would be much prettier!
14:13:44 <edleafe> anyways
14:13:49 * cdent bombs edleafe's shed
14:13:54 <edleafe> #topic Reviews
14:14:05 <edleafe> Nobody added any to the schedule
14:14:11 <johnthetubaguy> so I guess I worry about the lack of attention the non-priority specs are getting, but we need to focus on the other stuff, so thats fair enough
14:14:28 <edleafe> johnthetubaguy: anything in particular?
14:15:28 <johnthetubaguy> edleafe: well my two specs, obviously ;) but one is to make things easier to configure, the other is needed for us to delete cells v1 (at least in my head)
14:15:47 <jaypipes> johnthetubaguy: which one makes things easier to configure?
14:15:52 <mriedem> cdent: jaypipes: keep in mind on the grp spec no one replied to alaski's question in ps12 https://review.openstack.org/#/c/300176/12/specs/newton/approved/generic-resource-pools.rst@214
14:16:03 <johnthetubaguy> jaypipes: the one you don't like: https://review.openstack.org/#/c/256323/7
14:16:35 <johnthetubaguy> jaypipes: its intent is to make this easier to configure, I just can't ever get the weighers to do what I want
14:17:09 <jaypipes> johnthetubaguy: yeah, the whole weigher system is a turd.
14:17:40 <jaypipes> johnthetubaguy: the problem I see with your two proposals is that they build *more* complexity into the existing system. I don't see them simplifying anything.
14:17:43 <johnthetubaguy> jaypipes: this seems like a way we could deprecate and remove that, but I don't want it to slow down other progress either
14:18:20 <doffm> johnthetubaguy: Longer discussion... but I think we could make this simpler if we separated out packing and spreading.
14:18:23 <johnthetubaguy> jaypipes: so the idea is you have a list of yes/no decisions, some are required, some are just a preference, in a defined order
14:18:28 <doffm> Rather than negative weights everywhere.
14:18:47 <doffm> Use your routing spec to take different requests to a packing scheduler if needed.
14:18:55 <johnthetubaguy> doffm: yeah, I like that, I added in alternatives for a follow up
14:19:07 <jaypipes> johnthetubaguy: on the sharded scheduler one, I'm all for partitioning (I proposed code for the scheduler over 2 years ago that added support for a scheduler process to only consider a subset of hosts). The issue is that your spec adds more complexity to the existing scheduler design by making placement policies and decision making different per subset of hosts.
14:19:49 <edleafe> jaypipes: wouldn't that be preferable? Each scheduler does its own thing its own way?
14:19:59 <johnthetubaguy> jaypipes: right, its what cells v1 gives us today
14:20:00 <jaypipes> edleafe: no.
14:20:30 <jaypipes> johnthetubaguy: it's a hack today, and basing scheduler decisions on the flavor would be a hack again.
14:20:33 <johnthetubaguy> so on metal, ssd and non-ssd get different rules, each one gets its own nova-scheduler process (or set of processes)
14:21:23 <johnthetubaguy> jaypipes: it seems to follow the pattern of how people think about their fleet, so it seems natural, granted their may be a better way to do it
14:21:48 <johnthetubaguy> it does conflate two things here though
14:21:58 <johnthetubaguy> different sets of hosts having different scheduling policies
14:22:05 <jaypipes> johnthetubaguy: the *process by which a placement decision is reached* is the same, though. It's just "find me hosts that have X resources available and Y capabilities". The same scheduler process/code should handle all scheduling decisions. Use partitioning/sharding to reduce the number of compute hosts that each scheduler process considers in order to increase the scale of the scheduler.
14:22:10 <johnthetubaguy> sharding of the hosts, along lines you already shard your capacity planning
14:22:38 <tpepper> there's ops interest in simplified/unified scheduling.  people may think of their fleet in disjoint terms because the different openstack schedulers force them to.
14:22:41 <tpepper> jaypipes: +1
14:22:43 <johnthetubaguy> jaypipes: yeah, thats true
14:22:55 <doffm> jaypipes: So are you OK with different scheduler algorithms for different host subsets? But not OK with different processes?
14:22:55 <johnthetubaguy> jaypipes: let me add that comment and re-work it
14:23:35 <johnthetubaguy> jaypipes: I actually like the separation, but it does seem wrong to require it
14:23:55 <jaypipes> johnthetubaguy: now... for some kinds of *requests*, there may be a case for having totally different schedulers. For instance, if one request required a bunch of network and/or storage affinity policies vs. another request that just wants a simple X resources Y capabilities, you could make the case for routing the two requests to two different schedulers entirely.
14:24:12 <jaypipes> doffm: ^^
14:24:24 <edleafe> jaypipes: so all schedulers have to have the same filters/weighers/whatever enabled?
14:24:28 <johnthetubaguy> jaypipes: I am thinking onmetal vs virt, FWIW
14:24:45 <doffm> Gotta go, meeting. Will read backlog later and leave comments on relevent specs. :)
14:25:13 <jaypipes> johnthetubaguy: on metal vs. virt is the EXACT same request. It's a request for a simple amount of X resources having Y capabilities. There's nothing special about the onmetal request vs. the virt one
14:25:58 <jaypipes> johnthetubaguy: just because the Nova compute_nodes table was hacked together with a (host, node) tuple just for Ironic/baremetal doesn't mean that the placement request is really any different.
14:26:10 <johnthetubaguy> jaypipes: we have different sets of weighting preferences for each of those, but thats quite a specific use case
14:26:30 <jaypipes> johnthetubaguy: please elaborate on those differences.
14:26:34 <edleafe> johnthetubaguy: that's what I was thinking - different packing/spread needs
14:26:38 <jaypipes> it would greatly help me understand the use case.
14:26:55 <jaypipes> edleafe: you can't "pack" or "spread" bare metal nodes.
14:27:09 <jaypipes> edleafe: the resource is indivisible.
14:27:19 <edleafe> jaypipes: that's just one type of sharding
14:27:32 <jaypipes> sorry, not understanding you...
14:27:33 <edleafe> I meant in general
14:27:59 <tpepper> edleafe: are you thinking in terms of chassis packing?  eg: I've rented onmetal nodes from ya'll that are little blade form factors which arguably share chassis resources...
14:28:00 <edleafe> jaypipes: different host groups might require different approaches
14:28:12 <jaypipes> mriedem: on alaski's question, I will answer him on IRC. It's a smple misunderstanding it looks like.
14:28:16 <johnthetubaguy> jaypipes: well baremetal is 1:1, so most of the virt worries are irrelevant, I would have to double check if there are things the other way around
14:28:48 <tpepper> err meant johnthetubaguy ^^ not edleafe
14:29:34 <jaypipes> johnthetubaguy: sorry, I'm not following you. The fact that Ironic nodes are indivisible resources doesn't change the fact that a scheduling decision for virt vs. bare metal flavors is made the same way.
14:29:38 <johnthetubaguy> jaypipes: in reality we have different teams controlling the config right now, which is handy, so I would have to go digging to work out what the other folks do
14:29:45 <edleafe> jaypipes: so the only gain would be that each scheduler has to consider fewer hosts?
14:29:58 <johnthetubaguy> jaypipes: well if you keep windows and linux VMs on separate boxes, it doesn't matter for onmetal
14:30:19 <johnthetubaguy> keeping tenants apart from each other, similarly irrelevant
14:30:39 <jaypipes> edleafe: "only gain"? :) err, yes, the gain from partitioning is dramatically increased concurrency/scale due to allowing different scheduler processes to operate on fewer hosts.
14:33:11 <edleafe> jaypipes: it seems that the DB filtering would do that more simply and with reduced complexity
14:33:34 <edleafe> i.e., a request for bare metal would return drastically fewer hosts
14:33:56 <jaypipes> edleafe: a combination of DB side filtering and partitioning each scheduler process to only look at some shard of hosts gets the best scale in my placement-bench harness results.
14:34:17 <edleafe> ok, let's continue the discussion on the spec
14:34:23 <johnthetubaguy> thats back to the one process vs multiple process, for me the big thing is where to shard, not how
14:34:45 <edleafe> Are there any reviews we need to discuss?
14:35:16 <johnthetubaguy> jaypipes: the idea with my suggested shard, is it has no capacity planning worries, which is tricky when you run mostly full
14:35:33 <johnthetubaguy> but anyways, lets take this back to the spec review
14:35:49 <johnthetubaguy> s/no/no additional/
14:36:16 <edleafe> #topic Opens
14:36:29 <edleafe> So anything else before we get back to reviewing specs?
14:36:56 <cdent> no sir
14:37:18 <edleafe> OK, then, here are 23 minutes of your time back
14:37:25 <edleafe> #endmeeting