14:00:21 <edleafe> #startmeeting nova_scheduler 14:00:21 <openstack> Meeting started Mon May 16 14:00:21 2016 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:24 <openstack> The meeting name has been set to 'nova_scheduler' 14:00:26 <edleafe> Anyone around? 14:00:27 <jaypipes> o/ 14:00:29 <Yingxin> o/ 14:00:30 <mlavalle> o/ 14:00:38 <mriedem> o/ 14:01:03 <cdent> o/ 14:01:15 <edleafe> #topic Specs 14:01:16 <doffm> o/ 14:01:33 <edleafe> I listed several on the agenda 14:01:41 <edleafe> https://wiki.openstack.org/wiki/Meetings/NovaScheduler 14:01:59 <edleafe> Anyone want to discuss anything in particular about them? 14:02:11 <mriedem> on the logging one https://review.openstack.org/#/c/306647/ 14:02:12 <edleafe> (except "please review!!") 14:02:27 <jaypipes> would be great to get agreement on the g-r-p spec. 14:02:34 <mriedem> i see a comment was made, something like, if we're going to do x, let's go all out and making it super complicated! 14:02:41 <jaypipes> edleafe: cdent is pushing a new rev on that shortly. 14:02:44 <edleafe> mriedem: that would have been me 14:02:48 <cdent> I can push it now 14:02:52 <mriedem> i'd prefer to keep the logging spec simple to start 14:02:59 <mriedem> and build on it 14:03:01 <jaypipes> mriedem: ++ 14:03:02 * alex_xu waves late 14:03:07 <mriedem> rather than 10 new config options for logging 14:03:14 <edleafe> I was thinking one new config vs. two 14:03:33 <johnthetubaguy> I have pushed up a WIP spec for the distinct-subset-shard-scheduler spec (partly as requested by doffm) https://review.openstack.org/#/c/313519/ 14:03:45 <doffm> Awesome. 14:05:03 <Yingxin> mriedem: maybe the recursive filters can help the logging simple, but it makes code harder to be understand as jaypipes says. 14:05:04 <johnthetubaguy> I tried to answer questions on the ordered filter scheduler spec (https://review.openstack.org/#/c/256323/) but not sure if I managed to answer the questions there 14:05:04 <edleafe> So on the logging, we're definitely not going to offer extra logging except for NoValidHost? 14:05:18 <johnthetubaguy> mriedem: +1 on keeping that logging simple to start with 14:05:27 <cdent> okay, new version of g-r-p just pushed https://review.openstack.org/300176 was a bit rushed, so many need some tweaks but I believe it represents the latest agreements 14:05:40 <Yingxin> cdent: great 14:05:51 <johnthetubaguy> edleafe: it just seems like we should improve NoValidHost first, then see where we are 14:05:55 <edleafe> cdent: cool - will review after the meeting 14:06:05 <mriedem> edleafe: i'd need to read back through it in detail to get it in my head again, but it started out pretty simple and then there were some asks for additional things, and it seems to have grown into a bit of a monster 14:06:18 <edleafe> johnthetubaguy: sure 14:06:40 <edleafe> mriedem: originally it was to scratch a single itch. Other itches were then revealed 14:06:53 <mriedem> sure, but 14:07:03 <mriedem> that first itch is the bloody one 14:07:10 <mriedem> all scabby and gooey 14:07:22 <johnthetubaguy> ewww 14:07:27 * edleafe is getting hungry for breakfast all of a sudden... 14:07:30 <mriedem> heh 14:07:42 <mriedem> anyway, my 2 cents, i can read through it again, but might be a few hours 14:08:20 <edleafe> My only concern is that if we plan on adding the "success" case, we should lay the groundwork for that, rather than rip things up later to fit it 14:08:42 <edleafe> We don't have to implement it in one shot, of course 14:08:43 <johnthetubaguy> I hate the idea of the success case, as I have seen that suck performance out of the system 14:09:04 <johnthetubaguy> I assume we did something dumb to cause that, but either way, it worries me 14:09:14 <edleafe> johnthetubaguy: heh, yeah 14:09:22 <mriedem> yeah, idk, generally i'm not in favor or building for future requirements 14:09:27 <edleafe> But it was an ops guy asking for it. 14:09:28 <mriedem> *of building 14:09:51 <mriedem> sure, but i don't see that as a recurring ask from several ops people over time 14:09:56 <mriedem> like figuring out the novalidhost thing 14:10:10 <mriedem> let's put it this way, 14:10:19 <mriedem> the more complicated this is, the less chance of anything getting done in newton 14:10:27 <mriedem> because if it's too complicated, it probably won't get review attention 14:10:29 <edleafe> sure, makes sense 14:11:09 <mriedem> just something to keep in mind - i don't know how easy it would be to add the success stuff later because i don't know what cfriesen is implementing 14:11:34 <edleafe> jaypipes: cdent: anything we need to discuss on g-r-p? 14:11:49 <edleafe> i.e., any points of contention? 14:12:02 <cdent> edleafe: we've been round several blocks on the form of the api 14:12:10 <jaypipes> edleafe: I'm reviewing it now. cdent just pushed a new rev. 14:12:38 <cdent> the current form is supposed to reflect the latest discussions between jay and me 14:12:52 <edleafe> Alright, anything else to discuss for specs? 14:13:07 <cdent> I think what needs to happen there is that we agree on the spirit and just get on with it... 14:13:34 <edleafe> cdent: but blue would be much prettier! 14:13:44 <edleafe> anyways 14:13:49 * cdent bombs edleafe's shed 14:13:54 <edleafe> #topic Reviews 14:14:05 <edleafe> Nobody added any to the schedule 14:14:11 <johnthetubaguy> so I guess I worry about the lack of attention the non-priority specs are getting, but we need to focus on the other stuff, so thats fair enough 14:14:28 <edleafe> johnthetubaguy: anything in particular? 14:15:28 <johnthetubaguy> edleafe: well my two specs, obviously ;) but one is to make things easier to configure, the other is needed for us to delete cells v1 (at least in my head) 14:15:47 <jaypipes> johnthetubaguy: which one makes things easier to configure? 14:15:52 <mriedem> cdent: jaypipes: keep in mind on the grp spec no one replied to alaski's question in ps12 https://review.openstack.org/#/c/300176/12/specs/newton/approved/generic-resource-pools.rst@214 14:16:03 <johnthetubaguy> jaypipes: the one you don't like: https://review.openstack.org/#/c/256323/7 14:16:35 <johnthetubaguy> jaypipes: its intent is to make this easier to configure, I just can't ever get the weighers to do what I want 14:17:09 <jaypipes> johnthetubaguy: yeah, the whole weigher system is a turd. 14:17:40 <jaypipes> johnthetubaguy: the problem I see with your two proposals is that they build *more* complexity into the existing system. I don't see them simplifying anything. 14:17:43 <johnthetubaguy> jaypipes: this seems like a way we could deprecate and remove that, but I don't want it to slow down other progress either 14:18:20 <doffm> johnthetubaguy: Longer discussion... but I think we could make this simpler if we separated out packing and spreading. 14:18:23 <johnthetubaguy> jaypipes: so the idea is you have a list of yes/no decisions, some are required, some are just a preference, in a defined order 14:18:28 <doffm> Rather than negative weights everywhere. 14:18:47 <doffm> Use your routing spec to take different requests to a packing scheduler if needed. 14:18:55 <johnthetubaguy> doffm: yeah, I like that, I added in alternatives for a follow up 14:19:07 <jaypipes> johnthetubaguy: on the sharded scheduler one, I'm all for partitioning (I proposed code for the scheduler over 2 years ago that added support for a scheduler process to only consider a subset of hosts). The issue is that your spec adds more complexity to the existing scheduler design by making placement policies and decision making different per subset of hosts. 14:19:49 <edleafe> jaypipes: wouldn't that be preferable? Each scheduler does its own thing its own way? 14:19:59 <johnthetubaguy> jaypipes: right, its what cells v1 gives us today 14:20:00 <jaypipes> edleafe: no. 14:20:30 <jaypipes> johnthetubaguy: it's a hack today, and basing scheduler decisions on the flavor would be a hack again. 14:20:33 <johnthetubaguy> so on metal, ssd and non-ssd get different rules, each one gets its own nova-scheduler process (or set of processes) 14:21:23 <johnthetubaguy> jaypipes: it seems to follow the pattern of how people think about their fleet, so it seems natural, granted their may be a better way to do it 14:21:48 <johnthetubaguy> it does conflate two things here though 14:21:58 <johnthetubaguy> different sets of hosts having different scheduling policies 14:22:05 <jaypipes> johnthetubaguy: the *process by which a placement decision is reached* is the same, though. It's just "find me hosts that have X resources available and Y capabilities". The same scheduler process/code should handle all scheduling decisions. Use partitioning/sharding to reduce the number of compute hosts that each scheduler process considers in order to increase the scale of the scheduler. 14:22:10 <johnthetubaguy> sharding of the hosts, along lines you already shard your capacity planning 14:22:38 <tpepper> there's ops interest in simplified/unified scheduling. people may think of their fleet in disjoint terms because the different openstack schedulers force them to. 14:22:41 <tpepper> jaypipes: +1 14:22:43 <johnthetubaguy> jaypipes: yeah, thats true 14:22:55 <doffm> jaypipes: So are you OK with different scheduler algorithms for different host subsets? But not OK with different processes? 14:22:55 <johnthetubaguy> jaypipes: let me add that comment and re-work it 14:23:35 <johnthetubaguy> jaypipes: I actually like the separation, but it does seem wrong to require it 14:23:55 <jaypipes> johnthetubaguy: now... for some kinds of *requests*, there may be a case for having totally different schedulers. For instance, if one request required a bunch of network and/or storage affinity policies vs. another request that just wants a simple X resources Y capabilities, you could make the case for routing the two requests to two different schedulers entirely. 14:24:12 <jaypipes> doffm: ^^ 14:24:24 <edleafe> jaypipes: so all schedulers have to have the same filters/weighers/whatever enabled? 14:24:28 <johnthetubaguy> jaypipes: I am thinking onmetal vs virt, FWIW 14:24:45 <doffm> Gotta go, meeting. Will read backlog later and leave comments on relevent specs. :) 14:25:13 <jaypipes> johnthetubaguy: on metal vs. virt is the EXACT same request. It's a request for a simple amount of X resources having Y capabilities. There's nothing special about the onmetal request vs. the virt one 14:25:58 <jaypipes> johnthetubaguy: just because the Nova compute_nodes table was hacked together with a (host, node) tuple just for Ironic/baremetal doesn't mean that the placement request is really any different. 14:26:10 <johnthetubaguy> jaypipes: we have different sets of weighting preferences for each of those, but thats quite a specific use case 14:26:30 <jaypipes> johnthetubaguy: please elaborate on those differences. 14:26:34 <edleafe> johnthetubaguy: that's what I was thinking - different packing/spread needs 14:26:38 <jaypipes> it would greatly help me understand the use case. 14:26:55 <jaypipes> edleafe: you can't "pack" or "spread" bare metal nodes. 14:27:09 <jaypipes> edleafe: the resource is indivisible. 14:27:19 <edleafe> jaypipes: that's just one type of sharding 14:27:32 <jaypipes> sorry, not understanding you... 14:27:33 <edleafe> I meant in general 14:27:59 <tpepper> edleafe: are you thinking in terms of chassis packing? eg: I've rented onmetal nodes from ya'll that are little blade form factors which arguably share chassis resources... 14:28:00 <edleafe> jaypipes: different host groups might require different approaches 14:28:12 <jaypipes> mriedem: on alaski's question, I will answer him on IRC. It's a smple misunderstanding it looks like. 14:28:16 <johnthetubaguy> jaypipes: well baremetal is 1:1, so most of the virt worries are irrelevant, I would have to double check if there are things the other way around 14:28:48 <tpepper> err meant johnthetubaguy ^^ not edleafe 14:29:34 <jaypipes> johnthetubaguy: sorry, I'm not following you. The fact that Ironic nodes are indivisible resources doesn't change the fact that a scheduling decision for virt vs. bare metal flavors is made the same way. 14:29:38 <johnthetubaguy> jaypipes: in reality we have different teams controlling the config right now, which is handy, so I would have to go digging to work out what the other folks do 14:29:45 <edleafe> jaypipes: so the only gain would be that each scheduler has to consider fewer hosts? 14:29:58 <johnthetubaguy> jaypipes: well if you keep windows and linux VMs on separate boxes, it doesn't matter for onmetal 14:30:19 <johnthetubaguy> keeping tenants apart from each other, similarly irrelevant 14:30:39 <jaypipes> edleafe: "only gain"? :) err, yes, the gain from partitioning is dramatically increased concurrency/scale due to allowing different scheduler processes to operate on fewer hosts. 14:33:11 <edleafe> jaypipes: it seems that the DB filtering would do that more simply and with reduced complexity 14:33:34 <edleafe> i.e., a request for bare metal would return drastically fewer hosts 14:33:56 <jaypipes> edleafe: a combination of DB side filtering and partitioning each scheduler process to only look at some shard of hosts gets the best scale in my placement-bench harness results. 14:34:17 <edleafe> ok, let's continue the discussion on the spec 14:34:23 <johnthetubaguy> thats back to the one process vs multiple process, for me the big thing is where to shard, not how 14:34:45 <edleafe> Are there any reviews we need to discuss? 14:35:16 <johnthetubaguy> jaypipes: the idea with my suggested shard, is it has no capacity planning worries, which is tricky when you run mostly full 14:35:33 <johnthetubaguy> but anyways, lets take this back to the spec review 14:35:49 <johnthetubaguy> s/no/no additional/ 14:36:16 <edleafe> #topic Opens 14:36:29 <edleafe> So anything else before we get back to reviewing specs? 14:36:56 <cdent> no sir 14:37:18 <edleafe> OK, then, here are 23 minutes of your time back 14:37:25 <edleafe> #endmeeting