15:00:25 #startmeeting gantt 15:00:26 Meeting started Tue Jul 22 15:00:25 2014 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:29 The meeting name has been set to 'gantt' 15:00:35 anyone here to talk about the scheduler? 15:01:21 \o 15:01:49 (doing different things at the same time is definitely not a good option) 15:02:04 bauzas, but more exciting 15:02:27 well, concurrency is better than parralelism, eh ? 15:02:48 there's a difference between the two? 15:02:57 n0ano: damn! 15:03:18 n0ano: apples are oranges now ? :) 15:03:34 maybe you meant serialism is better than parralelism 15:03:58 o/ 15:04:22 n0ano: google it ;) 15:04:25 the fun one is conference calls, you know when someone say `can you repeat that' they were reading their email and not listening 15:05:18 anyway, let's try and get started 15:05:32 #topic use cases for a separated scheduler 15:05:43 can you repeat that ?:D 15:05:53 bauzas, no :-) 15:06:25 #link https://etherpad.openstack.org/p/SchedulerUseCases 15:06:32 I looked over the etherpad and the common thread is there are valid use cases where the scheduler needs to consider info from multiple subsystems 15:06:53 seems reasonable 15:07:04 ping jaypipes, Yathi 15:07:27 indeed, to me it seems like there are valid use cases that we don't serve now but should serve in the future 15:07:29 what are we trying to decide from the use cases? 15:07:35 johnthetubaguy: +1 15:07:38 there was also an outstanding challenge from jaypipes about the need for making a simultaneous decision 15:07:57 Hi 15:07:59 what mspreitz said 15:08:05 simultaneous, as in multiple service stats all together? 15:08:18 simultaneous is ortho to cross-service, actually 15:08:20 live nova, cinder and neutron stats together 15:08:28 As you may recall from the last meeting, this effort was to show why a separate scheduler? 15:08:29 johnthetubaguy: IIRC, the question was "do we need a global scheduler ?" 15:08:29 you can do simutaneous or serial placement even within one service 15:08:50 a separate scheduler will allow for a global scheduling 15:09:10 s/global/cross-services 15:09:13 Yathi, I would phrase it as `more easily allow' 15:09:31 and also make it 'more' easy for more complex cross-service scenarios 15:09:35 Yathi, but yes, to me there's clear justification for a separate/global scheduler 15:09:40 Yes, I think the cross-service use cases are the most compelling for separating the scheduler 15:09:42 n0ano +1 15:09:43 mspreitz: you think more about multiple VMs placing together? 15:09:44 yeah, the alternative would be to hit sequentially each service's scheduler, as are doing filters 15:10:14 from my point of view, even if we add no features, moving the code out of nova is very useful 15:10:24 johnthetubaguy: not sure I understand the sense of your question; if it is which use cases are more important for separation, I answered at the same time you asked 15:10:26 johnthetubaguy, +1 15:10:44 OK, not ortho 15:10:46 mspreitz: sorry, the question was what you mean my simultanious 15:11:26 By simultaneous I mean making one decision that covers placing several things. They might be all in one service, or they might not 15:11:38 mspreitz: cool, thanks 15:11:40 johnthetubaguy: moving the code is very useful as a first step I agree.. then I hope we can get more flexibility in allowing complex scenarios for resource placement 15:11:58 Yathi: the moving code is more about getting better review capacity and things 15:12:23 agreed that too. 15:12:43 I think we're all in `violent' agreement here so, absent an opposing view, I'd like to say we agree a split is good and let's move on 15:12:49 the rationale was also to correctly define what's a subcomponent is 15:12:59 n0ano: +1 15:13:14 n0ano: +1 15:13:23 n0ano: +1 15:13:32 some of the usecases you see in https://etherpad.openstack.org/p/SchedulerUseCases - are these future complex scenarios - cross-services, placing group of VMs, etc. 15:13:38 the biggest holdout was jaypipes, if I recall correctly 15:13:40 n0ano +1 on agreement here 15:14:08 mspreitz: yeah, that doesn't prevent to rediscuss that later, provided it's non blocking 15:14:16 OK, I count it as unanimous, we shouldn't forget about future use cases, they are important, but let's move on 15:14:26 #forklift status 15:14:29 yeah, since he's not here, there is nothing productive to do but move on 15:14:35 #topic forklift status 15:14:49 so 15:14:51 bauzas, looks like progress on the scheduler client library? 15:14:57 n0ano: indeed 15:15:08 n0ano: we got one patch merged last week 15:15:14 n0ano: https://review.openstack.org/103503 15:15:25 schedulet no longer calls computes 15:15:55 about the client, there is also https://review.openstack.org/82778 with a +2 (thanks johnthetubaguy) 15:16:02 chasing another -core 15:16:37 anyway, other reviews are good, so people can still review that one 15:17:08 the other big issue is the BP to isolate DB access, johnthetubaguy has graciously agreed to sponsor the exception request, what else do we need to do about that 15:17:18 still about client, there is another patch for porting select_destnations 15:17:21 https://review.openstack.org/104556 15:17:22 n0ano: need another sponsor I think 15:17:36 johnthetubaguy: there is already ndipanov who volunteered 15:17:40 ah, cool 15:18:05 mikal should then send a mail tomorrow morning saying the exception is granted 15:18:10 johnthetubaguy: btw. sounds like the etherpad is not up-to-date 15:18:15 so we need to agree what we want to do I guess 15:18:20 johnthetubaguy: +2 15:18:22 that's good but it still means we only have this week to finalize the BP 15:18:30 n0ano: +1 15:18:38 bauzas: correct, mikal is going to do that in the morning, the ML is the definitive place for the decision 15:18:55 n0ano: yes, do we agree on the approach 15:19:13 yeah, the approach is to make use of the client for updating status for aggregates 15:19:15 good news is I have people who will do the code onece the BP is approved so if we work on that quickly we should be OK 15:19:27 if s/o disagrees, could he maybe provide an alternative ? 15:19:44 n0ano: I can volunteer too 15:19:55 the problem is how to deal with the aggregate stuff, jay has a nice approach to fix quite a few of the filters 15:19:56 n0ano: I was quite busy due to sched-lib 15:20:13 is the spec ready for review right now? 15:20:13 johnthetubaguy: as said in the spec, jay's approach doesn't cover all the things 15:20:20 johnthetubaguy: yep 15:20:35 johnthetubaguy: I proposed another approach, and gave all details 15:20:48 johnthetubaguy: I'm listening to alternatives too 15:20:50 bauzas: I don't disagree, just wondering if we agree on your current approach, there was some code up, and I didn't really agree with the code 15:21:06 johnthetubaguy: yeah, that's why I proposed another way in the spec 15:21:16 johnthetubaguy: the spec diverged from the PoC 15:21:18 johnthetubaguy, I'd like to ignore the code for the moment and get the BP approved, code can come later 15:21:24 n0ano: +1 15:21:44 n0ano: but the approach needs to be agreed in the spec, I am more meaning, do we agree on the approach 15:22:11 n0ano: your comment was about how to notify the scheduler that an aggregate has been deleted 15:22:13 like, what extra goes into select_desitnation and what goes into the host stats call 15:22:27 johnthetubaguy: that's in the spec 15:22:40 let's all review the BP - https://review.openstack.org/#/c/89893 and go from there. 15:22:55 n0ano: let's take the opportunity here to cover the deletion case 15:23:09 bauzas: n0ano: OK just will will not have another meeting before we miss the BP freeze 15:23:13 n0ano: I was assuming that updating with None a resource means we can delete the resource 15:23:16 I was trying to see if we all agree with the spec 15:23:31 I'll try to summarize here 15:24:11 bauzas, that might work, as long as the none update is guaranteed to happen 15:24:13 we update scheduler with aggregates ids 15:24:32 so the compute knows the list of aggs he's in 15:24:53 we also update the list of aggs in the sched thanks to update_resource_stats 15:25:33 warning paste bomb 15:25:55 - For each aggregate creation/update in the Compute API, calls scheduler.client.update_resource_stats(name, values) with the name is a tuple (aggregate, 'id') where id is the id of the aggregate, and where values is the metadata of the aggregate 15:26:11 yeah, still feels like we should get compute to report the AZ zone name, rather than the aggregate id, but not sure 15:26:13 - amend scheduler.client.update_resource_stats so if name is (agg, id), do nothing (will be honored for Gantt, not for Nova) 15:26:39 johnthetubaguy: AZs are just aggregate metadata, no ? 15:26:44 right ? 15:27:08 johnthetubaguy: the only difference is that an host can only be part of one AZ, while it can be part of multiple aggs 15:27:19 bauzas: but if you report the id, then the select_destinations needs to give all the aggregate metadata on every call, which is quite wasteful, and not that clean 15:27:34 johnthetubaguy: nope, see my proposal 15:27:43 johnthetubaguy: I changed that way 15:27:51 johnthetubaguy: we update the scheduler view 15:28:00 bauzas: who updates the scheduler view? 15:28:18 johnthetubaguy: probably API 15:28:26 I mean, nova-api 15:28:33 johnthetubaguy: IIRC 15:28:46 johnthetubaguy: when creating/updating agg 15:28:59 johnthetubaguy: within the handler, to be precise 15:29:20 bauzas: but how does it get the first bit of information when you switch from nova-scheduler to gantt, for example, its seems a big worring 15:29:21 johnthetubaguy: I'm considering an aggregate as another type of resource for the scheduler 15:29:46 bauzas: yes, but in that case, you need someone to own the aggregate, and we don't really have an owner of that right now 15:30:13 johnthetubaguy: nope, I'm just saying that a call has to be made thru thru the lib 15:30:26 johnthetubaguy: with nova, it would be a no-op 15:30:30 bauzas: if the compute node, just reports its own stats up, you avoid all the confusion 15:30:45 johnthetubaguy: including metadata so ? 15:30:51 johnthetubaguy: that's another alternative 15:30:58 johnthetubaguy: not just the ids 15:30:58 bauzas: erm, the point is to isolate the DB, so nova will have to start reporting the new stats 15:31:14 bauzas: then before the split, the filter are no longer access the nova db 15:31:21 johnthetubaguy: right, and that's why I'm considering aggregates as another kind of resource 15:31:30 they only access the little bit of the nova db they are allowed to, I mean 15:31:42 johnthetubaguy: oh, see your worries 15:31:57 johnthetubaguy: you missed my last 2 bullets in the spec 15:32:06 - modify HostManager so it builds aggregates info in HostState by querying107 all Aggregate objects. 15:32:10 - update scheduler filters so that it looks into HostState instead109 of aggregates 15:32:19 Later, when Gantt will be created, the sched.client.update will update112 another table in Gantt DB so HostManager will be able to query it instead of113 Aggregate objects. 15:33:11 as I said in the spec, there are also the instancegroups objects to care of 15:33:12 bauzas: thats worse, if we are not careful, now all deployments have bad performance, not just the ones using bad filters/weighters 15:33:56 * jaypipes reads back... sorry for being late :) 15:34:10 * johnthetubaguy thinks my IRC is running a bit slow, taking a while to see people's comments 15:34:31 johnthetubaguy: because Manager will call aggregates each time it needs to be updated ? 15:35:49 johnthetubaguy: I mean, I can understand we could have some problems, I'm just trying to find the best tradeoff 15:35:56 bauzas: well, those aggregate calls are expensive, but only happen we required now, at least, just don't want to change that 15:36:16 johnthetubaguy: so, we agree that we need to update scheduler 15:36:21 bauzas: that why I am wondering why the host can't just report the stats, directly, that the scheduler wants to see 15:36:54 bauzas: except for where it needs user based info, which must come in select_destinations, thats fine 15:37:01 johnthetubaguy: so, to be precise, you mean that the periodic task reports all aggregates that the host wants to see ? 15:37:15 s/wants to see/is part of (really tired tonight) 15:37:30 can't see the whole story 15:37:58 because we can't pass the full list of aggregates within select_destinations 15:38:11 johnthetubaguy: ++ 15:38:14 johnthetubaguy: I mean, we need to update the scheduler with aggregates 15:38:22 bauzas: more that when the aggregate changes, computes are notified, like today, and only then update the local cache of the aggregate state, so the host just reports "az_zone:zoneB" or something like that 15:38:56 how the scheduler is able to scheduler if there are aggregates with no hosts yet ? 15:39:00 dammit 15:39:05 schedule 15:39:08 johnthetubaguy: I would think that when an agg is updated, all things that would be interested in the change would be notified, including a call to a scheduler RPC API. 15:39:13 or cast... 15:39:29 jaypipes: that was the idea of my proposal 15:39:33 bauzas: cool. 15:39:43 jaypipes: yeah, we have bits of that already today, but yeah, we could just call the scheduler 15:39:47 jaypipes: but the difference is that it was a no-op thing now 15:40:02 bauzas: sure, understood 15:40:07 so, are we arguing architecture or implementation right now? 15:40:16 impl 15:40:26 n0ano: we're arguing spec proposal details :) 15:40:29 bauzas: confused, I don't get why its no-op now, given we really need this change now, for performance improvements? 15:40:33 n0ano: because we need to provide those 15:40:57 OK, I want to keep everyone focused on the `BP` right now, not the path to implement the BP 15:41:05 s/path/patch 15:41:06 johnthetubaguy: lemme try to summarize the thing 15:41:28 n0ano: that's really a design discussion 15:41:35 johnthetubaguy: so the proposal would be 15:42:10 johnthetubaguy: each time an aggregate is modified (C/U/D), a call is made to the scheduler saying 'eh, I'm an agg with this metadata' 15:42:36 the proposal would be to make use of the existing update_resource_stats method 15:44:02 johnthetubaguy: in the nova world, that would mean that within update_resource_stats, it would update system_metadata 15:44:12 or I don't know which other one 15:44:22 bauzas: it just doesn't feel quite right at the moment 15:44:58 bauzas: in case things get out of sync, no clear "owner" to fix that, but maybe I am over thinking it 15:45:11 johnthetubaguy: the real problem is that : 15:45:14 bauzas: I just like the node being responsible for reporting all its stats 15:45:37 johnthetubaguy: yeah, I like it too, but there are cases where aggregates have no hosts 15:45:47 bauzas: but that does create other issues, but it works nicely for the move from nova-scheduler to gantt, and such like, as compute restart gives you fresh data to start the day with 15:46:00 johnthetubaguy: so when filtering on, you wouldn't have a view of all the aggregates 15:46:09 bauzas: if there are no hosts in an aggregate, the scheduler doesn't need to know about the aggregate 15:46:21 johnthetubaguy: nope, I disagree 15:46:28 johnthetubaguy: take the AZ filter 15:46:37 bauzas: scheduler is picking hosts, just uses extra data from aggregates to filter some out 15:46:53 johnthetubaguy: if no hosts are in the aggregate having the AZ wanted, scheduler won't know that this AZ is existing 15:47:16 bauzas: if you want AZ 7 and there are no hosts in AZ 7, then nothing to pick, doesn't matter if you even know what AZ 7 is 15:48:02 johnthetubaguy: I'm sorry, but that's a chicken-and-egg problem :) 15:48:25 johnthetubaguy: if I'm creating a new AZ with no hosts, I can possibly still ask nova to boot one host to this AZ 15:48:26 bauzas: its not though, you are trying to pick a host, if there are none, there are none? 15:48:53 johnthetubaguy: oh I see 15:48:56 bauzas: if all the hosts have a stat saying which AZ they are in, you just filter on that right? 15:49:33 johnthetubaguy: right 15:49:43 johnthetubaguy: that sounds reasonable 15:50:09 johnthetubaguy: that only implies that computes report each aggregate they are in, including metadata 15:50:22 bauzas: yep, that was my proposal 15:50:23 I have to say, I'm with johnthetubaguy on this, if there's no hosts in an agg or AZ I don't see why the scheduler would need to know about it. 15:50:51 bauzas: well, actually not quite 15:51:01 bauzas: each host reports the AZ zone it thinks its in 15:51:07 n0ano: yeah, that's logical 15:51:31 bauzas: it doesn't say about its aggregate, or any metadata, it reports what AZ zone it is in 15:51:46 johnthetubaguy: we need to have aggregate metadata for AggregateImagePropertiesIsolation by example 15:51:55 bauzas: for the tenant filter, the host reports what teants it allows, and what tenants it dis-allows 15:52:03 or AggregateInstanceExtraSpecsFilter 15:52:39 bauzas: extra specs filter, you probably what to report the extra specs you allow on each host 15:52:50 sorry guys, we'll have to continue this discussion on the nova channel, there's one more topic we should cover today 15:52:58 n0ano: right 15:52:58 bauzas: the nice property is the scheduler view of the world is never ahead of what the compute node things it should be doing 15:53:25 #topic mid-cycle meetup 15:53:31 johnthetubaguy: we need to follow-up on that one :) 15:53:32 bauzas: Ok, will have to be later, I have a meeting straight after this one I am afraid 15:53:44 johnthetubaguy: tomorrow morning, you're free ? 15:54:00 bauzas: OK 15:54:04 I know we'll talk about forklift status/process, are there other scheduling issue we want to raise at the meetup? 15:54:07 johnthetubaguy: cool thnks 15:54:42 n0ano: I want to discuss the Solver Scheduler blueprint.. and its request for spec freeze exception 15:54:44 n0ano: there is the proposal from jaypipes about global claiming 15:54:54 n0ano: if we include the resource tracker move to scheduler, along with the current progress, thats probably key 15:55:00 bauzas, BTW, I think we'll be setting up a Google+ conference so you can join in (with difficulty) 15:55:08 n0ano: that would be awesome 15:55:12 well some of these other things are more like gantt mid cycle things though right? 15:55:36 johnthetubaguy: +1 15:55:39 there might be some time for a gantt session in a break out room I guess 15:55:39 johnthetubaguy, not sure what you mean by `gantt mid-cycle' 15:55:52 well nova will not let these features in 15:55:57 so its quite a quick nova discussion 15:55:59 ahh, gantt specific vs. nova issues 15:56:03 yeah 15:56:07 sorry, I was unclear 15:56:20 johnthetubaguy: nova will not let what faetures in? 15:56:36 johnthetubaguy, NP but that's a good point, we should be focusing on nova issues at the nova meetup 15:57:04 and, fer sur, I can arrange a breakout room for a gantt specific session 15:57:08 n0ano: one topic for midcycle meetup: the scheduler subgroup believes the complex scheduling scenarios as shown in the usecases etherpad.. our solver scheduler BP tries to address that complexity.. and will fit in Gantt.. 15:57:57 Yathi, I haven't forgotten you, we can address that but I think that is a scheduler specific topic, not a nova one 15:58:04 n0ano: +1 15:58:26 n0ano: sure I agree.. it is scheduler specific.. 15:58:31 if you want, we can arrange a time for discussing about gantt specifics 15:58:36 so I could join 15:58:41 Yathi, I'm not against your solver proposal it's just I want to focus on the gantt split for the moment 15:58:52 that would allow us to discuss more easily 15:59:00 if we have a separate room 15:59:04 bauzas, +1 15:59:11 n0ano, you'll be my voice during the split status meeting 15:59:19 n0ano: that's great to know. I can imagine.. the priorities..totally agree.. but just want to get the basic framework code in.. 15:59:24 jaypipes: we kinda pushed back at the summit, and else where on lots of these, saying please split out gantt first 15:59:25 * n0ano has to work on my french accent 15:59:30 I'm still discussing with russellb to see if he can help too 16:00:25 n0ano: we have been trying to push the basic framework in (non-disruptive to gantt or nova) since HKG.. hence the push now.. 16:00:29 * bauzas would love to give his accent for free 16:00:37 it's approaching the top of the hour, we'll cancel this meeting next week (for obvious reasons), hope to see most of you in Oregon and we talk here in 2 weeks 16:00:49 Yathi, understood 16:00:56 #endmeeting