15:00:25 <n0ano> #startmeeting gantt 15:00:26 <openstack> Meeting started Tue Jul 22 15:00:25 2014 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:29 <openstack> The meeting name has been set to 'gantt' 15:00:35 <n0ano> anyone here to talk about the scheduler? 15:01:21 <bauzas> \o 15:01:49 <bauzas> (doing different things at the same time is definitely not a good option) 15:02:04 <n0ano> bauzas, but more exciting 15:02:27 <bauzas> well, concurrency is better than parralelism, eh ? 15:02:48 <n0ano> there's a difference between the two? 15:02:57 <bauzas> n0ano: damn! 15:03:18 <bauzas> n0ano: apples are oranges now ? :) 15:03:34 <n0ano> maybe you meant serialism is better than parralelism 15:03:58 <mspreitz> o/ 15:04:22 <bauzas> n0ano: google it ;) 15:04:25 <n0ano> the fun one is conference calls, you know when someone say `can you repeat that' they were reading their email and not listening 15:05:18 <n0ano> anyway, let's try and get started 15:05:32 <n0ano> #topic use cases for a separated scheduler 15:05:43 <bauzas> can you repeat that ?:D 15:05:53 <n0ano> bauzas, no :-) 15:06:25 <bauzas> #link https://etherpad.openstack.org/p/SchedulerUseCases 15:06:32 <n0ano> I looked over the etherpad and the common thread is there are valid use cases where the scheduler needs to consider info from multiple subsystems 15:06:53 <johnthetubaguy> seems reasonable 15:07:04 <bauzas> ping jaypipes, Yathi 15:07:27 <n0ano> indeed, to me it seems like there are valid use cases that we don't serve now but should serve in the future 15:07:29 <johnthetubaguy> what are we trying to decide from the use cases? 15:07:35 <bauzas> johnthetubaguy: +1 15:07:38 <mspreitz> there was also an outstanding challenge from jaypipes about the need for making a simultaneous decision 15:07:57 <Yathi> Hi 15:07:59 <n0ano> what mspreitz said 15:08:05 <johnthetubaguy> simultaneous, as in multiple service stats all together? 15:08:18 <mspreitz> simultaneous is ortho to cross-service, actually 15:08:20 <johnthetubaguy> live nova, cinder and neutron stats together 15:08:28 <Yathi> As you may recall from the last meeting, this effort was to show why a separate scheduler? 15:08:29 <bauzas> johnthetubaguy: IIRC, the question was "do we need a global scheduler ?" 15:08:29 <mspreitz> you can do simutaneous or serial placement even within one service 15:08:50 <Yathi> a separate scheduler will allow for a global scheduling 15:09:10 <bauzas> s/global/cross-services 15:09:13 <n0ano> Yathi, I would phrase it as `more easily allow' 15:09:31 <Yathi> and also make it 'more' easy for more complex cross-service scenarios 15:09:35 <n0ano> Yathi, but yes, to me there's clear justification for a separate/global scheduler 15:09:40 <mspreitz> Yes, I think the cross-service use cases are the most compelling for separating the scheduler 15:09:42 <Yathi> n0ano +1 15:09:43 <johnthetubaguy> mspreitz: you think more about multiple VMs placing together? 15:09:44 <bauzas> yeah, the alternative would be to hit sequentially each service's scheduler, as are doing filters 15:10:14 <johnthetubaguy> from my point of view, even if we add no features, moving the code out of nova is very useful 15:10:24 <mspreitz> johnthetubaguy: not sure I understand the sense of your question; if it is which use cases are more important for separation, I answered at the same time you asked 15:10:26 <n0ano> johnthetubaguy, +1 15:10:44 <mspreitz> OK, not ortho 15:10:46 <johnthetubaguy> mspreitz: sorry, the question was what you mean my simultanious 15:11:26 <mspreitz> By simultaneous I mean making one decision that covers placing several things. They might be all in one service, or they might not 15:11:38 <johnthetubaguy> mspreitz: cool, thanks 15:11:40 <Yathi> johnthetubaguy: moving the code is very useful as a first step I agree.. then I hope we can get more flexibility in allowing complex scenarios for resource placement 15:11:58 <johnthetubaguy> Yathi: the moving code is more about getting better review capacity and things 15:12:23 <Yathi> agreed that too. 15:12:43 <n0ano> I think we're all in `violent' agreement here so, absent an opposing view, I'd like to say we agree a split is good and let's move on 15:12:49 <bauzas> the rationale was also to correctly define what's a subcomponent is 15:12:59 <bauzas> n0ano: +1 15:13:14 <johnthetubaguy> n0ano: +1 15:13:23 <mspreitz> n0ano: +1 15:13:32 <Yathi> some of the usecases you see in https://etherpad.openstack.org/p/SchedulerUseCases - are these future complex scenarios - cross-services, placing group of VMs, etc. 15:13:38 <mspreitz> the biggest holdout was jaypipes, if I recall correctly 15:13:40 <Yathi> n0ano +1 on agreement here 15:14:08 <bauzas> mspreitz: yeah, that doesn't prevent to rediscuss that later, provided it's non blocking 15:14:16 <n0ano> OK, I count it as unanimous, we shouldn't forget about future use cases, they are important, but let's move on 15:14:26 <n0ano> #forklift status 15:14:29 <mspreitz> yeah, since he's not here, there is nothing productive to do but move on 15:14:35 <n0ano> #topic forklift status 15:14:49 <bauzas> so 15:14:51 <n0ano> bauzas, looks like progress on the scheduler client library? 15:14:57 <bauzas> n0ano: indeed 15:15:08 <bauzas> n0ano: we got one patch merged last week 15:15:14 <bauzas> n0ano: https://review.openstack.org/103503 15:15:25 <bauzas> schedulet no longer calls computes 15:15:55 <bauzas> about the client, there is also https://review.openstack.org/82778 with a +2 (thanks johnthetubaguy) 15:16:02 <bauzas> chasing another -core 15:16:37 <bauzas> anyway, other reviews are good, so people can still review that one 15:17:08 <n0ano> the other big issue is the BP to isolate DB access, johnthetubaguy has graciously agreed to sponsor the exception request, what else do we need to do about that 15:17:18 <bauzas> still about client, there is another patch for porting select_destnations 15:17:21 <bauzas> https://review.openstack.org/104556 15:17:22 <johnthetubaguy> n0ano: need another sponsor I think 15:17:36 <bauzas> johnthetubaguy: there is already ndipanov who volunteered 15:17:40 <johnthetubaguy> ah, cool 15:18:05 <johnthetubaguy> mikal should then send a mail tomorrow morning saying the exception is granted 15:18:10 <bauzas> johnthetubaguy: btw. sounds like the etherpad is not up-to-date 15:18:15 <johnthetubaguy> so we need to agree what we want to do I guess 15:18:20 <bauzas> johnthetubaguy: +2 15:18:22 <n0ano> that's good but it still means we only have this week to finalize the BP 15:18:30 <bauzas> n0ano: +1 15:18:38 <johnthetubaguy> bauzas: correct, mikal is going to do that in the morning, the ML is the definitive place for the decision 15:18:55 <johnthetubaguy> n0ano: yes, do we agree on the approach 15:19:13 <bauzas> yeah, the approach is to make use of the client for updating status for aggregates 15:19:15 <n0ano> good news is I have people who will do the code onece the BP is approved so if we work on that quickly we should be OK 15:19:27 <bauzas> if s/o disagrees, could he maybe provide an alternative ? 15:19:44 <bauzas> n0ano: I can volunteer too 15:19:55 <johnthetubaguy> the problem is how to deal with the aggregate stuff, jay has a nice approach to fix quite a few of the filters 15:19:56 <bauzas> n0ano: I was quite busy due to sched-lib 15:20:13 <johnthetubaguy> is the spec ready for review right now? 15:20:13 <bauzas> johnthetubaguy: as said in the spec, jay's approach doesn't cover all the things 15:20:20 <bauzas> johnthetubaguy: yep 15:20:35 <bauzas> johnthetubaguy: I proposed another approach, and gave all details 15:20:48 <bauzas> johnthetubaguy: I'm listening to alternatives too 15:20:50 <johnthetubaguy> bauzas: I don't disagree, just wondering if we agree on your current approach, there was some code up, and I didn't really agree with the code 15:21:06 <bauzas> johnthetubaguy: yeah, that's why I proposed another way in the spec 15:21:16 <bauzas> johnthetubaguy: the spec diverged from the PoC 15:21:18 <n0ano> johnthetubaguy, I'd like to ignore the code for the moment and get the BP approved, code can come later 15:21:24 <bauzas> n0ano: +1 15:21:44 <johnthetubaguy> n0ano: but the approach needs to be agreed in the spec, I am more meaning, do we agree on the approach 15:22:11 <bauzas> n0ano: your comment was about how to notify the scheduler that an aggregate has been deleted 15:22:13 <johnthetubaguy> like, what extra goes into select_desitnation and what goes into the host stats call 15:22:27 <bauzas> johnthetubaguy: that's in the spec 15:22:40 <n0ano> let's all review the BP - https://review.openstack.org/#/c/89893 and go from there. 15:22:55 <bauzas> n0ano: let's take the opportunity here to cover the deletion case 15:23:09 <johnthetubaguy> bauzas: n0ano: OK just will will not have another meeting before we miss the BP freeze 15:23:13 <bauzas> n0ano: I was assuming that updating with None a resource means we can delete the resource 15:23:16 <johnthetubaguy> I was trying to see if we all agree with the spec 15:23:31 <bauzas> I'll try to summarize here 15:24:11 <n0ano> bauzas, that might work, as long as the none update is guaranteed to happen 15:24:13 <bauzas> we update scheduler with aggregates ids 15:24:32 <bauzas> so the compute knows the list of aggs he's in 15:24:53 <bauzas> we also update the list of aggs in the sched thanks to update_resource_stats 15:25:33 <bauzas> warning paste bomb 15:25:55 <bauzas> - For each aggregate creation/update in the Compute API, calls scheduler.client.update_resource_stats(name, values) with the name is a tuple (aggregate, 'id') where id is the id of the aggregate, and where values is the metadata of the aggregate 15:26:11 <johnthetubaguy> yeah, still feels like we should get compute to report the AZ zone name, rather than the aggregate id, but not sure 15:26:13 <bauzas> - amend scheduler.client.update_resource_stats so if name is (agg, id), do nothing (will be honored for Gantt, not for Nova) 15:26:39 <bauzas> johnthetubaguy: AZs are just aggregate metadata, no ? 15:26:44 <bauzas> right ? 15:27:08 <bauzas> johnthetubaguy: the only difference is that an host can only be part of one AZ, while it can be part of multiple aggs 15:27:19 <johnthetubaguy> bauzas: but if you report the id, then the select_destinations needs to give all the aggregate metadata on every call, which is quite wasteful, and not that clean 15:27:34 <bauzas> johnthetubaguy: nope, see my proposal 15:27:43 <bauzas> johnthetubaguy: I changed that way 15:27:51 <bauzas> johnthetubaguy: we update the scheduler view 15:28:00 <johnthetubaguy> bauzas: who updates the scheduler view? 15:28:18 <bauzas> johnthetubaguy: probably API 15:28:26 <bauzas> I mean, nova-api 15:28:33 <bauzas> johnthetubaguy: IIRC 15:28:46 <bauzas> johnthetubaguy: when creating/updating agg 15:28:59 <bauzas> johnthetubaguy: within the handler, to be precise 15:29:20 <johnthetubaguy> bauzas: but how does it get the first bit of information when you switch from nova-scheduler to gantt, for example, its seems a big worring 15:29:21 <bauzas> johnthetubaguy: I'm considering an aggregate as another type of resource for the scheduler 15:29:46 <johnthetubaguy> bauzas: yes, but in that case, you need someone to own the aggregate, and we don't really have an owner of that right now 15:30:13 <bauzas> johnthetubaguy: nope, I'm just saying that a call has to be made thru thru the lib 15:30:26 <bauzas> johnthetubaguy: with nova, it would be a no-op 15:30:30 <johnthetubaguy> bauzas: if the compute node, just reports its own stats up, you avoid all the confusion 15:30:45 <bauzas> johnthetubaguy: including metadata so ? 15:30:51 <bauzas> johnthetubaguy: that's another alternative 15:30:58 <bauzas> johnthetubaguy: not just the ids 15:30:58 <johnthetubaguy> bauzas: erm, the point is to isolate the DB, so nova will have to start reporting the new stats 15:31:14 <johnthetubaguy> bauzas: then before the split, the filter are no longer access the nova db 15:31:21 <bauzas> johnthetubaguy: right, and that's why I'm considering aggregates as another kind of resource 15:31:30 <johnthetubaguy> they only access the little bit of the nova db they are allowed to, I mean 15:31:42 <bauzas> johnthetubaguy: oh, see your worries 15:31:57 <bauzas> johnthetubaguy: you missed my last 2 bullets in the spec 15:32:06 <bauzas> - modify HostManager so it builds aggregates info in HostState by querying107 all Aggregate objects. 15:32:10 <bauzas> - update scheduler filters so that it looks into HostState instead109 of aggregates 15:32:19 <bauzas> Later, when Gantt will be created, the sched.client.update will update112 another table in Gantt DB so HostManager will be able to query it instead of113 Aggregate objects. 15:33:11 <bauzas> as I said in the spec, there are also the instancegroups objects to care of 15:33:12 <johnthetubaguy> bauzas: thats worse, if we are not careful, now all deployments have bad performance, not just the ones using bad filters/weighters 15:33:56 * jaypipes reads back... sorry for being late :) 15:34:10 * johnthetubaguy thinks my IRC is running a bit slow, taking a while to see people's comments 15:34:31 <bauzas> johnthetubaguy: because Manager will call aggregates each time it needs to be updated ? 15:35:49 <bauzas> johnthetubaguy: I mean, I can understand we could have some problems, I'm just trying to find the best tradeoff 15:35:56 <johnthetubaguy> bauzas: well, those aggregate calls are expensive, but only happen we required now, at least, just don't want to change that 15:36:16 <bauzas> johnthetubaguy: so, we agree that we need to update scheduler 15:36:21 <johnthetubaguy> bauzas: that why I am wondering why the host can't just report the stats, directly, that the scheduler wants to see 15:36:54 <johnthetubaguy> bauzas: except for where it needs user based info, which must come in select_destinations, thats fine 15:37:01 <bauzas> johnthetubaguy: so, to be precise, you mean that the periodic task reports all aggregates that the host wants to see ? 15:37:15 <bauzas> s/wants to see/is part of (really tired tonight) 15:37:30 <bauzas> can't see the whole story 15:37:58 <bauzas> because we can't pass the full list of aggregates within select_destinations 15:38:11 <jaypipes> johnthetubaguy: ++ 15:38:14 <bauzas> johnthetubaguy: I mean, we need to update the scheduler with aggregates 15:38:22 <johnthetubaguy> bauzas: more that when the aggregate changes, computes are notified, like today, and only then update the local cache of the aggregate state, so the host just reports "az_zone:zoneB" or something like that 15:38:56 <bauzas> how the scheduler is able to scheduler if there are aggregates with no hosts yet ? 15:39:00 <bauzas> dammit 15:39:05 <bauzas> schedule 15:39:08 <jaypipes> johnthetubaguy: I would think that when an agg is updated, all things that would be interested in the change would be notified, including a call to a scheduler RPC API. 15:39:13 <jaypipes> or cast... 15:39:29 <bauzas> jaypipes: that was the idea of my proposal 15:39:33 <jaypipes> bauzas: cool. 15:39:43 <johnthetubaguy> jaypipes: yeah, we have bits of that already today, but yeah, we could just call the scheduler 15:39:47 <bauzas> jaypipes: but the difference is that it was a no-op thing now 15:40:02 <jaypipes> bauzas: sure, understood 15:40:07 <n0ano> so, are we arguing architecture or implementation right now? 15:40:16 <jaypipes> impl 15:40:26 <bauzas> n0ano: we're arguing spec proposal details :) 15:40:29 <johnthetubaguy> bauzas: confused, I don't get why its no-op now, given we really need this change now, for performance improvements? 15:40:33 <bauzas> n0ano: because we need to provide those 15:40:57 <n0ano> OK, I want to keep everyone focused on the `BP` right now, not the path to implement the BP 15:41:05 <n0ano> s/path/patch 15:41:06 <bauzas> johnthetubaguy: lemme try to summarize the thing 15:41:28 <bauzas> n0ano: that's really a design discussion 15:41:35 <bauzas> johnthetubaguy: so the proposal would be 15:42:10 <bauzas> johnthetubaguy: each time an aggregate is modified (C/U/D), a call is made to the scheduler saying 'eh, I'm an agg with this metadata' 15:42:36 <bauzas> the proposal would be to make use of the existing update_resource_stats method 15:44:02 <bauzas> johnthetubaguy: in the nova world, that would mean that within update_resource_stats, it would update system_metadata 15:44:12 <bauzas> or I don't know which other one 15:44:22 <johnthetubaguy> bauzas: it just doesn't feel quite right at the moment 15:44:58 <johnthetubaguy> bauzas: in case things get out of sync, no clear "owner" to fix that, but maybe I am over thinking it 15:45:11 <bauzas> johnthetubaguy: the real problem is that : 15:45:14 <johnthetubaguy> bauzas: I just like the node being responsible for reporting all its stats 15:45:37 <bauzas> johnthetubaguy: yeah, I like it too, but there are cases where aggregates have no hosts 15:45:47 <johnthetubaguy> bauzas: but that does create other issues, but it works nicely for the move from nova-scheduler to gantt, and such like, as compute restart gives you fresh data to start the day with 15:46:00 <bauzas> johnthetubaguy: so when filtering on, you wouldn't have a view of all the aggregates 15:46:09 <johnthetubaguy> bauzas: if there are no hosts in an aggregate, the scheduler doesn't need to know about the aggregate 15:46:21 <bauzas> johnthetubaguy: nope, I disagree 15:46:28 <bauzas> johnthetubaguy: take the AZ filter 15:46:37 <johnthetubaguy> bauzas: scheduler is picking hosts, just uses extra data from aggregates to filter some out 15:46:53 <bauzas> johnthetubaguy: if no hosts are in the aggregate having the AZ wanted, scheduler won't know that this AZ is existing 15:47:16 <johnthetubaguy> bauzas: if you want AZ 7 and there are no hosts in AZ 7, then nothing to pick, doesn't matter if you even know what AZ 7 is 15:48:02 <bauzas> johnthetubaguy: I'm sorry, but that's a chicken-and-egg problem :) 15:48:25 <bauzas> johnthetubaguy: if I'm creating a new AZ with no hosts, I can possibly still ask nova to boot one host to this AZ 15:48:26 <johnthetubaguy> bauzas: its not though, you are trying to pick a host, if there are none, there are none? 15:48:53 <bauzas> johnthetubaguy: oh I see 15:48:56 <johnthetubaguy> bauzas: if all the hosts have a stat saying which AZ they are in, you just filter on that right? 15:49:33 <bauzas> johnthetubaguy: right 15:49:43 <bauzas> johnthetubaguy: that sounds reasonable 15:50:09 <bauzas> johnthetubaguy: that only implies that computes report each aggregate they are in, including metadata 15:50:22 <johnthetubaguy> bauzas: yep, that was my proposal 15:50:23 <n0ano> I have to say, I'm with johnthetubaguy on this, if there's no hosts in an agg or AZ I don't see why the scheduler would need to know about it. 15:50:51 <johnthetubaguy> bauzas: well, actually not quite 15:51:01 <johnthetubaguy> bauzas: each host reports the AZ zone it thinks its in 15:51:07 <bauzas> n0ano: yeah, that's logical 15:51:31 <johnthetubaguy> bauzas: it doesn't say about its aggregate, or any metadata, it reports what AZ zone it is in 15:51:46 <bauzas> johnthetubaguy: we need to have aggregate metadata for AggregateImagePropertiesIsolation by example 15:51:55 <johnthetubaguy> bauzas: for the tenant filter, the host reports what teants it allows, and what tenants it dis-allows 15:52:03 <bauzas> or AggregateInstanceExtraSpecsFilter 15:52:39 <johnthetubaguy> bauzas: extra specs filter, you probably what to report the extra specs you allow on each host 15:52:50 <n0ano> sorry guys, we'll have to continue this discussion on the nova channel, there's one more topic we should cover today 15:52:58 <bauzas> n0ano: right 15:52:58 <johnthetubaguy> bauzas: the nice property is the scheduler view of the world is never ahead of what the compute node things it should be doing 15:53:25 <n0ano> #topic mid-cycle meetup 15:53:31 <bauzas> johnthetubaguy: we need to follow-up on that one :) 15:53:32 <johnthetubaguy> bauzas: Ok, will have to be later, I have a meeting straight after this one I am afraid 15:53:44 <bauzas> johnthetubaguy: tomorrow morning, you're free ? 15:54:00 <johnthetubaguy> bauzas: OK 15:54:04 <n0ano> I know we'll talk about forklift status/process, are there other scheduling issue we want to raise at the meetup? 15:54:07 <bauzas> johnthetubaguy: cool thnks 15:54:42 <Yathi> n0ano: I want to discuss the Solver Scheduler blueprint.. and its request for spec freeze exception 15:54:44 <bauzas> n0ano: there is the proposal from jaypipes about global claiming 15:54:54 <johnthetubaguy> n0ano: if we include the resource tracker move to scheduler, along with the current progress, thats probably key 15:55:00 <n0ano> bauzas, BTW, I think we'll be setting up a Google+ conference so you can join in (with difficulty) 15:55:08 <bauzas> n0ano: that would be awesome 15:55:12 <johnthetubaguy> well some of these other things are more like gantt mid cycle things though right? 15:55:36 <bauzas> johnthetubaguy: +1 15:55:39 <johnthetubaguy> there might be some time for a gantt session in a break out room I guess 15:55:39 <n0ano> johnthetubaguy, not sure what you mean by `gantt mid-cycle' 15:55:52 <johnthetubaguy> well nova will not let these features in 15:55:57 <johnthetubaguy> so its quite a quick nova discussion 15:55:59 <n0ano> ahh, gantt specific vs. nova issues 15:56:03 <johnthetubaguy> yeah 15:56:07 <johnthetubaguy> sorry, I was unclear 15:56:20 <jaypipes> johnthetubaguy: nova will not let what faetures in? 15:56:36 <n0ano> johnthetubaguy, NP but that's a good point, we should be focusing on nova issues at the nova meetup 15:57:04 <n0ano> and, fer sur, I can arrange a breakout room for a gantt specific session 15:57:08 <Yathi> n0ano: one topic for midcycle meetup: the scheduler subgroup believes the complex scheduling scenarios as shown in the usecases etherpad.. our solver scheduler BP tries to address that complexity.. and will fit in Gantt.. 15:57:57 <n0ano> Yathi, I haven't forgotten you, we can address that but I think that is a scheduler specific topic, not a nova one 15:58:04 <bauzas> n0ano: +1 15:58:26 <Yathi> n0ano: sure I agree.. it is scheduler specific.. 15:58:31 <bauzas> if you want, we can arrange a time for discussing about gantt specifics 15:58:36 <bauzas> so I could join 15:58:41 <n0ano> Yathi, I'm not against your solver proposal it's just I want to focus on the gantt split for the moment 15:58:52 <bauzas> that would allow us to discuss more easily 15:59:00 <bauzas> if we have a separate room 15:59:04 <n0ano> bauzas, +1 15:59:11 <bauzas> n0ano, you'll be my voice during the split status meeting 15:59:19 <Yathi> n0ano: that's great to know. I can imagine.. the priorities..totally agree.. but just want to get the basic framework code in.. 15:59:24 <johnthetubaguy> jaypipes: we kinda pushed back at the summit, and else where on lots of these, saying please split out gantt first 15:59:25 * n0ano has to work on my french accent 15:59:30 <bauzas> I'm still discussing with russellb to see if he can help too 16:00:25 <Yathi> n0ano: we have been trying to push the basic framework in (non-disruptive to gantt or nova) since HKG.. hence the push now.. 16:00:29 * bauzas would love to give his accent for free 16:00:37 <n0ano> it's approaching the top of the hour, we'll cancel this meeting next week (for obvious reasons), hope to see most of you in Oregon and we talk here in 2 weeks 16:00:49 <n0ano> Yathi, understood 16:00:56 <n0ano> #endmeeting