#openstack-meeting log

15:00:25 <n0ano> #startmeeting gantt
15:00:26 <openstack> Meeting started Tue Jul 22 15:00:25 2014 UTC and is due to finish in 60 minutes.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:29 <openstack> The meeting name has been set to 'gantt'
15:00:35 <n0ano> anyone here to talk about the scheduler?
15:01:21 <bauzas> \o
15:01:49 <bauzas> (doing different things at the same time is definitely not a good option)
15:02:04 <n0ano> bauzas, but more exciting
15:02:27 <bauzas> well, concurrency is better than parralelism, eh ?
15:02:48 <n0ano> there's a difference between the two?
15:02:57 <bauzas> n0ano: damn!
15:03:18 <bauzas> n0ano: apples are oranges now ? :)
15:03:34 <n0ano> maybe you meant serialism is better than parralelism
15:03:58 <mspreitz> o/
15:04:22 <bauzas> n0ano: google it ;)
15:04:25 <n0ano> the fun one is conference calls, you know when someone say `can you repeat that' they were reading their email and not listening
15:05:18 <n0ano> anyway, let's try and get started
15:05:32 <n0ano> #topic use cases for a separated scheduler
15:05:43 <bauzas> can you repeat that ?:D
15:05:53 <n0ano> bauzas, no :-)
15:06:25 <bauzas> #link https://etherpad.openstack.org/p/SchedulerUseCases
15:06:32 <n0ano> I looked over the etherpad and the common thread is there are valid use cases where the scheduler needs to consider info from multiple subsystems
15:06:53 <johnthetubaguy> seems reasonable
15:07:04 <bauzas> ping jaypipes, Yathi
15:07:27 <n0ano> indeed, to me it seems like there are valid use cases that we don't serve now but should serve in the future
15:07:29 <johnthetubaguy> what are we trying to decide from the use cases?
15:07:35 <bauzas> johnthetubaguy: +1
15:07:38 <mspreitz> there was also an outstanding challenge from jaypipes about the need for making a simultaneous decision
15:07:57 <Yathi> Hi
15:07:59 <n0ano> what mspreitz said
15:08:05 <johnthetubaguy> simultaneous, as in multiple service stats all together?
15:08:18 <mspreitz> simultaneous is ortho to cross-service, actually
15:08:20 <johnthetubaguy> live nova, cinder and neutron stats together
15:08:28 <Yathi> As you may recall from the last meeting, this effort was to show why a separate scheduler?
15:08:29 <bauzas> johnthetubaguy: IIRC, the question was "do we need a global scheduler ?"
15:08:29 <mspreitz> you can do simutaneous or serial placement even within one service
15:08:50 <Yathi> a separate scheduler will allow for a global scheduling
15:09:10 <bauzas> s/global/cross-services
15:09:13 <n0ano> Yathi, I would phrase it as `more easily allow'
15:09:31 <Yathi> and also make it 'more' easy for more complex cross-service scenarios
15:09:35 <n0ano> Yathi, but yes, to me there's clear justification for a separate/global scheduler
15:09:40 <mspreitz> Yes, I think the cross-service use cases are the most compelling for separating the scheduler
15:09:42 <Yathi> n0ano +1
15:09:43 <johnthetubaguy> mspreitz: you think more about multiple VMs placing together?
15:09:44 <bauzas> yeah, the alternative would be to hit sequentially each service's scheduler, as are doing filters
15:10:14 <johnthetubaguy> from my point of view, even if we add no features, moving the code out of nova is very useful
15:10:24 <mspreitz> johnthetubaguy: not sure I understand the sense of your question; if it is which use cases are more important for separation, I answered at the same time you asked
15:10:26 <n0ano> johnthetubaguy, +1
15:10:44 <mspreitz> OK, not ortho
15:10:46 <johnthetubaguy> mspreitz: sorry, the question was what you mean my simultanious
15:11:26 <mspreitz> By simultaneous I mean making one decision that covers placing several things.  They might be all in one service, or they might not
15:11:38 <johnthetubaguy> mspreitz: cool, thanks
15:11:40 <Yathi> johnthetubaguy:  moving the code is very useful as a first step I agree.. then I hope we can get more flexibility in allowing complex scenarios for resource placement
15:11:58 <johnthetubaguy> Yathi: the moving code is more about getting better review capacity and things
15:12:23 <Yathi> agreed that too.
15:12:43 <n0ano> I think we're all in `violent' agreement here so, absent an opposing view, I'd like to say we agree a split is good and let's move on
15:12:49 <bauzas> the rationale was also to correctly define what's a subcomponent is
15:12:59 <bauzas> n0ano: +1
15:13:14 <johnthetubaguy> n0ano: +1
15:13:23 <mspreitz> n0ano: +1
15:13:32 <Yathi> some of the usecases you see in https://etherpad.openstack.org/p/SchedulerUseCases - are these future complex scenarios - cross-services, placing group of VMs, etc.
15:13:38 <mspreitz> the biggest holdout was jaypipes, if I recall correctly
15:13:40 <Yathi> n0ano +1 on agreement here
15:14:08 <bauzas> mspreitz: yeah, that doesn't prevent to rediscuss that later, provided it's non blocking
15:14:16 <n0ano> OK, I count it as unanimous, we shouldn't forget about future use cases, they are important, but let's move on
15:14:26 <n0ano> #forklift status
15:14:29 <mspreitz> yeah, since he's not here, there is nothing productive to do but move on
15:14:35 <n0ano> #topic forklift status
15:14:49 <bauzas> so
15:14:51 <n0ano> bauzas, looks like progress on the scheduler client library?
15:14:57 <bauzas> n0ano: indeed
15:15:08 <bauzas> n0ano: we got one patch merged last week
15:15:14 <bauzas> n0ano: https://review.openstack.org/103503
15:15:25 <bauzas> schedulet no longer calls computes
15:15:55 <bauzas> about the client, there is also https://review.openstack.org/82778 with a +2 (thanks johnthetubaguy)
15:16:02 <bauzas> chasing another -core
15:16:37 <bauzas> anyway, other reviews are good, so people can still review that one
15:17:08 <n0ano> the other big issue is the BP to isolate DB access, johnthetubaguy has graciously agreed to sponsor the exception request, what else do we need to do about that
15:17:18 <bauzas> still about client, there is another patch for porting select_destnations
15:17:21 <bauzas> https://review.openstack.org/104556
15:17:22 <johnthetubaguy> n0ano: need another sponsor I think
15:17:36 <bauzas> johnthetubaguy: there is already ndipanov who volunteered
15:17:40 <johnthetubaguy> ah, cool
15:18:05 <johnthetubaguy> mikal should then send a mail tomorrow morning saying the exception is granted
15:18:10 <bauzas> johnthetubaguy: btw. sounds like the etherpad is not up-to-date
15:18:15 <johnthetubaguy> so we need to agree what we want to do I guess
15:18:20 <bauzas> johnthetubaguy: +2
15:18:22 <n0ano> that's good but it still means we only have this week to finalize the BP
15:18:30 <bauzas> n0ano: +1
15:18:38 <johnthetubaguy> bauzas: correct, mikal is going to do that in the morning, the ML is the definitive place for the decision
15:18:55 <johnthetubaguy> n0ano: yes, do we agree on the approach
15:19:13 <bauzas> yeah, the approach is to make use of the client for updating status for aggregates
15:19:15 <n0ano> good news is I have people who will do the code onece the BP is approved so if we work on that quickly we should be OK
15:19:27 <bauzas> if s/o disagrees, could he maybe provide an alternative ?
15:19:44 <bauzas> n0ano: I can volunteer too
15:19:55 <johnthetubaguy> the problem is how to deal with the aggregate stuff, jay has a nice approach to fix quite a few of the filters
15:19:56 <bauzas> n0ano: I was quite busy due to sched-lib
15:20:13 <johnthetubaguy> is the spec ready for review right now?
15:20:13 <bauzas> johnthetubaguy: as said in the spec, jay's approach doesn't cover all the things
15:20:20 <bauzas> johnthetubaguy: yep
15:20:35 <bauzas> johnthetubaguy: I proposed another approach, and gave all details
15:20:48 <bauzas> johnthetubaguy: I'm listening to alternatives too
15:20:50 <johnthetubaguy> bauzas: I don't disagree, just wondering if we agree on your current approach, there was some code up, and I didn't really agree with the code
15:21:06 <bauzas> johnthetubaguy: yeah, that's why I proposed another way in the spec
15:21:16 <bauzas> johnthetubaguy: the spec diverged from the PoC
15:21:18 <n0ano> johnthetubaguy, I'd like to ignore the code for the moment and get the BP approved, code can come later
15:21:24 <bauzas> n0ano: +1
15:21:44 <johnthetubaguy> n0ano: but the approach needs to be agreed in the spec, I am more meaning, do we agree on the approach
15:22:11 <bauzas> n0ano: your comment was about how to notify the scheduler that an aggregate has been deleted
15:22:13 <johnthetubaguy> like, what extra goes into select_desitnation and what goes into the host stats call
15:22:27 <bauzas> johnthetubaguy: that's in the spec
15:22:40 <n0ano> let's all review the BP - https://review.openstack.org/#/c/89893 and go from there.
15:22:55 <bauzas> n0ano: let's take the opportunity here to cover the deletion case
15:23:09 <johnthetubaguy> bauzas: n0ano: OK just will will not have another meeting before we miss the BP freeze
15:23:13 <bauzas> n0ano: I was assuming that updating with None a resource means we can delete the resource
15:23:16 <johnthetubaguy> I was trying to see if we all agree with the spec
15:23:31 <bauzas> I'll try to summarize here
15:24:11 <n0ano> bauzas, that might work, as long as the none update is guaranteed to happen
15:24:13 <bauzas> we update scheduler with aggregates ids
15:24:32 <bauzas> so the compute knows the list of aggs he's in
15:24:53 <bauzas> we also update the list of aggs in the sched thanks to update_resource_stats
15:25:33 <bauzas> warning paste bomb
15:25:55 <bauzas> - For each aggregate creation/update in the Compute API, calls    scheduler.client.update_resource_stats(name, values) with the name is    a tuple (aggregate, 'id') where id is the id of the aggregate, and where    values is the metadata of the aggregate
15:26:11 <johnthetubaguy> yeah, still feels like we should get compute to report the AZ zone name, rather than the aggregate id, but not sure
15:26:13 <bauzas> - amend scheduler.client.update_resource_stats so if name is (agg, id), do   nothing (will be honored for Gantt, not for Nova)
15:26:39 <bauzas> johnthetubaguy: AZs are just aggregate metadata, no ?
15:26:44 <bauzas> right ?
15:27:08 <bauzas> johnthetubaguy: the only difference is that an host can only be part of one AZ, while it can be part of multiple aggs
15:27:19 <johnthetubaguy> bauzas: but if you report the id, then the select_destinations needs to give all the aggregate metadata on every call, which is quite wasteful, and not that clean
15:27:34 <bauzas> johnthetubaguy: nope, see my proposal
15:27:43 <bauzas> johnthetubaguy: I changed that way
15:27:51 <bauzas> johnthetubaguy: we update the scheduler view
15:28:00 <johnthetubaguy> bauzas: who updates the scheduler view?
15:28:18 <bauzas> johnthetubaguy: probably API
15:28:26 <bauzas> I mean, nova-api
15:28:33 <bauzas> johnthetubaguy: IIRC
15:28:46 <bauzas> johnthetubaguy: when creating/updating agg
15:28:59 <bauzas> johnthetubaguy: within the handler, to be precise
15:29:20 <johnthetubaguy> bauzas: but how does it get the first bit of information when you switch from nova-scheduler to gantt, for example, its seems a big worring
15:29:21 <bauzas> johnthetubaguy: I'm considering an aggregate as another type of resource for the scheduler
15:29:46 <johnthetubaguy> bauzas: yes, but in that case, you need someone to own the aggregate, and we don't really have an owner of that right now
15:30:13 <bauzas> johnthetubaguy: nope, I'm just saying that a call has to be made thru thru the lib
15:30:26 <bauzas> johnthetubaguy: with nova, it would be a no-op
15:30:30 <johnthetubaguy> bauzas: if the compute node, just reports its own stats up, you avoid all the confusion
15:30:45 <bauzas> johnthetubaguy: including metadata so ?
15:30:51 <bauzas> johnthetubaguy: that's another alternative
15:30:58 <bauzas> johnthetubaguy: not just the ids
15:30:58 <johnthetubaguy> bauzas: erm, the point is to isolate the DB, so nova will have to start reporting the new stats
15:31:14 <johnthetubaguy> bauzas: then before the split, the filter are no longer access the nova db
15:31:21 <bauzas> johnthetubaguy: right, and that's why I'm considering aggregates as another kind of resource
15:31:30 <johnthetubaguy> they only access the little bit of the nova db they are allowed to, I mean
15:31:42 <bauzas> johnthetubaguy: oh, see your worries
15:31:57 <bauzas> johnthetubaguy: you missed my last 2 bullets in the spec
15:32:06 <bauzas> - modify HostManager so it builds aggregates info in HostState by querying107    all Aggregate objects.
15:32:10 <bauzas> - update scheduler filters so that it looks into HostState instead109    of aggregates
15:32:19 <bauzas> Later, when Gantt will be created, the sched.client.update will update112   another table in Gantt DB so HostManager will be able to query it instead of113   Aggregate objects.
15:33:11 <bauzas> as I said in the spec, there are also the instancegroups objects to care of
15:33:12 <johnthetubaguy> bauzas: thats worse, if we are not careful, now all deployments have bad performance, not just the ones using bad filters/weighters
15:33:56 * jaypipes reads back... sorry for being late :)
15:34:10 * johnthetubaguy thinks my IRC is running a bit slow, taking a while to see people's comments
15:34:31 <bauzas> johnthetubaguy: because Manager will call aggregates each time it needs to be updated ?
15:35:49 <bauzas> johnthetubaguy: I mean, I can understand we could have some problems, I'm just trying to find the best tradeoff
15:35:56 <johnthetubaguy> bauzas: well, those aggregate calls are expensive, but only happen we required now, at least, just don't want to change that
15:36:16 <bauzas> johnthetubaguy: so, we agree that we need to update scheduler
15:36:21 <johnthetubaguy> bauzas: that why I am wondering why the host can't just report the stats, directly, that the scheduler wants to see
15:36:54 <johnthetubaguy> bauzas: except for where it needs user based info, which must come in select_destinations, thats fine
15:37:01 <bauzas> johnthetubaguy: so, to be precise, you mean that the periodic task reports all aggregates that the host wants to see ?
15:37:15 <bauzas> s/wants to see/is part of (really tired tonight)
15:37:30 <bauzas> can't see the whole story
15:37:58 <bauzas> because we can't pass the full list of aggregates within select_destinations
15:38:11 <jaypipes> johnthetubaguy: ++
15:38:14 <bauzas> johnthetubaguy: I mean, we need to update the scheduler with aggregates
15:38:22 <johnthetubaguy> bauzas: more that when the aggregate changes, computes are notified, like today, and only then update the local cache of the aggregate state, so the host just reports "az_zone:zoneB" or something like that
15:38:56 <bauzas> how the scheduler is able to scheduler if there are aggregates with no hosts yet ?
15:39:00 <bauzas> dammit
15:39:05 <bauzas> schedule
15:39:08 <jaypipes> johnthetubaguy: I would think that when an agg is updated, all things that would be interested in the change would be notified, including a call to a scheduler RPC API.
15:39:13 <jaypipes> or cast...
15:39:29 <bauzas> jaypipes: that was the idea of my proposal
15:39:33 <jaypipes> bauzas: cool.
15:39:43 <johnthetubaguy> jaypipes: yeah, we have bits of that already today, but yeah, we could just call the scheduler
15:39:47 <bauzas> jaypipes: but the difference is that it was a no-op thing now
15:40:02 <jaypipes> bauzas: sure, understood
15:40:07 <n0ano> so, are we arguing architecture or implementation right now?
15:40:16 <jaypipes> impl
15:40:26 <bauzas> n0ano: we're arguing spec proposal details :)
15:40:29 <johnthetubaguy> bauzas: confused, I don't get why its no-op now, given we really need this change now, for performance improvements?
15:40:33 <bauzas> n0ano: because we need to provide those
15:40:57 <n0ano> OK, I want to keep everyone focused on the `BP` right now, not the path to implement the BP
15:41:05 <n0ano> s/path/patch
15:41:06 <bauzas> johnthetubaguy: lemme try to summarize the thing
15:41:28 <bauzas> n0ano: that's really a design discussion
15:41:35 <bauzas> johnthetubaguy: so the proposal would be
15:42:10 <bauzas> johnthetubaguy: each time an aggregate is modified (C/U/D), a call is made to the scheduler saying 'eh, I'm an agg with this metadata'
15:42:36 <bauzas> the proposal would be to make use of the existing update_resource_stats method
15:44:02 <bauzas> johnthetubaguy: in the nova world, that would mean that within update_resource_stats, it would update system_metadata
15:44:12 <bauzas> or I don't know which other one
15:44:22 <johnthetubaguy> bauzas: it just doesn't feel quite right at the moment
15:44:58 <johnthetubaguy> bauzas: in case things get out of sync, no clear "owner" to fix that, but maybe I am over thinking it
15:45:11 <bauzas> johnthetubaguy: the real problem is that :
15:45:14 <johnthetubaguy> bauzas: I just like the node being responsible for reporting all its stats
15:45:37 <bauzas> johnthetubaguy: yeah, I like it too, but there are cases where aggregates have no hosts
15:45:47 <johnthetubaguy> bauzas: but that does create other issues, but it works nicely for the move from nova-scheduler to gantt, and such like, as compute restart gives you fresh data to start the day with
15:46:00 <bauzas> johnthetubaguy: so when filtering on, you wouldn't have a view of all the aggregates
15:46:09 <johnthetubaguy> bauzas: if there are no hosts in an aggregate, the scheduler doesn't need to know about the aggregate
15:46:21 <bauzas> johnthetubaguy: nope, I disagree
15:46:28 <bauzas> johnthetubaguy: take the AZ filter
15:46:37 <johnthetubaguy> bauzas: scheduler is picking hosts, just uses extra data from aggregates to filter some out
15:46:53 <bauzas> johnthetubaguy: if no hosts are in the aggregate having the AZ wanted, scheduler won't know that this AZ is existing
15:47:16 <johnthetubaguy> bauzas: if you want AZ 7 and there are no hosts in AZ 7, then nothing to pick, doesn't matter if you even know what AZ 7 is
15:48:02 <bauzas> johnthetubaguy: I'm sorry, but that's a chicken-and-egg problem :)
15:48:25 <bauzas> johnthetubaguy: if I'm creating a new AZ with no hosts, I can possibly still ask nova to boot one host to this AZ
15:48:26 <johnthetubaguy> bauzas: its not though, you are trying to pick a host, if there are none, there are none?
15:48:53 <bauzas> johnthetubaguy: oh I see
15:48:56 <johnthetubaguy> bauzas: if all the hosts have a stat saying which AZ they are in, you just filter on that right?
15:49:33 <bauzas> johnthetubaguy: right
15:49:43 <bauzas> johnthetubaguy: that sounds reasonable
15:50:09 <bauzas> johnthetubaguy: that only implies that computes report each aggregate they are in, including metadata
15:50:22 <johnthetubaguy> bauzas: yep, that was my proposal
15:50:23 <n0ano> I have to say, I'm with johnthetubaguy on this, if there's no hosts in an agg or AZ I don't see why the scheduler would need to know about it.
15:50:51 <johnthetubaguy> bauzas: well, actually not quite
15:51:01 <johnthetubaguy> bauzas: each host reports the AZ zone it thinks its in
15:51:07 <bauzas> n0ano: yeah, that's logical
15:51:31 <johnthetubaguy> bauzas: it doesn't say about its aggregate, or any metadata, it reports what AZ zone it is in
15:51:46 <bauzas> johnthetubaguy: we need to have aggregate metadata for AggregateImagePropertiesIsolation by example
15:51:55 <johnthetubaguy> bauzas: for the tenant filter, the host reports what teants it allows, and what tenants it dis-allows
15:52:03 <bauzas> or AggregateInstanceExtraSpecsFilter
15:52:39 <johnthetubaguy> bauzas: extra specs filter, you probably what to report the extra specs you allow on each host
15:52:50 <n0ano> sorry guys, we'll have to continue this discussion on the nova channel, there's one more topic we should cover today
15:52:58 <bauzas> n0ano: right
15:52:58 <johnthetubaguy> bauzas: the nice property is the scheduler view of the world is never ahead of what the compute node things it should be doing
15:53:25 <n0ano> #topic mid-cycle meetup
15:53:31 <bauzas> johnthetubaguy: we need to follow-up on that one :)
15:53:32 <johnthetubaguy> bauzas: Ok, will have to be later, I have a meeting straight after this one I am afraid
15:53:44 <bauzas> johnthetubaguy: tomorrow morning, you're free ?
15:54:00 <johnthetubaguy> bauzas: OK
15:54:04 <n0ano> I know we'll talk about forklift status/process, are there other scheduling issue we want to raise at the meetup?
15:54:07 <bauzas> johnthetubaguy: cool thnks
15:54:42 <Yathi> n0ano: I want to discuss the Solver Scheduler blueprint.. and its request for spec freeze exception
15:54:44 <bauzas> n0ano: there is the proposal from jaypipes about global claiming
15:54:54 <johnthetubaguy> n0ano: if we include the resource tracker move to scheduler, along with the current progress, thats probably key
15:55:00 <n0ano> bauzas, BTW, I think we'll be setting up a Google+ conference so you can join in (with difficulty)
15:55:08 <bauzas> n0ano: that would be awesome
15:55:12 <johnthetubaguy> well some of these other things are more like gantt mid cycle things though right?
15:55:36 <bauzas> johnthetubaguy: +1
15:55:39 <johnthetubaguy> there might be some time for a gantt session in a break out room I guess
15:55:39 <n0ano> johnthetubaguy, not sure what you mean by `gantt mid-cycle'
15:55:52 <johnthetubaguy> well nova will not let these features in
15:55:57 <johnthetubaguy> so its quite a quick nova discussion
15:55:59 <n0ano> ahh, gantt specific vs. nova issues
15:56:03 <johnthetubaguy> yeah
15:56:07 <johnthetubaguy> sorry, I was unclear
15:56:20 <jaypipes> johnthetubaguy: nova will not let what faetures in?
15:56:36 <n0ano> johnthetubaguy, NP but that's a good point, we should be focusing on nova issues at the nova meetup
15:57:04 <n0ano> and, fer sur, I can arrange a breakout room for a gantt specific session
15:57:08 <Yathi> n0ano: one topic for midcycle meetup: the scheduler subgroup believes the complex scheduling scenarios as shown in the usecases etherpad.. our solver scheduler BP tries to address that complexity.. and will fit in Gantt..
15:57:57 <n0ano> Yathi, I haven't forgotten you, we can address that but I think that is a scheduler specific topic, not a nova one
15:58:04 <bauzas> n0ano: +1
15:58:26 <Yathi> n0ano: sure I agree.. it is scheduler specific..
15:58:31 <bauzas> if you want, we can arrange a time for discussing about gantt specifics
15:58:36 <bauzas> so I could join
15:58:41 <n0ano> Yathi, I'm not against your solver proposal it's just I want to focus on the gantt split for the moment
15:58:52 <bauzas> that would allow us to discuss more easily
15:59:00 <bauzas> if we have a separate room
15:59:04 <n0ano> bauzas, +1
15:59:11 <bauzas> n0ano, you'll be my voice during the split status meeting
15:59:19 <Yathi> n0ano: that's great to know. I can imagine.. the priorities..totally agree.. but just want to get the basic framework code in..
15:59:24 <johnthetubaguy> jaypipes: we kinda pushed back at the summit, and else where on lots of these, saying please split out gantt first
15:59:25 * n0ano has to work on my french accent
15:59:30 <bauzas> I'm still discussing with russellb to see if he can help too
16:00:25 <Yathi> n0ano: we have been trying to push the basic framework in (non-disruptive to gantt or nova) since HKG.. hence the push now..
16:00:29 * bauzas would love to give his accent for free
16:00:37 <n0ano> it's approaching the top of the hour, we'll cancel this meeting next week (for obvious reasons), hope to see most of you in Oregon and we talk here in 2 weeks
16:00:49 <n0ano> Yathi, understood
16:00:56 <n0ano> #endmeeting