14:00:05 <edleafe> #startmeeting nova_scheduler
14:00:07 <openstack> Meeting started Mon Jun  5 14:00:05 2017 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:10 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:14 <edleafe> Who's here today?
14:00:21 <alex_xu> o/
14:00:34 <lei-zh> o/
14:01:01 <jaypipes> o/
14:01:16 <cdent> o/
14:02:15 <edleafe> Looks like there is room enough for everyone to stretch their legs!
14:02:34 <jaypipes> indeed
14:02:43 <edleafe> #topic Specs and Reviews
14:02:51 <edleafe> ** Claims in the Scheduler/Conductor
14:02:51 <edleafe> *** #link Claims in the Scheduler/Conductor series: https://review.openstack.org/#/c/465171/
14:02:57 <edleafe> crap
14:03:02 <edleafe> copy paste fail!
14:03:08 <edleafe> Claims in the Scheduler/Conductor
14:03:12 <edleafe> #link Claims in the Scheduler/Conductor series: https://review.openstack.org/#/c/465171/
14:03:51 <edleafe> This seems to be moving forward
14:04:05 <edleafe> Are there any overriding concerns about the direction?
14:04:17 <jaypipes> not from me
14:04:56 <cdent> nor me
14:04:58 <edleafe> Jay has a +2 on the first in that series, so if we can get another core to agree..
14:05:12 <jaypipes> bauzas: you in today?
14:05:34 <edleafe> IAC, please review the next few in that series. I have cycles to update as needed
14:05:40 <edleafe> Moving on...
14:05:40 <edleafe> Nested Resource Providers
14:05:41 <edleafe> #link Nested RPs: https://review.openstack.org/#/c/415920
14:05:54 <dansmith> edleafe: uber holiday today in .ey
14:05:56 <dansmith> *eu
14:06:03 <diga> o/
14:06:18 <edleafe> dansmith: no ride sharing?
14:06:32 <dansmith> uuuuuuber holiday
14:06:32 <jaypipes> edleafe: been getting some good reviews from efried on that series.
14:06:35 <dansmith> big holiday
14:06:40 * cdent is no longer in the eu :(
14:06:40 <jaypipes> edleafe: plan to comment on the patches please.
14:07:01 <edleafe> jaypipes: sure thing - have them open in tabs with the latest updates
14:07:20 <jaypipes> edleafe: I also have a question on scheduler claims and shared resource providers to discuss if we time today
14:08:05 <edleafe> I'm free after this meeting
14:08:18 <edleafe> We can either do it during Opens, or anytime after
14:08:21 <jaypipes> kk
14:08:28 <alex_xu> jaypipes: a question, will change the API 'GET /resource_providers?resources=...' in the future for nested resource providers?
14:09:06 <jaypipes> alex_xu: great question, and the answer is related to what I want to discuss about shared resource providers :)
14:09:18 <jaypipes> alex_xu: so let's discuss that together
14:09:28 <alex_xu> jaypipes: ok
14:09:44 <edleafe> Let's do it in Opens then. I don't want to keep alex_xu up too late :)
14:09:50 <jaypipes> there's another patch series: https://review.openstack.org/#/q/topic:cd/placement-api-ref+status:open
14:09:58 <alex_xu> edleafe: thanks :)
14:09:59 <jaypipes> for avolkov's api-refs
14:10:30 <edleafe> jaypipes: stop stealing my thunder!
14:10:32 <edleafe> :)
14:10:35 <jaypipes> sorry!
14:10:39 <edleafe> that's later in the list
14:11:02 <jaypipes> k
14:11:05 <edleafe> Well, since you brought it up...
14:11:06 <edleafe> Placement API ref docs
14:11:07 <edleafe> #link Placement API ref docs: https://review.openstack.org/#/q/topic:cd/placement-api-ref+status:open
14:11:10 <jaypipes> hehe
14:11:23 * edleafe tries to run a tight meeting
14:11:53 <edleafe> Anything to add about that series?
14:12:32 <cdent> only that we'll need to start thinking about the publishing job
14:12:37 <cdent> right now the job makes drafts
14:12:53 <cdent> but last I talked to andrey it wasn't ready to publish
14:14:22 <edleafe> that's always the fun part, no?
14:14:30 <jaypipes> cdent: ++
14:14:56 <edleafe> Moving on
14:14:57 <edleafe> Add project_id and user_id to placement DB
14:14:57 <edleafe> #link project_id and user_id to placement DB: https://review.openstack.org/#/c/470645/
14:15:11 <jaypipes> hold up on that one.
14:15:13 <jaypipes> :)
14:15:14 <edleafe> Still kind of WIPpy
14:15:27 <jaypipes> I'm just fixing locally a postgreSQL oddity.
14:15:35 <jaypipes> finishing running tests now and should be pushed shortly.
14:15:44 <edleafe> cool
14:15:45 <jaypipes> spent most of the weekend working on that
14:15:55 <jaypipes> I wanted to normalize the consumers table properly.
14:16:03 <edleafe> I spent most of my weekend not working :)
14:16:12 <jaypipes> instead of endlessly repeating VARCHAR(255) columns everywhere
14:16:50 * edleafe finds it funny to see jaypipes and "normal" in the same sentence
14:17:11 <jaypipes> edleafe: normalized != normal :)
14:17:23 <edleafe> heh
14:17:40 <edleafe> Next up is: Delete all inventory
14:17:40 <edleafe> #link Delete all inventory: https://review.openstack.org/#/c/460147/
14:17:49 <cdent> did we know that mr date has the same initials as me? I'm sure that means something..
14:17:58 <edleafe> That's getting pretty close. We should be able to push that through this week
14:18:01 <jaypipes> I did indeed know that.
14:18:13 <jaypipes> I remember speaking to him around 2006 or so.
14:18:33 <jaypipes> he wanted $9000 to come do a keynote for the MySQL conference. I politely declined.
14:18:48 <cdent> hawt
14:19:09 <jaypipes> now... if his name was CJ Timestamp, I might have considered.
14:19:36 <edleafe> <groan>
14:19:44 <jaypipes> yes, I have turned into my father.
14:19:57 <jaypipes> ok, meeting guru, what's up next?
14:20:12 <edleafe> So we have a leaf, a dent, and some pipes making fun of a date
14:20:22 <jaypipes> indeed.
14:20:30 <jaypipes> Indy, bad dates.
14:20:30 <edleafe> Sync os-traits to DB
14:20:30 <edleafe> #link Sync os-traits: https://review.openstack.org/#/c/469578/
14:20:43 <edleafe> This is cdent's patch
14:20:53 <jaypipes> cdent: can you fix up that thing edleafe pointed to on ^^
14:21:03 <jaypipes> cdent: help a brother out? :)
14:21:37 <cdent> i thought that was already done, and now it was a thing that alex pointed out?
14:21:39 <edleafe> jaypipes: you mean the thing that alex_xu pointed to?
14:21:43 <edleafe> jinx
14:22:34 <jaypipes> doh, yeah, sorry, was alex_xu :)
14:22:47 <cdent> if he's correct, then sure, I can get that after this meeting
14:22:56 <cdent> I was waiting for confirmation from folk
14:23:08 <edleafe> cdent: ok, I can take a look too
14:23:28 <edleafe> Let
14:23:33 <alex_xu> cdent: I did a test in local :)
14:23:33 <edleafe> Let's move on
14:23:35 <cdent> letrec
14:23:44 * edleafe can't type an apostrophe
14:24:25 <edleafe> #topic Bugs
14:24:36 <edleafe> #link Placement bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement
14:24:50 <cdent> I made a few last week and then fixed most of them
14:24:52 <edleafe> A few new ones
14:25:03 <edleafe> ah, good
14:25:29 <edleafe> Then we're up to:
14:25:30 <cdent> was gonna do this one asap: https://bugs.launchpad.net/nova/+bug/1695356
14:25:31 <openstack> Launchpad bug 1695356 in OpenStack Compute (nova) "placement allocations handler module does not use util.extract_json" [Low,Triaged]
14:25:33 <edleafe> #topic Open discussion
14:25:41 * cdent sits comfortable
14:26:00 <jaypipes> cdent: yes, alex_xu is correct. I'd forgotten oslo.db wraps that exception.
14:26:00 <edleafe> sorry to step on your bug, cdent
14:26:22 * edleafe picture bug guts on my show
14:26:25 <edleafe> shoe, even
14:26:34 <jaypipes> OK, so we ready to discuss shared and nested providers in the claims?
14:26:39 <cdent> yes
14:26:40 <edleafe> go for it!
14:26:44 <jaypipes> awesome.
14:26:56 <jaypipes> alright, so here is the way I've been thinking about it.
14:27:14 <jaypipes> please come along this journey with me and tell me if I'm smoking some crazypants.
14:27:33 * cdent gets bong
14:28:05 <jaypipes> alright, so the scheduler, after edleafe's placement claims series is merged, will be doing the POST /allocations, passing in a consumer ID (the instance UUID) and the resource provider UUID it selected from the list of providers returned from placement.
14:28:30 <edleafe> so far so good...
14:29:04 <jaypipes> Now, if the resource provider selected by the scheduler uses a shared provider for one of the requested resources (say, DISK_GB), then we need to allocate against that particular resource provider (the shared one) and obviously not the compute node resource provider.
14:29:18 * cdent nods
14:29:52 * edleafe hopes jaypipes is going to suggest that that logic go in the placement code
14:30:04 * cdent cdent hopes not :)
14:30:09 <jaypipes> So, I'm thinking that the response from POST /allocations should include a list of resource providers that were consumed from, *including* the shared provider that the placement engine consumed from.
14:30:09 <cdent> oops
14:30:16 <dansmith> hope not
14:30:20 <cdent> argh
14:31:13 <jaypipes> The alternative is that placement change the response for GET /resource_providers that it returns to the scheduler to include the shared resource providers that the scheduler should include in its POST /allocations call.
14:31:40 * dansmith is confused
14:31:44 <edleafe> why is it important for POST /allocations to return anything?
14:31:49 <cdent> I thought the point of the aggregates cache that the report client (or was it resource tracker) was going to maintain was going to be used to know who is being shared with
14:32:02 <cdent> and thus we could write explicit allocations
14:32:03 <jaypipes> edleafe: good point. technically, it's not.
14:32:03 <dansmith> jaypipes: scheduler picks the things it wants to claim against, and tries in its POST.. either it works or doesn't.. no decision-making on the placement side right?
14:32:23 <edleafe> dansmith: it just gets back *root* providers
14:32:32 <edleafe> e.g., compute nodes
14:32:38 <dansmith> oh the wrinkle is nested here?
14:32:46 <edleafe> yes
14:32:49 <cdent> also shared
14:32:52 <cdent> ?
14:32:56 <jaypipes> dansmith: no, not nested, though that will be a similar wrinkle.
14:33:24 <dansmith> jaypipes: okay, but it sounds like you're describing some non-deterministic behavior on the placement side,
14:33:25 <edleafe> ok, good point
14:33:28 <dansmith> if it matters what the post response is
14:33:50 <jaypipes> cdent: the idea of the aggregate cache in the report client was indeed to know what things were shared with the compute node and be able to claim against those shared providers.
14:34:29 <edleafe> dansmith: it shouldn't matter. POST should just return 204 if successful
14:34:50 <dansmith> edleafe: right, that's my thought
14:35:19 <edleafe> dansmith: allocating disk resources on a compute node that uses shared storage should be transparent to scheduler
14:35:19 <jaypipes> cdent: the thing is, we need some way of telling the scheduler ("hey, this resource provider doesn't actually have room for X resource class. for that, you need to add a shared provider to the allocation")
14:35:20 <dansmith> which is why I'm wondering what jaypipes' point/question is about returning the providers that got consumed from
14:35:58 <edleafe> jaypipes: wait - that's not true
14:36:05 <jaypipes> edleafe: how so?
14:36:23 <edleafe> jaypipes: scheduler asks placement for RPs that can satisfy a set of resource requirements
14:36:31 <edleafe> placement returns that list
14:36:42 <cdent> I disagree with this statement: "allocating disk resources on a compute node that uses shared storage should be transparent to scheduler"
14:36:56 <jaypipes> no, it returns the list of providers that can satisfy that request OR are associated with a provider that shares.
14:36:59 <edleafe> when scheduler then tries to claim those resources against a RP, placement shouldn't reply "Oh, it can't satisfy that"
14:37:03 <dansmith> right, it's not transparent to the scheduler
14:37:18 <dansmith> scheduler is the thing that knows to claim against the shared provider in the same aggregate as the compute
14:37:30 <dansmith> (if we're doing it in scheduler vs. conductor)
14:37:49 <edleafe> dansmith: placement already determined that the compute node had shared storage in the same agg
14:37:56 <jaypipes> dansmith: technically, no, the placement API is the thing that knows that stuff.
14:38:22 <dansmith> jaypipes: you mean because it returned the shared provider in the initial GET right?
14:38:44 <jaypipes> dansmith: no, we don't (currently) return shared providers. we only return providers that are shared *with*.
14:38:59 <edleafe> Think of it this way: scheduler asks for resources. Placement has to have the logic to handle shared resources.
14:39:07 <dansmith> um.
14:39:14 <jaypipes> dansmith: now, I could change the return response of GET /resource_providers (my question above) if the scheduler does need to know that ionformation.
14:39:23 <edleafe> Scheduler then claims resources. Placement should have the logic to handle shared resources there, too
14:39:32 <edleafe> jaypipes: god no!
14:39:56 <jaypipes> edleafe: having the placement API do the claim against the shared provider is what I think is the cleanest solution.
14:40:13 <cdent> i'm not sure if I agree or disagree, edleafe, but why so strenuous?
14:40:22 <dansmith> jaypipes: that is entirely contrary to what I've been thinking the plan is this whole time
14:40:27 <edleafe> Because it's a placement detail
14:40:38 <jaypipes> perhaps it's worth doing a hangout on this. it's a complex topic and IRC threads are a pain for this..
14:40:40 * dansmith is getting rather frustrated with the lack of actual vs. apparent consensuson all this
14:40:49 <edleafe> scheduler, or anything else talking to placement, shouldn't have to incorporate that nesting/shared logic
14:41:10 <jaypipes> dansmith: please be patient :) we never discussed this in the specs or at summits.
14:41:25 <dansmith> huh? then I must be missing something even bigger
14:41:35 <edleafe> cdent: let me turn the question around
14:41:53 <jaypipes> would everyone be ok with a hangout?
14:42:02 <edleafe> cdent: why not have the scheduler determine shared resources when getting a list of hosts?
14:42:03 <cdent> i can listen, but will struggle to talk
14:42:11 * alex_xu is ok
14:42:21 * edleafe would need a few minutes to set up
14:42:23 <cdent> but I'd prefer to listen at this stage anyway, so is cool
14:42:31 * alex_xu will struggle to listen
14:42:49 <jaypipes> ok, we can continue on IRC then I suppose.
14:43:20 <jaypipes> dansmith: so, to be clear, the GET /resource_providers currently returns resource providers that either have the inventory themselves or are associated with a provider that shares some resources with it.
14:43:35 <jaypipes> dansmith: it does not currently return the sharing providers themselves.
14:43:43 <jaypipes> dansmith: only the shared-with providers.
14:43:47 <edleafe> the logic to determine the sharing is in placement
14:44:01 <dansmith> jaypipes: okay, that's not what I was thinking we were doing
14:44:23 <dansmith> jaypipes: because that precludes scheduler having filters for complex choosing of which shared thing to use
14:44:23 <jaypipes> dansmith: did you think we were returning the sharing providers in addition to the shared-with providers?
14:44:39 <dansmith> yeah
14:45:31 <dansmith> I think it also maybe prevents scheduler from being able to say "okay I got three compute nodes, one with local disk, two with shared, and the user prefers not shared, so choose cn1"
14:45:43 <jaypipes> dansmith: we have the infrastructure in place (via the provider_aggregate_map in the scheduler reporting client) to have the scheduler do some additional filtering logic. If that's what we want ti di,
14:45:48 <jaypipes> to do...
14:47:02 <jaypipes> dansmith: ok, fair point. would you want me to change the response of GET /resource_providers to indicate the shared-with relationship? or do you think it's appropriate for the scheduler to use the provider_aggregate_map stuff and "figure that out" for itself?
14:48:16 <dansmith> that seems like a lot of stabbing in the dark by the scheduler
14:48:18 <jaypipes> dansmith: a similar question is going to need answered for nested providers. Do we modify the response of GET /resource_providers to indicate the tree relationship between providers? or do we rely on the reporting client to keep that tree information in memory?
14:48:56 <cdent> GET /rps quickly gets very confused and confusing if we start trying to represent (and potentially optionally) shared and nested in the response
14:49:02 <dansmith> I dunno, I thought a major tenant here was that the scheduler was picking "which one" and placement was keeping track of "how much"
14:49:09 <cdent> on the other hand, the lack of explicitness is ... disarming
14:49:51 <jaypipes> well, this is complex stuff. no way around that.
14:49:53 <edleafe> dansmith: if the scheduler has a compute node, can't it tell if it is part of a shared resource agg?
14:50:16 <dansmith> edleafe: it can by stabbing in the dark yes
14:50:28 <jaypipes> edleafe: it can, but it's a lot more work for the scheduler (reporting client) to do that than it is for the placement API.
14:51:06 <edleafe> it seems that a better request for GET RPs would be the solution
14:51:22 <edleafe> like specifying a trait for shared/unshared
14:51:31 * dansmith checks local fast food chains for openings
14:51:47 <jaypipes> look, there's tradeoffs in simplicity/complexity in each of the approaches here. we need to probably choose the one that has the placement API returning the most useful information to the scheduler in a single response. or at least, that's my opinion...
14:52:30 <edleafe> The only way to deal with messy details is to localize them in one place, rather than try to make every part of the application have to understand that
14:52:41 <cdent> returning everything was my original opinion
14:52:46 <edleafe> RP relationships are messy
14:52:47 <cdent> (months ago)
14:53:04 <jaypipes> cdent: not helping.
14:53:15 <cdent> jaypipes: ? I was agreeing with you
14:53:18 <edleafe> Passing details of the messiness around will lead to more crap code all over the place
14:53:40 <dansmith> these details seem completely material to scheduler's day job
14:53:42 <jaypipes> edleafe: could you be explicit about what "details of the messiness" means to you?
14:53:42 <dansmith> to me at least
14:54:09 <edleafe> jaypipes: sure
14:54:59 <edleafe> The relationship between RPs, whether with shared aggs or nested providers, is something we've struggled with to get to a halfway-decent representation inside placement
14:55:38 <edleafe> Now we're thinkning about adding that same logic to the scheduler? And possibly also the conductor?
14:57:03 <edleafe> (BTW, 3 minutes left)
14:57:52 <jaypipes> edleafe: what about adding a separate GET /<something> API endpoint that would return all the things that the scheduler needs to make a decision?
14:58:05 <jaypipes> edleafe: instead of hacking further the GET /resource_providers endpoint?
14:58:15 * jaypipes just throwing out ideas
14:58:23 <edleafe> jaypipes: well, that would certainly suck a whole lot less
14:58:26 <cdent> do you mean one that would represent the structure more that the current representaiton does?
14:58:33 <cdent> than
14:58:39 <jaypipes> cdent: yes
14:59:06 <edleafe> cdent: maybe like /GET servers vs. /GET servers?uuid-AAAA...
14:59:09 <jaypipes> cdent: and possibly including things like traits and inventory/usage inforamtion
14:59:11 <edleafe> list vs. detail
14:59:44 <dansmith> this would effectively cement novaisms into placement?
15:00:00 <edleafe> dansmith: why?
15:00:07 <manjeets__> hello
15:00:16 <edleafe> Oh, wait, times up. Let's move this to -nova
15:00:17 <dansmith> edleafe: servers?
15:00:22 <edleafe> #endmeeting