14:00:05 <edleafe> #startmeeting nova_scheduler 14:00:07 <openstack> Meeting started Mon Jun 5 14:00:05 2017 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:10 <openstack> The meeting name has been set to 'nova_scheduler' 14:00:14 <edleafe> Who's here today? 14:00:21 <alex_xu> o/ 14:00:34 <lei-zh> o/ 14:01:01 <jaypipes> o/ 14:01:16 <cdent> o/ 14:02:15 <edleafe> Looks like there is room enough for everyone to stretch their legs! 14:02:34 <jaypipes> indeed 14:02:43 <edleafe> #topic Specs and Reviews 14:02:51 <edleafe> ** Claims in the Scheduler/Conductor 14:02:51 <edleafe> *** #link Claims in the Scheduler/Conductor series: https://review.openstack.org/#/c/465171/ 14:02:57 <edleafe> crap 14:03:02 <edleafe> copy paste fail! 14:03:08 <edleafe> Claims in the Scheduler/Conductor 14:03:12 <edleafe> #link Claims in the Scheduler/Conductor series: https://review.openstack.org/#/c/465171/ 14:03:51 <edleafe> This seems to be moving forward 14:04:05 <edleafe> Are there any overriding concerns about the direction? 14:04:17 <jaypipes> not from me 14:04:56 <cdent> nor me 14:04:58 <edleafe> Jay has a +2 on the first in that series, so if we can get another core to agree.. 14:05:12 <jaypipes> bauzas: you in today? 14:05:34 <edleafe> IAC, please review the next few in that series. I have cycles to update as needed 14:05:40 <edleafe> Moving on... 14:05:40 <edleafe> Nested Resource Providers 14:05:41 <edleafe> #link Nested RPs: https://review.openstack.org/#/c/415920 14:05:54 <dansmith> edleafe: uber holiday today in .ey 14:05:56 <dansmith> *eu 14:06:03 <diga> o/ 14:06:18 <edleafe> dansmith: no ride sharing? 14:06:32 <dansmith> uuuuuuber holiday 14:06:32 <jaypipes> edleafe: been getting some good reviews from efried on that series. 14:06:35 <dansmith> big holiday 14:06:40 * cdent is no longer in the eu :( 14:06:40 <jaypipes> edleafe: plan to comment on the patches please. 14:07:01 <edleafe> jaypipes: sure thing - have them open in tabs with the latest updates 14:07:20 <jaypipes> edleafe: I also have a question on scheduler claims and shared resource providers to discuss if we time today 14:08:05 <edleafe> I'm free after this meeting 14:08:18 <edleafe> We can either do it during Opens, or anytime after 14:08:21 <jaypipes> kk 14:08:28 <alex_xu> jaypipes: a question, will change the API 'GET /resource_providers?resources=...' in the future for nested resource providers? 14:09:06 <jaypipes> alex_xu: great question, and the answer is related to what I want to discuss about shared resource providers :) 14:09:18 <jaypipes> alex_xu: so let's discuss that together 14:09:28 <alex_xu> jaypipes: ok 14:09:44 <edleafe> Let's do it in Opens then. I don't want to keep alex_xu up too late :) 14:09:50 <jaypipes> there's another patch series: https://review.openstack.org/#/q/topic:cd/placement-api-ref+status:open 14:09:58 <alex_xu> edleafe: thanks :) 14:09:59 <jaypipes> for avolkov's api-refs 14:10:30 <edleafe> jaypipes: stop stealing my thunder! 14:10:32 <edleafe> :) 14:10:35 <jaypipes> sorry! 14:10:39 <edleafe> that's later in the list 14:11:02 <jaypipes> k 14:11:05 <edleafe> Well, since you brought it up... 14:11:06 <edleafe> Placement API ref docs 14:11:07 <edleafe> #link Placement API ref docs: https://review.openstack.org/#/q/topic:cd/placement-api-ref+status:open 14:11:10 <jaypipes> hehe 14:11:23 * edleafe tries to run a tight meeting 14:11:53 <edleafe> Anything to add about that series? 14:12:32 <cdent> only that we'll need to start thinking about the publishing job 14:12:37 <cdent> right now the job makes drafts 14:12:53 <cdent> but last I talked to andrey it wasn't ready to publish 14:14:22 <edleafe> that's always the fun part, no? 14:14:30 <jaypipes> cdent: ++ 14:14:56 <edleafe> Moving on 14:14:57 <edleafe> Add project_id and user_id to placement DB 14:14:57 <edleafe> #link project_id and user_id to placement DB: https://review.openstack.org/#/c/470645/ 14:15:11 <jaypipes> hold up on that one. 14:15:13 <jaypipes> :) 14:15:14 <edleafe> Still kind of WIPpy 14:15:27 <jaypipes> I'm just fixing locally a postgreSQL oddity. 14:15:35 <jaypipes> finishing running tests now and should be pushed shortly. 14:15:44 <edleafe> cool 14:15:45 <jaypipes> spent most of the weekend working on that 14:15:55 <jaypipes> I wanted to normalize the consumers table properly. 14:16:03 <edleafe> I spent most of my weekend not working :) 14:16:12 <jaypipes> instead of endlessly repeating VARCHAR(255) columns everywhere 14:16:50 * edleafe finds it funny to see jaypipes and "normal" in the same sentence 14:17:11 <jaypipes> edleafe: normalized != normal :) 14:17:23 <edleafe> heh 14:17:40 <edleafe> Next up is: Delete all inventory 14:17:40 <edleafe> #link Delete all inventory: https://review.openstack.org/#/c/460147/ 14:17:49 <cdent> did we know that mr date has the same initials as me? I'm sure that means something.. 14:17:58 <edleafe> That's getting pretty close. We should be able to push that through this week 14:18:01 <jaypipes> I did indeed know that. 14:18:13 <jaypipes> I remember speaking to him around 2006 or so. 14:18:33 <jaypipes> he wanted $9000 to come do a keynote for the MySQL conference. I politely declined. 14:18:48 <cdent> hawt 14:19:09 <jaypipes> now... if his name was CJ Timestamp, I might have considered. 14:19:36 <edleafe> <groan> 14:19:44 <jaypipes> yes, I have turned into my father. 14:19:57 <jaypipes> ok, meeting guru, what's up next? 14:20:12 <edleafe> So we have a leaf, a dent, and some pipes making fun of a date 14:20:22 <jaypipes> indeed. 14:20:30 <jaypipes> Indy, bad dates. 14:20:30 <edleafe> Sync os-traits to DB 14:20:30 <edleafe> #link Sync os-traits: https://review.openstack.org/#/c/469578/ 14:20:43 <edleafe> This is cdent's patch 14:20:53 <jaypipes> cdent: can you fix up that thing edleafe pointed to on ^^ 14:21:03 <jaypipes> cdent: help a brother out? :) 14:21:37 <cdent> i thought that was already done, and now it was a thing that alex pointed out? 14:21:39 <edleafe> jaypipes: you mean the thing that alex_xu pointed to? 14:21:43 <edleafe> jinx 14:22:34 <jaypipes> doh, yeah, sorry, was alex_xu :) 14:22:47 <cdent> if he's correct, then sure, I can get that after this meeting 14:22:56 <cdent> I was waiting for confirmation from folk 14:23:08 <edleafe> cdent: ok, I can take a look too 14:23:28 <edleafe> Let 14:23:33 <alex_xu> cdent: I did a test in local :) 14:23:33 <edleafe> Let's move on 14:23:35 <cdent> letrec 14:23:44 * edleafe can't type an apostrophe 14:24:25 <edleafe> #topic Bugs 14:24:36 <edleafe> #link Placement bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement 14:24:50 <cdent> I made a few last week and then fixed most of them 14:24:52 <edleafe> A few new ones 14:25:03 <edleafe> ah, good 14:25:29 <edleafe> Then we're up to: 14:25:30 <cdent> was gonna do this one asap: https://bugs.launchpad.net/nova/+bug/1695356 14:25:31 <openstack> Launchpad bug 1695356 in OpenStack Compute (nova) "placement allocations handler module does not use util.extract_json" [Low,Triaged] 14:25:33 <edleafe> #topic Open discussion 14:25:41 * cdent sits comfortable 14:26:00 <jaypipes> cdent: yes, alex_xu is correct. I'd forgotten oslo.db wraps that exception. 14:26:00 <edleafe> sorry to step on your bug, cdent 14:26:22 * edleafe picture bug guts on my show 14:26:25 <edleafe> shoe, even 14:26:34 <jaypipes> OK, so we ready to discuss shared and nested providers in the claims? 14:26:39 <cdent> yes 14:26:40 <edleafe> go for it! 14:26:44 <jaypipes> awesome. 14:26:56 <jaypipes> alright, so here is the way I've been thinking about it. 14:27:14 <jaypipes> please come along this journey with me and tell me if I'm smoking some crazypants. 14:27:33 * cdent gets bong 14:28:05 <jaypipes> alright, so the scheduler, after edleafe's placement claims series is merged, will be doing the POST /allocations, passing in a consumer ID (the instance UUID) and the resource provider UUID it selected from the list of providers returned from placement. 14:28:30 <edleafe> so far so good... 14:29:04 <jaypipes> Now, if the resource provider selected by the scheduler uses a shared provider for one of the requested resources (say, DISK_GB), then we need to allocate against that particular resource provider (the shared one) and obviously not the compute node resource provider. 14:29:18 * cdent nods 14:29:52 * edleafe hopes jaypipes is going to suggest that that logic go in the placement code 14:30:04 * cdent cdent hopes not :) 14:30:09 <jaypipes> So, I'm thinking that the response from POST /allocations should include a list of resource providers that were consumed from, *including* the shared provider that the placement engine consumed from. 14:30:09 <cdent> oops 14:30:16 <dansmith> hope not 14:30:20 <cdent> argh 14:31:13 <jaypipes> The alternative is that placement change the response for GET /resource_providers that it returns to the scheduler to include the shared resource providers that the scheduler should include in its POST /allocations call. 14:31:40 * dansmith is confused 14:31:44 <edleafe> why is it important for POST /allocations to return anything? 14:31:49 <cdent> I thought the point of the aggregates cache that the report client (or was it resource tracker) was going to maintain was going to be used to know who is being shared with 14:32:02 <cdent> and thus we could write explicit allocations 14:32:03 <jaypipes> edleafe: good point. technically, it's not. 14:32:03 <dansmith> jaypipes: scheduler picks the things it wants to claim against, and tries in its POST.. either it works or doesn't.. no decision-making on the placement side right? 14:32:23 <edleafe> dansmith: it just gets back *root* providers 14:32:32 <edleafe> e.g., compute nodes 14:32:38 <dansmith> oh the wrinkle is nested here? 14:32:46 <edleafe> yes 14:32:49 <cdent> also shared 14:32:52 <cdent> ? 14:32:56 <jaypipes> dansmith: no, not nested, though that will be a similar wrinkle. 14:33:24 <dansmith> jaypipes: okay, but it sounds like you're describing some non-deterministic behavior on the placement side, 14:33:25 <edleafe> ok, good point 14:33:28 <dansmith> if it matters what the post response is 14:33:50 <jaypipes> cdent: the idea of the aggregate cache in the report client was indeed to know what things were shared with the compute node and be able to claim against those shared providers. 14:34:29 <edleafe> dansmith: it shouldn't matter. POST should just return 204 if successful 14:34:50 <dansmith> edleafe: right, that's my thought 14:35:19 <edleafe> dansmith: allocating disk resources on a compute node that uses shared storage should be transparent to scheduler 14:35:19 <jaypipes> cdent: the thing is, we need some way of telling the scheduler ("hey, this resource provider doesn't actually have room for X resource class. for that, you need to add a shared provider to the allocation") 14:35:20 <dansmith> which is why I'm wondering what jaypipes' point/question is about returning the providers that got consumed from 14:35:58 <edleafe> jaypipes: wait - that's not true 14:36:05 <jaypipes> edleafe: how so? 14:36:23 <edleafe> jaypipes: scheduler asks placement for RPs that can satisfy a set of resource requirements 14:36:31 <edleafe> placement returns that list 14:36:42 <cdent> I disagree with this statement: "allocating disk resources on a compute node that uses shared storage should be transparent to scheduler" 14:36:56 <jaypipes> no, it returns the list of providers that can satisfy that request OR are associated with a provider that shares. 14:36:59 <edleafe> when scheduler then tries to claim those resources against a RP, placement shouldn't reply "Oh, it can't satisfy that" 14:37:03 <dansmith> right, it's not transparent to the scheduler 14:37:18 <dansmith> scheduler is the thing that knows to claim against the shared provider in the same aggregate as the compute 14:37:30 <dansmith> (if we're doing it in scheduler vs. conductor) 14:37:49 <edleafe> dansmith: placement already determined that the compute node had shared storage in the same agg 14:37:56 <jaypipes> dansmith: technically, no, the placement API is the thing that knows that stuff. 14:38:22 <dansmith> jaypipes: you mean because it returned the shared provider in the initial GET right? 14:38:44 <jaypipes> dansmith: no, we don't (currently) return shared providers. we only return providers that are shared *with*. 14:38:59 <edleafe> Think of it this way: scheduler asks for resources. Placement has to have the logic to handle shared resources. 14:39:07 <dansmith> um. 14:39:14 <jaypipes> dansmith: now, I could change the return response of GET /resource_providers (my question above) if the scheduler does need to know that ionformation. 14:39:23 <edleafe> Scheduler then claims resources. Placement should have the logic to handle shared resources there, too 14:39:32 <edleafe> jaypipes: god no! 14:39:56 <jaypipes> edleafe: having the placement API do the claim against the shared provider is what I think is the cleanest solution. 14:40:13 <cdent> i'm not sure if I agree or disagree, edleafe, but why so strenuous? 14:40:22 <dansmith> jaypipes: that is entirely contrary to what I've been thinking the plan is this whole time 14:40:27 <edleafe> Because it's a placement detail 14:40:38 <jaypipes> perhaps it's worth doing a hangout on this. it's a complex topic and IRC threads are a pain for this.. 14:40:40 * dansmith is getting rather frustrated with the lack of actual vs. apparent consensuson all this 14:40:49 <edleafe> scheduler, or anything else talking to placement, shouldn't have to incorporate that nesting/shared logic 14:41:10 <jaypipes> dansmith: please be patient :) we never discussed this in the specs or at summits. 14:41:25 <dansmith> huh? then I must be missing something even bigger 14:41:35 <edleafe> cdent: let me turn the question around 14:41:53 <jaypipes> would everyone be ok with a hangout? 14:42:02 <edleafe> cdent: why not have the scheduler determine shared resources when getting a list of hosts? 14:42:03 <cdent> i can listen, but will struggle to talk 14:42:11 * alex_xu is ok 14:42:21 * edleafe would need a few minutes to set up 14:42:23 <cdent> but I'd prefer to listen at this stage anyway, so is cool 14:42:31 * alex_xu will struggle to listen 14:42:49 <jaypipes> ok, we can continue on IRC then I suppose. 14:43:20 <jaypipes> dansmith: so, to be clear, the GET /resource_providers currently returns resource providers that either have the inventory themselves or are associated with a provider that shares some resources with it. 14:43:35 <jaypipes> dansmith: it does not currently return the sharing providers themselves. 14:43:43 <jaypipes> dansmith: only the shared-with providers. 14:43:47 <edleafe> the logic to determine the sharing is in placement 14:44:01 <dansmith> jaypipes: okay, that's not what I was thinking we were doing 14:44:23 <dansmith> jaypipes: because that precludes scheduler having filters for complex choosing of which shared thing to use 14:44:23 <jaypipes> dansmith: did you think we were returning the sharing providers in addition to the shared-with providers? 14:44:39 <dansmith> yeah 14:45:31 <dansmith> I think it also maybe prevents scheduler from being able to say "okay I got three compute nodes, one with local disk, two with shared, and the user prefers not shared, so choose cn1" 14:45:43 <jaypipes> dansmith: we have the infrastructure in place (via the provider_aggregate_map in the scheduler reporting client) to have the scheduler do some additional filtering logic. If that's what we want ti di, 14:45:48 <jaypipes> to do... 14:47:02 <jaypipes> dansmith: ok, fair point. would you want me to change the response of GET /resource_providers to indicate the shared-with relationship? or do you think it's appropriate for the scheduler to use the provider_aggregate_map stuff and "figure that out" for itself? 14:48:16 <dansmith> that seems like a lot of stabbing in the dark by the scheduler 14:48:18 <jaypipes> dansmith: a similar question is going to need answered for nested providers. Do we modify the response of GET /resource_providers to indicate the tree relationship between providers? or do we rely on the reporting client to keep that tree information in memory? 14:48:56 <cdent> GET /rps quickly gets very confused and confusing if we start trying to represent (and potentially optionally) shared and nested in the response 14:49:02 <dansmith> I dunno, I thought a major tenant here was that the scheduler was picking "which one" and placement was keeping track of "how much" 14:49:09 <cdent> on the other hand, the lack of explicitness is ... disarming 14:49:51 <jaypipes> well, this is complex stuff. no way around that. 14:49:53 <edleafe> dansmith: if the scheduler has a compute node, can't it tell if it is part of a shared resource agg? 14:50:16 <dansmith> edleafe: it can by stabbing in the dark yes 14:50:28 <jaypipes> edleafe: it can, but it's a lot more work for the scheduler (reporting client) to do that than it is for the placement API. 14:51:06 <edleafe> it seems that a better request for GET RPs would be the solution 14:51:22 <edleafe> like specifying a trait for shared/unshared 14:51:31 * dansmith checks local fast food chains for openings 14:51:47 <jaypipes> look, there's tradeoffs in simplicity/complexity in each of the approaches here. we need to probably choose the one that has the placement API returning the most useful information to the scheduler in a single response. or at least, that's my opinion... 14:52:30 <edleafe> The only way to deal with messy details is to localize them in one place, rather than try to make every part of the application have to understand that 14:52:41 <cdent> returning everything was my original opinion 14:52:46 <edleafe> RP relationships are messy 14:52:47 <cdent> (months ago) 14:53:04 <jaypipes> cdent: not helping. 14:53:15 <cdent> jaypipes: ? I was agreeing with you 14:53:18 <edleafe> Passing details of the messiness around will lead to more crap code all over the place 14:53:40 <dansmith> these details seem completely material to scheduler's day job 14:53:42 <jaypipes> edleafe: could you be explicit about what "details of the messiness" means to you? 14:53:42 <dansmith> to me at least 14:54:09 <edleafe> jaypipes: sure 14:54:59 <edleafe> The relationship between RPs, whether with shared aggs or nested providers, is something we've struggled with to get to a halfway-decent representation inside placement 14:55:38 <edleafe> Now we're thinkning about adding that same logic to the scheduler? And possibly also the conductor? 14:57:03 <edleafe> (BTW, 3 minutes left) 14:57:52 <jaypipes> edleafe: what about adding a separate GET /<something> API endpoint that would return all the things that the scheduler needs to make a decision? 14:58:05 <jaypipes> edleafe: instead of hacking further the GET /resource_providers endpoint? 14:58:15 * jaypipes just throwing out ideas 14:58:23 <edleafe> jaypipes: well, that would certainly suck a whole lot less 14:58:26 <cdent> do you mean one that would represent the structure more that the current representaiton does? 14:58:33 <cdent> than 14:58:39 <jaypipes> cdent: yes 14:59:06 <edleafe> cdent: maybe like /GET servers vs. /GET servers?uuid-AAAA... 14:59:09 <jaypipes> cdent: and possibly including things like traits and inventory/usage inforamtion 14:59:11 <edleafe> list vs. detail 14:59:44 <dansmith> this would effectively cement novaisms into placement? 15:00:00 <edleafe> dansmith: why? 15:00:07 <manjeets__> hello 15:00:16 <edleafe> Oh, wait, times up. Let's move this to -nova 15:00:17 <dansmith> edleafe: servers? 15:00:22 <edleafe> #endmeeting