14:00:12 <edleafe> #startmeeting nova_scheduler 14:00:13 <openstack> Meeting started Mon Apr 2 14:00:12 2018 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:16 <openstack> The meeting name has been set to 'nova_scheduler' 14:00:20 <edleafe> Who's here? 14:00:27 <alex_xu_> o/ 14:00:29 <takashin> o/ 14:00:51 <fried_bunny> รถ/ 14:01:12 <edleafe> Might be a light crowd, since many have off for the fried_bunny holiday 14:01:25 <jaypipes> o/ 14:02:32 * alex_xu_ just found that it is not only Chinese eat bunny 14:02:56 <edleafe> :) 14:03:33 <fried_bunny> "Mississippi Fried Bunny to compete with KFC" 14:04:42 <edleafe> alex_xu_: https://www.youtube.com/watch?v=Yxiv3CBMS4M 14:05:10 <edleafe> OK, let's start 14:05:18 <alex_xu_> haha 14:05:40 <edleafe> cdent put together a frighteningly long list of what's on our plate in his last placement update email 14:05:59 <edleafe> So I'm gonna paste 'em here 14:06:01 <fried_bunny> +1 to frightening 14:06:04 <edleafe> #topic Specs 14:06:13 <edleafe> #link VMware: place instances on resource pool (using update_provider_tree) https://review.openstack.org/#/c/549067/ 14:06:16 <edleafe> #link Provide error codes for placement API https://review.openstack.org/#/c/418393/ 14:06:19 <edleafe> #link mirror nova host aggregates to placement API https://review.openstack.org/#/c/545057/ 14:06:22 <edleafe> #link Proposes NUMA topology with RPs https://review.openstack.org/#/c/552924/ 14:06:25 <edleafe> #link Account for host agg allocation ratio in placement https://review.openstack.org/#/c/544683/ 14:06:28 <edleafe> #link Spec for isolating configuration of placement database https://review.openstack.org/#/c/552927/ 14:06:31 <edleafe> #link Support default allocation ratios https://review.openstack.org/#/c/552105/ 14:06:34 <edleafe> #link Spec on preemptible servers https://review.openstack.org/#/c/438640/ 14:06:37 <edleafe> #link Handle nested providers for allocation candidates https://review.openstack.org/#/c/556873/ 14:06:40 <edleafe> #link Add Generation to Consumers https://review.openstack.org/#/c/556971/ 14:06:43 <edleafe> #link Mention (no) granular support for image traits https://review.openstack.org/#/c/554305/ 14:06:46 <edleafe> #link Update the vGPU spec https://review.openstack.org/#/c/557912/ 14:06:49 <edleafe> #link Proposes Multiple GPU types https://review.openstack.org/#/c/557065/ 14:06:52 <edleafe> #link Standardize CPU resource tracking https://review.openstack.org/#/c/555081/ 14:06:55 <edleafe> #link NUMA-aware live migration https://review.openstack.org/#/c/552722/ 14:06:58 <edleafe> #link Network bandwidth resource provider https://review.openstack.org/#/c/502306/ 14:07:01 <edleafe> #link Propose counting quota usage from placement https://review.openstack.org/#/c/509042/ 14:07:04 <edleafe> #link Fix endpoint URI /allocation_requests (reality fix) https://review.openstack.org/#/c/557580/ 14:07:07 <edleafe> (whew!) 14:07:12 <fried_bunny> TBC, this is doesn't include specs we've *already approved* (and thus slated for Rocky). 14:07:23 <edleafe> We couldn't go through them 1-by-1 if we wanted to. So are there any that we need to discuss? 14:07:30 <edleafe> fried_bunny: true dat 14:09:16 <edleafe> OK, I guess nothing too controversial in specs. 14:09:26 <edleafe> #topic reviews 14:09:28 <fried_bunny> except for there being too many for us to possibly handle in Rocky 14:09:48 <edleafe> fried_bunny: yeah, that will be discussed in Opens 14:10:24 <edleafe> Once again, we can't handle them individually, so here are all of them: 14:10:27 <edleafe> #link Update Provider Tree https://review.openstack.org/#/q/topic:bp/update-provider-tree 14:10:30 <edleafe> #link Neste resource providers htttps://review.openstack.org/#/q/topic:bp/nested-resource-providers 14:10:33 <edleafe> #link Nested providers in allocation candidates htttps://review.openstack.org/#/q/topic:bp/nested-resource-providers-allocation-candidates 14:10:36 <edleafe> #link Request Filters htttps://review.openstack.org/#/q/topic:bp/placement-req-filter 14:10:39 <edleafe> #link Mirror nova host aggregates to placement htttps://review.openstack.org/#/q/topic:bp/placement-mirror-host-aggregates 14:10:42 <edleafe> #link Forbidden Traits htttps://review.openstack.org/#/q/topic:bp/placement-forbidden-traits 14:10:45 <edleafe> #link Consumer Generations htttps://review.openstack.org/#/q/topic:bp/add-consumer-generation 14:10:48 <edleafe> #link Extraction htttps://review.openstack.org/#/q/topic:bp/placement-extract 14:10:51 <edleafe> #link Optional Placement DB htttps://review.openstack.org/#/c/552927/ 14:10:54 <edleafe> #link Purge comp_node and res_prvdr records during deletion of cells/hosts https://review.openstack.org/#/c/546660/ 14:10:57 <edleafe> #link A huge pile of improvements to osc-placement https://review.openstack.org/#/q/topic:bp/placement-osc-plugin-rocky 14:11:00 <edleafe> #link Add compute capabilities traits (to os-traits) https://review.openstack.org/#/c/546713/ 14:11:03 <edleafe> #link General policy sample file for placement https://review.openstack.org/#/c/524425/ 14:11:06 <edleafe> #link Provide framework for setting placement error codes https://review.openstack.org/#/c/546177/ 14:11:09 <edleafe> #link Get resource provider by uuid or name (osc-placement) https://review.openstack.org/#/c/527791/ 14:11:12 <edleafe> #link Fix comments in get_all_with_shared() https://review.openstack.org/#/c/533195/ 14:11:15 <edleafe> #link Fixes related to shared providers https://review.openstack.org/#/q/topic:bug/1732731 14:11:18 <edleafe> #link placement: Make API history doc more consistent https://review.openstack.org/#/c/477478/ 14:11:21 <edleafe> #link Add to contributor docs about handler testing https://review.openstack.org/#/c/557355/ 14:11:24 <edleafe> #link doc: Upgrade placement first https://review.openstack.org/#/c/556631/ 14:11:27 <edleafe> #link Handle agg generation conflict in report client https://review.openstack.org/#/c/556669/ 14:11:30 <edleafe> #link Slugification utilities for placement names https://review.openstack.org/#/c/556628/ 14:11:33 <edleafe> #link Remove usage of [placement]os_region_name https://review.openstack.org/#/c/557086/ 14:11:36 <edleafe> #link Get rid of 406 paths in report client https://review.openstack.org/#/c/556633/ 14:11:39 <edleafe> #link Add unit test for non-placement resize https://review.openstack.org/#/c/537614/ 14:11:42 <edleafe> #link Address issues raised in adding member_of to GET /a-c https://review.openstack.org/#/c/554357/ 14:11:45 <edleafe> #link Fix allocation_candidates not to ignore shared RPs https://review.openstack.org/#/c/533396/ 14:11:48 <edleafe> #link Sharing-related bug fixes https://review.openstack.org/#/q/topic:bug/1724613 14:11:51 <edleafe> #link More sharing related bug fixes https://review.openstack.org/#/q/topic:bug/1732731 14:11:54 <edleafe> #link Cover migration cases with functional tests https://review.openstack.org/#/c/493865/ 14:11:57 <edleafe> Are there any of these reviews we need to discuss individually here? 14:12:03 <fried_bunny> I'd like to discuss the series starting at https://review.openstack.org/#/c/558044/ 14:12:13 <fried_bunny> Mainly to get advice on where microversions are or aren't needed. 14:12:21 <fried_bunny> This has three patches that change behavior. 14:12:53 <fried_bunny> https://review.openstack.org/#/c/558045/ changes the provider_summaries to include *all* the resources in a provider, not just the ones relating to the request. 14:13:00 <fried_bunny> jaypipes: We decided we wanted to do that, right? 14:13:38 <edleafe> fried_bunny: what was the reason for that change? 14:14:18 <fried_bunny> I don't remember the rationale, to be honest. I imagine it could be useful for a weigher. 14:15:00 <edleafe> OIC. Well, if we do make that change, it's definitely a new microversion 14:15:01 <fried_bunny> I know that it helped me with something later in the series. But that doesn't mean we couldn't filter 'em back out before returning, I suppose. 14:15:09 <alex_xu_> yes, I also think about that 14:15:20 <fried_bunny> okay, that's what I thought for that one. And tetsuro has done so (since last night). So cool there. 14:15:22 <fried_bunny> Next 14:15:39 <fried_bunny> https://review.openstack.org/#/c/533437/ now allows you to have an "anchor" provider in play. I sent email about this. 14:16:08 <alex_xu_> fried_bunny: we required spec for new microversion 14:16:25 <fried_bunny> TL;DR: if you have a CN that doesn't provide resources itself, but is agg-associated with sharing providers that do provide resource to the request, you should be able to get results there. 14:16:35 <fried_bunny> Today, you get no results. 14:16:44 <fried_bunny> That patch fixes it so you do get results. 14:16:55 <fried_bunny> IMO, this one's fixing broken behavior, so doesn't need a microversion. 14:17:11 <fried_bunny> but... thoughts? 14:17:51 <fried_bunny> alex_xu_: That's a good point. I'd like to come back to whether we actually want/need that first behavior at all. 14:17:59 <edleafe> This also touches on the notion that every API change must be a separate microversion. I don't agree with that approach, since microversions are made on a per-call basis 14:18:15 <edleafe> I'd rather see related changes in a single microversion 14:18:24 <alex_xu_> there are a lot of breaking case in the shard rp 14:18:42 <alex_xu_> I also remember we fixed few cases for shared rp, and we didn't bump microversion last release 14:18:54 <jaypipes> alex_xu_: example? 14:19:16 <jaypipes> fried_bunny: I still haven't gotten through your anchor providers ML post. 14:19:17 <edleafe> were they bug fixes, or just behavior changes? 14:19:18 * alex_xu_ find the link 14:19:43 <alex_xu_> jaypipes: https://review.openstack.org/480379 14:20:05 <alex_xu_> fried_bunny told me that already fixed by the refactor you done last release 14:20:29 <alex_xu_> the refactor of shared and non-shared code in the allocation candidates 14:22:43 <fried_bunny> edleafe: That's often debatable. As is the case here, I think. 14:23:30 <alex_xu_> I also remember we said we didn't support shared rp offcially 14:23:54 <alex_xu_> maybe we can bump a microversion when we fix all the cases, that microversion is signal the shared rp works now 14:24:03 <jaypipes> alex_xu_: well, a) we haven't merged any of that code and b) I don't see how any code that has merged already has affected the public API? 14:25:38 <fried_bunny> Finally, https://review.openstack.org/#/c/558014/ makes it so that, in the "anchor" case, the anchor RP actually shows up in the allocation request (with empty resources) and provider summaries (with, because of that first patch, *all* of its resources, even though none were used in the request). 14:25:46 <alex_xu_> ok, acutally I didn't check that code again 14:26:19 <alex_xu_> I should verify the testcase fried_bunny mentioned, but didn't get a chance yet 14:26:44 <fried_bunny> In this case, if we go for that second one as it is, we would arguably need a microversion for this third one. BUT if we squash 2 & 3 together, we can probably get away with *no* microversion, because the behavior change as a whole is enabling a broken behavior. 14:27:33 <fried_bunny> Now, as alex_xu_ suggests, it's possible we should roll 2&3 into a microversion where we declare support for sharing providers. And basically we can do whatever we want up until then with sharing providers because we don't claim to support them at all. 14:27:53 * fried_bunny steps back 14:28:08 <fried_bunny> It's also possible we just want to punt this whole series to Stein... 14:28:32 <fried_bunny> ...and focus on nrp, granular, upt, forbidden, member_of, etc. 14:29:10 <edleafe> fried_bunny: that's kind of my thinking. Sure, it would be great to get this, but we keep finding more important things to work on 14:29:15 <fried_bunny> basically continue declaring no official support for sharing providers. 14:30:15 <fried_bunny> jaypipes: What's your take on the first part - including all resources in provider summaries? Do you remember why we wanted to do that? 14:30:32 <fried_bunny> ...or even IF we wanted to do that? Maybe tetsuro and I just made it up in our heads. 14:30:37 <jaypipes> fried_bunny: you mean why we *didn't* do that, right? 14:31:05 <fried_bunny> shrug, I suspect the reason it is the way it is is more of a side effect of the impl than a conscious design decision. 14:31:34 <jaypipes> fried_bunny: not sure there was a deliberate thought behind that. probably just that we already had the resources dict representing the resources that are requested and we simply just put a WHERE condition on the keys in that dict. 14:31:43 <fried_bunny> exactly 14:32:03 <fried_bunny> I think perhaps I extrapolated something we said about NRP, which was that we want p_s to show all the RPs in the tree, even the ones not providing resource. 14:32:13 <jaypipes> fried_bunny: "Give me all your things that have oranges. OK, here they are, and some also have apples" 14:32:43 <fried_bunny> jaypipes: IMO, provider_summaries answers "what's in the grove?". 14:32:43 <jaypipes> fried_bunny: is it useful information? maybe... is it a priority right now? not sure about that. 14:33:28 <edleafe> fried_bunny: was there a use case that precipitated this work? 14:33:56 <fried_bunny> edleafe: I was reviewing tetsuro's work on anchor providers, which I think he was doing in response to open bugs. 14:34:26 <fried_bunny> It was complicated code he was modifying in complicated ways, so I got really deep into it and I guess lost perspective on whether it meshed with our current priorities. 14:34:55 <fried_bunny> Having had this discussion now, given that one of the things we want to get out of this meeting is, "how do we manage scope?" I can be convinced that we should table the series for Stein. 14:35:11 <jaypipes> fried_bunny: ++ 14:35:15 <fried_bunny> Perhaps a procedural -2 jaypipes? 14:35:23 <edleafe> makes sense 14:35:23 <jaypipes> we have plenty of other things to focus on. 14:35:26 <fried_bunny> ...with a link to this discussion 14:35:39 <jaypipes> fried_bunny: after I fully consume your ML post about it, sure. 14:36:01 <fried_bunny> Bit of a shame since it's pretty much working at this point. But for sake of reviewer bandwidth, microversion noise, spec requirements, etc... 14:36:43 <fried_bunny> Let's move on. 14:36:50 <fried_bunny> I have one other thing I'd like to call attention to. 14:37:00 <edleafe> go for it 14:37:08 <fried_bunny> bug 1760322 14:37:09 <openstack> bug 1760322 in OpenStack Compute (nova) "Traits not synced if first retrieval fails" [Undecided,In progress] https://launchpad.net/bugs/1760322 - Assigned to Eric Fried (efried) 14:37:23 <fried_bunny> test case: https://review.openstack.org/#/c/558066/ 14:37:38 <fried_bunny> (bad, horrible) fix: https://review.openstack.org/#/c/558068/ 14:37:47 <jaypipes> currently reviewing that... 14:38:04 <fried_bunny> jaypipes: I think the right fix is to make _ensure_trait_sync run in its own transaction, regardless of the context around it. Is that possible? 14:38:29 <fried_bunny> (btw, I agree with tetsuro's comment on the test case patch, and got that all fixed up only to mess up the rebase locally, so Ima have to redo it) 14:39:09 <jaypipes> fried_bunny: not sure that is possible. at least, not if we want to trigger that being run from our context-decorated staticmethods... 14:40:01 <fried_bunny> okay. Well, consider your attention done broughten to it. We can finish the discussion in the review. And it's probably something cdent will want to look at. 14:40:33 <edleafe> fried_bunny: why wouldn't you do the sync check in _ensure_trait_sync, and set the global there accordingly? 14:40:35 <jaypipes> alex_xu_: do NOT copy fried_bunny's use of wrong English above. 14:40:51 <jaypipes> alex_xu_: "broughten" is not a word. 14:41:13 <fried_bunny> edleafe: That's what's being done. And it turns out wrong when the encompassing method raises an exception, because the db transaction gets rolled back, but the global doesn't get unset. 14:41:17 <fried_bunny> At least I think that's what's going on. 14:42:01 <alex_xu_> jaypipes: ok... 14:42:14 <edleafe> I'd have to look at it more closely, but it seems like the check is in the wrong place. 14:42:15 <fried_bunny> Perhaps there's a way to set a flag to say "commit the transaction even when I raise an exception from here"? 14:42:44 <jaypipes> fried_bunny: I'll comment on the review. 14:42:49 <edleafe> me too 14:42:52 <fried_bunny> A provably correct way to do it would be to have the _TRAITS_SYNCED flag in the database itself. So it's part of the transaction. But that's a tad icky. 14:43:02 <edleafe> ick ick ick 14:43:13 <fried_bunny> okay, triple icky. 14:43:22 <fried_bunny> cool, moving on. 14:43:33 <fried_bunny> thanks for indulging me, y'all. 14:43:42 * alex_xu_ still can stop google https://en.wiktionary.org/wiki/broughten 14:43:47 <edleafe> Any other reviews to discuss? 14:44:43 <fried_bunny> https://www.youtube.com/watch?v=tq08yOneY_0 14:44:54 <edleafe> alex_xu_: it's considered "substandard": https://www.merriam-webster.com/dictionary/broughten 14:45:41 <edleafe> ok, let's move on 14:45:42 <edleafe> #topic Bugs 14:45:44 <edleafe> #link Placement Bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id 14:45:51 <edleafe> 3 new ones this week 14:46:13 <edleafe> we've already discussed 2 of 'em 14:46:26 <edleafe> the third has to do with some doc errors 14:46:59 <edleafe> So let's move on to the main topic 14:47:04 <edleafe> #topic Open Discussion 14:47:34 <edleafe> We're taking on a *lot* of stuff 14:47:48 <edleafe> cdent mentioned this in his last placement update 14:48:02 <edleafe> #link cdent's email on Placement Overload http://lists.openstack.org/pipermail/openstack-dev/2018-March/128924.html 14:48:18 <fried_bunny> One thing we can do is start looking at specs from a perspective of "Can we do this in Rocky" as opposed to "Do we want this and is it workable". I think we've been doing pretty much exclusively the latter thus for. 14:48:20 <fried_bunny> far 14:49:03 <edleafe> I don't know about that. Having a spec approved and then punting the development makes it simpler to revist in the next cycle 14:49:11 <fried_bunny> We may even consider re-evaluating blueprints we've already approved in that light. 14:49:38 <edleafe> Perhaps we should have some sort of "priorities list" 14:49:40 <fried_bunny> edleafe: Yeah, we can mark a blueprint approved for next cycle. 14:49:40 <edleafe> :) 14:49:59 <fried_bunny> but not sure how the paperwork works out for the spec being in specs/rocky/approved 14:50:10 <fried_bunny> Perhaps we want to introduce a specs/future/approved 14:50:24 <fried_bunny> and then they move into specs/<currentrelease>/approved when the bp gets approved for that release. 14:50:26 <edleafe> fried_bunny: it has to be re-proposed in Stein, taking into account any changes that will be needed 14:50:50 <fried_bunny> okay; specs/rocky/approved-but-not-for-this-release/ ? 14:51:01 <fried_bunny> something to separate 'em out. 14:51:04 <edleafe> IOW, I propose to do X by changing Y in a certain way, but before the next cycle, other things have changed Y. 14:51:34 <edleafe> fried_bunny: that's 14:51:35 <fried_bunny> yeah, I get it. 14:51:40 <edleafe> that's "backlog" 14:52:02 <fried_bunny> Sure. So what I'm saying is that right now we don't have a good demarcation of backlog vs nowlog 14:52:09 <edleafe> :) 14:52:13 <fried_bunny> which makes it hard to understand the content of a release. 14:52:30 <fried_bunny> and makes it look like we don't get done what we say we're going to get done. Which we *also* don't. 14:53:41 <fried_bunny> This is probably something to discuss in the nova meeting, with more cores and the PTL at hand. 14:54:01 <fried_bunny> do we have anything specifically in the scheduler area that we can do to limit scope? 14:54:26 <edleafe> But that's not unexpected. I don't think we have ever finished all our specs before the end of the cycle, and then sat around wondering how we will occupy our time 14:54:41 <fried_bunny> I assert that's a problem. 14:54:56 <fried_bunny> Not that we don't have spare time, but that we *never* finish the specs we approved for a cycle. 14:55:01 <fried_bunny> or even come close. 14:55:08 <fried_bunny> It's become the expectation, the status quo. 14:55:24 <edleafe> fried_bunny: yes, we have an etherpad 14:55:35 <edleafe> #link Rocky priority etherpad https://etherpad.openstack.org/p/rocky-nova-priorities-tracking 14:55:57 <edleafe> We need to get better about updating that sucker 14:56:45 <fried_bunny> 4 minutes. 14:58:09 <edleafe> ok, I'll put an item on next week's agenda to work on the etherpad. In the meantime, let's add the specs/reviews we feel are important to the etherpad, and we can order them next week 14:59:15 <edleafe> Any last-minute (literally) items? 14:59:40 <edleafe> #endmeeting