14:00:35 <edleafe> #startmeeting nova_scheduler 14:00:36 <openstack> Meeting started Mon Jul 10 14:00:35 2017 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:40 <openstack> The meeting name has been set to 'nova_scheduler' 14:00:44 <edleafe> Good UGT morning! 14:00:48 <edleafe> Who's here? 14:01:08 <dtantsur> o/ 14:01:58 <cdent> o/ 14:02:10 <edleafe> jaypipes, alex_xu, bauzas - around? 14:02:16 <alex_xu> o/ 14:02:17 <jaypipes> yuppers. 14:03:14 <edleafe> Guess we'll start 14:03:15 <edleafe> #topic Specs and Reviews 14:03:27 <edleafe> There is a new spec 14:03:33 <edleafe> or rather an amendment to one 14:03:35 <edleafe> #link Amend spec for Custom Resource Classes in Flavors: https://review.openstack.org/#/c/481748/ 14:04:01 <edleafe> This was going to be done by jroll 14:04:08 <edleafe> Looks like it's now on me 14:04:33 <cdent> i can probably be your off hours buddy on that? 14:04:36 <jaypipes> edleafe: didn't you already have code for that? 14:04:43 <jaypipes> I thought I remember reviewing that already? 14:04:46 <edleafe> jaypipes: for my half, yes 14:05:02 <edleafe> jroll was going to handle what needed to happen for migration 14:05:16 <edleafe> so that when Pike starts up, the correct resources are allocated 14:05:16 <jaypipes> ah 14:05:31 <dtantsur> FYI I've tried it with a devstack change, and still cannot make the tests pass: https://review.openstack.org/#/c/476968/. It may be my mistake, of course, or it may be this missing migration 14:05:47 <edleafe> dtantsur: tried what? 14:06:00 <dtantsur> edleafe: sorry :) using resource classes for scheduling ironic instances 14:06:29 <edleafe> OK, I haven't looked at that patch. 14:06:36 <edleafe> I'll take a look at it later 14:07:09 <jaypipes> I will as well. 14:07:25 <jaypipes> both the spec and the patch 14:08:18 <edleafe> jaypipes: do we have the code merged to use the custom RC? 14:08:31 <edleafe> I know it was mine, but I thought there was another piece needed 14:08:37 <jaypipes> edleafe: oh yes, since Ocata. 14:08:51 <jaypipes> edleafe: oh, sorry, you're talking about the flavor thing 14:09:10 <jaypipes> edleafe: not sure on the flavor thing... need to check 14:09:12 <edleafe> jaypipes: yeah, the patch I wrote grabbed the custom RC from extra_specs 14:09:22 <edleafe> and added it to the 'resources' dict. 14:09:31 <jaypipes> right 14:10:49 <edleafe> Well, I'll be digging into what's needed for the migration. And I'd be happy to have cdent's help (and anyone else's) 14:11:04 <bauzas> oh snap, forgot meeting \o 14:11:07 <cdent> you know where to find me and I’ll look for you 14:11:21 * edleafe waves to bauzas 14:11:21 <jaypipes> stalker alert! 14:11:32 * bauzas bows to edleafe 14:11:32 <edleafe> :) 14:11:44 <edleafe> OK, next up... 14:11:47 <edleafe> #link Claims in the Scheduler: https://review.openstack.org/#/c/476632/ 14:12:06 <edleafe> The first part is +W'd, so this is the only active one 14:12:56 <edleafe> jaypipes: anything to note? 14:13:09 <jaypipes> edleafe: I'll respond to mriedem's comments on there. 14:13:16 <edleafe> ok 14:13:17 <jaypipes> edleafe: did you have further comments on it? 14:13:57 <bauzas> technically, we have not yet merged the bottom patch but okay 14:14:36 <edleafe> jaypipes: I haven't looked at it since Friday morning, so when I do I'll respond on the patch 14:14:55 <jaypipes> k 14:14:59 <edleafe> Oh, I almost forgot to note: 14:15:01 <edleafe> #link Devstack to use resource classes by default https://review.openstack.org/#/c/476968/ 14:15:22 * edleafe wants to keep the record up-to-date 14:15:45 <edleafe> Moving on... 14:15:46 <edleafe> #link Nested Resource Providers: series starting with https://review.openstack.org/#/c/470575/ 14:15:59 <edleafe> This is still pretty much on hold, right? 14:17:08 * edleafe pokes jaypipes 14:17:16 <jaypipes> edleafe: yeah 14:17:27 <jaypipes> edleafe: it will pick up steam once claims are in. 14:17:30 <edleafe> ok, just making sure 14:17:35 * mriedem joins late 14:17:44 <jaypipes> edleafe: and I add some more functional testing around the scheduler -> conductor -> compute interactions. 14:17:44 <edleafe> Finally... 14:17:47 <edleafe> #link Placement api-ref docs https://review.openstack.org/#/q/topic:cd/placement-api-ref+status:open 14:18:18 <edleafe> jaypipes: let us know how we can help (besides reviews, of course) 14:19:24 <edleafe> Anything else for specs/reviews? 14:19:39 <alex_xu> the traits support in the allocation candidates are submitted 14:19:58 <alex_xu> #link the first patch https://review.openstack.org/478464 14:20:10 <alex_xu> #link the last one https://review.openstack.org/#/c/479776/ 14:21:08 <jaypipes> mriedem: responded to your comments on ^ 14:21:20 <jaypipes> mriedem: sorry, on https://review.openstack.org/#/c/476632/ 14:21:33 <edleafe> OK, thanks alex_xu - added to my review list 14:21:45 <alex_xu> edleafe: I also remember there is one patch from you for 'GET /resources' with traits 14:21:53 <alex_xu> edleafe: thanks 14:22:25 <mriedem> jaypipes: ok, i guess i'm missing something then because when originally planning this all out, 14:22:39 <mriedem> i thought we were going for some minimum nova-compute service version check before doing allocations in the scheduler 14:22:48 <mriedem> such that we would no longer do the claim in the compute 14:23:51 <mriedem> once we do the allocation in the scheduler, the claim in the compute is at best redundant but not a problem, 14:24:05 <mriedem> at worst the claim fails because of something like the overhead calculation 14:24:19 <mriedem> or pci or whatever we don 14:24:24 <mriedem> *don't handle yet in the scheduler 14:25:46 <jaypipes> mriedem: we can do the *removal of the claim on the compute node* once we know all computes are upgraded. but that's a different patch to what's up there now, which just does the claim in the scheduler. 14:27:48 <edleafe> jaypipes: so if the scheduler starts doing claims, will that cause a problem with older computes? 14:27:54 <jaypipes> edleafe: no. 14:28:06 <edleafe> Or will the compute claim just be a duplicate 14:28:19 <mriedem> it's a duplicate 14:28:21 <jaypipes> edleafe: not even duplicate. it just won't be done. 14:28:35 <mriedem> what do you mean it won't be done? 14:28:39 <jaypipes> edleafe: b/c the report client only writes allocations that are not already existing. 14:28:39 <edleafe> jaypipes: even on an old compute? 14:28:53 <jaypipes> edleafe: yes. on ocata computes, we already do this. 14:29:03 <mriedem> writing the allocations is part of the claim process that happens on the compute *today* yes? 14:29:08 <edleafe> jaypipes: ok, I'll have to re-read that code 14:29:15 <jaypipes> mriedem: yes, and the periodic audit job. 14:29:23 <mriedem> but before we have the RT call the report client to write allocations, we're doing pci and overhead calculations 14:29:41 <jaypipes> mriedem: correct. 14:29:41 <mriedem> so we are still going to go through the same old claim process 14:29:47 <mriedem> which may fail, and trigger a retry 14:30:06 <jaypipes> mriedem: correct. if that happens, the allocations are deleted from the placement API. 14:30:12 <mriedem> where? 14:30:18 <jaypipes> in the periodic audit job. 14:30:25 <jaypipes> update_available_resource() 14:30:29 <jaypipes> will pick that uyp. 14:31:12 <mriedem> when does the alternates stuff for retries come in? 14:31:24 <mriedem> on top of https://review.openstack.org/#/c/476632/ ? 14:31:44 <cdent> even if something writes allocations for the same instance multiple times, it is a replace action 14:31:59 <cdent> PUT /allocations/consumer_uuid is replace 14:32:02 <jaypipes> mriedem: yes, the alternatives stuff needs to come after this. 14:32:58 <jaypipes> cdent: right, but we look up existing allocations first and do nothing if nothing changed: https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L863 14:33:20 <cdent> jaypipes: yeah, I know, I was just saying that’s it’s safe even if that wasn’t happening 14:33:26 <jaypipes> gotcha 14:33:36 <mriedem> ok so if we leave the allocation cleanup to the periodic task, 14:33:59 <mriedem> there is a chance you could "fill up" allocations for a compute node after a couple of failed attempts within a minute or something, 14:34:09 <jaypipes> mriedem: yep. 14:34:14 <mriedem> which if you've got a lot of compute nodes and a busy cloud, should be ok... 14:34:42 <jaypipes> mriedem: and I wrote in that comment that I could try and "undo" successful allocations in the scheduler _claim_resources() method, but that meh, eventually it'll get cleaned up by the periodic audit task on the compute 14:35:31 <mriedem> i have a bad feeling about relying on that 14:35:44 <mriedem> especially when someone does nova boot with min-count 100 14:36:04 <mriedem> e.g. you get to 99 and novalidhost, and we don't cleanup the allocations for the first 98 14:37:02 <jaypipes> mriedem: I'm happy to take a go at that cleanup if you'd like. 14:37:06 <dansmith> the retry part of conductor could accelerate that 14:37:08 <jaypipes> mriedem: just say the word. 14:37:10 <mriedem> will needing to undo allocations in the scheduler slow it down for other incoming requests? we're still single worker right? 14:37:34 <mriedem> dansmith: in this case we wouldn't get to conductor, 14:37:35 <dansmith> mriedem: single worker but we yield when making a call to placement 14:37:36 <mriedem> it's novalidhost 14:37:42 <jaypipes> mriedem: there's no reason at all why the scheduler needs to be single process. 14:38:05 <dansmith> mriedem: you mean for a failed boot that never gets retried? 14:38:53 <mriedem> dansmith: yes 14:38:57 <mriedem> scheduler raises NoValidHost 14:39:21 <dansmith> okay I'm confused about why we'd still have stale allocations in that case 14:39:29 <dansmith> but we can discuss outside of the meeting 14:39:31 <jaypipes> dansmith: he's talking about this code: 14:39:38 <mriedem> https://review.openstack.org/#/c/476632/19/nova/scheduler/manager.py@128 14:39:43 <jaypipes> ya 14:39:46 <jaypipes> danke mriedem 14:40:08 <dansmith> oh I see, just in the n-instances case, I gotca 14:40:30 <jaypipes> mriedem: like I said, I'm happy to give a go at cleaning up already-successful allocations in that block. 14:40:38 <jaypipes> mriedem: just say the word. 14:40:41 <dansmith> cleanup there would be easy I think, yeah 14:40:52 <mriedem> in general i think we should cleanup when we can 14:40:59 <jaypipes> yeah, I'll just keep track of the instance UUIDs that succeeded. 14:41:03 <dansmith> yep 14:41:04 <mriedem> including when we retry from the comptue to the conductor with the alternates 14:41:26 <jaypipes> mriedem: well, and we'll eventually want to be retrying *within* the scheduler. 14:41:29 <dansmith> mriedem: yeah that's the case I was thinking of and have always described it as "cleanup the old, claim the next alternate" 14:41:36 <jaypipes> but whatevs, I hear ya, I'll fix that section up. 14:41:44 <dansmith> jaypipes: no, we can't retry in the scheduler once we've failed on the compute node 14:41:58 <jaypipes> dansmith: retry on the allocation_request... 14:42:00 <mriedem> i think jay is talking about pre-compute 14:42:05 <mriedem> yeah 14:42:07 <dansmith> that, yes 14:42:07 <jaypipes> right. 14:42:19 <dansmith> figured he meant: [07:41:04] <mriedem> including when we retry from the comptue to the conductor with the alternates 14:42:30 <jaypipes> yeah, sorry, no I mean the allocation candidates thing. 14:42:37 <mriedem> retrying within the scheduler is the whole reason we decided to do it in the scheduler and not conductor 14:42:41 <dansmith> ack 14:42:44 <mriedem> so yeah we should do that :) 14:42:45 <edleafe> well, that's not really a retry when the scheduler can't claim 14:42:46 <dansmith> yeah 14:42:52 <edleafe> just validating the host 14:43:06 <jaypipes> anyway, mriedem, besides the cleaning up successful allocations in that failure block, is there anything big you want changed on the patch? if not, I'll go and work on this. 14:43:24 <mriedem> jaypipes: i think you already replied on my other things 14:43:29 <jaypipes> the other little nits I'll get, yep 14:43:39 <edleafe> Let's continue this in -nova 14:43:43 <mriedem> btw, we create the allocations after the filters right? 14:43:44 <edleafe> #topic Bugs 14:43:57 <edleafe> #undo 14:43:58 <openstack> Removing item from minutes: #topic Bugs 14:44:20 <jaypipes> mriedem: yes. 14:44:24 <jaypipes> mriedem: and the weighers. 14:44:26 <bauzas> sorry was a bit afk 14:44:36 <bauzas> but I have a point about the above 14:44:47 <edleafe> Let's keep it quick 14:45:27 <bauzas> given the time we still have for Pike, do folks agree with me about possibly not having the conductor passing alternatives for Pike ? 14:45:45 <dansmith> no I don't agree 14:45:54 <jaypipes> bauzas: no, I think it's absoltelyuy doable for Pike to have the alternatives done. 14:46:14 <edleafe> me too 14:46:17 <bauzas> would it be a problem not having that for Pike ? 14:46:20 <jaypipes> bauzas: I think we can have claims merged and ready by Wednesday and patches up for alternatives by EOW 14:46:38 <bauzas> while I agree with all of us about why it's important, I'm just trying to be pragramatic 14:46:41 <dansmith> bauzas: yes, without that we're toast for the proper cellsv2 arrangement 14:47:01 <cdent> yeah, we pretty much have to do it 14:47:01 <dansmith> bauzas: we can be pragmatic when we're out of time, but we're not there, IMHO 14:47:12 <bauzas> okay 14:47:14 <jaypipes> we need to get alternatives done, flavors for resource classes complete, and claims done. 14:47:19 <dansmith> ack 14:47:21 <jaypipes> those are absolutes for Pike. 14:47:37 <jaypipes> nested stuff is nice to have, and we've made a bit of progress on it already. 14:47:39 <bauzas> and shared-RP, and custom-RP? :) 14:47:47 <bauzas> yeah, that's my point 14:47:48 <mriedem> shared is done 14:48:02 <bauzas> well, agreed 14:48:02 <mriedem> allocation candidates takes care of shared, at least for disk 14:48:16 <edleafe> mriedem: well, not completely done 14:48:29 <edleafe> mriedem: we don't handle complex RPs 14:48:35 <jaypipes> mriedem: well, almost... still need a way to trigger the compute node to not want to claim the disk when shared provider is used... 14:48:37 <bauzas> okay, tbc, I don't disagree with the direction, I'm just trying to see what is left for Pike 14:48:39 <edleafe> mriedem: like a compute with both local and shared 14:48:41 * alex_xu puts the trait's priority low, focus on review the priority stuff 14:48:50 <jaypipes> mriedem: but that is a short patch that all the plumbing is ready for. 14:49:04 <jaypipes> edleafe: we don't *currently* handle that. 14:49:15 <jaypipes> edleafe: so that's not something I'm worried about yet 14:49:24 <edleafe> jaypipes: exactly - which was going to be the subject I wanted to discuss in Opens 14:49:29 <jaypipes> kk 14:49:33 <edleafe> but we are quickly running out of time 14:49:46 <jaypipes> there is always #openstack-nova, ed :) 14:50:08 * edleafe blinks 14:50:12 <edleafe> Really?? 14:50:15 <edleafe> :) 14:50:31 <bauzas> anyway 14:50:38 <bauzas> I don't want to confuse people 14:50:47 <edleafe> Let's try to move on again... 14:50:48 <edleafe> #topic Bugs 14:50:48 <edleafe> #link https://bugs.launchpad.net/nova/+bugs?field.tag=placement 14:50:58 <edleafe> Only one new bug: 14:50:58 <edleafe> #link The AllocationCandidates.get_by_filters returned wrong combination of AllocationRequests https://bugs.launchpad.net/nova/+bug/1702420 14:51:00 <openstack> Launchpad bug 1702420 in OpenStack Compute (nova) "The AllocationCandidates.get_by_filters returned wrong combination of AllocationRequests" [High,In progress] - Assigned to Alex Xu (xuhj) 14:51:01 <edleafe> alex_xu reported this one, and is working on it. 14:51:08 <edleafe> alex_xu: any problems with that? 14:51:21 <alex_xu> edleafe: no, just waiting for review 14:51:31 <edleafe> great 14:51:39 <edleafe> Anything else on bugs? 14:52:16 <edleafe> #topic Open Discussion 14:52:28 <edleafe> I had one concern: the change to return a list of HostState objects from the scheduler driver to the manager. IMO, we really need the host to be associated with its Allocation object so that a proper claim can be made. The current design just returns hosts, and then picks the first allocation that matches the hosts RP id. 14:52:38 <edleafe> In the case of a host that has both local and shared storage, there will be two allocation candidates for that host. The current design will choose one of those more or less at random. 14:52:45 <edleafe> Jay has said that when we begin to support such complex RPs, we will make the change then. Since we are changing the interface between manager and driver now, wouldn't it be best to do it so that when we add complex RPs, we don't have to change it again? 14:53:07 <dansmith> if you haven't requested a trait of shared or not-shared, then at-random is fine right? 14:53:30 <edleafe> dansmith: in that case, yes 14:53:51 <edleafe> but in the case of local vs. public net for PCI, probably not 14:54:09 <jaypipes> to be clear, the code just selects the first allocation request containing the host's RP ID. so yeah, there's no order to it. 14:54:41 <mkucia> Hi. I am wondering how the driver will be handling ResourceProviders? Will there be a dedicated class (ResourceProviderDriver) for each provider type? 14:54:41 <dansmith> in the case of network, if your flavors say "give me a pci net device but I don't care which kind" then you're asking for at random, no? 14:54:43 <edleafe> jaypipes: you can keep the randomness for now 14:54:44 <dansmith> agree it would be a dumb thing to do, but.. 14:55:00 <edleafe> jaypipes: I was concerned about having to change the interface yet again in Queens 14:55:23 <edleafe> dansmith: again, in that particular case, you would be correct 14:55:32 <edleafe> but that's not my point 14:55:40 <jaypipes> edleafe: this is an internal interface. I'm not concerned at all about that. 14:55:48 <mriedem> me neither 14:55:55 <mriedem> and this is no worse than what we have today right? 14:56:09 <jaypipes> edleafe: I mean, we need to change the RPC interface for alternatives support, and that's major surgery. This stuff was just a botox injection compared to that. 14:56:12 <mriedem> i'm more concerned about the <3 weeks to FF 14:56:18 <dansmith> mriedem: ++ 14:56:23 <bauzas> botox, heh 14:56:26 <edleafe> ok, fine. 14:56:37 <bauzas> mriedem: me too, hence my previous point 14:56:40 <jaypipes> edleafe: you agree the RPC change is much more yes? 14:56:45 <edleafe> It just wasn't what we had originally discussed, and it raised a flag for me 14:56:52 <edleafe> jaypipes: of course 14:56:54 <jaypipes> understod. 14:57:17 <jaypipes> understood, edleafe and I appreciate your concerns on it. As you saw, I went through a bunch of iterations on thinking about those internal changes 14:57:50 <jaypipes> edleafe: but returning the HostState objects instead of the host,node tuples allowed us to isolate pretty effectively the claims code in the manager without affecting the drivers at all. 14:58:17 <edleafe> As long as we all realize that this will have to change yet again in Queens, sure 14:58:35 <cdent> change is and always will be inevitable 14:58:42 <jaypipes> edleafe: certainly it may. but again, I'm less concerned about internal interfaces than the RPC ones. 14:58:43 <edleafe> how trite 14:58:48 * cdent is trite 14:58:55 <cdent> always has been, always will be 14:59:31 <edleafe> jaypipes: I was more concerned about saying we will do X, and finding Y 14:59:32 <bauzas> 1 min left 14:59:51 <edleafe> As long as we get to X eventually 15:00:05 <edleafe> That's it - thanks everyone! 15:00:07 <edleafe> #endmeeting