14:00:16 #startmeeting nova_scheduler 14:00:17 Meeting started Mon Oct 2 14:00:16 2017 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:20 The meeting name has been set to 'nova_scheduler' 14:00:26 Good UGT morning! Who's here? 14:00:27 o/ 14:00:35 o/ 14:00:38 o/ 14:00:41 \o 14:00:46 (at least for 25 mins) 14:02:22 Let's get started 14:02:23 #topic Specs 14:02:36 #link Return Alternate Hosts https://review.openstack.org/504275/ 14:02:36 #link Return Selection objects https://review.openstack.org/498830/ 14:02:44 i had comments on the first 14:02:46 These two are mine 14:02:51 mriedem: yeah, saw them 14:03:11 was waiting until the caffeine fully kicked in before jumping into that 14:03:21 the two main things i had left where, 14:03:47 1. you said you don't expect changes to the compute rpc, but then if we're not persisting the alternate hosts, how do you propose to pass the alternate hosts through for the reschedule loop? 14:04:04 superconductor -> scheduler -> superconductor -> compute -> cell conductor -> compute2 -> cell conductor -> compute3 14:04:18 the cell conductor / compute dance is going to need the alternate hosts passed through 14:04:29 yup, that was my point in the 2nd spec 14:04:36 I need to review then the first one 14:04:55 i haven't read the 2nd, i thought it was just about the object modeling 14:05:05 mriedem: yeah, I got that. Like I said, I'll get to it later 14:05:05 there are 2 possibilities, but AFAIR we said to pass by a parameter 14:05:15 the alternative was to use the ReqSpec object 14:05:16 Yes, the second is just the object. 14:05:29 i don't want to use the request spec 14:05:40 so yeah pass as an rpc parameter 14:05:50 if the compute is queens, it will get it and send it to conductor, 14:06:02 mriedem: yeah that was the consensus 14:06:04 if conductor gets called from a pike compute, it won't have alternate hosts and you're done 14:06:34 (even if I think we should still pass the ReqSpec record anyway :) 14:06:36 ok the only other thing on the first was i disagreed with the assertion that notifications will change 14:06:44 but that's pretty minor 14:07:39 Next up 14:07:39 #link nova-spec add traits to /allocation_candidates https://review.openstack.org/497713 14:07:41 edleafe: any reason why we have 2 specs ? 14:07:42 Still trying to get best name for the required traits query param 14:08:11 i have comments in ^ too, 14:08:13 bauzas: We really only needed the alternate hosts as a spec 14:08:24 agreed 14:08:28 i've given up on the name of the parameter, 'required' is fine i guess 14:08:34 it's been discussed at length 14:08:43 my -1 was on the behavior of passing ?required= 14:08:44 bauzas: but as there was some disagreement about what the selection object should look like, I wrote a spec so that we could get agreement 14:09:13 i did some testing and if you pass GET /servers?status='' to the compute api, we take that literally 14:09:13 mriedem: my objection to 'requires' was wearing my API-SIG hat 14:09:19 meaning give me all instances with status='' 14:09:32 and i don't think ^ is what we want 14:09:42 computers do tend to take things literally :) 14:10:08 so i was saying, we either do the same and ?required='' means give me all allocatoin candidates that specifically don't have any traits 14:10:11 which is weird, 14:10:25 or we just require that if you specify "required" it must be length > 0 14:10:55 since 'required' is a list of traits, sending '' means an empty list, no? 14:11:24 as a user, how do you expect that to be interpreted? 14:11:27 whereas 'status' is a single string 14:11:48 the spec says required='' would be ignored 14:12:01 "required=''" means "I don't require any traits" 14:12:29 then don't specify the parameter... 14:12:41 it just seems like very confusing API behavior 14:12:42 that would be preferable, sure 14:13:01 I’m confused on how this ever became an issue? 14:13:18 why was empty param ever proposed in the first place? 14:13:22 i said from early in the spec i didn't like treaing ?required='' as ignoring it 14:13:22 'status' is saying: match this status. 'required' is saying: "only give me results with these traits" 14:13:32 because alex said it would be easier to code client-side that way 14:14:06 edleafe: sure, and i think it could be reasonable for a user to think '' means give me RPs with *no* traits whatsoever 14:14:07 I don’t think that’s worth it 14:14:14 cdent: i don't either, it's ambiguous behavior 14:14:18 and inconsistent with the compute api 14:14:46 https://review.openstack.org/#/c/497713/2/specs/queens/approved/add-trait-support-in-allocation-candidates.rst@67 14:14:49 Since alex is away I can fix it up tonight or tomorrow morn 14:14:51 was alex's response to me about it 14:15:15 well i'd like to make sure we all either agree, or at least don't care 14:15:25 including jaypipes and johnthetubaguy since they are +2 on the spec 14:15:51 * cdent looks around 14:16:08 we can follow up after the meeting 14:16:13 i'll just leave the -1 for now 14:16:17 yeah, lets catch up after 14:16:22 next up are 2 from cdent 14:16:23 #link POST /allocations https://review.openstack.org/#/c/499259/ 14:16:23 Has 1 +2 14:16:23 #link Minimal cache headers in Placement https://review.openstack.org/#/c/496853/ 14:16:26 Also has 1 +2 14:17:00 first one requires https://review.openstack.org/#/c/508164/ 14:17:35 You beat me to it :) 14:17:37 #link Add spec for symmetric GET and PUT of allocations https://review.openstack.org/#/c/508164/ 14:18:25 I have a 4th spec in progress which I think is getting in the weeds about details without agreement on the goal: (limiting /allocation_candidates) https://review.openstack.org/#/c/504540/ 14:18:26 mriedem: I'd prefer it be an error to provide required= 14:18:35 jaypipes: ok same here 14:18:43 mriedem: but I don't feel it enough to hold up the spec. 14:18:46 as I say on there I’m going to rewrite the spec to be more limited unless there are objections 14:18:55 cdent: no, that sounds good 14:19:32 jaypipes, mriedem: give me a signal if you want me to update alex’s spec 14:19:43 let's confirm with john after the meeting 14:20:25 ok then - last spec that I have: 14:20:26 #link Re-propose nested resource providers spec https://review.openstack.org/#/c/505209/ 14:20:29 This is going down the rabbit hole of NUMA and NFV permutations 14:20:44 yeah i saw that.... 14:20:51 * cdent passes edleafe the semphore flags 14:20:52 i was happyish when it was just a re-proposal, 14:20:58 cdent: go for it if it'll make mriedem happy. 14:21:01 and then it turned into the generic device mgmt thing 14:21:12 * cdent is all about making mriedem happy 14:21:35 would i be correct in saying the nested rp's spec is less of a re-proposal now and requires a new read through? 14:21:48 mriedem: no. 14:21:49 didn't we decide a simple first step on nested, I can't remember what that was, bandwith monitoring? 14:21:50 yeah 14:21:53 mriedem: +1 14:21:54 mriedem: it's a re-proposal. 14:22:10 we said no numa use cases to start at the ptg 14:22:18 johnthetubaguy: no, PCI devices. 14:22:24 I added a little clarifying language to placate some comments, but nothing substantial changed 14:22:26 johnthetubaguy: without nUma 14:22:36 jaypipes: ++ 14:22:47 edleafe: ok. the amount of commenting back and forth made me think it was changing quite a bit 14:22:54 will re-review 14:22:57 mriedem: no, it has not. 14:23:01 (changed much) 14:23:09 it's more about the usage of what we could do 14:24:18 Anyone else have any specs to discuss that I missed? 14:24:29 I have that ironic one (finds link) 14:24:38 * bauzas needs to disappear for 20 mins 14:24:55 https://review.openstack.org/#/c/507052/ 14:25:05 basically using traits with ironic 14:25:28 I think the deeper question, is who should set the traits? second, how should they do that? 14:25:35 I mean generally 14:25:43 #link Support traits in the Ironic driver #link Re-propose nested resource providers spec https://review.openstack.org/#/c/505209/ 14:25:50 is the answer: Operator + PlacementClient 14:25:55 edleafe: thanks 14:25:57 #undo 14:25:58 Removing item from minutes: #link https://review.openstack.org/#/c/505209/ 14:26:25 #link Support traits in the Ironic driver https://review.openstack.org/#/c/507052/ 14:26:32 * edleafe can't copy/paste 14:27:25 Moving on then... 14:27:26 #topic Reviews 14:27:29 #link Add traits to GET /allocation_candidates https://review.openstack.org/479776 14:27:30 so at the PTG we were talking more about copying what we do with ResourceClasses, which is the set them in Ironic (using inspector, etc) then push up to placement in the ironic virt driver 14:27:58 OK, no response on that one then, for the moment? 14:28:30 in general, 14:28:35 johnthetubaguy: I'll take a look at the spec later 14:28:38 i'd like nova to not be a proxy for ironic 14:28:57 mriedem: yeah, I am ++ that 14:29:02 i haven't reviewed either spec, but it sounds like you might be proposing that ironic-inspector could create the traits? 14:29:36 so that is how the resource class is set, but its only talking ot the ironic API today 14:29:37 which could get us into some split brain scenario, but maybe not terrible if the rp generation changes when traits are added/removed 14:29:40 do we know if ^ is true? 14:30:27 AFAIK, no, traits don't change generation. cdent? 14:30:32 thinking about this, inspector can't talk to placement, as it will race nova-compute creating the RP 14:30:34 i need to look, one sec 14:30:59 generation is updated on set trait 14:31:26 johnthetubaguy: well, one would just get a 409 14:31:28 which could be ignored 14:31:49 mriedem: inspector only really runs once when you are commissioning the node 14:31:54 cdent: ah, you're right. I was looking at the trait creation code 14:31:58 but that's why i wondered about the generation, because we don't want to the user to chnage traits on the node via ironic api, and then have inspector and nova-compute racing to update the traits 14:31:59 johnthetubaguy: well the ironic virt driver would need to do the same thing with traits as it currently does with node.resource_class 14:32:03 mriedem: it can go in a retry loop, but thats nasty 14:32:23 why can't ironic itself just call placement? 14:32:25 jaypipes: thats the current proposal, I think its the safest, but nasty approach 14:32:37 johnthetubaguy: whatever method Ironic wants to use to expose those traits, fine by us. The Ironic virt driver will just need to ask Ironic for that information the way it needs to. 14:32:53 johnthetubaguy: in other words, decouple any expectations between Nova and Ironic :) 14:33:17 don't forget about mogan :) 14:33:17 jaypipes: that makes the virt driver a bit of a proxy though, which is what we don't really want 14:33:26 anything you do as a proxy in the nova ironic driver, they will copy in mogan 14:33:30 It sounds like the question is in part about whether ironic will talk to placement itself, or nova will talk to ironic to talk to placement 14:33:39 I think we should encourage the former 14:33:41 johnthetubaguy: that's exactly how we handle libvirt, hyper-v, etc, though 14:34:01 jaypipes: they report traits already? 14:34:05 no 14:34:14 but they will for some standard types i think 14:34:17 johnthetubaguy: libvirt does a get_devices('pci') and looks at capabilities crap and jams it into the resources dict returned from get_avilable_resource(). hyperv and xen do similar but different underlying calls. 14:34:20 like vgpus 14:34:34 johnthetubaguy: they put them in extra specs now, but yes. 14:34:55 johnthetubaguy: and I mean cpu_info 14:35:08 I mean its like ironic and capabilities now, which they want to drop once traits is working 14:35:09 johnthetubaguy: which is matched with the extra specs for compute capabilities filter. 14:35:12 the stuff an operator would put on an ironic node for traits would be mostly custom traits, yes? 14:35:29 mriedem: not all, but most 14:35:39 gotta dash, will catch up on log later 14:35:39 mriedem: could be. or could be standardized if the hardware folks decide on some standard traits. 14:35:40 * cdent waves 14:36:01 i'm just trying to compare what we would be doing for libvirt/hyperv wrt custom traits, and i guess the answer is flavor extra specs? 14:36:27 well, so ironic will use the same flavor extra specs for the request 14:36:44 mriedem: well, right now all the virt drivers "inspect" the host in their own way and jam capabilities into the cpu_info key of the resource dict returned by get_avaialable_resource() 14:37:37 mriedem: we'd like to change that to have the virt drivers passed a ProviderTree and have them associate traits with the providers in that tree and then have the RT and scheduler report client do the needful as far as communicating with placement. 14:37:41 mriedem: make sense? 14:37:45 and we're going to translate those capabilities to traits on the compute node resource provider? 14:38:03 mriedem: no, the virt drivers would tag a provider with a set of traits 14:38:33 jaypipes: you mean in get_inventory() ? 14:38:35 mriedem: which is what they do today only instead of tagging a provider with traits, they just return a JSON blob of randomness in the resources dict returned from get_available_resource() 14:38:41 johnthetubaguy: yep. 14:38:59 jaypipes: right and then ComputeCapabilitiesFilter processes that json bloc 14:39:00 *blob 14:39:06 johnthetubaguy: though with the nested resource providers series, we're changing that to update_inventory() but yeah, same conecepot 14:39:07 or whatever other scheduler filter cares 14:39:12 mriedem: yup 14:39:20 mriedem: and we'd love to get rid of that awfulness. 14:39:37 so I think jaypipes might (mostly) like the spec, I think mriedem might (mostly) hate the spec, in its current state 14:39:40 mriedem: nope, it's just the cOmputecapabilitiesfilter. 14:39:53 jaypipes: and whatever out of tree filters i mean :) 14:39:57 jaypipes: that is good to know, these ironic patches would sit behind that I guess 14:39:59 mriedem: ah, yeah 14:40:01 johnthetubaguy: well, given ^ i might hate it less 14:40:06 johnthetubaguy: sorry, which spec are we referringh to? 14:40:18 https://review.openstack.org/#/c/507052 14:40:19 johnthetubaguy: the ironic driver being consistent with the other drivers is ok by me 14:40:19 I've lost track of which spec... :( 14:40:26 ah! 14:40:38 mriedem: it sounds more consistent than I expected, which is nice 14:40:42 johnthetubaguy: well I had not even seen that spec yet! :) 14:40:50 johnthetubaguy: so I don't know if I like it or not ;) 14:40:58 * johnthetubaguy mission accomplished. 14:41:00 :) 14:41:04 * jaypipes reviews now 14:41:32 cool, thanks, I think that helps me feel better about plan A for the spec 14:43:48 Well now... we're running a bit long today, so rather than go through the reviews one-by-one, let me dump them all and if any tickle your interest, we can discuss 14:43:51 #link Add traits to get RPs with shared https://review.openstack.org/478464/ 14:43:54 #link Allow _set_allocations to delete allocations https://review.openstack.org/#/c/501051/ 14:43:57 #link WIP - POST /allocations for >1 consumer https://review.openstack.org/#/c/500073/ 14:44:00 #link Use ksa adapter for placement https://review.openstack.org/#/c/492247/ 14:44:03 #link Migration allocation fixes https://review.openstack.org/#/q/topic:bp/migration-allocations+status:open 14:44:06 #link Add Alternate Hosts https://review.openstack.org/#/c/486215/ 14:44:09 #link Add Selection objects https://review.openstack.org/#/c/499239/ 14:44:12 #link Return Selection objects from the scheduler driver https://review.openstack.org/#/c/495854/ 14:44:24 well, since it's spec sprint today, should we be primarily focused on those? 14:44:53 jaypipes: tomorrow is the spec review sprint 14:44:59 oh... 14:45:04 posted late to the ML this morning, sorry 14:45:09 * jaypipes steps back, suitably reprimanded. 14:45:17 feel free to review specs all day :) 14:47:50 No interest in discussing any of these reviews? 14:48:15 Moving on then... 14:48:15 #topic Bugs 14:48:17 #link placement server needs to retry allocations, server-side https://bugs.launchpad.net/nova/+bug/1719933 14:48:18 Launchpad bug 1719933 in OpenStack Compute (nova) "placement server needs to retry allocations, server-side" [Medium,Triaged] 14:48:43 Just the one new one that arose from mriedem's tests with large numbers of builds 14:49:05 which i'm trying to recreate here https://review.openstack.org/#/c/507918/ 14:49:43 edleafe: that TODO is not really relevant any more. 14:50:01 jaypipes: why? 14:50:12 edleafe: we now *do* attempt 3 retries when claiming resources. 14:50:21 jaypipes: client side, 14:50:24 and we've seen that failing 14:50:25 edleafe: handling automatically the concurrent update detected. 14:50:27 hence the bug 14:50:38 ah, sorry... 14:50:52 * jaypipes steps back, suitably reprimanded. 14:50:56 dansmith and i were doing large(r) scale multi-create testing last week 14:51:10 if you try to create >100 instances at once, you start hitting the limits of that client side retry 14:51:27 mriedem: gotcha. 14:51:31 https://review.openstack.org/#/c/507918/ is trying to recreate in devstack so we can get logs to see where the conflict is coming from, 14:51:39 because it's using the fake driver which shouldn't be updating inventory 14:51:39 mriedem: if OK with you, I'd like to take that bug then. 14:51:41 * bauzas is bacvk 14:51:43 go nuts 14:51:55 mriedem: since I have a series that has cleaned up DB stuff in the resource_provider.py module 14:53:03 Well, we have 7 minutes left for: 14:53:06 #topic Open Discussion 14:53:17 Anything that we haven't covered? 14:53:17 i have another bug fix 14:53:21 https://review.openstack.org/#/c/507687/ 14:53:23 #undo 14:53:24 Removing item from minutes: #link https://review.openstack.org/#/c/507687/ 14:53:25 ^ needs to go to pike 14:53:39 fixes 2 bugs 14:53:56 * edleafe is curious that #undo doesn't undo topics 14:54:51 #link Remove dest node allocations during live migration rollback https://review.openstack.org/#/c/507687/ 14:55:25 Anything else to discuss? 14:56:03 OK, thanks everyone! 14:56:05 #endmeeting