14:02:52 <cdent> #startmeeting nova_scheduler 14:02:53 <openstack> Meeting started Mon Jun 25 14:02:52 2018 UTC and is due to finish in 60 minutes. The chair is cdent. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:02:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:02:56 <openstack> The meeting name has been set to 'nova_scheduler' 14:02:59 <tssurya> o/ 14:03:00 <takashin> o/ 14:03:11 <cdent> #chair efried edleafe tssurya takashin 14:03:12 <openstack> Current chairs: cdent edleafe efried takashin tssurya 14:03:22 <efried> ō/ 14:03:39 <cdent> #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler 14:04:15 <cdent> #link last meeting http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-06-18-14.00.html 14:04:23 <bauzas> \o 14:04:28 <edleafe> \o 14:04:31 <cdent> #topic last meeting 14:04:43 <cdent> anything anyone want to address from the last meeting? 14:05:07 <cdent> #topic specs and review 14:05:07 <cdent> #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131752.html 14:05:19 <cdent> any pending work that people need to discuss in person? 14:05:23 <cdent> "in person" 14:06:45 <cdent> I guess that's a "no"? 14:06:51 <mriedem> o/ (late) 14:07:03 <cdent> you got something mriedem ? 14:07:07 <mriedem> no 14:07:09 <mriedem> just here 14:07:24 <cdent> cool 14:07:24 <cdent> #topic bugs 14:07:25 <cdent> #link placement bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id 14:09:13 <mriedem> https://bugs.launchpad.net/nova/+bug/1777591 is interesting 14:09:13 <openstack> Launchpad bug 1777591 in OpenStack Compute (nova) "‘limit’ in allocation_candidates where sometimes make fore_hosts invalid" [Undecided,In progress] - Assigned to xulei (605423512-j) 14:09:45 <mriedem> and is a similar problem i think to the rebuild + image-defined traits issue 14:10:10 <mriedem> https://review.openstack.org/#/c/569498/ 14:13:26 * bauzas raises fist at force_hosts 14:14:11 <efried> Before you put it in that context, my best stab at fixing this (actually I think it was cdent's idea) was to add a queryparam to GET /allocation_candidates letting you restrict to particular provider UUID(s). 14:14:55 <efried> mriedem: Do you have a different idea? 14:15:13 <mriedem> efried: i saw cdent's comment on the bug, and just said the same on the patch 14:15:17 <mriedem> and yes that's what we should do imo 14:15:23 <mriedem> we already have that on GET /resource_providers 14:15:51 <mriedem> i.e. we likely need the same in_tree param in GET /allocation_candidates 14:16:12 <efried> okey dokey. Stein, I'm assuming. 14:16:26 <mriedem> maybe, it is a bug fix 14:16:38 <mriedem> for what i think is a pretty common scenario (for admins) 14:16:57 <efried> I was thinking less from a procedural-bp-approval point of view and more of a we've-already-got-a-shit-ton-on-our-plate perspective. 14:16:58 <mriedem> there might be a backportable workaround... 14:17:12 <mriedem> yeah i understand, but we can't backport microversions 14:17:33 <efried> The workaround is "set your limit higher". 14:17:40 <mriedem> so unless we have a workaround we can hack into the scheduler, it's kind of a big deal - alternatively your workaround is just set the limit to -1 14:17:42 <bauzas> MHO is maybe we should just not ask for a limit if operators use force_hosts 14:17:54 <bauzas> that's their responsibility 14:19:54 <mriedem> so for rocky, 14:20:02 <efried> Are we suggesting in Rocky (and backportable) to override limit to -1 if force_hosts is set? 14:20:02 <efried> Or to document as a workaround that the operator should set a higher limit overall? 14:20:23 <mriedem> we could (1) update the max_placement_results option help for the known issue (2) add a known issues release note and (3) update the docs for force hosts to mention this as well https://docs.openstack.org/nova/latest/admin/availability-zones.html 14:20:27 <bauzas> force_hosts is only used by admins 14:20:37 <bauzas> so they know about the number of compute nodes they have 14:20:38 <mriedem> efried: i think it's a docs bug in rocky 14:20:46 <efried> bah, I will *never* get the operator/admin terminology right. 14:20:47 <mriedem> and we could fix in stein with a pre-request placement filter 14:20:58 <mriedem> i left comments on https://review.openstack.org/#/c/576693/ 14:21:32 <efried> fwiw, I don't think it's a terrible workaround to disable limit when force_hosts is used. 14:21:53 <edleafe> the two options seem exclusive 14:22:07 <efried> yes 14:22:20 <efried> One is automatic 14:22:32 <efried> And the manual one - isn't the limit done by a conf option? 14:23:04 <mriedem> it's still shitty for performance 14:23:10 <efried> So you would have to set it globally for the sake of a single spawn? 14:23:14 <mriedem> you could pull back 10K candidates just to find the one you want 14:23:21 <edleafe> you could add a host_id argument to limit the amount of returned a/cs to either 1 (if it has the resources) or 0 (if it doesn't) 14:23:27 <mriedem> yeah max_placement_results is config 14:23:48 <efried> Yes, you could, but that's better to do for that single call than to have to set it globally so it happens for *every* request. 14:23:56 <mriedem> efried: true 14:24:13 <edleafe> Just got a message from cdent: his network is down, and is trying to get back online 14:24:29 <efried> figured something like that, thx edleafe 14:25:22 <mriedem> efried: also left that alternative on https://review.openstack.org/#/c/576693/ 14:25:55 <mriedem> best thing probably is to start with a functional recreate 14:26:00 <mriedem> and then we can tease out that idea 14:26:31 <mriedem> 2 hosts, limit=1 and force to one of them 14:26:33 <efried> functional recreate is going to be tough, considering we can't predict the order in which results come back from the db. 14:26:42 <mriedem> default order is based on id isn't it? 14:26:50 <efried> I don't think it's that predictable. 14:27:10 <efried> Especially the way the code is set up now, where we shuffle alloc cands around in dicts all over the place while we're filtering/merging them. 14:27:34 <efried> It's definitely not *documented* as being based on ID if that's what you're asknig. 14:27:36 <efried> asking 14:27:40 <mriedem> so i guess randomize_allocation_candidates isn't really useful? 14:27:44 <efried> it is 14:27:45 <mriedem> "If False, allocation candidates 14:27:45 <mriedem> are returned in a deterministic but undefined order." 14:28:03 <efried> Right, meaning for the same env you would get back the same order every time. 14:28:22 <efried> but you can't rely on what that order would be. 14:28:33 <efried> And if one teensy thing about the env changes, the whole thing could be different. 14:29:49 <efried> also keep in mind that 2 hosts doesn't necessarily == 2 candidates IRL - because sharing & nested. 14:30:13 <efried> not that we have to do that in test 14:30:36 <mriedem> i wouldn't use sharing or nested in the functional test 14:30:44 <mriedem> but if it's a worry then i guess we just unit test it, but that kind of sucks 14:31:03 <mriedem> i don't know what else to do though 14:31:06 <efried> Not sure if we need a specific recreate for "NoValidHosts when using force_host". 14:31:31 <efried> We can have a functional test where the conf is set up for a lower limit and just prove we get all results (higher than that limit) when force_hosts is set. 14:32:07 <efried> I think I get what you're saying, though, that that test would then not be reusable for the "real" fix. 14:32:10 <mriedem> meh, that's probably fine for a unit test - if force_hosts, assert get allocation candidates was called w/o a limit 14:32:14 <efried> But I don't see an alternative offhand. 14:32:23 <mriedem> anyway, i think we can move on 14:32:56 <mriedem> since dansmith added max_placement_results he might have opinions too 14:33:11 <mriedem> but can take that to -nova after the meeting 14:33:21 <dansmith> makes sense 14:34:45 <efried> Any other bugs to bring up? 14:34:56 <efried> #topic open discussion 14:35:10 <efried> Bueller? Bueller? 14:35:55 <efried> Okay then. Keep calm and carry on. 14:35:58 <efried> #endmeeting