13:59:59 <efried> #startmeeting nova_scheduler 14:00:00 <openstack> Meeting started Mon Jul 16 13:59:59 2018 UTC and is due to finish in 60 minutes. The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:03 <openstack> The meeting name has been set to 'nova_scheduler' 14:00:14 <cdent> o_ 14:00:17 <takashin> o/ 14:00:23 <gibi> o/ 14:00:24 <efried> Get up cdent! 14:00:24 <alex_xu> o/ 14:00:32 * cdent is tired 14:00:34 <edleafe> \o 14:01:26 <efried> #topic last meeting 14:01:38 <efried> #link last minutes: http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-07-09-14.01.html 14:01:52 <efried> Any old business to bring up? 14:02:51 <jaypipes> o/ 14:02:54 <tssurya> o/ 14:02:56 <efried> #topic specs and review 14:03:03 <efried> #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-July/132252.html (Thanks for covering this, Jay) 14:03:27 <efried> Highest priority at the moment is the 14:03:27 <efried> #link reshaper series: https://review.openstack.org/#/q/topic:bp/reshape-provider-tree+status:open 14:04:08 <efried> cdent: jaypipes and I are working through stuff this morning. 14:04:15 <efried> s/:/,/ 14:04:47 <cdent> I _almost_ managed to reshape something. but not quite 14:05:04 <jaypipes> cdent: what did you run into? 14:05:20 <jaypipes> cdent: you mean reshaping with gabbits? or something else? 14:05:23 <cdent> resource provider generation conflict in set_allocations 14:05:35 <cdent> yes, reshaping over http 14:05:47 <cdent> I noted it in my latest push. it's the last test in reshaper.yaml 14:05:52 <jaypipes> ack 14:05:56 <jaypipes> ok, will look shortly. 14:06:06 <efried> Last week we stuffed all of the patches into a single series; we'll see if multiple-authors-on-a-series is more or less painful than maintaining dependencies in other ways. 14:07:11 <efried> Note that there's a 14:07:11 <efried> #link reshaper spec update: https://review.openstack.org/582350 14:07:11 <efried> open to collect design tweaks as needed for implementation. 14:07:27 * efried re -Ws that... 14:08:19 <efried> Going to hold that open until we're pretty much done. Folks should feel free to submit patch sets there. 14:08:46 <efried> Any other specs or reviews people would like to highlight at this time? 14:08:57 <efried> or discussion needed on the above? 14:09:02 <jaypipes> not from me, no 14:09:10 <cdent> carry on 14:09:25 <efried> #topic bugs 14:09:25 <efried> #link placement bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id 14:10:34 <efried> We've closed the more urgent consumer gen related bugs, which was probably the most important/urgent thing. 14:10:56 <jaypipes> https://bugs.launchpad.net/nova/+bug/1781439 should be an easy one for people looking for low-hanging-fruit 14:10:57 <openstack> Launchpad bug 1781439 in OpenStack Compute (nova) "Test & document 1.28 (consumer gen) changes for /resource_providers/{u}/allocations" [Undecided,New] 14:11:00 * jaypipes adds tag in LP 14:11:34 <jaypipes> I'll shoot a note to the ML 14:11:51 <efried> I'd like to point out that bug #1781430 should still be fixed, but is no longer in the critical path since latest tweaks to the reshaper db patch. 14:11:51 <openstack> bug 1781430 in OpenStack Compute (nova) "AllocationList.delete_all() incorrectly assumes a single consumer" [High,In progress] https://launchpad.net/bugs/1781430 - Assigned to Jay Pipes (jaypipes) 14:12:46 <efried> Any other bugs anyone would like to highlight or discuss at this time? 14:13:33 <jaypipes> efried: err, isn't that bug fixed in the bottom patch of the reshaper series? 14:13:46 <jaypipes> i.e. https://review.openstack.org/#/c/582382/ 14:13:59 <efried> jaypipes: Yes it is, but not yet merged, and no longer essential for the series. 14:13:59 <jaypipes> efried: or were you thinking of a different bug? 14:14:13 <efried> jaypipes: I don't anticipate it coming to this, but if necessary, it could be yanked out of the series. 14:14:23 <gibi> efried: that fix is on the gate now :) 14:14:29 <jaypipes> k 14:14:32 <efried> gibi: Thanks. That's the easiest thing :) 14:15:08 <efried> moving on... 14:15:15 <efried> #topic opens 14:15:15 <efried> Planning/Doing support in nova/report client for: consumer generation handling 14:15:56 <cdent> I added that because I figured we needed the reminder, but I don't have much to say beyond that 14:16:30 <efried> This entails scouring the rt/reportclient for stuff related to allocations, bumping the calls to microversions that handle consumer generations, and possibly handling retries/races accordingly. 14:17:12 <efried> I'm guessing cdent jaypipes efried will probably not have the bandwidth to look at this until reshaper winds down. If anyone else wishes to jump in here, that help would be welcomed. 14:17:50 <gibi> efried, cdent: I will try to keep that on my radar 14:17:59 <efried> Thanks gibi 14:18:14 <efried> next up: 14:18:14 <efried> Planning/Doing support in nova/report client for: nested and shared providers when modifying migration (and other?) allocations 14:18:48 <efried> Similar in spirit to the above, though probably considerably more complicated. 14:19:31 <gibi> efried: also I think a bit depends on the former as the nested ac is a higher microversion than the consumer gen 14:20:14 <cdent> yeah, that too was "we needed the reminder" 14:20:44 <efried> gibi: Well, the nested microversion (1.29) only affects GET /a_c; whereas the consumer gen (1.28) affects the [resource_providers/{u}]/allocations[/{u}] paths. 14:20:49 <efried> So they can still be done mutually exclusively. 14:20:53 <efried> However, 14:21:27 <efried> I think we should broaden this topic (or add a precursor) for making sure our initial allocations work with nested + shared in the first place. 14:21:49 <efried> We have one example where we've proven shared works on initial allocations - that's libvirt with shared DISK_GB. 14:22:28 <efried> I believe gibi is working on a patch for some func tests along these lines with nrp. Finding... 14:22:49 <gibi> efried: here is the patch https://review.openstack.org/#/c/527728/ it shows that we need 1.29 for nested a_C 14:23:02 <efried> beaut. 14:24:34 <efried> next up: 14:24:34 <efried> Planning/Doing support in nova/report client for: whatever else is not being remembered right now 14:24:34 <efried> Can anyone think of more things we should do on the rt/reportclient side to exploit work we've done in placement lately? 14:25:07 <cdent> I think we need to be clear to separate what we must do from what we'd like to do. I'm not clear where that boundary currently is 14:25:22 <efried> Agree with that. 14:25:59 <efried> We know we need reshaper before we can support nrp for vgpu or numa. That's an easy one. 14:26:23 * cdent nods 14:27:26 <gibi> also we need 1.28 and 1.29 support in the report client to support nrp at all outside of reshaping situations 14:27:40 * mriedem forgot the meeting started 14:27:43 <efried> Do we need 1.28 though? 14:28:02 <efried> Do we actually need consumer gens for anything from nova right now, considering the big lock in the rt? 14:28:28 <gibi> efried: I feel dangerous getting back allocation candidates from 1.29 and passing it back those to /allocations < 1.28 14:29:08 <efried> That's fair. 14:30:35 <efried> So that reminds me: generation support for aggregate operations. 14:31:10 <efried> #link Check provider generation and retry on conflict: https://review.openstack.org/#/c/556669/ 14:31:44 <efried> This *is* crucial because we *do* have a race on aggregates, since we're mirroring host aggs in the api service. 14:32:10 <efried> This patch has been through the wringer a fair bit, but I think it's ready to be reviewed now. 14:32:29 <efried> cdent, jaypipes, mriedem: you have had eyes on this previously; would you mind having another look please? 14:32:38 <cdent> yessir 14:33:11 <mriedem> currently working on fixing https://bugs.launchpad.net/nova/+bug/1781710 14:33:11 <openstack> Launchpad bug 1781710 in OpenStack Compute (nova) "ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity failing with "Servers are on the same host"" [High,Triaged] - Assigned to Matt Riedemann (mriedem) 14:33:15 <mriedem> but yeah for later 14:34:24 <efried> So I think making sure both initial and migrating allocs work for shared+nested <= this is in the critical path for nrp support, possibly even more important than reshaper because it affects even initial nrp impls (those not requiring reshaper to get started). 14:35:17 <efried> cdent: Sounds like the answer is: It's all "must do" o_O 14:35:34 <cdent> yay? 14:35:50 <gibi> business as usual 14:36:47 <efried> If nobody else has volunteered by the time reshaper winds down, I'll probably hit that last one, because I'm going to be implementing such a driver (initial nrp not requiring reshaper). 14:37:40 <cdent> i'll have a more clear picture in a few days 14:38:41 <efried> Okay; any other open discussion topics? 14:39:59 <efried> Here's one then: anyone feel like (co-)proposing a placement (or other scheduler) topic for Berlin? 14:40:59 <cdent> I'm trying really hard to avoid presenting at summit. 14:41:32 <cdent> jaypipes: you continuing your plan of not going? 14:42:08 <gibi> efried: I and others proposed one for bandwidth 14:44:39 <efried> Okay. I'm about 20% motivated to propose another placement update like the last one. It went well, but not sure the value:effort ratio is high enough. 14:45:07 <cdent> efried: check with me about joining for one wherever the post berlin one is. I should be off the tc by then 14:45:18 <efried> roger wilco. 14:45:28 <efried> Okay, any other topics before we close? 14:45:48 <cdent> let's call it 14:46:15 <efried> Thanks all. 14:46:15 <efried> #endmeeting