13:59:59 <jaypipes> #startmeeting scheduler 14:00:00 <openstack> Meeting started Mon Sep 17 13:59:59 2018 UTC and is due to finish in 60 minutes. The chair is jaypipes. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:03 <openstack> The meeting name has been set to 'scheduler' 14:00:15 <jaypipes> good morning/evening all. 14:01:04 * bauzas yawns 14:01:08 <jaypipes> tetsuro, edleafe, mriedem, dansmith, bauzas, gibi: hi 14:01:09 <gibi> o/ 14:01:11 <wkite> o/ 14:01:31 <mriedem> o/ 14:01:54 <jaypipes> #topic quick recap of placement/scheduler topics from PTG 14:02:26 * dansmith snorts 14:02:43 <jaypipes> #link https://etherpad.openstack.org/p/nova-ptg-stein 14:02:55 * bauzas gentle reminds that he has to leave for 20 mins at 1420UTC 14:03:07 <jaypipes> there were a number of placement-related topics (as always) at the PTG 14:03:41 <jaypipes> along with a fairly lengthy discussion on the status and milestones related to placement extraction 14:03:59 <jaypipes> edleafe: would you like to summarize the extraction bits? 14:05:25 <jaypipes> Ed may be on his way to the office, so let me try 14:06:28 <jaypipes> melwitt summarized the decisions regarding the governance items nicely in a ML post: 14:06:31 <jaypipes> #link http://lists.openstack.org/pipermail/openstack-dev/2018-September/134541.html 14:07:50 <jaypipes> That ML post lists the items that we're aiming to focus on to finalize the path for final extraction of placement. The items revolve around testing of the upgrade paths and implementing support for reshaper for the vGPU use cases 14:08:21 <jaypipes> bauzas is responsible for the libvirt vGPU reshaper efforts and naichuan Sun is responsible for the vGPU efforts for the Xen virt driver 14:08:57 <jaypipes> gibi: perhaps you might give a quick status report on the extraction patch series since I'm not familiar with the progress there? 14:09:31 <gibi> honestly I also needs to catch up on what is happening on the placement side 14:09:50 <gibi> what I know that we see green test results with the new repo 14:09:53 <mriedem> we need to do the grenade stuff 14:10:05 <gibi> yeah, next step is grenade I guess 14:10:07 <mriedem> first step is writing the db table copy and dump script 14:10:12 <mriedem> and then integrate that into grenade 14:10:40 <mriedem> i've got a patch up to grenade for adding a postgresql grenade job to the experimental queue as well so anyone adding pg support for the upgrade script can test it 14:10:59 <gibi> In parallel I would like to make nova functional test run with the extracted placement repo 14:11:02 <jaypipes> #link latest etherpad on placement extraction bits: https://etherpad.openstack.org/p/placement-extract-stein-3 14:11:24 <mriedem> #link the grenade postgresql job patch https://review.openstack.org/#/c/602124/ 14:12:23 <jaypipes> ok, thanks gibi and mriedem 14:12:47 <jaypipes> #topic placement and scheduler blueprints for Stein 14:13:36 <wkite> Hi, I am working on joint scheduler for nova and zun based on numa and pinned cpu, could anyone give me some advice? 14:14:11 <jaypipes> wkite: sure, in a little bit. let me get through the status parts of the meeting? 14:14:46 <gibi> I thought that we are pretty freezed at the moment regarding new features in placement 14:15:19 <jaypipes> gibi: there are plenty of blueprints targeting the placement and scheduler services in Stein, though 14:15:46 <mriedem> like https://blueprints.launchpad.net/nova/+spec/use-nested-allocation-candidates ! 14:16:01 <gibi> which I'm working on :) 14:16:02 <mriedem> that's just all nova-scheduler side stuff 14:16:16 <jaypipes> right. this meeting is still the scheduler meeting is it not? :) 14:16:24 <mriedem> i gues 14:16:32 <mriedem> can someone summarize the consumer gen thread? 14:16:37 <gibi> sure 14:16:55 <gibi> so placement 1.28 added the consumer generation for allocations 14:17:13 <gibi> to use this the scheduler report client needs some change 14:17:34 <bauzas> what's unfun is that we merged 1.30 (reshape) and used it before nova used 1.29 14:17:42 <bauzas> (nested alloc candidates) 14:17:52 <jaypipes> bauzas: I don't see why that matters. 14:17:56 <gibi> in general either nova creates a new consumer and there nova is sure that the generation is None 14:17:57 <bauzas> but 1.30 requires 1.29 to be implemented on the client side 14:18:17 <gibi> or nova updates and existing consumer and there nova ask placement about the generation of the consumer to be updated 14:18:29 <bauzas> jaypipes: because reshaping implies that a boot will fail unless nova speakes nested alloc candidates 14:18:32 <mriedem> because now we have more than just nova doing things, like in the bw providers case 14:19:11 <gibi> if in any case placement returns consumer generation conflict nova will fail the instance workflow operation 14:19:13 <bauzas> because resources can be on children 14:19:17 <mriedem> even though nova and neutron are working with the same consumer right? the instance uuid. 14:19:47 <gibi> neutron does not manipulate allocations 14:19:51 <gibi> just reporting inventories 14:19:52 <gibi> at the moment 14:20:04 <jaypipes> right. all allocation is done via claim_resources() 14:20:23 <gibi> somewhere in the future when a bandwidth of a port needs to be resized neutron might want to touch allocations 14:20:26 <gibi> but not now 14:21:09 <bauzas> ... and I need to disappear 14:21:32 <jaypipes> mriedem: are you asking gibi to summarize the entire consumer generation patch series? or something else? 14:21:34 <gibi> the implementation to support consumer generation is basically ready for review 14:22:04 <mriedem> i was asking to summarize the ML thread which contributes to the code series i assume 14:22:04 <gibi> the patch series starts here https://review.openstack.org/#/c/591597 14:22:22 <mriedem> i was wondering why we have such a big ML thread about this and what the big changes are to nova before i actually review any of this 14:22:42 <mriedem> if it's just, "this makes the nova client side bits (SchedulerReportClient) aware of consumer generations" that's fine 14:23:07 <gibi> mriedem: my concern was what to do if placement return consumer generation conflict. a) retry, b) fail hard c) fail soft, let user retry 14:23:14 <jaypipes> mriedem: it makes the nova client side *safe* for multiple things to operate on an allocation. 14:23:29 <gibi> the answer was mostly b) fail hard 14:23:32 <jaypipes> yep 14:23:46 <gibi> so the patch series is now makes consumer conflict a hard failure with instance ending up in ERROR state 14:23:49 <mriedem> what besides the scheduler during the initial claim is messing with the allocations created by the scheduler? 14:23:49 <jaypipes> gibi: which was the safest choice. 14:24:07 <jaypipes> mriedem: reshaper. 14:24:20 <gibi> mriedem: all the intance move operations by moving allocations from instance.uuid to migration.uuid and back in revert 14:24:31 <jaypipes> mriedem: along with anything that does migrations or resizes. 14:24:45 <mriedem> ok 14:25:12 <gibi> mriedem: the nasty things are force evacuate and force migrate 14:25:19 <jaypipes> as always. 14:25:22 <gibi> mriedem: they allocate outside of scheduler 14:25:44 <mriedem> yeah they still do the allocations though 14:25:46 <mriedem> like the scheduler 14:25:59 <mriedem> but with a todo, from me, since i think pike 14:26:18 <gibi> mriedem: yes, they do just via different code path 14:26:22 <mriedem> yup 14:26:30 <mriedem> ok we can probably move on - i just needed to get caught up 14:26:44 <jaypipes> mriedem: right, but they don't currently handle failures due to consumer generation mismatch, which is what Gibi's patch series does (sets instances to ERROR if >1 thing tries updating allocations for the same instance at the same time) 14:27:17 <jaypipes> ok, yes, let's move on. 14:27:32 <jaypipes> #topic open discussion 14:27:47 <jaypipes> #action all to review gibi's consumer generation patch series 14:27:59 <gibi> \o/ 14:28:20 <jaypipes> #link Gibi's consumer generation patch series; https://review.openstack.org/#/q/topic:bp/use-nested-allocation-candidates+(status:open+OR+status:merged) 14:28:32 <jaypipes> ok, open discussion now 14:28:38 <jaypipes> wkite: hi 14:28:55 <jaypipes> wkite: can you give us a brief summary of what you are trying to do? 14:29:07 <wkite> ok 14:31:11 <wkite> I am trying to use placement for save numa topology for nova and zun, so that the scheduler can get numa topology from placement and then do the schedue work 14:31:21 <wkite> for both nova and zun 14:32:17 <jaypipes> wkite: a NUMA topology isn't a low-level resource. It's not possible to "consume a NUMA topology" from placement because a NUMA topology is a complex, non-integer resource. 14:32:43 <wkite> jaypipes: yes 14:33:00 <jaypipes> wkite: now, if you were to consume some CPU resources or memory resources from a NUMA cell, now that is something we could model in placement. 14:33:02 <wkite> a json object 14:33:16 <jaypipes> wkite: we have no plans to allow resources in placement to be JSON objects. 14:33:21 <dansmith> WUT 14:33:25 <dansmith> but I wanna! 14:33:29 <jaypipes> dansmith: stop. :) 14:34:32 <jaypipes> wkite: the solution that we've discussed is to keep the NUMATopologyFilter inside the nova-scheduler to handle placement of a virtual guest CPU topology on top of a host NUMA topology. 14:35:29 <jaypipes> wkite: while keeping placement focused on atomic, integer resources. basically, placement is complementary to the nova-scheduler. for simple integer-based capacity calculations, placement is used. for complex placement/topology decisions, the nova/virt/hardware.py functions are called from within the nova-scheduler's NUMATOpologyFilter 14:35:59 <jaypipes> wkite: if you are creating a separate scheduler service for Zun, my advice would be to follow the same strategy. 14:37:43 <jaypipes> wkite: if you'd like to discuss this further, let's move the conversation to #openstack-placement and I can fill you in on how the nova and placement services interact with regards to NUMA topology decisions. 14:38:07 <wkite> I want to run both nova and zun on one host by sharing the pinned cpu, but what should do 14:39:11 <jaypipes> wkite: if you share a pinned CPU, it's no longer pinned is it? :) 14:40:17 <jaypipes> wkite: I mean... a pinned CPU is dedicated to a particular workload. if you then have another workload pinned to that same CPU, then the CPU is shared among workloads and is no longer dedicated. 14:40:52 <wkite> both of them use themself pinned cpu 14:41:21 <mriedem> so a vm uses [1,2] and a container uses [3,4]? 14:41:25 <mriedem> on the same host 14:41:29 <wkite> yes 14:41:45 <jaypipes> wkite: here is a spec you read that discusses dedicated and shared CPU resource tracking and our plans for this in Stein: https://review.openstack.org/#/c/555081/ 14:41:56 <jaypipes> that you should read... 14:42:28 <wkite> jaypipes: ok, thank you 14:42:47 <jaypipes> np. like I said, if you'd like to discuss this further, join us on #openstack-placement and we can discuss there. 14:42:51 <jaypipes> wkite: ^ 14:43:55 <jaypipes> ok, in other open discussion items... I still need to write the "negative member_of" spec. I'll do that today or tomorrow and get with sean-k-mooney on his nova-side specs for the placement request filters that will use negative member_of. 14:44:05 <wkite> jaypipes: ok 14:44:59 <jaypipes> does anyone have any other items to discuss? otherwise, I'll wrap up the meeting. 14:45:24 <tetsuro> jaypipes: I was wondering if I can take that negative member_of spec. 14:45:47 <jaypipes> tetsuro: sure, if you'd like it, I have not started it, so go for it. 14:45:57 <tetsuro> Thanks 14:46:20 <jaypipes> tetsuro: thank YOU. :) 14:46:27 <tetsuro> Since I have sorted out how the existing member of param works on nested/shared in https://review.openstack.org/#/c/602638/ 14:47:06 <tetsuro> I'd like to clear how negative member_of param should work as well 14:47:36 <tetsuro> And that should relate to the bug I opened^ 14:47:55 <tetsuro> np 14:49:23 <jaypipes> tetsuro: well, the negative member_of is just "make sure the provider trees that match an allocation candidate request group are NOT associated with one or more provider aggregates" 14:50:40 <jaypipes> tetsuro: and the bug you reference above is making sure that member_of refers to the aggregate associations of the entire provider tree instead of just a single provider, right? 14:50:57 <tetsuro> Yup, so my question is does an aggregate on root provider span on whole tree as well? 14:51:11 * bauzas is back 14:51:18 <tetsuro> for negative aggregate cases 14:51:33 <tetsuro> s/negative member_of cases 14:52:30 <tetsuro> if the root is on aggregate A and a user specifies !member_of=aggA, non of the nested rps under the root can be picked up. 14:52:45 <jaypipes> tetsuro: I believe in Denver we agreed that for the non-numbered request group, a member_of query parameter means that the provider *tree*'s associated aggregates are considered (in other words, look for the provider aggregates associated with the root_provider_id of providers). And for numbered request groups, the single provider_id (not root_provider_id) would be used for the member associations. 14:52:46 <tetsuro> s/non/none 14:53:31 <jaypipes> gibi, dansmith, mriedem: if you remember that conversation, could you verify my recollection is accurate for ^^ 14:54:00 <dansmith> aye 14:54:04 * bauzas nods 14:54:06 <dansmith> because the numbered ones are more specific 14:54:17 <dansmith> or more prescriptive 14:54:44 <bauzas> and I remember we said it's a bugfix 14:55:03 <tetsuro> That's my understanding, too. 14:55:06 <gibi> seems correct. this means a negative member_of on a root_rp in an unnumbered group means nothing from that tree 14:55:33 <tetsuro> Okay, thanks. 14:55:52 <jaypipes> cool. I actually remembered something from a PTG correctly. 14:56:02 <jaypipes> alright, tetsuro, you good to go on that then? 14:56:10 <jaypipes> tetsuro: I'll review your bug patch series today. 14:56:26 <tetsuro> Yes, I am good to go. 14:56:31 <jaypipes> awesome, thanks. 14:56:55 <jaypipes> OK, well, I'm going to wrap the meeting up then. thanks everyone. :) see you on #openstack-nova and #openstack-placement. 14:56:58 <jaypipes> #endmeeting