13:59:59 <jaypipes> #startmeeting scheduler
14:00:00 <openstack> Meeting started Mon Sep 17 13:59:59 2018 UTC and is due to finish in 60 minutes.  The chair is jaypipes. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:03 <openstack> The meeting name has been set to 'scheduler'
14:00:15 <jaypipes> good morning/evening all.
14:01:04 * bauzas yawns
14:01:08 <jaypipes> tetsuro, edleafe, mriedem, dansmith, bauzas, gibi: hi
14:01:09 <gibi> o/
14:01:11 <wkite> o/
14:01:31 <mriedem> o/
14:01:54 <jaypipes> #topic quick recap of placement/scheduler topics from PTG
14:02:26 * dansmith snorts
14:02:43 <jaypipes> #link https://etherpad.openstack.org/p/nova-ptg-stein
14:02:55 * bauzas gentle reminds that he has to leave for 20 mins at 1420UTC
14:03:07 <jaypipes> there were a number of placement-related topics (as always) at the PTG
14:03:41 <jaypipes> along with a fairly lengthy discussion on the status and milestones related to placement extraction
14:03:59 <jaypipes> edleafe: would you like to summarize the extraction bits?
14:05:25 <jaypipes> Ed may be on his way to the office, so let me try
14:06:28 <jaypipes> melwitt summarized the decisions regarding the governance items nicely in a ML post:
14:06:31 <jaypipes> #link http://lists.openstack.org/pipermail/openstack-dev/2018-September/134541.html
14:07:50 <jaypipes> That ML post lists the items that we're aiming to focus on to finalize the path for final extraction of placement. The items revolve around testing of the upgrade paths and implementing support for reshaper for the vGPU use cases
14:08:21 <jaypipes> bauzas is responsible for the libvirt vGPU reshaper efforts and naichuan Sun is responsible for the vGPU efforts for the Xen virt driver
14:08:57 <jaypipes> gibi: perhaps you might give a quick status report on the extraction patch series since I'm not familiar with the progress there?
14:09:31 <gibi> honestly I also needs to catch up on what is happening on the placement side
14:09:50 <gibi> what I know that we see green test results with the new repo
14:09:53 <mriedem> we need to do the grenade stuff
14:10:05 <gibi> yeah, next step is grenade I guess
14:10:07 <mriedem> first step is writing the db table copy and dump script
14:10:12 <mriedem> and then integrate that into grenade
14:10:40 <mriedem> i've got a patch up to grenade for adding a postgresql grenade job to the experimental queue as well so anyone adding pg support for the upgrade script can test it
14:10:59 <gibi> In parallel I would like to make nova functional test run with the extracted placement repo
14:11:02 <jaypipes> #link latest etherpad on placement extraction bits: https://etherpad.openstack.org/p/placement-extract-stein-3
14:11:24 <mriedem> #link the grenade postgresql job patch https://review.openstack.org/#/c/602124/
14:12:23 <jaypipes> ok, thanks gibi and mriedem
14:12:47 <jaypipes> #topic placement and scheduler blueprints for Stein
14:13:36 <wkite> Hi, I am working on joint scheduler for nova and zun based on numa and pinned cpu, could anyone give me some advice?
14:14:11 <jaypipes> wkite: sure, in a little bit. let me get through the status parts of the meeting?
14:14:46 <gibi> I thought that we are pretty freezed at the moment regarding new features in placement
14:15:19 <jaypipes> gibi: there are plenty of blueprints targeting the placement and scheduler services in Stein, though
14:15:46 <mriedem> like https://blueprints.launchpad.net/nova/+spec/use-nested-allocation-candidates !
14:16:01 <gibi> which I'm working on :)
14:16:02 <mriedem> that's just all nova-scheduler side stuff
14:16:16 <jaypipes> right. this meeting is still the scheduler meeting is it not? :)
14:16:24 <mriedem> i gues
14:16:32 <mriedem> can someone summarize the consumer gen thread?
14:16:37 <gibi> sure
14:16:55 <gibi> so placement 1.28 added the consumer generation for allocations
14:17:13 <gibi> to use this the scheduler report client needs some change
14:17:34 <bauzas> what's unfun is that we merged 1.30 (reshape) and used it before nova used 1.29
14:17:42 <bauzas> (nested alloc candidates)
14:17:52 <jaypipes> bauzas: I don't see why that matters.
14:17:56 <gibi> in general either nova creates a new consumer and there nova is sure that the generation is None
14:17:57 <bauzas> but 1.30 requires 1.29 to be implemented on the client side
14:18:17 <gibi> or nova updates and existing consumer and there nova ask placement about the generation of the consumer to be updated
14:18:29 <bauzas> jaypipes: because reshaping implies that a boot will fail unless nova speakes nested alloc candidates
14:18:32 <mriedem> because now we have more than just nova doing things, like in the bw providers case
14:19:11 <gibi> if in any case placement returns consumer generation conflict nova will fail the instance workflow operation
14:19:13 <bauzas> because resources can be on children
14:19:17 <mriedem> even though nova and neutron are working with the same consumer right? the instance uuid.
14:19:47 <gibi> neutron does not manipulate allocations
14:19:51 <gibi> just reporting inventories
14:19:52 <gibi> at the moment
14:20:04 <jaypipes> right. all allocation is done via claim_resources()
14:20:23 <gibi> somewhere in the future when a bandwidth of a port needs to be resized neutron might want to touch allocations
14:20:26 <gibi> but not now
14:21:09 <bauzas> ... and I need to disappear
14:21:32 <jaypipes> mriedem: are you asking gibi to summarize the entire consumer generation patch series? or something else?
14:21:34 <gibi> the implementation to support consumer generation is basically ready for review
14:22:04 <mriedem> i was asking to summarize the ML thread which contributes to the code series i assume
14:22:04 <gibi> the patch series starts here https://review.openstack.org/#/c/591597
14:22:22 <mriedem> i was wondering why we have such a big ML thread about this and what the big changes are to nova before i actually review any of this
14:22:42 <mriedem> if it's just, "this makes the nova client side bits (SchedulerReportClient) aware of consumer generations" that's fine
14:23:07 <gibi> mriedem: my concern was what to do if placement return consumer generation conflict. a) retry, b) fail hard c) fail soft, let user retry
14:23:14 <jaypipes> mriedem: it makes the nova client side *safe* for multiple things to operate on an allocation.
14:23:29 <gibi> the answer was mostly b) fail hard
14:23:32 <jaypipes> yep
14:23:46 <gibi> so the patch series is now makes consumer conflict a hard failure with instance ending up in ERROR state
14:23:49 <mriedem> what besides the scheduler during the initial claim is messing with the allocations created by the scheduler?
14:23:49 <jaypipes> gibi: which was the safest choice.
14:24:07 <jaypipes> mriedem: reshaper.
14:24:20 <gibi> mriedem: all the intance move operations by moving allocations from instance.uuid to migration.uuid and back in revert
14:24:31 <jaypipes> mriedem: along with anything that does migrations or resizes.
14:24:45 <mriedem> ok
14:25:12 <gibi> mriedem: the nasty things are force evacuate and force migrate
14:25:19 <jaypipes> as always.
14:25:22 <gibi> mriedem: they allocate outside of scheduler
14:25:44 <mriedem> yeah they still do the allocations though
14:25:46 <mriedem> like the scheduler
14:25:59 <mriedem> but with a todo, from me, since i think pike
14:26:18 <gibi> mriedem: yes, they do just via different code path
14:26:22 <mriedem> yup
14:26:30 <mriedem> ok we can probably move on - i just needed to get caught up
14:26:44 <jaypipes> mriedem: right, but they don't currently handle failures due to consumer generation mismatch, which is what Gibi's patch series does (sets instances to ERROR if >1 thing tries updating allocations for the same instance at the same time)
14:27:17 <jaypipes> ok, yes, let's move on.
14:27:32 <jaypipes> #topic open discussion
14:27:47 <jaypipes> #action all to review gibi's consumer generation patch series
14:27:59 <gibi> \o/
14:28:20 <jaypipes> #link Gibi's consumer generation patch series; https://review.openstack.org/#/q/topic:bp/use-nested-allocation-candidates+(status:open+OR+status:merged)
14:28:32 <jaypipes> ok, open discussion now
14:28:38 <jaypipes> wkite: hi
14:28:55 <jaypipes> wkite: can you give us a brief summary of what you are trying to do?
14:29:07 <wkite> ok
14:31:11 <wkite> I am trying to use placement for save numa topology for nova and zun, so that the scheduler can get numa topology from placement and then do the schedue work
14:31:21 <wkite> for both nova and zun
14:32:17 <jaypipes> wkite: a NUMA topology isn't a low-level resource. It's not possible to "consume a NUMA topology" from placement because a NUMA topology is a complex, non-integer resource.
14:32:43 <wkite> jaypipes: yes
14:33:00 <jaypipes> wkite: now, if you were to consume some CPU resources or memory resources from a NUMA cell, now that is something we could model in placement.
14:33:02 <wkite> a json object
14:33:16 <jaypipes> wkite: we have no plans to allow resources in placement to be JSON objects.
14:33:21 <dansmith> WUT
14:33:25 <dansmith> but I wanna!
14:33:29 <jaypipes> dansmith: stop. :)
14:34:32 <jaypipes> wkite: the solution that we've discussed is to keep the NUMATopologyFilter inside the nova-scheduler to handle placement of a virtual guest CPU topology on top of a host NUMA topology.
14:35:29 <jaypipes> wkite: while keeping placement focused on atomic, integer resources. basically, placement is complementary to the nova-scheduler. for simple integer-based capacity calculations, placement is used. for complex placement/topology decisions, the nova/virt/hardware.py functions are called from within the nova-scheduler's NUMATOpologyFilter
14:35:59 <jaypipes> wkite: if you are creating a separate scheduler service for Zun, my advice would be to follow the same strategy.
14:37:43 <jaypipes> wkite: if you'd like to discuss this further, let's move the conversation to #openstack-placement and I can fill you in on how the nova and placement services interact with regards to NUMA topology decisions.
14:38:07 <wkite> I want to run both nova and zun on one host by sharing the pinned cpu, but what should do
14:39:11 <jaypipes> wkite: if you share a pinned CPU, it's no longer pinned is it? :)
14:40:17 <jaypipes> wkite: I mean... a pinned CPU is dedicated to a particular workload. if you then have another workload pinned to that same CPU, then the CPU is shared among workloads and is no longer dedicated.
14:40:52 <wkite> both of them use themself pinned cpu
14:41:21 <mriedem> so a vm uses [1,2] and a container uses [3,4]?
14:41:25 <mriedem> on the same host
14:41:29 <wkite> yes
14:41:45 <jaypipes> wkite: here is a spec you read that discusses dedicated and shared CPU resource tracking and our plans for this in Stein: https://review.openstack.org/#/c/555081/
14:41:56 <jaypipes> that you should read...
14:42:28 <wkite> jaypipes: ok, thank you
14:42:47 <jaypipes> np. like I said, if you'd like to discuss this further, join us on #openstack-placement and we can discuss there.
14:42:51 <jaypipes> wkite: ^
14:43:55 <jaypipes> ok, in other open discussion items... I still need to write the "negative member_of" spec. I'll do that today or tomorrow and get with sean-k-mooney on his nova-side specs for the placement request filters that will use negative member_of.
14:44:05 <wkite> jaypipes: ok
14:44:59 <jaypipes> does anyone have any other items to discuss? otherwise, I'll wrap up the meeting.
14:45:24 <tetsuro> jaypipes: I was wondering if I can take that negative member_of spec.
14:45:47 <jaypipes> tetsuro: sure, if you'd like it, I have not started it, so go for it.
14:45:57 <tetsuro> Thanks
14:46:20 <jaypipes> tetsuro: thank YOU. :)
14:46:27 <tetsuro> Since I have sorted out how the existing member of param works on nested/shared in https://review.openstack.org/#/c/602638/
14:47:06 <tetsuro> I'd like to clear how negative member_of param should work as well
14:47:36 <tetsuro> And that should relate to the bug I opened^
14:47:55 <tetsuro> np
14:49:23 <jaypipes> tetsuro: well, the negative member_of is just "make sure the provider trees that match an allocation candidate request group are NOT associated with one or more provider aggregates"
14:50:40 <jaypipes> tetsuro: and the bug you reference above is making sure that member_of refers to the aggregate associations of the entire provider tree instead of just a single provider, right?
14:50:57 <tetsuro> Yup, so my question is does an aggregate on root provider span on whole tree as well?
14:51:11 * bauzas is back
14:51:18 <tetsuro> for negative aggregate cases
14:51:33 <tetsuro> s/negative member_of cases
14:52:30 <tetsuro> if the root is on aggregate A and a user specifies !member_of=aggA, non of the nested rps under the root can be picked up.
14:52:45 <jaypipes> tetsuro: I believe in Denver we agreed that for the non-numbered request group, a member_of query parameter means that the provider *tree*'s associated aggregates are considered (in other words, look for the provider aggregates associated with the root_provider_id of providers). And for numbered request groups, the single provider_id (not root_provider_id) would be used for the member associations.
14:52:46 <tetsuro> s/non/none
14:53:31 <jaypipes> gibi, dansmith, mriedem: if you remember that conversation, could you verify my recollection is accurate for ^^
14:54:00 <dansmith> aye
14:54:04 * bauzas nods
14:54:06 <dansmith> because the numbered ones are more specific
14:54:17 <dansmith> or more prescriptive
14:54:44 <bauzas> and I remember we said it's a bugfix
14:55:03 <tetsuro> That's my understanding, too.
14:55:06 <gibi> seems correct. this means a negative member_of on a root_rp in an unnumbered group means nothing from that tree
14:55:33 <tetsuro> Okay, thanks.
14:55:52 <jaypipes> cool. I actually remembered something from a PTG correctly.
14:56:02 <jaypipes> alright, tetsuro, you good to go on that then?
14:56:10 <jaypipes> tetsuro: I'll review your bug patch series today.
14:56:26 <tetsuro> Yes, I am good to go.
14:56:31 <jaypipes> awesome, thanks.
14:56:55 <jaypipes> OK, well, I'm going to wrap the meeting up then. thanks everyone. :) see you on #openstack-nova and #openstack-placement.
14:56:58 <jaypipes> #endmeeting