21:00:44 <efried> #startmeeting nova
21:00:45 <openstack> Meeting started Thu Jul 18 21:00:44 2019 UTC and is due to finish in 60 minutes.  The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:46 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:48 <openstack> The meeting name has been set to 'nova'
21:01:10 <takashin> o/
21:01:19 <dustinc_OSCON> o/
21:01:24 <efried> https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting
21:02:07 <dustinc_OSCON> Only half here, on my phone at a conference.
21:02:48 <artom> Here, but will cowardly flee halfway through to pick up kids
21:03:04 <efried> foreshadowing of PTG
21:03:31 <efried> okay, might as well rip through this
21:03:32 <efried> #topic Last meeting
21:03:32 <efried> #link Minutes from last meeting: http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-07-11-14.01.html
21:03:36 <efried> any old bidniss?
21:04:21 <efried> #topic Release News
21:04:21 <efried> Spec freeze in one week (July 25th)
21:04:21 <efried> #link spec review etherpad (I'm still tracking this) https://etherpad.openstack.org/p/nova-spec-review-day
21:04:21 <efried> Look for things that aren't Action: author -- these are our responsibility until spec freeze
21:04:45 <efried> #topic Bugs (stuck/critical)
21:04:45 <efried> No Critical bugs
21:04:45 <efried> #link 69 new untriaged bugs (+2 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New
21:04:45 <efried> #link 4 untagged untriaged bugs (-1 since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW
21:04:59 <efried> #topic Gate status
21:04:59 <efried> #link check queue gate status http://status.openstack.org/elastic-recheck/index.html
21:04:59 <efried> #link 3rd party CI status TBD
21:05:28 <efried> You may see infra node counts crest 700 now (+100 ish) thanks to donnyd and fortnebula
21:05:30 <artom> Gate status reminds me, need to hunt for a second +2 for https://review.opendev.org/#/c/670848/
21:05:58 <donnyd> efried: still working on getting to 100
21:06:02 <donnyd> at 60 ATM
21:06:11 <efried> artom: can we get that passing zuul?
21:06:44 <artom> efried, yeah, doh :( But I'm at the mercy of rechecks
21:07:31 <efried> btw, donnyd I've noticed a fair number of timeouts on jobs being run on fortnebula. That ^ is one of them. Here's another: https://review.opendev.org/#/c/671341/
21:07:43 <efried> no idea if that's coincidence or anything you can tweak or what.
21:08:00 <efried> but thought you would want to know
21:08:10 <efried> #topic Reminders
21:08:10 <efried> any?
21:08:30 <efried> #topic Stable branch status
21:08:30 <efried> #link stable/stein: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/stein
21:08:30 <efried> #link stable/rocky: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/rocky
21:08:30 <efried> #link stable/queens: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/queens
21:08:41 * melwitt irc client weirdly finally showed the messages in this channel
21:08:55 <efried> o/ melwitt, anything to add on topics so far?
21:09:05 <melwitt> nay
21:09:11 <efried> #topic Sub/related team Highlights
21:09:11 <efried> Placement (cdent)
21:09:11 <efried> #link latest pupdate http://lists.openstack.org/pipermail/openstack-discuss/2019-July/007716.html
21:10:02 <efried> not a lot of change since last week. Looking for update to the
21:10:02 <efried> #link NUMA topology in placement spec https://review.opendev.org/#/c/552924/
21:10:19 <efried> ...to make use of recent placement features.
21:10:36 <efried> API (gmann)
21:10:36 <efried> Updates on ML- #link http://lists.openstack.org/pipermail/openstack-discuss/2019-July/007877.html
21:10:43 <melwitt> so that spec will be the one for consuming placement things in order to implement vGPU with affinity?
21:11:11 <melwitt> ok, I think so
21:11:41 <efried> melwitt: more broadly, nova modeling more things, including *CPU and MEMORY_MB, in placement, so we can do affinity at GET /a_c time rather than late in the NUMATopologyFilter.
21:11:56 <efried> s/more things/more NUMA things/
21:12:22 <donnyd> efried: yea its from the storage being to slow, still waiting on gear to fix it
21:12:28 <efried> ack, thanks donnyd
21:12:33 <melwitt> efried: ok, so is a new/different spec expected for the vGPU stuff specifically?
21:13:01 <efried> melwitt: you refer to
21:13:01 <efried> #link NUMA affinity with VGPU https://review.opendev.org/#/c/650963/
21:13:05 <melwitt> yes
21:13:17 <efried> I've been recalcitrant on that one because I don't see it moving us forward.
21:13:38 <efried> bauzas has convinced me that there are pieces of it that would make sense to do anyway
21:13:48 <efried> so if he wanted to pare it down to just those pieces, I would get out of the way.
21:14:11 <melwitt> I understand. I'm trying to determine what is the way forward with consuming the placement stuff for the vGPU affinity
21:14:41 <melwitt> like do we draw up a spec for that? or is the spec you mentioned earlier what would cover it? or other?
21:15:42 <efried> I could see the first spec covering VCPU, PCPU, MEMORY_MB, and VGPU
21:15:48 <efried> we have all the placement tools to make that work.
21:15:58 <artom> Maybe one thing at a time?
21:16:08 <efried> that --^ is "one thing" IMO
21:16:29 <efried> we can't really reshape proc/mem resources into child providers without also dealing with the existing PGPU child provider
21:16:35 <artom> I get that they're all related, and that you can't do affinity until you have them all
21:17:11 <efried> well, it makes no sense to do VCPU without MEMORY_MB or vice versa, and the incremental cost of both vs either is negligible really.
21:17:23 <artom> Yeah, so cpus and mem together
21:17:33 <efried> incremental cost of putting VGPU in the mix is also less than if we tried to do it separately IMO.
21:17:34 <artom> But GPUs can stay on the compute node, no?
21:17:42 <efried> VGPUs are not on the compute RP.
21:17:49 <efried> they're already in a child PGPU provider.
21:18:07 <artom> Sorry, right, but that child is a child of the compute node, not a NUMA node
21:18:16 <efried> correct, because we don't have numa nodes yet
21:18:44 <artom> So that can stay as is until we move VCPUs and MEM?
21:18:46 <efried> so we could leave it alone, and it would be sitting there as a peer of numa nodes, and we couldn't support affinity with it, and we would have to do another reshape later to put it under the right NUMA node.
21:18:59 <artom> We don't support affinity for it now...
21:19:21 <artom> Are reshapes expensive?
21:19:23 <efried> but IMO that's more painful than just moving the PGPU provider at the same time as we create & reshape to the NUMA providers.
21:19:46 <efried> in the sense that you can only do them on release boundaries (as the current rules are written)
21:19:49 <artom> My gut is that many less complex reshapes are better than 1 big one
21:21:13 <efried> the two efforts are going to be intertwined no matter what. Whichever effort you implement second is going to have to accommodate changes made by the first.
21:22:38 <efried> If the PGPU provider sits parallel with the NUMA providers, then the PGPU affinity spec still needs to deal with the fact that the procs/mem it's looking at are no longer on the root RP.
21:22:38 <efried> And if you implement the VGPU affinity thing first, you'll have to change that code likewise ^ when you do the reshape & affinity work for the proc/mem splitup
21:22:48 <efried> but sure, we could do that.
21:22:53 <efried> to what advantage?
21:23:15 <efried> anyway, this is probably moot until we have actual proposals to look at.
21:23:16 <artom> Ah, you're assuming that GPUs would get reshapred first
21:23:26 <efried> no, I cited either direction above.
21:23:31 <artom> (which makes sense, given that melwitt is pushing for it)
21:24:02 <artom> Yeah, it's moot. I'll only be happy to be proven wrong
21:24:23 <artom> I'm still scarred by that GPU reshape I reviewed a while ago
21:24:52 <efried> Realistically, considering this would not be a trivial spec, and considering it effectively hasn't been started yet, I don't see proc/mem split landing in Train.
21:25:15 <melwitt> I don't think I'm pushing for anything in particular, just that we (RH) would like to deliver NUMA affinity with vGPU and I wanted to find out what are the next steps as far as which spec we need
21:26:28 <efried> At this point I don't even remember which bit of the vgpu affinity spec bauzas convinced me was still useful.
21:27:41 <melwitt> yeah, I don't know about that. but I was wondering should we be updating that spec to be about consuming placement features for vGPU affinity or is there an already existing spec we need to update or do we need a new spec
21:29:07 <efried> Okay, I will try to answer this specific question.
21:29:07 <efried> Without splitting proc/mem into numa RPs, there's nothing that can be done placement-wise for vgpu affinity, so https://review.opendev.org/#/c/650963/ is the only way vgpu affinity is happening.
21:29:43 <melwitt> it sounds like it would be either the "all the things" spec or a new spec that is only about the vGPU part, with the latter being of questionable easier-ness
21:29:51 <melwitt> I see, ok
21:30:09 <melwitt> thanks
21:31:52 <efried> The general numa-modeling-in-placement spec (https://review.opendev.org/#/c/552924/) could be taken in two different directions: a) proc/mem; or b) proc/mem/vgpu. If a), then we would eventually need a third spec to describe VGPU affinity via placement-isms
21:32:46 <efried> but if b) then we could do away with https://review.opendev.org/#/c/650963/ entirely
21:32:46 <efried> or
21:32:46 <efried> implement https://review.opendev.org/#/c/650963/ and then have to retrofit it when we do b)
21:33:01 <efried> dead horse at this point?
21:33:17 <efried> #topic Stuck Reviews
21:33:17 <efried> any?
21:33:27 <efried> (other than perhaps the vgpu affinity one :P )
21:33:57 <efried> #topic Review status page
21:33:57 <efried> #link http://status.openstack.org/reviews/#nova
21:33:57 <efried> Count: 461 (-0); Top score: 1433 (+21)
21:33:57 <efried> #help Pick a patch near the top, shepherd it to closure
21:34:07 <efried> #topic Open discussion
21:34:34 <efried> any?
21:34:40 <efried> Anything else to discuss?
21:35:24 <efried> Thanks all
21:35:24 <efried> o/
21:35:24 <efried> #endmeeting