14:00:19 <efried> #startmeeting nova_scheduler 14:00:19 <openstack> Meeting started Mon Oct 8 14:00:19 2018 UTC and is due to finish in 60 minutes. The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:23 <openstack> The meeting name has been set to 'nova_scheduler' 14:00:33 <takashin> o/ 14:00:37 <jaypipes> o/ 14:01:07 * gibi cannot join this time 14:01:19 * efried strikes gibi from agenda 14:02:21 <efried> Bueller? Bueller? 14:02:41 <efried> guess we'll get started and let people wander in. 14:02:47 <efried> #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting 14:03:00 <efried> #topic last meeting 14:03:00 <efried> #link last minutes: http://eavesdrop.openstack.org/meetings/nova_scheduler/2018/nova_scheduler.2018-10-01-14.00.html 14:03:00 <efried> Any old business? 14:03:08 <mriedem> o/ 14:03:48 <efried> #topic specs and review 14:03:48 <efried> #link latest pupdate: http://lists.openstack.org/pipermail/openstack-dev/2018-October/135475.html 14:03:59 * bauzas waves a bit late 14:04:20 <efried> Anything to call out from the pupdate? (Will talk about extraction a bit later) 14:04:58 <efried> #link Consumer generation & nrp use in nova: Series now starting at https://review.openstack.org/#/c/605785/ 14:04:58 <efried> No longer in runway. Was gonna ask gibi the status of the series, but he's not attending today. 14:05:10 <efried> Bottom patch has some minor fixups required 14:05:18 <efried> an interesting issue raised by tetsuro 14:05:54 <mriedem> i need to look at that one, 14:06:02 <mriedem> since i talked with gibi about the design before he wrote it 14:06:13 <efried> which is that we can't tell whether the destination is nested in many cases until after we've already decided to schedule to it. 14:06:31 <bauzas> also some concern about how the filters could check the computes in case allocation candidates are only about nested RPs 14:06:33 <efried> which means we don't know whether we need to run the scheduler until... we've run the scheduler. 14:06:50 <mriedem> i have the same problem with the resize to same host bug 14:07:20 <efried> well, if you know the host already, you can go query that host to see if he's nested. And if so, you have to run the scheduler. 14:07:24 <bauzas> I had a concern on changing the behaviour in https://review.openstack.org/#/c/605785/9/nova/compute/api.py@4375 14:07:56 <bauzas> if we want to call the scheduler anyway, we should have a new microversion IMHO 14:08:19 <mriedem> bauzas: gibi and i had talked about the behavioral changes, but we didn't think a new microversion would be needed here, 14:08:31 <mriedem> but it's messy i agree, 14:08:43 <bauzas> heh 14:08:47 <bauzas> huh* even 14:08:48 <mriedem> we already broke the force behavior in pike when we made sure we could claim allocations for vcpu/disk/ram on force 14:08:59 <mriedem> here we're breaking that if nested 14:09:11 <mriedem> the more we depend on claims in the scheduler, the less we can honor force 14:09:21 <bauzas> if we want to stop forcing a target (wrt I'm fine with), I just think we should still signal it for operators 14:09:41 <efried> Can we add a column to the hosts table caching whether the host uses nested/sharing? 14:09:52 <bauzas> like, you wanna still not call the scheduler ? fair enough, just don't ask for 2.XX microversion 14:10:05 <bauzas> >2.XX even 14:10:11 <jaypipes> why does it matter if we go from a non-nested host to a nested host? I mean, if the nested host supports the original requested resources and traits, who cares? 14:10:17 <mriedem> i don't think we want to allow people to opt into breaking themselves 14:10:19 <efried> bauzas: But if we don't call the scheduler, we literally *can't* schedule to a nested host 14:10:43 <bauzas> efried: how can I target a nested resource provider ? 14:10:49 <bauzas> could someone give me examples ? 14:10:52 <efried> jaypipes: a) How would you know if it does? b) if any of the resources are in child providers, you need GET /a_c to give you a proper allocation request. 14:11:03 <bauzas> operators target compute services 14:11:05 <jaypipes> efried: and? 14:11:24 <efried> and that (calling GET /a_c rather than just cloning the alloc onto the dest) is a behavior change. 14:11:35 <mriedem> we should probably table this until gibi is around to talk about it, because i know he and i talked about a bit of this before he started this code 14:11:36 <jaypipes> efried: if the scheduler returns a destination, we use it. who cares if the resources ended up being provided by child providers or not. 14:12:01 <efried> that's the point. The scheduler returns a destination if we call the scheduler. 14:12:10 <efried> We're talking about a code path where previously we *didn't* call the scheduler. 14:12:12 <efried> IIUC. 14:12:15 <mriedem> jaypipes: the question is when you force and bypass the scheduler 14:12:32 <jaypipes> ah... force_host rears its ugly-ass head yet again. 14:12:37 <mriedem> yes 14:12:41 <bauzas> not force_hosts 14:12:48 <mriedem> same idea 14:12:51 <bauzas> force_hosts is only for boot 14:12:53 <mriedem> i think we should table until gibi is around 14:12:57 <bauzas> but it's calling the scheduler 14:12:58 <efried> yeah. 14:13:20 <mriedem> i could try to dig up our irc conversation but it'd be hard probably 14:13:21 <efried> or we could just proceed, and make a big decision that affects his whole world for the next six months. 14:13:25 <bauzas> compared to livemigrate/evacuate where you litterally can bypass scheduler 14:13:27 <jaypipes> I guess I still don't see why we care. If the destination host (forced or not) supports the original request, why do we care? 14:13:49 <efried> chicken/egg. We don't know if it supports the original request unless we call the scheduler algo to find that out. 14:14:04 <mriedem> well, we claim outside of the scheduler 14:14:05 <efried> I'm not sure to what extent ops expect "force" to mean "don't call the scheduler" though. 14:14:06 <mriedem> today 14:14:07 <bauzas> I still don't get why we're concerned by nested resource providers being targets 14:14:08 <jaypipes> efried: why can't we ask the destination host in pre-live-migrate? 14:14:23 <mriedem> like i said, we already sort of broke the live migration 'force' parameter in pike, 14:14:31 <mriedem> when conductor started claiming 14:14:34 <bauzas> efried: since live-migrate API is existing AFAIK 14:14:54 <bauzas> mriedem: shit, I missed that then 14:14:59 <efried> bauzas: If any of the resources that we need come from nested providers, we must use GET /a_c to come up with a proper allocation request. 14:15:31 <bauzas> efried: isn't that a bit related to the concern I had about candidates be only on nested resource providers ? 14:15:44 <bauzas> we somehow need to know which root RP we're talking about 14:16:12 <mriedem> bauzas: see https://review.openstack.org/#/c/605785/9/nova/conductor/tasks/live_migrate.py@132 and scheduler_utils.claim_resources_on_destination for history 14:16:44 <efried> so, tabling until we can involve gibi. Moving on. 14:16:49 <mriedem> +1 14:16:59 <efried> Extraction 14:16:59 <efried> Info in the pupdate ---^ 14:16:59 <efried> cdent out this week. edleafe, mriedem, status? 14:17:14 <efried> Oh, Ed isn't around either. It's all on you mriedem 14:17:34 <mriedem> umm 14:17:40 * mriedem looks 14:17:59 <mriedem> https://review.openstack.org/#/c/604454/ is the grenade patch which is passing, 14:18:13 <mriedem> cdent updated that with the proper code to create the uwsgi placement-api config 14:18:31 <efried> #link https://review.openstack.org/#/c/604454/ is the grenade patch which is passing 14:18:31 <mriedem> the devstack change that depends on it is still failing though https://review.openstack.org/#/c/600162/ 14:18:46 <efried> #link the devstack change that depends on it https://review.openstack.org/#/c/600162/ 14:18:55 <efried> This is the $PROJECTS issue? 14:18:59 <mriedem> tl;dr there are other jobs that devstack runs which aren't cloning the placement repo yet, 14:19:09 <mriedem> i have patches up for that, but they aren't passing and i haven't dug into why yet 14:19:23 <mriedem> yeah https://review.openstack.org/#/c/606853/ and https://review.openstack.org/#/c/608266/ 14:19:35 <bauzas> I have good news for extraction 14:20:04 <bauzas> https://review.openstack.org/#/c/599208/ has been tested and works on a physical machine with pGPUs 14:20:22 <mriedem> efried: looks like my d-g patch for updating $PROJECTS passed, just failed one test in tempest 14:20:26 <mriedem> so just rechecks it looks like 14:20:27 <bauzas> next step will be to write some functional test mocking this ^ 14:20:42 <efried> nice 14:20:58 <jaypipes> bauzas: nice. 14:21:11 <efried> bauzas: That's more of a reshape nugget than extraction, though? 14:21:32 <jaypipes> efried: we agreed that that was a requirement for extraction. 14:21:36 <bauzas> efried: well, I thought we agreed on this being a priority for the extraction :) 14:21:36 <efried> oh, I guess we said we were going to want ... yeah 14:21:53 <efried> I forget why, actually. 14:22:02 <bauzas> anyway 14:22:09 <jaypipes> let's not rehash that. 14:22:12 <efried> oh, right, it was a requirement for the governance split 14:22:22 <efried> not for getting extracted placement working. 14:22:26 <efried> cool cool 14:22:30 <jaypipes> I have a spec topic... 14:22:37 <efried> anything else on extraction? 14:22:44 <mriedem> tl;dr it's close 14:22:49 <efried> sweet 14:22:50 <mriedem> for the grenade/devstack ci/infra bits 14:23:31 <bauzas> I need to disappear, taxi driving my kids from school 14:23:42 <efried> jaypipes: Want to go now or after the other spec/review topics? 14:24:02 <jaypipes> so I have repeatedly stated I am not remotely interested in pursuing either https://review.openstack.org/#/c/544683/ or https://review.openstack.org/#/c/552105/. I was under the impression that someone (Yikun maybe?) who *was* interested in continuing that work was going to get https://review.openstack.org/#/c/552105/ into a state where people agreed on it (good luck with that), but as of now, I've seen little action on it other than 14:24:03 <jaypipes> negative reviews. 14:24:21 * efried click click click 14:24:35 <mriedem> jaypipes: yeah yikun has been busy with some internal stuff after a re-org, 14:24:36 <jaypipes> so my question is should I just abandon both of the specs and force the issue? 14:24:48 <mriedem> i can send an email to see what's going on and if we still care about those 14:24:53 <jaypipes> k, thx 14:25:54 <efried> This could relate to the next-next topic on the agenda actually. 14:26:39 <efried> we were talking about using the file format proposal embedded in the 14:26:39 <efried> #link device passthrough spec https://review.openstack.org/#/c/591037/ 14:26:39 <efried> as a mechanism to customize provider attributes (prompted by the belmoreira min_unit discussion) 14:26:59 <efried> jaypipes agreed to review ^ with that in mind 14:27:03 <jaypipes> efried: yes. 14:27:26 <jaypipes> efried: I have found it very difficult to review. will give it another go this morning. 14:27:29 <efried> The "initial defaults" thing is still weird. 14:27:48 <efried> and not addressed in there (yet) 14:28:03 <efried> bauzas suggested to split out the part of the spec that talks about the file format, and do the device passthrough aspect on its own. 14:28:24 <efried> Which sounds like a good idea to me, considering the various ways we've talked about using it. 14:30:08 <efried> okay, moving on. 14:30:24 <efried> last week, the 14:30:25 <efried> #link HPET discussion http://lists.openstack.org/pipermail/openstack-dev/2018-October/135446.html 14:30:25 <efried> led to an interesting precedent on using traits for config 14:30:31 <jaypipes> another spec ... I pushed a new rev on https://review.openstack.org/#/c/555081/ 14:30:38 <jaypipes> (cpu resource tracking) 14:30:59 <efried> #link CPU resource tracking spec https://review.openstack.org/#/c/555081/ 14:33:03 <efried> any discussion on traits-for-config or CPU resource tracking? 14:33:25 <efried> any other specs or reviews to discuss? 14:33:29 <mriedem> i personally hope that cpu resource tracking is not something we pursue for stein 14:33:35 <mriedem> while we're still trying to land reshaper et al 14:34:13 <mriedem> reshaping all instances on all compute nodes is going to be rough during upgrade 14:34:16 <mriedem> unless we can do that offline 14:35:56 <jaypipes> mriedem: so let's hold off getting new clean functionality so that upgrades can be prolonged even longer until end of 2019? 14:36:21 <mriedem> yes? 14:36:26 <dansmith> I feel like we've been putting off numa topo in placement a while now 14:36:47 <mriedem> i think getting reshaper and bw-aware scheduling and all that stuff has been around long enough that we need to get those done first 14:36:50 <dansmith> so I don't disagree that it's going to be a big reshape, but.. dang, we've been working towards it for a while now and.. 14:37:01 <jaypipes> mriedem: I don't get the argument that adding another data migration (reshape action) makes upgrades harder than having one to do in a release cycle. 14:37:04 <mriedem> i would just like fewer things to worry about 14:37:24 <dansmith> if we end up with something for gpus that requires compute nodes to be online, 14:37:33 <dansmith> it'd be massively better for FFU to have both of those in the same release 14:37:45 <dansmith> vs. two different (especially back-to-back) releases 14:38:21 <mriedem> do we need the computes online for the cpu resource tracking upgrade? 14:38:27 <dansmith> yes 14:38:50 <dansmith> they have to do it themselves, I think, because only they know where and what the topology is 14:39:18 <jaypipes> dansmith: right, unless we go with a real inventory/provider descriptor file format. 14:39:34 <dansmith> jaypipes: well, that just pushes the problem elsewhere.. you still have to collect that info from somewhere 14:39:50 <jaypipes> dansmith: it's already in the vast majority of inventory management systems. 14:39:57 <efried> waitwait, the *admin* is going to be responsible for describing NUMA topology? It's not something the driver can discover? 14:40:14 <dansmith> efried: we should have the driver do it for sure 14:40:22 <jaypipes> efried: the operator is ultimately responsible for *whether* a compute node should expose providers as a tree. 14:40:22 <efried> phew 14:40:28 <dansmith> jaypipes: but we can't just requre the operator to have that and build such a mapping, IMHO 14:40:30 <dansmith> but even still, 14:40:54 <mriedem> why would operators care how we model things internally? 14:41:01 <jaypipes> efried: a lot of operators don't want or need to deal with NUMA. they just have needs for dedicated CPU and shared CPU resources and don't care about NUMA. 14:41:14 <efried> Yeah, I can live with a "use numa or not" switch. 14:41:18 <dansmith> the driver is the only one that can decide how existing allocations map to that information, IMHO, so unless you want to run the driver against the DB from a central node... even still, there are numa pinnings that the driver has done already we need to know about 14:41:36 <efried> I was just afraid you were talking about requiring the op to say "and this CPU is in NUMA node 1, and this CPU is in NUMA node 2 and..." 14:41:45 <dansmith> mriedem: they don't, that's why making them write a topo description for each (type of) compute node to do this migration would be mega-suck 14:41:58 <dansmith> efried: I think that's what jaypipes is saying 14:42:03 <dansmith> and I think that's not reasonable 14:42:17 <dansmith> efried: I don't think a "numa or not" switch is reasonable either, FWIW 14:42:20 <jaypipes> dansmith: ops *already* have that. they all have hardware profiles which describe the different types of hardware they provide to guests. 14:42:21 <dansmith> they just want it to work 14:42:45 <dansmith> jaypipes: all ops do not have that 14:42:51 <dansmith> jaypipes: but even still, they don't have the information about what numa allocations we've already done for existing instances 14:43:08 <jaypipes> dansmith: agreed completely with that last statement. 14:43:34 <efried> With a generic inventory/provider descriptor file, you could allow the op to override/customize. But I would think we would want the default to be automatic detection/configuration resulting in at least a sane setup. 14:43:42 <jaypipes> it's a shame the guest NUMA topology and CPU pinning were implemented as such a tightly coupled blobject mess. 14:43:50 <mriedem> while i agree it would be best if we can do all the reshapes we know we need to do in the same release to ease the pain, i just wanted to state that i'm worried about trying to bite this off in stein with everything else that's going on 14:44:11 <dansmith> mriedem: there's risk there for sure, no doubt 14:44:29 <efried> We also still can't do generic affinity without a placement API change, just to bring that up again. 14:44:31 <dansmith> I'm not saying it's critical, I'm just saying writing it off now seems like a bad idea to me 14:45:02 <mriedem> i'll admit the only part of that spec i've read is the upgrade impact 14:45:10 <mriedem> then i had to go change my drawers 14:45:22 <dansmith> mriedem: I guess I'm not sure why that's a surprise at this point, 14:45:27 <mriedem> will artom's stuff depend on this? 14:45:29 <dansmith> but maybe I have just done more thinking about it 14:45:40 <mriedem> artom's stuff = numa aware live migration 14:45:52 <dansmith> mriedem: artom's stuff kinda conflicts with this.. if this was done his stuff would be easier I think 14:45:55 <mriedem> dansmith: yeah i've just avoided thinking about this 14:46:40 <mriedem> ok, i need to get updated on what he plans to do with that as well 14:46:56 <mriedem> anyway, i'll be quiet now 14:48:35 <efried> dansmith, jaypipes: any last words? 14:48:46 <dansmith> no. 14:49:36 <efried> Home stretch 14:49:36 <efried> #topic bugs 14:49:36 <efried> #link Placement bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement 14:49:41 <efried> any bugs to highlight? 14:49:43 <jaypipes> efried: go Browns? 14:50:12 <mriedem> ugliest FG ever 14:51:43 <efried> Horns to 5-1 by a toenail. Khabib face cranks Connor to a tap, then attacks his training team. Derek Lewis's balls are hot. Other weekend sports news? 14:51:56 <efried> I guess we're really in 14:51:56 <efried> #topic opens 14:51:58 <dansmith> is that real sports news? 14:52:02 <mriedem> yes 14:52:09 <dansmith> hah. okay, sounded made up 14:52:12 <mriedem> https://deadspin.com/khabib-nurmagomedov-taps-out-conor-mcgregor-attacks-co-1829580622 14:52:14 <dansmith> shows how much I know 14:52:51 * bauzas waves again 14:53:15 <efried> and https://www.youtube.com/watch?v=F_E6jXHMPs4 14:53:18 <efried> okay, anything else? 14:53:26 * edleafe arrives super-late 14:53:57 <efried> edleafe: Anything to bring up before we close? 14:54:48 <bauzas> we also had https://www.youtube.com/watch?v=KgwmPhAu0tc 14:55:31 <efried> Thanks y'all 14:55:31 <efried> #endmeeting