14:00:10 <edleafe> #startmeeting nova_scheduler 14:00:11 <openstack> Meeting started Mon Nov 27 14:00:10 2017 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:14 <openstack> The meeting name has been set to 'nova_scheduler' 14:00:14 <cdent> o/ 14:00:18 <efried> \o 14:00:18 <ttsiouts> o/ 14:00:19 <takashin> o/ 14:00:25 <edleafe> Good UGT morning! Who's here? 14:00:28 <jaypipes> o/ 14:00:54 <alex_xu> o/ 14:01:42 <cdent> Over the week I learned that jaypipes is running for the senate 14:01:52 <jaypipes> quoi? 14:01:58 <cdent> “child bride" 14:02:06 <jaypipes> ahahaha 14:02:17 * edleafe missed the reference 14:02:21 * efried too 14:02:31 <jaypipes> over the weekend I learned edleafe got really high. 14:02:39 <edleafe> that I did! 14:02:40 <cdent> https://twitter.com/jaypipes/status/934490883478773762 14:03:04 <edleafe> jaypipes: you little devil 14:03:07 * bauzas waves 14:03:26 <jaypipes> so... on to scheduler things :) 14:03:35 <edleafe> yeah 14:03:48 <edleafe> #topic Tell me what's going on 14:04:01 <cdent> #link what’s going on http://lists.openstack.org/pipermail/openstack-dev/2017-November/124886.html 14:04:10 <edleafe> I got back from 10 days away from the 'puter, and have nothing prepared 14:04:26 <edleafe> So raise your hand if you want to discuss something 14:04:45 <efried> I think the topic accumulated_nits accidentally got sprayed around a whole series by git restack. 14:04:48 <efried> mahbad 14:05:01 <efried> Most of that should have been bp/nested-resource-providers probably. 14:05:13 <efried> uh, no. 14:05:27 <efried> but not accumulated_nits 14:05:57 <efried> Anyway, IMO the most pressing need right now is getting core reviews on the n-r-p series and starting to get that merged. 14:06:20 <efried> It's going to be prerequisite to a couple of other critical things. 14:06:26 <cdent> i agree! 14:06:39 <cdent> (damn, I just missed an “I concur” opportunity) 14:06:40 <edleafe> yeah, these long-running series become a pain when they get stale 14:06:49 <edleafe> I concur! 14:06:57 <jaypipes> that and the refactor series which also has a bunch of func tests and traits handling 14:06:59 <bauzas> well, I'll look 14:07:29 <efried> #link bottom of the n-r-p series: https://review.openstack.org/#/c/377138/ 14:07:39 <bauzas> (well, once I'll done with my internal problems :p ) 14:07:50 <efried> jaypipes ++ 14:08:11 <edleafe> I haven't looked at the alternate host series yet, but I assume it's still being held back while details are being picked over 14:08:32 * edleafe probably has several more weeks of rebase fun with that 14:08:37 <efried> #link current bottom of the refactor series: https://review.openstack.org/#/c/516782/11 14:09:02 <efried> Actually ^ is the last patch in the refactor series proper. 14:09:09 <efried> But it's anchoring a pile of other stuff. 14:09:20 <efried> including tests and traits handling, as jaypipes said. 14:10:29 <efried> I think the biggest to-do on top of the n-r-p series is getting n-r-p affordance for allocation candidates. 14:10:59 * cdent concurs 14:11:01 <efried> jaypipes or alex_xu Do you have that cued up anywhere (locally) yet? 14:11:13 <jaypipes> efried: yes, and then settling on the update_provider_tree() implementation 14:11:27 <edleafe> Are we still aiming for "merge shit early" like we said at PTG? It sure doesn't feel like it 14:11:40 <alex_xu> i don't have anything 14:11:44 <jaypipes> edleafe: we've merged quite a bit so far, actually. 14:11:56 <efried> Agree, though the summit + thanksgiving really put a crimp in things. 14:12:03 <jaypipes> edleafe: personally, I'd like to see the alternate hosts stuff make progress this week. 14:12:14 <cdent> here comes christmas too :( 14:12:17 <efried> yeah 14:12:28 <edleafe> jaypipes: it isn't as critical path as n-r-p, though 14:13:04 <efried> So I'll follow edleafe's "merge shit early" with: The next 2-3 weeks we should have some kind of official focussed review-and-merge-shit push. 14:13:14 <jaypipes> edleafe: it's still priority over n-r-p, though, according to my last recollection of priorities (alternate hosts, traits handling and move operation cleanup) 14:13:43 <efried> jaypipes I didn't think those three were in any particular priority order. 14:13:43 <jaypipes> efried: let's make it official then. 14:14:03 <edleafe> efried: yeah, I didn't get a sense of ordering either 14:14:05 <jaypipes> efried: they're not. they're just the 3 priorities. n-r-p isn't actually a priority for Queens... 14:14:50 <cdent> that’s not my recollection? I thought we said at ptg that nrp was pre-req for traits and RT cleanup 14:15:09 <cdent> or was that just “before shared”? 14:15:15 <jaypipes> cdent: right. before shared. 14:15:38 * jaypipes notes that shared stuff has been the source of most bugs in the allocation candidates code so far... 14:15:47 <jaypipes> in any case, it all is related. 14:15:58 <jaypipes> so let's make this week the official "push this shit" week. 14:16:05 <efried> https://etherpad.openstack.org/p/nova-ptg-queens-placement L47-55 14:16:19 * cdent wonders if we can make every week officially push this shit? 14:16:21 <efried> Right - "shared" was deferred out of Q. 14:16:28 <efried> cdent Baby steps 14:16:53 <alex_xu> should I rebase this patch https://review.openstack.org/480379? 14:17:34 <efried> alex_xu I would say that's probably a lower priority than reviews, if you had to pick one or the other. 14:17:39 <efried> Though you've been pretty well on top of reviews. 14:17:54 <jaypipes> efried: can you check to see if alex_xu's tests from the dependent patch on above is included in the refactor suite (from you or gibi)? 14:18:02 <efried> Yeah, given how we're not doing shared RPs for Q, we sure have spent a lot of time on it... 14:18:34 <efried> jaypipes You talking about https://review.openstack.org/#/c/480379/22/nova/tests/functional/db/test_resource_provider.py ? 14:18:50 <jaypipes> yeah 14:18:57 <efried> jaypipes Will do. 14:19:37 <alex_xu> For this patch https://review.openstack.org/517119, we still needs to ensure the traits work well for shared case, right? 14:20:14 <jaypipes> gawd, maybe I was confusing the generic device manager with n-r-p w.r.t priorities in Queens. guh, I must be getting old. :( 14:20:36 <jaypipes> either that or everything is running into everything else in my mind at this poitn. 14:20:48 <cdent> it does all run together 14:20:48 <efried> I feel like we still have some architectural work to do on sharing + traits. 14:20:57 <edleafe> jaypipes: that's the definition of getting old 14:21:01 * edleafe knows 14:21:09 <efried> In general, we should be looking for opportunities to defer work on sharing RPs. 14:21:26 <efried> Which we haven't been doing especially well, tbh. I'm probably the most guilty. 14:21:26 * cdent concurs 14:21:52 <cdent> when it feature freeze? 14:21:54 <cdent> s/it/is/ 14:22:07 <bauzas> Milestone-3 14:22:20 <cdent> what is the date of Milestone-3 ? 14:22:26 <alex_xu> Jan 22 14:22:31 <alex_xu> #link https://releases.openstack.org/queens/schedule.html 14:22:33 <bauzas> https://releases.openstack.org/queens/schedule.html 14:22:37 <bauzas> dammit, burned 14:22:49 <cdent> thanks was nearly there 14:22:50 <jaypipes> so basically 6 weeks, if you count western holiday week a goner. 14:23:07 <bauzas> correct 14:23:12 <bauzas> R-5 14:23:25 <bauzas> and we are R-13 14:23:51 <efried> Jan 25 according to https://wiki.openstack.org/wiki/Nova/Queens_Release_Schedule -- a whole three extra days! 14:24:10 <bauzas> yeah, because we do that by Thursdays 14:24:16 <efried> phew 14:24:20 <bauzas> some projects do that by Tuesdays 14:24:37 <bauzas> but the relmanagement team accepts all the milestones during those 3 days 14:24:54 <bauzas> hope that's clearer 14:25:07 <bauzas> the real crux is : do we still need to discuss design problems now ? 14:25:21 <efried> I really hope not. 14:25:40 <bauzas> tbh, n-r-p is the top prio 14:25:41 <efried> Hopefully the design discussions are minor enough to be contained within reviews. 14:25:48 <bauzas> if we need to discuss problems during the implementation 14:25:53 <efried> E.g. the update_provider_tree stuff 14:25:56 <bauzas> yeah 14:26:02 <bauzas> that's my point 14:26:17 <efried> So far that effort has been doing well on that count. 14:26:18 <bauzas> we had a long series of discussions when we reviewed the scheduler claims 14:26:40 <bauzas> if we need to discuss like we did for that, then huh 14:26:51 <efried> Course it was pretty straightforward when it was just me & jaypipes having the discussions :) 14:28:08 <jaypipes> efried: I've always maintained that there will need to be kinks worked out once real clients were using the n-r-p stuff. and now that XenAPI has a proposed patch up, that's shaking that implementation detail tree a bit. which is a good thing. 14:28:20 <efried> ++ 14:28:35 <edleafe> all the more reason to merge and then update in subsequent patches 14:28:41 <efried> +++ 14:28:46 * cdent concurs 14:29:06 * efried looks around... 14:29:20 <efried> There's three cores in attendance, and two of them are authors of the patches in question. 14:29:24 <bauzas> jaypipes: yeah, the Xen team is having 3 people working on the VGPU feature, so they can help 14:29:31 <bauzas> (while only me for libvirt :p ) 14:29:36 <edleafe> efried: we can nag the others later in -nova 14:29:59 <efried> Yuh, we need to socialize our "official" review push week to the likes of dansmith, stephenfin, etc. 14:30:06 <bauzas> efried: here I definitely want to help but struggle with time 14:30:20 <bauzas> NFV FTW 14:30:20 * gibi can help with some of the reviews in n-r-p 14:30:21 <efried> Right. It can't all be on the shoulders of one or two cores. 14:30:51 <edleafe> We seem to agree on this. Anything else to discuss? 14:31:05 * edleafe has to dig out from a ton of backlog emails 14:32:10 <cdent> my pending placement stuff is pretty ready for whoever would like to look 14:32:18 <jaypipes> cdent: link pls 14:32:42 <cdent> #link some gabbit clean up https://review.openstack.org/#/c/513057/ 14:32:57 <cdent> #link symmetric get and put and post to /allocations: https://review.openstack.org/#/c/510626/ 14:33:14 <cdent> #link cache headers and requisite changes to objects: https://review.openstack.org/#/c/521639/ 14:33:33 <jaypipes> danke 14:33:35 <cdent> #link fixup to get proper formatted errors: https://review.openstack.org/#/c/518223/ 14:34:56 <cdent> in other news I’ve started messing around with a containered placement service to see what breaks when it is isolated and/or used in parallel. one fun bug so far: 14:35:22 <cdent> #link config for middleware bug: https://bugs.launchpad.net/nova/+bug/1734491 14:35:22 <openstack> Launchpad bug 1734491 in OpenStack Compute (nova) "placement keystonemiddleware_authtoken ignores OS_PLACEMENT_CONFIG_DIR" [Undecided,In progress] - Assigned to Chris Dent (cdent) 14:35:51 <cdent> Also there’s a new bug related to request ids, requiring some clean up on nova side: 14:35:59 <cdent> #link request id bug: https://bugs.launchpad.net/nova/+bug/1734625 14:36:00 <openstack> Launchpad bug 1734625 in OpenStack Compute (nova) "placement: Request IDs are not passed to placement service" [Undecided,In progress] - Assigned to Takashi NATSUME (natsume-takashi) 14:36:32 <jaypipes> cool, thanks for investigfating that cdent 14:36:59 <edleafe> why would placement care about nova-isms like request id? 14:37:14 <efried> request IDs are openstack-isms now. 14:37:19 <cdent> edleafe: see: https://review.openstack.org/#/c/523007/ 14:37:34 <mriedem> this is a bug that's affecting the gate, which only efried and i have looked at so far i think https://bugs.launchpad.net/nova/+bug/1731668 14:37:34 <openstack> Launchpad bug 1731668 in OpenStack Compute (nova) "placement: claim allocations fails with IndexError in _ensure_lookup_table_entry" [High,Confirmed] 14:37:34 <cdent> the incoming request id ought to be respected in the logs 14:38:49 <cdent> mriedem, jaypipes : is that code subject to the big refactoring that jay and eric have done? 14:39:20 <efried> cdent Which, the bug? 14:39:22 <mriedem> there was a refactor that hit this code recently 14:39:35 <cdent> yeah, the ensure look up table 14:39:36 <mriedem> note the bug is a few weeks old at this point 14:41:29 <jaypipes> mriedem: no, I don't think that code was touched. 14:43:02 * efried concurs 14:43:29 <jaypipes> mriedem: I will take that bug 14:43:32 <mriedem> it may be a latent bug, idk 14:43:33 <efried> The refactor was all about GET /allocation_candidates. That bug is in stuff that touches project/user IDs. 14:43:34 * cdent writes efried a cheque 14:43:49 * efried makes fun of cdent's British spelling of cheque. 14:43:55 <jaypipes> mriedem: yeah, I think I know what it is. 14:44:24 <efried> jaypipes Note the "related fix" in the bug report - some discussion of the race condition there (and a possible "fix") 14:44:27 <jaypipes> mriedem: we're not catching the right exception and it's dropping out of the except block without running that fetchall() again. 14:44:37 <efried> Ah, or that. 14:44:59 <jaypipes> efried: almost guarantee we're getting a DBDeadlock error, not a duplicate error. 14:45:05 <jaypipes> efried: and that's causing this. 14:45:06 <efried> cool 14:45:27 <efried> Yeah, cause that race should be way too hard to hit for the frequency with which we're seeing the error. 14:46:33 <efried> mriedem jaypipes I could put up an investigative patch that logs the exception. 14:46:45 <jaypipes> efried: I'll do it. 14:46:48 <efried> ack 14:48:32 <jaypipes> I think we just need to catch sa.exc.IntegrityError instead... but I'll check it out. 14:48:32 <edleafe> ok, anything else? 14:49:37 <efried> Who's got the #action to socialize the review push? 14:50:09 * edleafe thinks efried just volunteered 14:50:28 * efried thinks it should be someone with more clout. 14:52:46 <efried> I can compose an email for the dev list 14:52:53 * edleafe thinks it should be someone with social skills, leaving himself out 14:54:11 <edleafe> #action efried to socialize review priorities to other nova cores 14:54:23 <efried> ack 14:54:29 <edleafe> Anything else? Or shall we call it a day? 14:54:35 <cdent> call it 14:54:46 * edleafe just confused non-native English speakers 14:55:00 <edleafe> OK, thanks everyone! 14:55:03 <edleafe> #endmeeting