14:00:10 #startmeeting nova_scheduler 14:00:11 Meeting started Mon Nov 27 14:00:10 2017 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:14 The meeting name has been set to 'nova_scheduler' 14:00:14 o/ 14:00:18 \o 14:00:18 o/ 14:00:19 o/ 14:00:25 Good UGT morning! Who's here? 14:00:28 o/ 14:00:54 o/ 14:01:42 Over the week I learned that jaypipes is running for the senate 14:01:52 quoi? 14:01:58 “child bride" 14:02:06 ahahaha 14:02:17 * edleafe missed the reference 14:02:21 * efried too 14:02:31 over the weekend I learned edleafe got really high. 14:02:39 that I did! 14:02:40 https://twitter.com/jaypipes/status/934490883478773762 14:03:04 jaypipes: you little devil 14:03:07 * bauzas waves 14:03:26 so... on to scheduler things :) 14:03:35 yeah 14:03:48 #topic Tell me what's going on 14:04:01 #link what’s going on http://lists.openstack.org/pipermail/openstack-dev/2017-November/124886.html 14:04:10 I got back from 10 days away from the 'puter, and have nothing prepared 14:04:26 So raise your hand if you want to discuss something 14:04:45 I think the topic accumulated_nits accidentally got sprayed around a whole series by git restack. 14:04:48 mahbad 14:05:01 Most of that should have been bp/nested-resource-providers probably. 14:05:13 uh, no. 14:05:27 but not accumulated_nits 14:05:57 Anyway, IMO the most pressing need right now is getting core reviews on the n-r-p series and starting to get that merged. 14:06:20 It's going to be prerequisite to a couple of other critical things. 14:06:26 i agree! 14:06:39 (damn, I just missed an “I concur” opportunity) 14:06:40 yeah, these long-running series become a pain when they get stale 14:06:49 I concur! 14:06:57 that and the refactor series which also has a bunch of func tests and traits handling 14:06:59 well, I'll look 14:07:29 #link bottom of the n-r-p series: https://review.openstack.org/#/c/377138/ 14:07:39 (well, once I'll done with my internal problems :p ) 14:07:50 jaypipes ++ 14:08:11 I haven't looked at the alternate host series yet, but I assume it's still being held back while details are being picked over 14:08:32 * edleafe probably has several more weeks of rebase fun with that 14:08:37 #link current bottom of the refactor series: https://review.openstack.org/#/c/516782/11 14:09:02 Actually ^ is the last patch in the refactor series proper. 14:09:09 But it's anchoring a pile of other stuff. 14:09:20 including tests and traits handling, as jaypipes said. 14:10:29 I think the biggest to-do on top of the n-r-p series is getting n-r-p affordance for allocation candidates. 14:10:59 * cdent concurs 14:11:01 jaypipes or alex_xu Do you have that cued up anywhere (locally) yet? 14:11:13 efried: yes, and then settling on the update_provider_tree() implementation 14:11:27 Are we still aiming for "merge shit early" like we said at PTG? It sure doesn't feel like it 14:11:40 i don't have anything 14:11:44 edleafe: we've merged quite a bit so far, actually. 14:11:56 Agree, though the summit + thanksgiving really put a crimp in things. 14:12:03 edleafe: personally, I'd like to see the alternate hosts stuff make progress this week. 14:12:14 here comes christmas too :( 14:12:17 yeah 14:12:28 jaypipes: it isn't as critical path as n-r-p, though 14:13:04 So I'll follow edleafe's "merge shit early" with: The next 2-3 weeks we should have some kind of official focussed review-and-merge-shit push. 14:13:14 edleafe: it's still priority over n-r-p, though, according to my last recollection of priorities (alternate hosts, traits handling and move operation cleanup) 14:13:43 jaypipes I didn't think those three were in any particular priority order. 14:13:43 efried: let's make it official then. 14:14:03 efried: yeah, I didn't get a sense of ordering either 14:14:05 efried: they're not. they're just the 3 priorities. n-r-p isn't actually a priority for Queens... 14:14:50 that’s not my recollection? I thought we said at ptg that nrp was pre-req for traits and RT cleanup 14:15:09 or was that just “before shared”? 14:15:15 cdent: right. before shared. 14:15:38 * jaypipes notes that shared stuff has been the source of most bugs in the allocation candidates code so far... 14:15:47 in any case, it all is related. 14:15:58 so let's make this week the official "push this shit" week. 14:16:05 https://etherpad.openstack.org/p/nova-ptg-queens-placement L47-55 14:16:19 * cdent wonders if we can make every week officially push this shit? 14:16:21 Right - "shared" was deferred out of Q. 14:16:28 cdent Baby steps 14:16:53 should I rebase this patch https://review.openstack.org/480379? 14:17:34 alex_xu I would say that's probably a lower priority than reviews, if you had to pick one or the other. 14:17:39 Though you've been pretty well on top of reviews. 14:17:54 efried: can you check to see if alex_xu's tests from the dependent patch on above is included in the refactor suite (from you or gibi)? 14:18:02 Yeah, given how we're not doing shared RPs for Q, we sure have spent a lot of time on it... 14:18:34 jaypipes You talking about https://review.openstack.org/#/c/480379/22/nova/tests/functional/db/test_resource_provider.py ? 14:18:50 yeah 14:18:57 jaypipes Will do. 14:19:37 For this patch https://review.openstack.org/517119, we still needs to ensure the traits work well for shared case, right? 14:20:14 gawd, maybe I was confusing the generic device manager with n-r-p w.r.t priorities in Queens. guh, I must be getting old. :( 14:20:36 either that or everything is running into everything else in my mind at this poitn. 14:20:48 it does all run together 14:20:48 I feel like we still have some architectural work to do on sharing + traits. 14:20:57 jaypipes: that's the definition of getting old 14:21:01 * edleafe knows 14:21:09 In general, we should be looking for opportunities to defer work on sharing RPs. 14:21:26 Which we haven't been doing especially well, tbh. I'm probably the most guilty. 14:21:26 * cdent concurs 14:21:52 when it feature freeze? 14:21:54 s/it/is/ 14:22:07 Milestone-3 14:22:20 what is the date of Milestone-3 ? 14:22:26 Jan 22 14:22:31 #link https://releases.openstack.org/queens/schedule.html 14:22:33 https://releases.openstack.org/queens/schedule.html 14:22:37 dammit, burned 14:22:49 thanks was nearly there 14:22:50 so basically 6 weeks, if you count western holiday week a goner. 14:23:07 correct 14:23:12 R-5 14:23:25 and we are R-13 14:23:51 Jan 25 according to https://wiki.openstack.org/wiki/Nova/Queens_Release_Schedule -- a whole three extra days! 14:24:10 yeah, because we do that by Thursdays 14:24:16 phew 14:24:20 some projects do that by Tuesdays 14:24:37 but the relmanagement team accepts all the milestones during those 3 days 14:24:54 hope that's clearer 14:25:07 the real crux is : do we still need to discuss design problems now ? 14:25:21 I really hope not. 14:25:40 tbh, n-r-p is the top prio 14:25:41 Hopefully the design discussions are minor enough to be contained within reviews. 14:25:48 if we need to discuss problems during the implementation 14:25:53 E.g. the update_provider_tree stuff 14:25:56 yeah 14:26:02 that's my point 14:26:17 So far that effort has been doing well on that count. 14:26:18 we had a long series of discussions when we reviewed the scheduler claims 14:26:40 if we need to discuss like we did for that, then huh 14:26:51 Course it was pretty straightforward when it was just me & jaypipes having the discussions :) 14:28:08 efried: I've always maintained that there will need to be kinks worked out once real clients were using the n-r-p stuff. and now that XenAPI has a proposed patch up, that's shaking that implementation detail tree a bit. which is a good thing. 14:28:20 ++ 14:28:35 all the more reason to merge and then update in subsequent patches 14:28:41 +++ 14:28:46 * cdent concurs 14:29:06 * efried looks around... 14:29:20 There's three cores in attendance, and two of them are authors of the patches in question. 14:29:24 jaypipes: yeah, the Xen team is having 3 people working on the VGPU feature, so they can help 14:29:31 (while only me for libvirt :p ) 14:29:36 efried: we can nag the others later in -nova 14:29:59 Yuh, we need to socialize our "official" review push week to the likes of dansmith, stephenfin, etc. 14:30:06 efried: here I definitely want to help but struggle with time 14:30:20 NFV FTW 14:30:20 * gibi can help with some of the reviews in n-r-p 14:30:21 Right. It can't all be on the shoulders of one or two cores. 14:30:51 We seem to agree on this. Anything else to discuss? 14:31:05 * edleafe has to dig out from a ton of backlog emails 14:32:10 my pending placement stuff is pretty ready for whoever would like to look 14:32:18 cdent: link pls 14:32:42 #link some gabbit clean up https://review.openstack.org/#/c/513057/ 14:32:57 #link symmetric get and put and post to /allocations: https://review.openstack.org/#/c/510626/ 14:33:14 #link cache headers and requisite changes to objects: https://review.openstack.org/#/c/521639/ 14:33:33 danke 14:33:35 #link fixup to get proper formatted errors: https://review.openstack.org/#/c/518223/ 14:34:56 in other news I’ve started messing around with a containered placement service to see what breaks when it is isolated and/or used in parallel. one fun bug so far: 14:35:22 #link config for middleware bug: https://bugs.launchpad.net/nova/+bug/1734491 14:35:22 Launchpad bug 1734491 in OpenStack Compute (nova) "placement keystonemiddleware_authtoken ignores OS_PLACEMENT_CONFIG_DIR" [Undecided,In progress] - Assigned to Chris Dent (cdent) 14:35:51 Also there’s a new bug related to request ids, requiring some clean up on nova side: 14:35:59 #link request id bug: https://bugs.launchpad.net/nova/+bug/1734625 14:36:00 Launchpad bug 1734625 in OpenStack Compute (nova) "placement: Request IDs are not passed to placement service" [Undecided,In progress] - Assigned to Takashi NATSUME (natsume-takashi) 14:36:32 cool, thanks for investigfating that cdent 14:36:59 why would placement care about nova-isms like request id? 14:37:14 request IDs are openstack-isms now. 14:37:19 edleafe: see: https://review.openstack.org/#/c/523007/ 14:37:34 this is a bug that's affecting the gate, which only efried and i have looked at so far i think https://bugs.launchpad.net/nova/+bug/1731668 14:37:34 Launchpad bug 1731668 in OpenStack Compute (nova) "placement: claim allocations fails with IndexError in _ensure_lookup_table_entry" [High,Confirmed] 14:37:34 the incoming request id ought to be respected in the logs 14:38:49 mriedem, jaypipes : is that code subject to the big refactoring that jay and eric have done? 14:39:20 cdent Which, the bug? 14:39:22 there was a refactor that hit this code recently 14:39:35 yeah, the ensure look up table 14:39:36 note the bug is a few weeks old at this point 14:41:29 mriedem: no, I don't think that code was touched. 14:43:02 * efried concurs 14:43:29 mriedem: I will take that bug 14:43:32 it may be a latent bug, idk 14:43:33 The refactor was all about GET /allocation_candidates. That bug is in stuff that touches project/user IDs. 14:43:34 * cdent writes efried a cheque 14:43:49 * efried makes fun of cdent's British spelling of cheque. 14:43:55 mriedem: yeah, I think I know what it is. 14:44:24 jaypipes Note the "related fix" in the bug report - some discussion of the race condition there (and a possible "fix") 14:44:27 mriedem: we're not catching the right exception and it's dropping out of the except block without running that fetchall() again. 14:44:37 Ah, or that. 14:44:59 efried: almost guarantee we're getting a DBDeadlock error, not a duplicate error. 14:45:05 efried: and that's causing this. 14:45:06 cool 14:45:27 Yeah, cause that race should be way too hard to hit for the frequency with which we're seeing the error. 14:46:33 mriedem jaypipes I could put up an investigative patch that logs the exception. 14:46:45 efried: I'll do it. 14:46:48 ack 14:48:32 I think we just need to catch sa.exc.IntegrityError instead... but I'll check it out. 14:48:32 ok, anything else? 14:49:37 Who's got the #action to socialize the review push? 14:50:09 * edleafe thinks efried just volunteered 14:50:28 * efried thinks it should be someone with more clout. 14:52:46 I can compose an email for the dev list 14:52:53 * edleafe thinks it should be someone with social skills, leaving himself out 14:54:11 #action efried to socialize review priorities to other nova cores 14:54:23 ack 14:54:29 Anything else? Or shall we call it a day? 14:54:35 call it 14:54:46 * edleafe just confused non-native English speakers 14:55:00 OK, thanks everyone! 14:55:03 #endmeeting