14:00:04 #startmeeting nova 14:00:06 Meeting started Thu May 3 14:00:04 2018 UTC and is due to finish in 60 minutes. The chair is melwitt. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:11 The meeting name has been set to 'nova' 14:00:32 o/ 14:00:32 hi everyone 14:00:33 o/ 14:00:33 o/ 14:00:36 o/ 14:00:38 o/ 14:01:03 ō/ 14:01:14 #topic Release News 14:01:20 #link Rocky release schedule: https://wiki.openstack.org/wiki/Nova/Rocky_Release_Schedule 14:01:34 we have the summit coming up in a few weeks 14:02:06 we have some nova-related topics approved for the forum 14:02:12 o/ 14:02:23 the forum schedule has been sent to the dev ML but I don't have a link handy 14:02:23 \o 14:02:52 but do check that out to see what we have coming up at the forum 14:02:54 #link Rocky review runways: https://etherpad.openstack.org/p/nova-runways-rocky 14:03:03 current runways: 14:03:11 #link runway #1: XenAPI: Support a new image handler for non-FS based SRs [END DATE: 2018-05-11] series starting at https://review.openstack.org/#/c/497201 14:03:13 patch 497201 - nova - XenAPI: deprecate the config for image handler cla... 14:03:26 #link runway #2: Add z/VM driver [END DATE: 2018-05-15] spec amendment needed at https://review.openstack.org/562154 and implementation starting at https://review.openstack.org/523387 14:03:27 patch 562154 - nova-specs - Add additional information for z/VM spec. 14:03:28 patch 523387 - nova - z/VM Driver: Initial change set of z/VM driver 14:03:34 #link runway #3: Local disk serial numbers [END DATE: 2018-05-16] series starting at https://review.openstack.org/526346 14:03:34 patch 526346 - nova - Give volume DriverBlockDevice classes a common prefix 14:04:21 please take some time to review the patches in runways as a priority ^ 14:04:46 anything else for release news or runways? 14:04:49 * bauzas waves 14:05:42 #topic Bugs (stuck/critical) 14:05:55 no critical bugs 14:06:04 #link 31 new untriaged bugs (same since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 14:06:27 the untriaged bug count hasn't gone up, so thanks to everyone who's been helping with triage 14:06:38 #link 3 untagged untriaged bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 14:06:45 #link bug triage how-to: https://wiki.openstack.org/wiki/Nova/BugTriage#Tags 14:07:02 please lend a helping hand with bug triage if you can. this is a good how-to guide ^ 14:07:03 * johnthetubaguy nods at the XenAPI runway item 14:07:27 yep, that's a good one for you :) 14:07:37 Gate status: 14:07:42 #link check queue gate status http://status.openstack.org/elastic-recheck/index.html 14:08:05 gate's been ... good I think 14:08:14 3rd party CI: 14:08:16 except ceph! 14:08:21 yup, I saw 14:08:26 some timeouts too 14:08:52 yes, except ceph. but there's a patch for that, there's been a cinder API change that has caused a call that used to work to fail 14:09:13 oh that merged 14:09:16 the cinder one 14:09:27 it actually affects more than ceph in that any delete of a BFV instance would fail to detach the volume because of it 14:09:31 oh, good 14:09:51 powervm CI was recently updated to use a queens undercloud, and is now experiencing connection issues. Working to get that fixed 14:10:05 thanks for the heads up edmondsw 14:10:13 #link 3rd party CI status http://ci-watch.tintri.com/project?project=nova&time=7+days 14:10:32 virtuozzo CI has been broken for awhile and now it's returning a 404 on its test result links 14:11:26 I sent a message to the dev ML asking if anyone from virtuozzo could reply and nothing so far 14:11:55 anything else for bugs, gate status, or third party CI? 14:12:23 PowerVM CI has been hitting connection issues from jenkins to nodepool nodes 14:12:39 https://wiki.jenkins.io/display/JENKINS/Remoting+issue 14:12:44 if anyone has ideas on that... 14:12:49 we're all ears :) 14:12:58 Causing pretty much all runs to fail at the moment 14:13:04 ask in -infra 14:13:09 yeah 14:14:03 topic #Reminders 14:14:10 #topic Reminders 14:14:19 #link Rocky Review Priorities https://etherpad.openstack.org/p/rocky-nova-priorities-tracking 14:14:34 subteam and bug etherpad ^ it is there 14:14:50 does anyone have any other reminders to highlight? 14:16:04 #topic Stable branch status 14:16:11 #link stable/queens: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/queens,n,z 14:16:26 there have been a lot of backports proposed 14:16:46 stable cores, please take a look at some reviews when you can 14:16:53 #link stable/pike: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/pike,n,z 14:17:05 same for pike, the list is growing 14:17:11 #link stable/ocata: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/ocata,n,z 14:17:23 and ocata doesn't have too many 14:17:38 anything else for stable branch status? 14:17:53 ocata should taper off 14:17:56 oh, also we released ocata 15.1.1 recently 14:18:31 #link ocata 15.1.1 released on 2018-05-02 https://review.openstack.org/564044 14:18:32 patch 564044 - releases - nova: release ocata 15.1.1 (MERGED) 14:18:52 #topic Subteam Highlights 14:18:56 no 14:19:01 lol 14:19:24 yeah, so for cells v2 we skipped the meeting again bc we didn't need to have a meeting 14:19:36 because we're efficient like that 14:19:48 :P 14:19:53 i'm following up on a bug with tssurya in -nova 14:20:01 sounds like cern is upgrading and putting out fires, recap to come 14:20:22 plus a talk in vancouver about their cells v2 upgrade 14:20:23 right? 14:20:27 Yes 14:20:36 yep. CERN peeps are in the middle of an upgrade to multi-cell cells v2 14:20:42 exciting stuff 14:20:45 https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20667/moving-from-cellsv1-to-cellsv2-at-cern 14:20:49 Totally :) 14:21:14 really looking forward to that talk 14:21:23 be careful what you wish for 14:21:30 might be a bunch of "the nova team sucks because" talk 14:21:32 exactly! 14:21:37 heh 14:21:43 "nova cellsv2 ate the higgs boson data" 14:21:43 :( 14:21:43 etc 14:21:49 Hehe , 14:22:15 okay, anything else for cells? 14:22:45 * bauzas needs to disappear for babytaxiing 14:22:49 edleafe: scheduler? 14:22:54 Clarified the status of Nested Resource Providers for the Cyborg team. They were under the impression that since jaypipes's patch for handling requests for resources when nested providers are involved had merged, that NRP was functional. 14:22:58 #link Resource Requests for Nested Providers https://review.openstack.org/#/c/554529/ 14:22:59 patch 554529 - nova - placement: resource requests for nested providers (MERGED) 14:23:01 Discussed the two approaches I had proposed for the consumer generation issue. No decision was made, which eventually didn't matter, since jaypipes ended up redoing the entire series on his own. 14:23:05 Discussed the bug reported by the CERN folks that the association_refresh should be configurable. The current hard-coded interval of 300 seconds was a huge drag on their deployment. When they lengthened the interval, they saw a "big time improvement". 14:23:09 We discussed whether it was OK to backport a config option to Queens. Decided that it would be ok as long as the default for the option didn't change behavior. 14:23:12 That's it. 14:23:32 that change is merged on master, 14:23:34 needs a backport 14:23:50 oh nvm https://review.openstack.org/#/c/565526/ 14:23:50 patch 565526 - nova - Make association_refresh configurable 14:24:20 mriedem : will out up the backport soon 14:24:26 Put* 14:24:41 okay, so needs another +2 14:24:48 yeah i'll review it after the meeting 14:24:54 coolness 14:24:56 Thanks! 14:25:09 edleafe: cool. I was thinking about the cyborg stuff yesterday and was wondering where things are. is it that once NRP is functional, they can start the implementation on their side? 14:25:48 melwitt: they are starting the implementation already. We just had to explain that we aren't able to return nested RPs yet 14:26:03 and that we probably won't be able to in Rocky 14:26:13 return from what? GET /allocation_candidates? 14:26:20 yes 14:26:26 isn't that tetsuro's spec? 14:26:31 yes again 14:26:40 so why won't we be able to in rocky? 14:26:43 just too much other stuff? 14:27:09 that was the status I got - it *might* make it in, but that it isn't a sure thing 14:27:22 tetsuro's patches are coming along nicely... 14:27:24 okay, so is it that returning nested RPs required for them to complete the implementation on their side? trying to get an idea of when we can expect the FPGA stuff to be live 14:27:51 they need ^ to schedule in nova properly 14:27:56 melwitt: they can code to the specs now 14:28:11 Yeah, the series is up and ready for review 14:28:16 I've got +2s on the first three or four patches. 14:28:18 it just won't work until the rest is merged 14:28:20 okay, so nothing is blocking them and once they're done, then the last piece will be the nova integration part 14:28:26 The final patch (with the microversion) is proposed now. 14:28:31 melwitt: cyborg still needs to implement their update_provider_tree() work. we still need to implement the scheduler side of things (which is the GET /allocation_candidates stuff tetsuro is working on and the granular request stuff efried is working on, both of which have active reviews for patches) 14:29:01 jaypipes: yes, the point was that they can start working on it. We aren't blocking them 14:29:06 I wasn't sure about it on Monday, because that one wasn't there, but now that it's up, I'm confident we can make Rocky with this stuff. 14:29:14 edleafe: correct, no disagreement from me on that 14:29:19 kewl 14:29:21 so cyborg puts fpga traits or providers in the tree, and then you can request that via flavor extra specs using granular request syntax right? 14:29:43 yup 14:30:10 all sounds good, thanks y'all 14:30:36 last on subteams, notifications, there are no notes from gibi in the agenda 14:30:47 No notification meeting this week due to public holiday 14:30:56 There will be meeting next week and I will try to send status mail as well. 14:31:05 ah, right. I thought I knew that and then forgot 14:31:10 cool gibi 14:31:26 anything else for subteams? 14:31:50 #topic Stuck Reviews 14:32:02 there's one thing in the agenda, the rebuild thing again 14:32:21 http://lists.openstack.org/pipermail/openstack-dev/2018-April/129726.html 14:32:42 I thought I saw chatter last night, we're going with the "check allocations to find our RPs and compare those traits" approach, yes? 14:32:51 there's been more replies to the ML thread 14:33:22 yeah, I mean, I don't think there's any other "right" way to handle it. that's just MHO 14:34:10 the other option was to just punt in the API and only allow old image-traits == new image-traits on a rebuild, or rather kick out any request that has image-traits that did not exist in the original image 14:34:52 and arvind isn't here 14:35:09 is this really stuck? 14:35:12 by the definition? 14:35:16 yeah i think it is 14:35:29 it's been a while since I looked, 14:35:35 but what is the blocking disagreement? 14:35:48 how to handle rebuild with a new image that has required traits 14:35:54 whether or not to ask placement to verify the image-traits on a rebuild request 14:35:56 it's in that ML thread 14:36:05 I've been ignoring it 14:36:27 the question is whether or not we block rebuild to same host if the traits have changed? 14:36:28 my opinion is, we have to ask placement if we want to check whether the image-traits are okay 14:37:05 should it just match live-migrate kind of things, where the host is specified? 14:37:13 and the debate is whether we should check them with placement or just do a flavor compare, old vs new and only allow same traits. and not ask placement anything 14:37:23 ah 14:37:39 just comparing flavors is way too naive I tink 14:37:40 s/flavor/image/ 14:37:48 image, sorry 14:37:49 that'll work in a contrived functional test, 14:37:56 but not in real life I think 14:38:19 melwitt: yeah I know, same deal.. "without asking placement" I mean 14:38:38 right the current proposal in the spec from arvind is the api checks if the new image for rebuild has traits which are a subset of the original image traits used to create the instance 14:38:45 which might or might not work with the current state of the host, 14:38:53 assuming no external service has changed the traits on the compute node provider 14:38:58 I can't imagine that working the way you expect in practice 14:39:31 i think this is basically the same thing as running the ImagePropertiesFilter on rebuild for a new image 14:39:33 and it artificially limits a rebuild if the compute host has trait A and B and you originally booted with trait A and want to rebuild with trait B and then it says NO 14:39:35 i don't really see how this is different 14:39:50 yup, agreed, I see it the same way 14:39:56 I shall commentificate upon the threadage and reviewage 14:40:03 thanks dansmith 14:40:34 okay, any other comments on that before we move to open discussion? 14:41:00 #topic Open discussion 14:41:05 Do we have a bug open for the func test failure in test_parallel_evacuate_with_server_group (AssertionError: 'host3' == 'host3'). Been seeing this pretty frequently, seems like a race. 14:41:15 we do 14:41:29 cfriesen had some ideas about it. lemme see if I can find it 14:41:44 Not that I plan to debug the thing. Just want to be able to start saying "recheck bug #xxxx" instead of "recheck" :) 14:41:49 #link https://bugs.launchpad.net/nova/+bug/1763181 14:41:50 Launchpad bug 1763181 in OpenStack Compute (nova) "test_parallel_evacuate_with_server_group intermittently fails" [Medium,Confirmed] 14:41:55 Thanks melwitt 14:42:11 yes it's been open for a couple of weeks 14:42:22 http://status.openstack.org/elastic-recheck/#1763181 14:42:24 yeah, so from what cfriesen mentioned on there, it sounds like it's a race in the actual code, not just the test 14:42:30 which sucks 14:42:39 because that'll be harder to fix 14:43:16 so if anyone wants to help with the solution to that, it would be really cool ;) 14:43:28 anything else for open discussion? 14:44:20 okay, we'll wrap up. thanks everybody 14:44:23 #endmeeting