14:00:07 #startmeeting nova 14:00:08 Meeting started Thu Jun 28 14:00:07 2018 UTC and is due to finish in 60 minutes. The chair is melwitt. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:11 The meeting name has been set to 'nova' 14:00:19 howdy everybody 14:00:26 hello :) 14:00:35 o/ 14:00:36 o/ 14:00:46 ō/ 14:01:17 \o 14:01:22 #topic Release News 14:01:34 #link Rocky release schedule: https://wiki.openstack.org/wiki/Nova/Rocky_Release_Schedule 14:01:47 #link Rocky review runways: https://etherpad.openstack.org/p/nova-runways-rocky 14:01:55 #link runway #1: Report CPU features as traits - https://blueprints.launchpad.net/nova/+spec/report-cpu-features-as-traits [END DATE: 2018-07-06] https://review.openstack.org/560317 14:01:56 patch 560317 - nova - Add method to get cpu traits 14:02:02 #link runway #2: NUMA aware vSwitches - https://blueprints.launchpad.net/nova/+spec/numa-aware-vswitches (stephenfin) [END DATE: 2018-07-06] https://review.openstack.org/#/q/topic:bp/numa-aware-vswitches 14:02:08 #link runway #3: Complex Anti Affinity Policies: https://blueprints.launchpad.net/nova/+spec/complex-anti-affinity-policies (yikun) [END DATE: 2018-07-06] https://review.openstack.org/#/q/topic:bp/complex-anti-affinity-policies 14:02:16 o/ 14:02:33 r-3 is July 26, about a month from now 14:03:04 I don't have anything else for release news or runways. anyone have anything they'd like to add? 14:03:59 seems like all 3 are getting decent review 14:04:43 I noticed that too. kudos to everyone pitching in there 14:05:05 * bauzas waves a bit late 14:05:16 #topic Bugs (stuck/critical) 14:05:27 no bugs in the critical bugs link 14:05:33 #link 49 new untriaged bugs (up 2 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 14:05:39 #link 11 untagged untriaged bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 14:05:57 * tssurya waves late 14:06:02 untriaged bug count still creeping up. thanks for those who have been helping and we need more help 14:06:09 #link bug triage how-to: https://wiki.openstack.org/wiki/Nova/BugTriage#Tags 14:06:15 #help need help with bug triage 14:06:25 Gate status: 14:06:30 #link check queue gate status http://status.openstack.org/elastic-recheck/index.html 14:07:11 i have a new cells v1 skip patch 14:07:17 https://review.openstack.org/#/c/578125/ 14:07:18 patch 578125 - nova - Skip ServerShowV247Test.test_update_rebuild_list_s... 14:07:21 another rebuild test for cells v1 14:07:40 i think that's all i've noticed for new gate issues this week 14:07:57 a-ha, cool, thanks. will review that one 14:08:21 3rd party CI: 14:08:27 #link 3rd party CI status http://ci-watch.tintri.com/project?project=nova&time=7+days 14:09:09 does anyone have anything else for bugs, gate status or third party CI? 14:09:30 random note, when do we plan to remove cells v1? 14:09:45 after nova-net 14:09:49 which at this point is at least stein 14:10:05 yep 14:10:08 cool, thanks 14:10:23 johnthetubaguy: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131247.html 14:10:43 thanks for that link 14:10:56 any other questions or comments on bugs, gate status or third party CI? 14:10:58 ah, thanks 14:11:30 #topic Reminders 14:11:39 #link Rocky Subteam Patches n Bugs https://etherpad.openstack.org/p/rocky-nova-priorities-tracking 14:12:15 that's all I have for reminders. does anyone have any other reminders to highlight? 14:13:15 #topic Stable branch status 14:13:20 #link stable/queens: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/queens,n,z 14:13:51 have 6 queens backports proposed needing stable reviews 14:13:59 #link stable/pike: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/pike,n,z 14:14:18 7 backports for pike 14:14:30 #link stable/ocata: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/ocata,n,z 14:14:49 6 backports for ocata 14:15:37 does anyone have anything else for stable branch status? 14:16:34 #topic Subteam Highlights 14:16:59 * efried scrambles 14:17:02 cells v2, we skipped the weekly meeting yesterday 14:17:15 we chatted about the "handling a down cell" spec 14:17:25 added comments on the review 14:17:33 anything else I'm missing dansmith or mriedem? 14:17:41 i had that marker bug fix patch 14:17:47 and dan has his sub url patch 14:17:53 * mriedem finds links 14:18:03 oh right. those are on my review list 14:18:05 #link handling a down cell spec https://review.openstack.org/#/c/557369/ 14:18:06 patch 557369 - nova-specs - Handling a down cell 14:18:18 #link https://review.openstack.org/#/c/576161/ 14:18:19 patch 576161 - nova - Fix regression when listing build_requests with ma... 14:18:33 #link https://review.openstack.org/#/c/578163/ 14:18:33 patch 578163 - nova - Allow templated cell_mapping URLs 14:18:56 * bauzas needs to disappear for family reasons 14:18:58 cool, thanks 14:20:00 scheduler, cdent or efried or jaypipes? 14:20:08 We discussed https://bugs.launchpad.net/nova/+bug/1777591 14:20:08 Launchpad bug 1777591 in OpenStack Compute (nova) "‘limit’ in allocation_candidates where sometimes make force_hosts invalid" [High,In progress] - Assigned to xulei (605423512-j) 14:20:08 TL;DR: if you're force_host'ing, the limit on GET /a_c can make you come up dry. 14:20:20 i fell off the net midway through 14:20:54 We agreed on the long-term approach, which is to make GET /a_c accept UUID(s) to limit the results. 14:21:00 looks like they've updated the patch to not pass a limit when forcing a host or node https://review.openstack.org/#/c/576693/ 14:21:00 patch 576693 - nova - Disable limits if force_hosts or force_nodes is set 14:21:09 And the backportable fix, which is to disable the limit qparam when forcing... yeah, what mriedem said. 14:21:28 we also talked about functional ways to test this, but it's not easy 14:21:46 We also agreed that the long-term approach should be punted to Stein, because we've got too much going on in Rocky to try to add it. 14:22:26 And cdent and I spent some time starting down the rabbit hole of what it really means to say GET /a_c?rp_uuids=in:X,Y,Z 14:23:12 Hint: it's not going to be as simple as you think. Handling nested/shared gets hairy, unless we start exposing the concept of "anchor" (or "target", whatever you want to call it). Which we have a hard enough time explaining internally in code. 14:23:16 That's it. 14:23:56 lots of good info there, thanks for the update efried 14:23:57 notifications subteam, gibi? 14:24:11 melwitt: there was no meeting this week 14:24:20 melwitt: here is the current status mail #link http://lists.openstack.org/pipermail/openstack-dev/2018-June/131827.html 14:24:38 melwitt: many closed blueprint so happyness :) 14:24:46 melwitt: that is all 14:24:52 woot 14:25:03 thanks gibi 14:25:32 anything else for subteam highlights before we move on? 14:26:27 #topic Stuck Reviews 14:26:38 nothing on the agenda. does anyone in the room have anything for stuck reviews? 14:27:28 #topic Open discussion 14:27:44 we have a couple of items on the agenda, specless blueprints looking for approval 14:27:58 #link specless blueprint approval wanted (jangutter): https://blueprints.launchpad.net/nova/+spec/vrouter-os-vif-conversion 14:28:08 Thanks! 14:28:12 similar to this change that merged earlier this cycle: https://review.openstack.org/534371 14:28:12 patch 534371 - nova - remove IVS plug/unplug as they're moved to separat... (MERGED) 14:28:47 from what I understand, this is just converting existing code to leverage os-vif and fairly simple 14:28:54 famous last words 14:29:04 yeah, true 14:29:13 Yep. I can go into some funzies if you want.... 14:29:42 But, the worst seems to be that this is going to move pain out of Nova. 14:29:56 I think my main question about it is, are there any migration issues to consider here? 14:30:13 upgrade/migration 14:30:17 There has to be an upgrade step, yeah: 14:30:20 since we're past spec freeze, i think it would be good to say that anything new coming in as exceptions require at least 2 cores signing up to review them, like we used to do in days of old with FFEs 14:30:35 I'm also 100% on board with that. 14:31:00 that's fair, mriedem 14:31:02 So during the upgrade, Contrail will have to pull in a new os-vif plugin. 14:31:30 However: most Contrail installs lock the version of Contrail/Tungsten Fabric 1:1 with OpenStack. 14:31:36 which is the same os-vif plugin repo that provides the contrail_vrouter vif type support yes? 14:31:45 mriedem: correct. 14:31:53 so chances are it's already there 14:32:01 but you'd need the new version that supports the vrouter vif type 14:32:34 mriedem: correct. And since the next version of Contrail is only targeting Queens, there's going to be at least one cycle of overlap. 14:32:41 since the contrail plugin reviews aren't in our gerrit - can we still depend on them using zuulv3? 14:33:24 in other words, i think the nova change shouldn't land until either the contrail plugin series is merged (if we can depends-on with zuul externally) or those are released (even though they aren't released to pypi....) 14:33:47 mriedem: I think that's a fair ask. 14:33:48 btw, kind of sucks those don't get released to pypi 14:34:17 mriedem: I'll definitely pass along the request to TF/Contrail 14:34:49 so who is signing up to review this? 14:34:54 so there's a possibility that it will be "ok" to merge nova changes after contrail plugin changes merge, _without_ a release? 14:35:11 of the plugin? 14:35:12 melwitt: idk, i thought zuulv3 can depends-on external changes b/c of urls 14:35:20 if so, then that's probably good enough 14:35:31 i've never tried it though, like a nova change depending on a github change 14:35:34 what I mean is, do consumers of the plugin pull it from source? 14:35:44 they get it from their big vendor product install 14:35:57 okay 14:35:58 melwitt: consumers of the plugin grabs it from the massive containers published by TF/Contrail 14:36:09 https://github.com/Juniper/contrail-nova-vif-driver 14:36:16 that does get tagged at least 14:36:17 melwitt: they're pretty far downstream from OpenStack 14:36:25 I see. thanks 14:36:47 normally with dependent libraries we'd say you need at least version x.y of something in the reno 14:36:49 is kind of my point 14:37:02 yeah, that is what I'm used to 14:37:24 so we could hold the nova change until the https://github.com/Juniper/contrail-nova-vif-driver patches are merged and tagged 14:37:47 * melwitt nods 14:37:53 hopefully jangutter is actually testing all of this end to end to make sure it works... 14:37:54 (facepalm) yes, that is a very good point. I'll engage with the TF community to ensure that happens. 14:37:55 since there is no CI 14:38:32 anyway, i'll leave a comment on the nova change and we can move on 14:38:33 mriedem: oh yes, but the coverage is very very tiny :-( 14:38:47 Thanks for indulging! 14:38:54 can i (1) install it and (2) create a server with a port that plugs in properly 14:39:21 mriedem: that's tricky, since Contrail historically has problems with OpenStack master. 14:39:44 mriedem: Tungsten Fabric is trying to get better, but it's going to take a bit of time. 14:39:45 so this nova change is going to be backported internally to queens? 14:40:04 anyway, i don't need to know that 14:40:07 mriedem: this change doesn't need backports to queens. 14:40:11 nor do i expect 3rd party CI 14:40:19 jangutter: i know it doesn't upstream :) 14:40:26 anyway, better move on 14:40:32 okay. so jangutter needs to find 2 cores that can commit to reviewing the series, if so, we'll approve the bp 14:40:46 thanks! 14:40:47 stephenfin might be a good person to ask, he has knowledge in os-vif 14:40:57 okay, next one on the agenda: 14:41:08 #link specless blueprint approval wanted (gmann): https://blueprints.launchpad.net/nova/+spec/api-extensions-merge-rocky 14:41:39 continuation of queens work to merge API extensions in controller and schema code 14:41:57 seems okay to me. other opinions? 14:42:27 it was approved for rocky until about a week ago when i deferred it due to no activity, 14:42:30 since then gmann went nuts 14:42:33 it had gotten untargeted from rocky because there were no changes proposed and now it has several changes proposed 14:42:35 so might as well throw it back into the mix 14:42:54 ++ okay, will approve 14:43:19 okay, next, mnaser has a review he wanted to discuss 14:43:32 hi! 14:43:33 #link Add hostId to metadata service https://review.openstack.org/577933 14:43:34 patch 577933 - nova - Add hostId to metadata service 14:43:49 i was just wondering thought about that with maybe more eyes to discuss it 14:44:03 i would be fine with that *except* for the issues with config drive not changing during live migration and the hostId will be stale 14:44:17 well isnt 'normal' metadata published in config drive? 14:44:24 i don't think we have other fields in the config drive that suffer from that - we have the AZ but you can't live migrate out of the AZ unless you force it 14:44:37 like user pbulished metadata 14:44:49 mnaser: as in boot time metadata and then add more later? 14:44:52 yeah 14:45:18 yes but i'd kind of think those are used differently 14:46:16 * gibi needs to drop of 14:46:21 does sound a bit like setting you up for a facepalm 14:46:22 if you create a guest and require the ability to be able to change metadata later and have it reflected in the guest, you can't rely on config drive - but i'm not sure that we have a way to say 'definitely no config drive, metadata api or bust' 14:46:35 so hostId would create an inconsistency between config drive metadata vs metadata API metadata? 14:46:50 hostId is hashed instance.host + project_id 14:47:07 if the guest is live migrated, the config drive hostId hash would be from the initial host, not the current host 14:47:14 and resize, etc 14:47:14 so it can't really be trusted in a public cloud that does a lot of live migratoin 14:47:28 i'm honestly not sure in which cases we rebuild the config drive 14:47:32 besides rebuild and unshelve 14:47:33 I guess its a non use triggered action that changes a property you might not expect 14:47:36 yeah, understood. and this would be the only piece of metadata that would be like that 14:47:57 melwitt: i haven't done a full audit but a quick glance i tihnk so yes 14:48:00 as noted about AZ 14:48:09 server metadata is updated by the user, so that feels different somehow 14:48:17 right agree 14:48:31 as a user/guest, i won't/shouldn't know if i've been live migrated 14:48:39 i guess you'd know from the compute REST API on the outside though 14:48:53 if that hostId doesn't match what's in the guest, you know the guest was moved 14:49:14 there's been want to rebuild config drive in more cases in the past, so maybe doing something like that could create consistency? I can't remember if it was ever proposed to rebuild it upon live migration 14:49:18 mnaser: so i know this use case is for infra right now, but would you support this at vexxxhost? 14:49:25 yes 14:49:33 oops i made that too sexy 14:49:37 ha 14:49:45 lol 14:49:48 honestly 14:50:00 i would be okay with us maybe putting a warning saying that this *could* change in configdrive 14:50:14 but in general anything config drive is not 'total source of truth' 14:50:24 sure, that's why i'm not against the idea 14:50:26 just caveats 14:50:33 so that warning would apply in anything config drive.. metadata api service is still giving the best up to date info 14:50:33 and we don't document the metadata api responses at all 14:50:47 right, but some clouds don't even run the metadata api 14:51:07 that's why i was trying to figure out if there is a way to say i only want metadata api for my guest or nothing at all 14:51:29 you can say no config drive when creating a server (and that's the default from the api) but i don't think we have any way from the compute service to know that the metadata api isn't running 14:51:40 you can --config-drive false 14:51:42 ah, right 14:51:47 yeah it's not the same 14:51:57 yeah 14:51:59 it would be interesting to know if there are other fields that can be stale for config drive. maybe this is no different than the current state 14:52:13 server metadata was a good example 14:52:19 all the network state would be stale 14:52:27 don't regenerate on vif plug 14:52:33 sure any ports or volumes you attach after the server is created wouldn't be in config drive 14:52:35 yeah if you unplug or plug new vifs 14:52:37 ... real answer is we need to document that stuff 14:53:06 so while i agree that my stuff can be stale, other stuff can be too, so this might be a two part problem where i just volunteered myself to have some sort of warning saying 14:53:16 config drive data don't get regenerated and therefore might be stale 14:53:26 s/don't/doesn't/ 14:53:33 maybe we just need some docs in https://docs.openstack.org/nova/latest/user/config-drive.html about config drive being stale and ways to refresh it (server actions that is) 14:53:48 if we have that, i'd be more comfortable with adding this 14:53:56 like, if you need something fresh, do a rebuild? 14:53:58 yeah, I was trying to highlight whether this would be the _first_ stale thing or if there are already stale things and it's a known landscape 14:54:20 this might be the first stale thing not initiated by a user is the wrinkle i think 14:54:23 I thought config drive wasn't even needed/used after initial boot. Pretty sure in Power we remove the thing and never reattach it. 14:54:24 so it sounds like, this is probably okay provided that docs patches are landed with it to be clear about config drive 14:54:26 * stephenfin wonders why we still care about config drive if the metadata service seems better in every way 14:54:36 * mnaser can add document update to config drive being stale in a patch under mine 14:54:37 stephenfin: see above 14:54:40 mriedem: I see 14:54:44 stephenfin: some deployments don't run the metadata api 14:55:00 mnaser: ok do that an i'm ok with your change 14:55:01 yeah, Oath don't run the metadata API, for one 14:55:02 but i don't have the time/resources to add an extra patch that implements rebuilding config drive somewhow 14:55:13 heh i'm not asking for that 14:55:18 i think SpamapS wants it though 14:55:22 but he knows how to code i think 14:55:24 :) 14:55:42 ok so i will push up a patch under mine as a doc update to let users know that configdrive could have stale data as it is built when the instance is first created 14:55:49 also im pretty sure that only happens if you use shared storage 14:55:55 (i think?) 14:56:21 maybe just on live migrations would be better to summarize it 14:56:28 okay, sounds like we agree this is okay with some doc updates to go with it 14:56:40 we have 4 minutes left 14:56:55 thanks everyone (for comments) 14:57:04 does anyone in the room have anything for open discussion before we wrap up? 14:58:14 okay, let's call it. thanks everyone 14:58:18 #endmeeting