21:00:07 #startmeeting nova 21:00:08 Meeting started Thu Feb 28 21:00:07 2019 UTC and is due to finish in 60 minutes. The chair is melwitt. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:12 The meeting name has been set to 'nova' 21:00:14 o/ 21:00:15 hi everyone, welcome to the nova meeting 21:00:18 o/ 21:00:19 o/ 21:00:21 agenda https://wiki.openstack.org/wiki/Meetings/Nova 21:00:21 ~o~ 21:00:33 that's my move 21:00:46 let's make a start 21:00:47 \o 21:00:49 #topic Release News 21:00:56 #link Stein release schedule: https://wiki.openstack.org/wiki/Nova/Stein_Release_Schedule 21:01:03 #info non-client library freeze is today Feb 28, os-vif 1.15.1 was released, os-resource-classes 0.3.0 was released. os-traits did not have anything new to release since last version. 21:01:19 so all of our non-client library releases are done 21:01:26 #info s-3 feature freeze is March 7 21:01:29 one week away 21:01:37 #link Stein blueprint status tracking: https://etherpad.openstack.org/p/nova-stein-blueprint-status 21:01:53 we're tracking progress here ^ 21:02:00 o/ 21:02:01 #link Stein RC potential changes tracking: https://etherpad.openstack.org/p/nova-stein-rc-potential 21:02:25 RC potential blocker bugs and other related RC stuff goes here ^ 21:02:39 #link Stein runway etherpad: https://etherpad.openstack.org/p/nova-runways-stein 21:02:47 #link runway #1: https://blueprints.launchpad.net/nova/+spec/flavor-extra-spec-image-property-validation (jackding) [END 2019-03-06] https://review.openstack.org/#/c/620706/ Flavor extra spec and image properties validation 21:02:53 #link runway #2: https://blueprints.launchpad.net/nova/+spec/ironic-conductor-groups (jroll) [END 2019-03-06] https://review.openstack.org/#/c/635006/ ironic: partition compute services by conductor group 21:03:00 this is merged and bp marked as complete today ^ 21:03:08 #link runway #3: https://blueprints.launchpad.net/nova/+spec/enable-rebuild-for-instances-in-cell0 (ttsiouts) [END 2019-03-07 - feature freeze] https://review.openstack.org/570201 21:03:34 does anyone have anything else to mention for release news? or questions? 21:03:55 ok, moving on 21:03:57 #topic Bugs (stuck/critical) 21:04:02 no critical bugs 21:04:09 #link 69 new untriaged bugs (up 5 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 21:04:16 #link 9 untagged untriaged bugs (up 3 since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 21:04:25 #link bug triage how-to: https://wiki.openstack.org/wiki/Nova/BugTriage#Tags 21:04:31 #help need help with bug triage 21:05:08 when doing bug triage, use the nova-stein-rc-potential bug tag for potential RC blockers 21:05:47 #link ML post http://lists.openstack.org/pipermail/openstack-discuss/2019-February/003343.html 21:05:59 Gate status 21:06:04 #link check queue gate status http://status.openstack.org/elastic-recheck/index.html 21:06:10 3rd party CI 21:06:15 #link 3rd party CI status http://ciwatch.mmedvede.net/project?project=nova&time=7+days 21:06:37 anything else to mention for bugs or gate/CI? 21:06:56 ok, continuing 21:07:05 #topic Stable branch status 21:07:13 #link stable/rocky: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/rocky,n,z 21:07:31 very few rocky backports proposed 21:07:38 #link stable/queens: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/queens,n,z 21:07:42 That must mean rocky was perfect 21:07:56 yeah that's what I assume 21:07:56 * artom prefers Rocky II personally 21:08:08 queens backports could use some review help 21:08:14 lots o backports 21:08:20 #link stable/pike: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/pike,n,z 21:08:31 something something Stallone can beat up Freddy Mercury 21:08:32 bunch for pike too. help wanted 21:08:58 I'll propose stable releases next week on s-3 day 21:09:24 since we usually aim for doing stable release at milestone 21:09:40 maybe we should do that a week later bc FF actually 21:10:19 so maybe the week after FF aim to flush stable reviews and release 21:10:35 anything else for stable branches before we move on? 21:10:46 ok 21:10:50 #topic Subteam Highlights 21:11:01 efried: any updates for scheduler? 21:11:07 you know it 21:11:14 #link n-sch minutes http://eavesdrop.openstack.org/meetings/nova_scheduler/2019/nova_scheduler.2019-02-25-14.00.html 21:11:22 We discussed 21:11:22 #link alloc cands in_tree series starting at https://review.openstack.org/#/c/638929/ 21:11:22 ...which has since merged \o/ (microversion 1.31) 21:11:38 We discussed 21:11:38 #link the OVO-ectomy https://review.openstack.org/#/q/topic:cd/less-ovo+(status:open+OR+status:merged) 21:11:38 ...all of which has since merged. There is a continuation of 21:11:38 #link refactors and cleanup currently starting at https://review.openstack.org/#/c/637325/ 21:11:51 We discussed 21:11:51 #link libvirt reshaper (new bottom of series) https://review.openstack.org/#/c/636591/ 21:11:51 That bottom patch has merged, and the rest of the series is mostly green except for one issue noted at 21:11:52 #link what happens to mdevs on reboot? https://review.openstack.org/#/c/636591/5/nova/virt/libvirt/driver.py@586 21:12:02 We discussed 21:12:02 #link ML thread about placement & related bug/bp tracking #link http://lists.openstack.org/pipermail/openstack-discuss/2019-February/003102.html 21:12:02 As well another couple of operational things that should be hashed out on the ML, possibly initiated there by the PTL (old or new): 21:12:02 - Format/fate of the n-sch meeting 21:12:02 - Placement team logistics at the PTG 21:12:15 END 21:12:34 cool, so based on that I will mark the in_tree bp as complete 21:12:42 there is a spec update pending 21:13:07 I think that's ok. I'll bug people to review the spec update 21:13:14 #link in-tree alloc candidates spec update https://review.openstack.org/#/c/639033/ 21:13:24 yeah, self.review_that_sucker() 21:13:39 yeah, me too 21:13:41 ok 21:13:48 no updates for API from gmann on the agenda 21:13:59 so we'll move on to... 21:14:05 #topic Stuck Reviews 21:14:25 (mriedem): Decide what to do about attaching volumes with tags to shelved offloaded servers for https://review.openstack.org/#/c/623981 21:14:35 #link ML thread with options: http://lists.openstack.org/pipermail/openstack-discuss/2019-February/003356.html 21:14:50 you want me to just go? 21:14:57 yeah, sure 21:14:57 the options are in the email 21:15:06 the way the root detach/attach code is written today, 21:15:14 when detaching a root volume, the tag is reset to None, 21:15:25 with the idea that when you attach a new root volume, you could specify a new tag, 21:15:33 the problem is, root detach/attach is only allowed on shelved offloaded instances, 21:15:47 but the api does not allow you to attach a volume with a tag to a shelved offloaded instance 21:15:50 the tag part specifically 21:16:03 the original thinking was because when we unshelve, we don't know if the compute will support tags 21:16:05 and honor them 21:16:08 however, 21:16:27 that's already a latent bug because i can createa server with device tags, shelve it and then unshelve it and if i land on a host that does not support device tags, it passes but my tags aren't exposed to the guest 21:16:46 that recorded with bug 1817927 21:16:47 bug 1817927 in OpenStack Compute (nova) "device tagging support is not checked during move operations" [Undecided,New] https://launchpad.net/bugs/1817927 21:16:51 same is true for any move operation actually, 21:16:59 because we don't consider the user-requested device tags during scheduling at all, not even create 21:17:05 so, 21:17:23 i think we're restricting attaching volumes with tags to shelve offloaded servers for really no good reason 21:17:46 I guess realistically, how many people are running heterogeneous clouds with the potential to hit bug 1817927? It was reported by mriedem, not and end user/operator... 21:17:46 bug 1817927 in OpenStack Compute (nova) "device tagging support is not checked during move operations" [Undecided,New] https://launchpad.net/bugs/1817927 21:17:46 the question is what to do about it in the context of the root volume detach/attach series 21:17:57 artom: i would say slim 21:18:05 mriedem, yeah, so I'm partial to 1, with you 21:18:13 also, looking back, 21:18:28 we probably should have put a policy rule in the api for device tags if your deployment doesn't support them 21:18:29 IIUC, when you create an instance, tags are not guaranteed to be supported by the compute host the server lands on? 21:18:32 like we have for trusted certs 21:18:47 melwitt: correct, and if they land on a compute that doesn't support them during create, it aborts 21:18:54 no reschedule, nothing - you're dead 21:19:00 melwitt, this was more true in the past with the possibility of older versions, but now it's just about running a supported hypervisor 21:19:09 ok. then I guess I don't see why to restrict it for shelve/unshelve 21:19:10 again correct 21:19:13 hyperv and xen and libvirt have them (for boot time) 21:19:30 yeah so if you're running VIO and your users try to specify tags during server create, kaboom 21:19:46 we could policy rule that out of existence if we wanted, but it hasn't come up 21:19:58 yeah. it seems like the existing restriction doesn't make sense given that there's not even a restriction for create 21:20:05 also, with the compute-driven capabilities traits stuff that aspiers is working on, 21:20:27 we can modify the request spec in train to say, "the user wants tags, so make sure you give them a compute which supports tags" 21:21:01 yeah, that would be nice 21:21:12 so if we're leaning to option 1, we would lift that restriction in the same microversion Kevin_Zheng is adding for the root attach/detach support 21:21:17 i assume anyway 21:21:36 we can't really just remove the restriction and say 'oops' for interop reasons 21:21:56 yeah. I can't immediately think of how a separate microversion would help 21:22:08 this does make his change more complicated 21:22:14 yeah :( 21:22:21 but i think it needs to happen this way, i don't want to half ass around with multiple microversions for this 21:22:39 mriedem, actually, hold up 21:23:04 IIRC one of the reasons we outright refused tagged attach to shelved is because we had to communicate with the compute manager 21:23:20 Which we didn't know at the time of attach 21:23:30 Has this been "solved" by Kevin's work? 21:23:48 when attaching a volume to a not-shelved server, we call down to compute to reserve a device name 21:23:58 when attaching a volume to a shelved offloaded server, we just create the bdm in the api 21:24:21 in the case of your tagged attach code, it will also check the compute capabilities to see if it supports tagged attach and blow up if not 21:24:32 so we wouldn't have ^ in the case of shelved offloaded attach 21:24:43 however, as noted, we're already not honoring device tags on unshelve anyway 21:24:50 so....who cares? 21:25:16 the long-term fix for doing that properly is the scheduling based on required traits stuff 21:25:31 i don't think we can just start exploding unshelve because servers have tags with them now 21:25:46 until the scheduler piece is worked in 21:26:23 artom: you're thinking of this https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L5480 21:26:31 ^ happens during an attach to a non-shelved server 21:27:07 multiattach volumes are kind of broken in the same way wrt unshelve 21:27:22 you can boot from multiattach volume, shelve and then unshelve elsewhere on something that doesn't support it 21:27:59 the api kicks out trying to attach multiattach volumes to shelved servers as well 21:28:26 https://github.com/openstack/nova/blob/master/nova/compute/api.py#L4199 21:28:48 I assume volume attach is the only time you can add device tags to something 21:29:01 create and attach 21:29:15 otherwise a workaround would be to set them after attaching sans tags 21:29:17 got it 21:29:33 we don't have that today 21:29:38 mriedem, hah, found it https://review.openstack.org/#/c/391941/50/nova/compute/api.py 21:29:39 right 21:29:47 And yeah, it was only checking for compute host support 21:30:37 so that means a server create could reschedule to land on a host with support? 21:31:00 server create aborts if it lands on a host that doesn't support tags 21:31:04 it does not reschedule 21:31:18 oh that's in manager 21:31:19 I see. ok 21:31:26 https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1898 21:31:48 so another option is, 21:31:59 land kevin's code as-is (option 2 in my email), 21:32:28 and when we have smarter scheduling to take device tags and multiattach volumes into account, we could add a microversion to top the tag/multiattach restriction on attaching volumes to shelved offloaded instances 21:32:52 which is these 2 checks https://github.com/openstack/nova/blob/master/nova/compute/api.py#L4191 21:33:10 s/top/drop/ 21:33:11 It'd be kinda weird to just disappear a tag without warnin 21:33:13 *warning 21:33:36 that also sounds reasonable, and would make it easier on Kevin for landing this series. the only potential pitfall is that ^ you could lose your tags and be unable to re-add them 21:33:41 well, we'd probably put something in the api reference saying 'at this microversion you can detach a root volume but note that the tag will be gone with it and you cannot provide a new tag when attaching a new root volume' 21:34:13 We'll need a big warning regardless 21:34:16 ++ 21:34:22 Hrmm, so actually, the API is user-facing, right? 21:34:36 our api is meant to be used by users yes... 21:34:37 So if we're going to warn about stuff in the API, it should be about what users can change/control 21:34:48 Ie, telling them their tag will disappear is fair game 21:34:51 I think ideal is option 1, the restriction seems artificial based on what's been explained. but can we even get option 1 done within a week 21:35:08 Telling them their unshelve might blow up because behind the scenes the operator is running different HVs isn't fair 21:35:12 Because they can't do anything about that 21:35:38 So with that reasoning I'm leaning more 2 now 21:36:05 artom: as in do what we have now, and in the future when we don't suck at scheduling, lift the api restriction 21:36:16 mriedem, yeah 21:36:19 (Heh, "when") 21:36:21 I guess I could see how unshelve is worse than rejecting a create in a mixed HV's env because you haven't invested much into your server yet 21:36:42 well, unshelve just fails, 21:36:46 we don't delete your snapshot 21:37:08 actually unshelve doesn't even fail if you have device tags 21:37:12 that's that bug from earlier 21:37:17 bug 1817927 21:37:18 bug 1817927 in OpenStack Compute (nova) "device tagging support is not checked during move operations" [Undecided,New] https://launchpad.net/bugs/1817927 21:37:20 oh yeah, right. 21:37:26 (That'd be hillarious if the FaultWrapper just deleted a random instance) 21:37:27 but nor does evacuate, resize or live migrate 21:38:00 So what does actually happen? The tag is just ignored? 21:38:19 yes 21:38:25 That's harmless 21:38:36 we don't honor the user request 21:38:42 So really 1 and 2 are the same in that sense 21:38:47 You end up with a tagless server 21:39:06 ok, this is pretty complicated to reason about but I think the problem has been adequately explained. so we could continue discussing in #openstack-nova and/or the ML 21:39:06 In 1 it's ignored by the unshelve 21:39:12 In 2 it's removed by the detach 21:39:41 ok we can move on, people can dump opinions in the ML 21:39:46 I have to bounce to pick up kids anyways 21:39:48 o/ 21:39:52 ok, cool 21:39:58 last thing, open discussion 21:40:15 #topic Open discussion 21:40:26 anyone have anything for open discussion before we wrap up? 21:40:52 going 21:41:00 going 21:41:18 ok, guess that's it, thank you all 21:41:19 #endmeeting