#openstack-nova log

16:00:24 <bauzas> #startmeeting nova
16:00:24 <opendevmeet> Meeting started Tue May 24 16:00:24 2022 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:24 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:24 <opendevmeet> The meeting name has been set to 'nova'
16:00:32 <bauzas> hey folks
16:00:37 <whoami-rajat> Hi
16:00:53 <bauzas> #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting
16:01:15 <elodilles> o/
16:01:59 <opendevreview> Kashyap Chamarthy proposed openstack/nova master: libvirt: Add a workaround to skip compareCPU() on destination  https://review.opendev.org/c/openstack/nova/+/838926
16:02:08 <gibi> o/
16:02:33 <bauzas> ok, let's start
16:02:38 <bauzas> #topic Bugs (stuck/critical)
16:02:43 <bauzas> #info No Critical bug
16:02:47 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 15 new untriaged bugs (+1 since the last meeting)
16:02:56 <bauzas> no worries sean, I saw you did a hard work
16:03:04 <bauzas> #link https://storyboard.openstack.org/#!/project/openstack/placement 26 open stories (0 since the last meeting) in Storyboard for Placement
16:03:09 <bauzas> #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster
16:03:22 <bauzas> sean-k-mooney: any bug you wanted to discuss for triage ?
16:03:40 <sean-k-mooney> am not anyting pressing
16:03:42 <sean-k-mooney> https://etherpad.opendev.org/p/nova-bug-triage-2022-05-17
16:03:46 <sean-k-mooney> we had one feature request
16:03:49 <sean-k-mooney> for numa in placment
16:03:56 <sean-k-mooney> which i marked as invlaid
16:04:07 <sean-k-mooney> and and one duplciate fo an
16:04:14 <sean-k-mooney> oslo messaging bug
16:04:26 <sean-k-mooney> the rest did nto have enouch info to triage really
16:04:31 <sean-k-mooney> so i marked them incomplete
16:04:42 <sean-k-mooney> i also checked some of the incomplete form last week
16:04:46 <sean-k-mooney> but no change really
16:05:13 <sean-k-mooney> one fixed bug form stephen https://bugs.launchpad.net/nova/+bug/1974173
16:05:20 <sean-k-mooney> thats about it
16:06:34 <bauzas> ok thanks
16:06:40 <bauzas> and thanks again for triaging
16:06:57 <bauzas> elodilles: are you okay for getting the baton for this week ?
16:07:44 <elodilles> bauzas: yepp o7
16:07:54 <bauzas> thanks
16:08:00 <bauzas> #info Next bug baton is passed to elodilles
16:08:20 <bauzas> #topic Gate status
16:08:59 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs
16:09:03 <bauzas> #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status
16:09:08 <bauzas> #link https://zuul.opendev.org/t/openstack/builds?job_name=nova-emulation&pipeline=periodic-weekly&skip=0 Emulation periodic job runs
16:09:14 <bauzas> as you can see ^ nothing to tell
16:09:45 <bauzas> both jobs and pipelines work
16:09:52 <bauzas> #info Please look at the gate failures and file a bug report with the gate-failure tag.
16:09:57 <bauzas> #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures
16:10:05 <bauzas> as a reminder for everyone ^ :)
16:10:30 <gibi> please note that we are still playing wack-a-mole with the volume detach issue. There are still open tempest patches adding more SSHABLE waiters
16:10:49 <bauzas> gibi: yup, we'll discuss this for the stable topic
16:11:03 <gibi> ack, but this is affecting master still :)
16:11:05 <sean-k-mooney> bauzas: well it affects master too
16:11:10 <sean-k-mooney> but sure
16:11:30 <bauzas> yep, I know, thanks for the reminder it also impacts master
16:12:14 <bauzas> #topic Release Planning
16:12:19 <bauzas> #link https://releases.openstack.org/zed/schedule.html
16:12:22 <bauzas> #info Zed-1 was last week
16:12:48 <bauzas> thanks sean-k-mooney for accepting the rc1 releases for the projectsd
16:13:03 <bauzas> oh, actually elodilles
16:13:12 <sean-k-mooney> ya it was elodilles
16:13:25 <sean-k-mooney> i repled on one after the fact
16:13:27 <bauzas> #link https://review.opendev.org/c/openstack/releases/+/841851 novaclient release for zed-1
16:13:34 <elodilles> well, it had a deadline, so needed a review & merge o:)
16:13:43 <sean-k-mooney> we dicussed it but i forgot to do it before the deadline
16:13:51 <bauzas> #link https://review.opendev.org/c/openstack/releases/+/841845 os-vif release for zed-1
16:13:56 <bauzas> sean-k-mooney: me too
16:14:05 <sean-k-mooney> elodilles: strictly speaking we dont have to do it by m1
16:14:08 <bauzas> and i was off this friday, didn't help
16:14:17 <sean-k-mooney> that is just the convention the the release team are following
16:14:30 <sean-k-mooney> btu its not requried by the release model
16:14:34 <bauzas> elodilles: don't be afraid to ping me if you need me to review some release change
16:14:38 <sean-k-mooney> we jsut need a intermediat release :)
16:14:43 <bauzas> correct
16:15:06 <elodilles> bauzas: ack :)
16:16:36 <elodilles> sean-k-mooney: not necessary, yes, but if there is no -1 from the team, then release managers merges the generated patches at deadlines o:)
16:16:55 <sean-k-mooney> elodilles: right the deadline is actully m3
16:17:04 <sean-k-mooney> the docs dont mention m1 at all
16:17:21 <sean-k-mooney> that is jsut a hold over form the release with milestone model
16:17:31 <sean-k-mooney> but thanks form taking care of it in any case
16:18:48 <sean-k-mooney> https://github.com/openstack/releases/blob/61f891ddd7bd3b28ac7b5e7e9e1d9203fbbe297d/doc/source/reference/release_models.rst#cycle-with-intermediary=
16:19:18 <elodilles> sean-k-mooney: see #2, and it's last chapter: https://releases.openstack.org/reference/process.html#milestone-1
16:19:34 <bauzas> elodilles: how can I see whether for example os-vif is either using a cycle-with-rc model or a cycle-with-intermediary one ?
16:19:55 <sean-k-mooney> elodilles: yep that is not in lien with the governance doc
16:19:57 <elodilles> bauzas: in the yaml file under deliverables/zed
16:20:05 <sean-k-mooney> anyway its not importnat now
16:20:18 <bauzas> elodilles: ok b/c https://releases.openstack.org/teams/nova.html doesn't tell it
16:20:31 <bauzas> anyway, moving on
16:20:42 <elodilles> ++
16:20:49 <bauzas> #topic Review priorities
16:20:55 <bauzas> #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+label:Review-Priority%252B1
16:21:00 <bauzas> #link https://review.opendev.org/c/openstack/project-config/+/837595 Gerrit policy for Review-prio contributors flag. Naming bikeshed in there.
16:21:06 <bauzas> #link https://docs.openstack.org/nova/latest/contributor/process.html#what-the-review-priority-label-in-gerrit-are-use-for Documentation we already have
16:21:39 <bauzas> I provided a comment for https://review.opendev.org/c/openstack/project-config/+/837595
16:21:43 <bauzas> please review it
16:21:58 <gibi> done :)
16:22:38 <bauzas> thanks
16:22:50 <bauzas> at least I'm French so in general I'm not good at naming things
16:23:06 <bauzas> but at least I try to find a consensus
16:23:14 <gibi> thank you for that
16:23:39 <bauzas> I think all contributors know what nova-core means
16:23:52 <bauzas> hopefully
16:24:15 <gibi> that is a fair assumption
16:24:35 <bauzas> for other repos, we could name the label differently of course, like 'osvif-core' if this is named by gerrit
16:25:21 <bauzas> ie. nova-specs-core review promise
16:25:31 <bauzas> os-vif-core etc.
16:25:41 <bauzas> but this is a naming bikeshed
16:26:24 <bauzas> anyway, moving on
16:26:32 <bauzas> #topic Stable Branches
16:26:40 <bauzas> in general I ask elodilles
16:26:46 <bauzas> but this time, let me do it
16:26:52 <bauzas> #info ussuri and older branches are still blocked, newer branches should be OK
16:27:03 <bauzas> melwitt had a point
16:27:33 <elodilles> just an update for that ^^^ i think ussuri is blocked but the older branches are not blocked anymore
16:27:47 <bauzas> #link https://etherpad.opendev.org/p/nova-stable-branch-ci stable branches CI issues tracking, feel free to update with stable branch CI issues
16:27:58 <bauzas> elodilles: woah
16:28:11 <bauzas> kudos to the team then
16:28:13 <elodilles> bauzas: l-c branches were merged
16:28:32 <elodilles> bauzas: i don't say they don't have intermittent failures though o:)
16:28:43 <bauzas> elodilles: I thought most of the issues were related to volume detach things, which are unrelated to l-c
16:28:44 <elodilles> but at least they are not blocked
16:28:48 <bauzas> ah
16:29:23 <bauzas> elodilles: but then, why ussuri is blocked while older not ?
16:29:47 <elodilles> ussuri and train were where tempest were not pinned,
16:29:58 <elodilles> and where tempest is running with py36
16:30:08 <elodilles> if i'm not mistaken that's it
16:30:21 <elodilles> and gmann's train fix has landed
16:31:04 <bauzas> ok thanks
16:31:05 <elodilles> originally we thought that ussuri does not need a fix as it has zuulv3 jobs already, but that's not true unfortunately
16:31:21 <bauzas> gmann told me he couldn't attend this meeting, so let's discuss this again next week
16:31:33 <elodilles> i mean, it has zuulv3 jobs, but still we are facing with the same issue
16:31:42 <elodilles> bauzas: ++
16:31:43 <gibi> so I think the next step is still to gather the intermitten failures and try to fix them
16:32:02 <bauzas> gibi: yeah, we'll track those on a weekly basis thanks to the etherpad
16:32:12 <gibi> ack
16:32:31 <elodilles> thanks melwitt for starting the etherpad \o/
16:32:37 <bauzas> yup, melwitt++
16:34:29 <bauzas> anything to discuss about those intermittent issues btw. ?
16:35:30 <elodilles> i guess we still need to collect them to have the full picture
16:35:36 <bauzas> yup
16:36:32 <gibi> yepp
16:37:11 <elodilles> maybe one note: for placement we don't have periodic-stable on wallaby and older
16:37:45 <bauzas> :/
16:37:49 <gibi> elodilles: do you suspect some instability in placement?
16:38:14 <elodilles> gibi: nope, but the gate is broken in wallaby and older in placement
16:38:15 <gibi> or is this just proactively running some jobbs
16:38:25 <gibi> broken?!
16:38:27 <elodilles> gibi: see melwitt's etherpad
16:38:28 <gibi> that is bad :/
16:38:46 <elodilles> though probably they are some known issues to fix
16:39:09 <gibi> I agree to add some periodic there then
16:39:25 <elodilles> gibi: ack, i can backport the patch that added the periodic
16:39:39 <elodilles> * periodic-stable
16:40:18 <bauzas> gibi: agreed too
16:41:04 <bauzas> moving on ?
16:41:14 <elodilles> bauzas: ++
16:41:24 <bauzas> #topic Open discussion
16:41:29 <bauzas> (whoami-rajat) Discuss regarding the design of rebuild volume backed instance feature
16:41:35 <bauzas> whoami-rajat: your turn
16:41:35 <whoami-rajat> Hi
16:41:39 <whoami-rajat> thanks
16:41:52 <whoami-rajat> #link https://review.opendev.org/c/openstack/nova-specs/+/840155
16:42:20 <whoami-rajat> So I started working on this feature in yoga (this was proposed/reproposed several times before) and the spec got approved
16:42:41 <whoami-rajat> now while reproposing it, sean-k-mooney has some concerns regarding the new parameter we are introducing ``reimage_boot_volume``
16:43:01 <whoami-rajat> it's a request parameter to tell the API, we are performing rebuild on a volume backed instance and not an ephemeral disk
16:43:13 <sean-k-mooney> yep
16:43:28 <whoami-rajat> initially the idea was not to have feature parity between both workflows but later there were many concerns with this operation being destructive
16:43:38 <whoami-rajat> even if you follow past specs, the concern has been discussed
16:44:15 <whoami-rajat> so lyarwood suggested to add this parameter ``reimage_boot_volume`` so any user who would like to opt in for this (as it has data loss risk) would only be able to do it
16:44:22 <sean-k-mooney> i really think that havign feature partiy btween bfv=True|false is imporant
16:44:38 <sean-k-mooney> i dont think the data loss argument holds
16:44:57 <sean-k-mooney> my reason is tha thtis is a deliberate instance action to rebuild the root disk
16:45:06 <gibi> rebuild is destructive for image bases instances too
16:45:11 <sean-k-mooney> yep
16:45:30 <sean-k-mooney> and rebuild is not the same as evacuate
16:45:31 <whoami-rajat> yes but the destructive operation is performed by cinder in this case where the volume resides on the cinder side
16:45:45 <sean-k-mooney> for evacuate we shoudl preserve the data
16:45:48 <bauzas> that's the whole purpose of this spec
16:45:58 <sean-k-mooney> for rebuild via the api we shoudl reimage the root volume
16:46:01 <bauzas> rebuild on BFV wasn't destructive, right?
16:46:14 <sean-k-mooney> rebuild was rejected
16:46:16 <sean-k-mooney> for bfv
16:46:17 <whoami-rajat> we didn't support rebuild on BFV
16:46:34 <sean-k-mooney> so the wole point is to allow rebuild with bfv
16:46:50 <bauzas> if so, there is a clear implication of what rebuild means for the root disk
16:47:13 <bauzas> we blocked because we were unable to rebuild the root disk if bfv
16:47:19 <sean-k-mooney> and technialy extra ephmeral disks
16:47:28 <sean-k-mooney> bauzas: correct
16:48:05 <bauzas> then, I don't see a need for differenciating BFV and non-BFV from an API pov
16:48:20 <bauzas> both will be destructive for the root disk
16:48:33 <sean-k-mooney> if so we also do not need an api microversion correct
16:48:39 <sean-k-mooney> and no api change at all
16:48:44 <sean-k-mooney> we just remove the block
16:48:50 <whoami-rajat> the destructive nature of this operation was the concern from many folks, I can't name everyone but this was approved in yoga so you can see
16:48:51 <bauzas> good question
16:48:53 <sean-k-mooney> when cinder is new enough
16:49:10 <whoami-rajat> dansmith, has been actively reviewing the changes I proposed last cycle so maybe he can weigh in
16:49:49 <bauzas> whoami-rajat: frankly, if we were about adding some parameter, it would be more for *not* recreating the volume
16:50:10 <dansmith> bauzas: the point of the spec/effort is to rebuild the root volume
16:50:17 <dansmith> i.e. to reimage it, but let cinder do the reimaging
16:50:29 <bauzas> dansmith: that's what I understand
16:50:38 <bauzas> so...
16:51:37 <bauzas> tbc, I don't see a need for an API param that'd say "yes, I want to rebuild by reimaging"
16:51:56 <bauzas> which would imply that the default would be "rebuild by not reimaging"
16:52:25 <sean-k-mooney> bauzas: no default would reject
16:52:49 <sean-k-mooney> bauzas: that was the behavior that i think lee suggested but i dont think i reviewd the previous iteration
16:52:57 <dansmith> I think user-initiated rebuild where we don't reimage root is pointless right?
16:53:06 <dansmith> as long as we don't rebuild on evacuate then we're good,
16:53:06 <sean-k-mooney> correct
16:53:13 <sean-k-mooney> ya
16:53:13 <bauzas> I agree
16:53:19 <dansmith> but this is specifically to make BFV behave like regular instances
16:53:48 <sean-k-mooney> right so evacuate shoudl continue to preseve the root disk if its on shared storage
16:53:50 <bauzas> correct me if I'm wrong, but I feel we are on the same page
16:53:57 <bauzas> evacuate should differ
16:54:06 <sean-k-mooney> and rebuidl will always reimage it provided cinder i new enough
16:54:06 <whoami-rajat> Since the main destruction is performed on the cinder side, I know a lot of folks on cinder side that won't agree to the idea of not adding this additional precautionary measure to avoid it
16:54:13 <bauzas> but rebuild should behave like regular instance, ie. reimage
16:54:17 <dansmith> bauzas: okay I guess I thought you were arguing for a special param
16:54:20 <whoami-rajat> as where the initial concern started ^
16:54:35 <bauzas> dansmith: I was absolutely on the other direction, see above :)
16:54:36 <sean-k-mooney> i realy dont liek the idea of make bfv special in the nova api
16:54:45 <bauzas> me too
16:54:51 <dansmith> bauzas: ack, sorry, I'm double-meeting-ing
16:55:04 <bauzas> from an API point of view, this is clear
16:55:11 <sean-k-mooney> whoami-rajat: if we want to prevent this form the cidner side
16:55:21 <sean-k-mooney> i think cinder need a way to block the reimage not nova
16:55:31 <bauzas> of course, since we share the same internal methods for evacuate and rebuild, we should make them differ based on some conditional
16:55:32 <sean-k-mooney> like locking the volume or similar
16:55:47 <bauzas> but this conditional doesn't have to be exposed at the API level
16:55:57 <sean-k-mooney> bauzas: i think we pass a flag to rebudil to signal if its an evacuate right
16:55:58 <dansmith> bauzas: we already have a flag to pass,
16:56:04 <dansmith> bauzas: because we have to honor the old microversion,
16:56:11 <dansmith> so we can just make sure it's ==false for the evac case
16:56:16 <bauzas> dansmith: yeah, I know, that's the conditional I thought
16:56:30 <dansmith> conditional at the rpc layer, but the only conditional in the api is "old or new microversion"
16:56:44 <dansmith> the only conditional *should* be version, I mean
16:56:46 <bauzas> dansmith: correct, that being said, there was an open question
16:56:50 <sean-k-mooney> dansmith: well do we need a microverion
16:56:57 <bauzas> about even whether we would need a microversion
16:57:01 <sean-k-mooney> there is not api request change
16:57:08 <bauzas> if we just unblock
16:57:10 <dansmith> I think we absolutely do,
16:57:17 <sean-k-mooney> i realy think we should not
16:57:19 <dansmith> because right now rebuild does not destroy data and after this, it would
16:57:30 <sean-k-mooney> right now it rejects the request
16:57:35 <whoami-rajat> sean-k-mooney, if the operation is initiated from nova side, I'm not sure how from cinder side we can provide a user input to block this
16:57:44 <dansmith> sean-k-mooney: only if the image is different
16:58:00 <sean-k-mooney> dansmith: no its rejected always i tought
16:58:06 <dansmith> sean-k-mooney: if the image is the same, it allows it
16:58:13 <sean-k-mooney> ...
16:58:13 <dansmith> whoami-rajat: right?
16:58:15 <whoami-rajat> but maybe I'm the only one defending the proposal
16:58:43 <whoami-rajat> dansmith, yes, for same image it does allow the rebuild
16:58:54 <dansmith> sean-k-mooney: ^
16:59:14 <bauzas> hah
16:59:18 <sean-k-mooney> that seams like a bug
16:59:23 <sean-k-mooney> since tha talso destorys data
16:59:34 <bauzas> (18:46:01) bauzas: rebuild on BFV wasn't destructive, right?
16:59:45 <sean-k-mooney> there is no differne form a data perspective fi you use the same image or differnt one
16:59:47 <bauzas> damn, we're about the end of time
16:59:49 <dansmith> sean-k-mooney: it doesn't on BFV but does on regular instances
17:00:13 <dansmith> sean-k-mooney: on BFV if the image is the same, it will just rebuild the ports or whatever, but no change to the disk
17:00:20 <dansmith> but will destroy the disk with the same image on a regular instance
17:00:23 <bauzas> I'll close this meeting, but I beg the people here to continue discussing this topic after
17:00:24 <sean-k-mooney> that the same as a hard reboot
17:00:37 <dansmith> sean-k-mooney: alas, it's api behavior we have had for YEARS
17:00:44 <sean-k-mooney> rebuild is not a move op
17:00:49 <dansmith> so changing it to now destroy data is a Bad Plan (tm)
17:00:51 <sean-k-mooney> and it should not really update the port either
17:00:55 <dansmith> well understood :)
17:00:59 <bauzas> thanks all, and for people interested in this bfv resize discuss, please stay around
17:01:04 <bauzas> #endmeeting