16:00:24 #startmeeting nova 16:00:24 Meeting started Tue May 24 16:00:24 2022 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:24 The meeting name has been set to 'nova' 16:00:32 hey folks 16:00:37 Hi 16:00:53 #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 16:01:15 o/ 16:01:59 Kashyap Chamarthy proposed openstack/nova master: libvirt: Add a workaround to skip compareCPU() on destination https://review.opendev.org/c/openstack/nova/+/838926 16:02:08 o/ 16:02:33 ok, let's start 16:02:38 #topic Bugs (stuck/critical) 16:02:43 #info No Critical bug 16:02:47 #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 15 new untriaged bugs (+1 since the last meeting) 16:02:56 no worries sean, I saw you did a hard work 16:03:04 #link https://storyboard.openstack.org/#!/project/openstack/placement 26 open stories (0 since the last meeting) in Storyboard for Placement 16:03:09 #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:03:22 sean-k-mooney: any bug you wanted to discuss for triage ? 16:03:40 am not anyting pressing 16:03:42 https://etherpad.opendev.org/p/nova-bug-triage-2022-05-17 16:03:46 we had one feature request 16:03:49 for numa in placment 16:03:56 which i marked as invlaid 16:04:07 and and one duplciate fo an 16:04:14 oslo messaging bug 16:04:26 the rest did nto have enouch info to triage really 16:04:31 so i marked them incomplete 16:04:42 i also checked some of the incomplete form last week 16:04:46 but no change really 16:05:13 one fixed bug form stephen https://bugs.launchpad.net/nova/+bug/1974173 16:05:20 thats about it 16:06:34 ok thanks 16:06:40 and thanks again for triaging 16:06:57 elodilles: are you okay for getting the baton for this week ? 16:07:44 bauzas: yepp o7 16:07:54 thanks 16:08:00 #info Next bug baton is passed to elodilles 16:08:20 #topic Gate status 16:08:59 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:09:03 #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status 16:09:08 #link https://zuul.opendev.org/t/openstack/builds?job_name=nova-emulation&pipeline=periodic-weekly&skip=0 Emulation periodic job runs 16:09:14 as you can see ^ nothing to tell 16:09:45 both jobs and pipelines work 16:09:52 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:09:57 #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures 16:10:05 as a reminder for everyone ^ :) 16:10:30 please note that we are still playing wack-a-mole with the volume detach issue. There are still open tempest patches adding more SSHABLE waiters 16:10:49 gibi: yup, we'll discuss this for the stable topic 16:11:03 ack, but this is affecting master still :) 16:11:05 bauzas: well it affects master too 16:11:10 but sure 16:11:30 yep, I know, thanks for the reminder it also impacts master 16:12:14 #topic Release Planning 16:12:19 #link https://releases.openstack.org/zed/schedule.html 16:12:22 #info Zed-1 was last week 16:12:48 thanks sean-k-mooney for accepting the rc1 releases for the projectsd 16:13:03 oh, actually elodilles 16:13:12 ya it was elodilles 16:13:25 i repled on one after the fact 16:13:27 #link https://review.opendev.org/c/openstack/releases/+/841851 novaclient release for zed-1 16:13:34 well, it had a deadline, so needed a review & merge o:) 16:13:43 we dicussed it but i forgot to do it before the deadline 16:13:51 #link https://review.opendev.org/c/openstack/releases/+/841845 os-vif release for zed-1 16:13:56 sean-k-mooney: me too 16:14:05 elodilles: strictly speaking we dont have to do it by m1 16:14:08 and i was off this friday, didn't help 16:14:17 that is just the convention the the release team are following 16:14:30 btu its not requried by the release model 16:14:34 elodilles: don't be afraid to ping me if you need me to review some release change 16:14:38 we jsut need a intermediat release :) 16:14:43 correct 16:15:06 bauzas: ack :) 16:16:36 sean-k-mooney: not necessary, yes, but if there is no -1 from the team, then release managers merges the generated patches at deadlines o:) 16:16:55 elodilles: right the deadline is actully m3 16:17:04 the docs dont mention m1 at all 16:17:21 that is jsut a hold over form the release with milestone model 16:17:31 but thanks form taking care of it in any case 16:18:48 https://github.com/openstack/releases/blob/61f891ddd7bd3b28ac7b5e7e9e1d9203fbbe297d/doc/source/reference/release_models.rst#cycle-with-intermediary= 16:19:18 sean-k-mooney: see #2, and it's last chapter: https://releases.openstack.org/reference/process.html#milestone-1 16:19:34 elodilles: how can I see whether for example os-vif is either using a cycle-with-rc model or a cycle-with-intermediary one ? 16:19:55 elodilles: yep that is not in lien with the governance doc 16:19:57 bauzas: in the yaml file under deliverables/zed 16:20:05 anyway its not importnat now 16:20:18 elodilles: ok b/c https://releases.openstack.org/teams/nova.html doesn't tell it 16:20:31 anyway, moving on 16:20:42 ++ 16:20:49 #topic Review priorities 16:20:55 #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+label:Review-Priority%252B1 16:21:00 #link https://review.opendev.org/c/openstack/project-config/+/837595 Gerrit policy for Review-prio contributors flag. Naming bikeshed in there. 16:21:06 #link https://docs.openstack.org/nova/latest/contributor/process.html#what-the-review-priority-label-in-gerrit-are-use-for Documentation we already have 16:21:39 I provided a comment for https://review.opendev.org/c/openstack/project-config/+/837595 16:21:43 please review it 16:21:58 done :) 16:22:38 thanks 16:22:50 at least I'm French so in general I'm not good at naming things 16:23:06 but at least I try to find a consensus 16:23:14 thank you for that 16:23:39 I think all contributors know what nova-core means 16:23:52 hopefully 16:24:15 that is a fair assumption 16:24:35 for other repos, we could name the label differently of course, like 'osvif-core' if this is named by gerrit 16:25:21 ie. nova-specs-core review promise 16:25:31 os-vif-core etc. 16:25:41 but this is a naming bikeshed 16:26:24 anyway, moving on 16:26:32 #topic Stable Branches 16:26:40 in general I ask elodilles 16:26:46 but this time, let me do it 16:26:52 #info ussuri and older branches are still blocked, newer branches should be OK 16:27:03 melwitt had a point 16:27:33 just an update for that ^^^ i think ussuri is blocked but the older branches are not blocked anymore 16:27:47 #link https://etherpad.opendev.org/p/nova-stable-branch-ci stable branches CI issues tracking, feel free to update with stable branch CI issues 16:27:58 elodilles: woah 16:28:11 kudos to the team then 16:28:13 bauzas: l-c branches were merged 16:28:32 bauzas: i don't say they don't have intermittent failures though o:) 16:28:43 elodilles: I thought most of the issues were related to volume detach things, which are unrelated to l-c 16:28:44 but at least they are not blocked 16:28:48 ah 16:29:23 elodilles: but then, why ussuri is blocked while older not ? 16:29:47 ussuri and train were where tempest were not pinned, 16:29:58 and where tempest is running with py36 16:30:08 if i'm not mistaken that's it 16:30:21 and gmann's train fix has landed 16:31:04 ok thanks 16:31:05 originally we thought that ussuri does not need a fix as it has zuulv3 jobs already, but that's not true unfortunately 16:31:21 gmann told me he couldn't attend this meeting, so let's discuss this again next week 16:31:33 i mean, it has zuulv3 jobs, but still we are facing with the same issue 16:31:42 bauzas: ++ 16:31:43 so I think the next step is still to gather the intermitten failures and try to fix them 16:32:02 gibi: yeah, we'll track those on a weekly basis thanks to the etherpad 16:32:12 ack 16:32:31 thanks melwitt for starting the etherpad \o/ 16:32:37 yup, melwitt++ 16:34:29 anything to discuss about those intermittent issues btw. ? 16:35:30 i guess we still need to collect them to have the full picture 16:35:36 yup 16:36:32 yepp 16:37:11 maybe one note: for placement we don't have periodic-stable on wallaby and older 16:37:45 :/ 16:37:49 elodilles: do you suspect some instability in placement? 16:38:14 gibi: nope, but the gate is broken in wallaby and older in placement 16:38:15 or is this just proactively running some jobbs 16:38:25 broken?! 16:38:27 gibi: see melwitt's etherpad 16:38:28 that is bad :/ 16:38:46 though probably they are some known issues to fix 16:39:09 I agree to add some periodic there then 16:39:25 gibi: ack, i can backport the patch that added the periodic 16:39:39 * periodic-stable 16:40:18 gibi: agreed too 16:41:04 moving on ? 16:41:14 bauzas: ++ 16:41:24 #topic Open discussion 16:41:29 (whoami-rajat) Discuss regarding the design of rebuild volume backed instance feature 16:41:35 whoami-rajat: your turn 16:41:35 Hi 16:41:39 thanks 16:41:52 #link https://review.opendev.org/c/openstack/nova-specs/+/840155 16:42:20 So I started working on this feature in yoga (this was proposed/reproposed several times before) and the spec got approved 16:42:41 now while reproposing it, sean-k-mooney has some concerns regarding the new parameter we are introducing ``reimage_boot_volume`` 16:43:01 it's a request parameter to tell the API, we are performing rebuild on a volume backed instance and not an ephemeral disk 16:43:13 yep 16:43:28 initially the idea was not to have feature parity between both workflows but later there were many concerns with this operation being destructive 16:43:38 even if you follow past specs, the concern has been discussed 16:44:15 so lyarwood suggested to add this parameter ``reimage_boot_volume`` so any user who would like to opt in for this (as it has data loss risk) would only be able to do it 16:44:22 i really think that havign feature partiy btween bfv=True|false is imporant 16:44:38 i dont think the data loss argument holds 16:44:57 my reason is tha thtis is a deliberate instance action to rebuild the root disk 16:45:06 rebuild is destructive for image bases instances too 16:45:11 yep 16:45:30 and rebuild is not the same as evacuate 16:45:31 yes but the destructive operation is performed by cinder in this case where the volume resides on the cinder side 16:45:45 for evacuate we shoudl preserve the data 16:45:48 that's the whole purpose of this spec 16:45:58 for rebuild via the api we shoudl reimage the root volume 16:46:01 rebuild on BFV wasn't destructive, right? 16:46:14 rebuild was rejected 16:46:16 for bfv 16:46:17 we didn't support rebuild on BFV 16:46:34 so the wole point is to allow rebuild with bfv 16:46:50 if so, there is a clear implication of what rebuild means for the root disk 16:47:13 we blocked because we were unable to rebuild the root disk if bfv 16:47:19 and technialy extra ephmeral disks 16:47:28 bauzas: correct 16:48:05 then, I don't see a need for differenciating BFV and non-BFV from an API pov 16:48:20 both will be destructive for the root disk 16:48:33 if so we also do not need an api microversion correct 16:48:39 and no api change at all 16:48:44 we just remove the block 16:48:50 the destructive nature of this operation was the concern from many folks, I can't name everyone but this was approved in yoga so you can see 16:48:51 good question 16:48:53 when cinder is new enough 16:49:10 dansmith, has been actively reviewing the changes I proposed last cycle so maybe he can weigh in 16:49:49 whoami-rajat: frankly, if we were about adding some parameter, it would be more for *not* recreating the volume 16:50:10 bauzas: the point of the spec/effort is to rebuild the root volume 16:50:17 i.e. to reimage it, but let cinder do the reimaging 16:50:29 dansmith: that's what I understand 16:50:38 so... 16:51:37 tbc, I don't see a need for an API param that'd say "yes, I want to rebuild by reimaging" 16:51:56 which would imply that the default would be "rebuild by not reimaging" 16:52:25 bauzas: no default would reject 16:52:49 bauzas: that was the behavior that i think lee suggested but i dont think i reviewd the previous iteration 16:52:57 I think user-initiated rebuild where we don't reimage root is pointless right? 16:53:06 as long as we don't rebuild on evacuate then we're good, 16:53:06 correct 16:53:13 ya 16:53:13 I agree 16:53:19 but this is specifically to make BFV behave like regular instances 16:53:48 right so evacuate shoudl continue to preseve the root disk if its on shared storage 16:53:50 correct me if I'm wrong, but I feel we are on the same page 16:53:57 evacuate should differ 16:54:06 and rebuidl will always reimage it provided cinder i new enough 16:54:06 Since the main destruction is performed on the cinder side, I know a lot of folks on cinder side that won't agree to the idea of not adding this additional precautionary measure to avoid it 16:54:13 but rebuild should behave like regular instance, ie. reimage 16:54:17 bauzas: okay I guess I thought you were arguing for a special param 16:54:20 as where the initial concern started ^ 16:54:35 dansmith: I was absolutely on the other direction, see above :) 16:54:36 i realy dont liek the idea of make bfv special in the nova api 16:54:45 me too 16:54:51 bauzas: ack, sorry, I'm double-meeting-ing 16:55:04 from an API point of view, this is clear 16:55:11 whoami-rajat: if we want to prevent this form the cidner side 16:55:21 i think cinder need a way to block the reimage not nova 16:55:31 of course, since we share the same internal methods for evacuate and rebuild, we should make them differ based on some conditional 16:55:32 like locking the volume or similar 16:55:47 but this conditional doesn't have to be exposed at the API level 16:55:57 bauzas: i think we pass a flag to rebudil to signal if its an evacuate right 16:55:58 bauzas: we already have a flag to pass, 16:56:04 bauzas: because we have to honor the old microversion, 16:56:11 so we can just make sure it's ==false for the evac case 16:56:16 dansmith: yeah, I know, that's the conditional I thought 16:56:30 conditional at the rpc layer, but the only conditional in the api is "old or new microversion" 16:56:44 the only conditional *should* be version, I mean 16:56:46 dansmith: correct, that being said, there was an open question 16:56:50 dansmith: well do we need a microverion 16:56:57 about even whether we would need a microversion 16:57:01 there is not api request change 16:57:08 if we just unblock 16:57:10 I think we absolutely do, 16:57:17 i realy think we should not 16:57:19 because right now rebuild does not destroy data and after this, it would 16:57:30 right now it rejects the request 16:57:35 sean-k-mooney, if the operation is initiated from nova side, I'm not sure how from cinder side we can provide a user input to block this 16:57:44 sean-k-mooney: only if the image is different 16:58:00 dansmith: no its rejected always i tought 16:58:06 sean-k-mooney: if the image is the same, it allows it 16:58:13 ... 16:58:13 whoami-rajat: right? 16:58:15 but maybe I'm the only one defending the proposal 16:58:43 dansmith, yes, for same image it does allow the rebuild 16:58:54 sean-k-mooney: ^ 16:59:14 hah 16:59:18 that seams like a bug 16:59:23 since tha talso destorys data 16:59:34 (18:46:01) bauzas: rebuild on BFV wasn't destructive, right? 16:59:45 there is no differne form a data perspective fi you use the same image or differnt one 16:59:47 damn, we're about the end of time 16:59:49 sean-k-mooney: it doesn't on BFV but does on regular instances 17:00:13 sean-k-mooney: on BFV if the image is the same, it will just rebuild the ports or whatever, but no change to the disk 17:00:20 but will destroy the disk with the same image on a regular instance 17:00:23 I'll close this meeting, but I beg the people here to continue discussing this topic after 17:00:24 that the same as a hard reboot 17:00:37 sean-k-mooney: alas, it's api behavior we have had for YEARS 17:00:44 rebuild is not a move op 17:00:49 so changing it to now destroy data is a Bad Plan (tm) 17:00:51 and it should not really update the port either 17:00:55 well understood :) 17:00:59 thanks all, and for people interested in this bfv resize discuss, please stay around 17:01:04 #endmeeting