16:00:04 #startmeeting nova 16:00:04 Meeting started Tue Apr 26 16:00:04 2022 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:04 The meeting name has been set to 'nova' 16:00:11 o/ 16:00:29 ehlo compute-ers 16:00:53 #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 16:01:14 o/ 16:01:37 o/ 16:03:04 okay, let's start, people could come, once they're undragged from some internal big meeting :) 16:03:24 o/ 16:03:26 #topic Bugs (stuck/critical) 16:03:28 damn 16:03:30 #topic Bugs (stuck/critical) 16:03:35 #info No Critical bug 16:03:40 #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 28 new untriaged bugs (-7 since the last meeting) 16:03:46 kudos to gibi for this hard work 16:03:59 #link https://storyboard.openstack.org/#!/project/openstack/placement 26 open stories (0 since the last meeting) in Storyboard for Placement 16:04:06 #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:04:25 gibi: wanted to discuss about the bugs you triaged before we look at next week bug baton owner ? 16:04:31 just a small link 16:04:33 https://etherpad.opendev.org/p/nova-bug-triage-20220419 16:04:41 I collected the triaged bugs here 16:04:55 I think the only interesting information is the bugs in triaged state without assignee 16:05:02 as I said before, this is a good idea to keep some references, mostly about the incomplete 16:05:02 we have one this week 16:05:14 [Needs Assignee] Concurrent migration of vms with the same multiattach volume fails https://bugs.launchpad.net/nova/+bug/1968645 16:05:20 well, we have around 950 open bugs 16:05:29 yeah, but at least this is a fresh one :) 16:05:37 probably the reporter is still around if we have questions :) 16:06:02 gibi: what we could agree on is to find some volunteer for only high bugs that just got triaged 16:06:36 I think it is worth to advertise these bugs but I agree that we don't have to forcefully assign it 16:06:49 sounds a low hanging fruit 16:07:03 if this is about adding a retry loop 16:07:14 yeah, it does not seem super hard 16:07:21 yepp it is a retry loop 16:07:26 for cinder attachment create 16:07:38 anyhow we can move on :) 16:08:16 I'm ready to pass the baton 16:08:51 gibi: I'll then add the low-hanging-fruit tag 16:08:57 bauzas: works of me 16:08:58 this may help 16:09:01 *for 16:10:15 gibi: nah, it worked *of* you as you proposed the solution :) 16:10:47 anyway, this leads to the last point 16:10:49 melwitt: around ? 16:11:03 yes 16:11:15 melwitt: you're next in the bug triage roster 16:11:29 melwitt: do you feel brave enough to get the bug baton ? 16:12:01 bauzas: haha, yes. sounds cool 16:12:28 as a reminder for everyone, this baton doesn't imply any matter of ownership or responsibility 16:12:47 anyone wanting to help is welcome, based on his or her free time 16:13:00 and others may help if they want 16:13:16 melwitt: which leads to me saying you can ping me if you need help for upstream triage 16:13:29 cool, thanks 16:13:38 #info Next bug baton is passed to melwitt 16:13:46 melwitt: thanks, very much appreciated 16:13:57 next topic then, 16:14:01 #topic Gate status 16:14:06 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:14:12 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:14:26 #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status 16:14:30 #link https://zuul.opendev.org/t/openstack/builds?job_name=nova-emulation&pipeline=periodic-weekly&skip=0 Emulation periodic job runs 16:14:34 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:14:51 that's been a while we got a new gate failure 16:15:05 I saw some intermittent one but had not time to dig in 16:15:19 for centos-8-stream, non voting job is failing 100% now. 16:15:23 As testing runtime for Zed is centos-9-stream and in QA we agreed to drop the c8s support in devstack as well as in Tempest, I have proposed the changes there and will notify it on ML also. also moving c8s job to c9s, please review it - https://review.opendev.org/c/openstack/nova/+/839275 16:15:44 thanks gmann 16:15:48 this series #link https://review.opendev.org/q/topic:drop-c8s-testing 16:16:09 gmann: on my list, will vote on it later today or tomorrow 16:16:26 thanks, meanwhile I will get tempest depends-on merge 16:17:03 cool 16:17:24 I have to sit down and consider all the implications but your patch seems good to me 16:17:35 and either way, ship is sailed 16:17:52 centos9 is targeted for zed 16:17:57 yeah 16:18:29 do we have any tracking LP bug about the centos8 failures ? 16:18:58 ideally, I'd want those to be wontfix if we decide we move on 16:19:15 no bug as of now, we are going as per the testing runtime. and as we drop the py36 from projects then it is broken like for nova as of now 16:19:29 ok 16:19:37 then no paperwork to fill in 16:19:45 it is failing as in nova we made nova require >=py3.8 and other projects will do the same 16:19:45 :) 16:19:52 oh 16:19:55 right 16:20:51 anyway, I guess we can continue 16:20:58 yeah 16:21:10 #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures 16:21:30 just as a weekly periodic reminder (which includes me, stupid bias) 16:21:54 next topic, 16:21:59 #topic Release Planning 16:22:03 #link https://releases.openstack.org/zed/schedule.html 16:22:07 #info Zed-1 is due in 3 weeks 16:22:10 tick-tock 16:22:27 ♫ on the clock ♫ 16:22:50 * melwitt knows the reference and feels embarrassed :) 16:23:10 I said last week I should ask this week for a spec review day, but do people feel good about considering it for in two weeks ? 16:23:16 bah, I have to allocate some time to the placement PCI tracking spec 16:23:19 No shame, those pop songs are earworms by design 16:23:52 artom: but the party don't stop 16:23:52 gibi, you have sean-k-mooney's original spec to work from, so not starting from scratch 16:24:00 artom: true true 16:24:15 * bauzas appreciates the english grammar mistake, btw. 16:24:15 bauzas: I'm OK to have a review day next week or the week after 16:24:33 :) 16:24:35 gibi: I have to write some spec for deprecating the keypair generation, you know 16:24:46 so 16:24:47 :) 16:25:12 maybe saying a spec review day not next tuesday, but tuesday after that ? 16:25:22 ie. May 10th ? 16:25:37 the 10 ya that would be ok i think 16:26:13 I see no objections 16:26:23 good for me 16:26:41 #agreed first spec review day will happen on May 10th, bauzas to communicate thru the mailing list 16:27:09 this leaves 2 weeks for people writing specs, you are warned 16:27:20 (again, this includes me) 16:27:32 * gibi feels warned 16:27:55 * bauzas feels gibi overfeels more than he should :) 16:28:17 we'll have another round of spec reviews, as we agreed last PTG, either way 16:28:51 ok, next 16:29:00 #topic Review priorities 16:29:04 #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+label:Review-Priority%252B1 16:29:52 the vIOMMU change probably needs paperwork 16:30:41 tl;dr: https://review.opendev.org/c/openstack/nova/+/830646 requires at least a blueprint, and maybe a spec 16:30:42 ya i started to review it i kind of think it shoudl have a mini spec 16:30:50 +1 16:31:09 I can use my hammer 16:31:21 but I could soften it 16:31:51 explaining we require a blueprint and some discussion at a nova meeting before we can pursue reviewing it 16:31:59 ricolin: are you around ? 16:32:18 im not againt the proposaly in fact i have wanted to add supprot for a while but we need to agree on the extra_specs/image propeties and define the scope 16:32:28 i was recommening keeping it small for now 16:32:31 yeah, and that's why we need a debate 16:32:55 not about which paper stamp we should use 16:33:02 there is viommu supprot with the limited chagne required for there acclerator to work 16:33:08 but whether we all agree on the design 16:33:13 and then there is full supprot with security and isolation 16:33:38 correct, that's why we need to discuss this correctly and address the design scope 16:33:49 i would suggest we split it like that and only do the former this cycle 16:34:11 sean-k-mooney: well, we need a owner, at first :) 16:34:35 even if we agree on the direction, we need gears 16:34:37 yep mnaser also expressed interest. im not sure that stephenfin will work on it this cycle 16:35:05 I'm sure stephenfin said he was okay to leave it for others to continue :) 16:35:27 hence the gerrit hammer 16:35:47 this may help people to react and assign some time for this 16:36:43 I'll also drop the review-prio flag which is meaningless in this case as we can't merge it as it is 16:37:43 ok, moving on if nobody yells 16:38:10 #topic Stable Branches 16:38:14 elodilles: your turn 16:38:20 #info ussuri and older branches are blocked until 'l-c drop' patches merge - https://review.opendev.org/q/I514f6b337ffefef90a0ce9ab0b4afd083caa277e 16:38:30 #info other branches should be OK 16:38:36 #info nova projects' stable/victoria transitioned to Extended Maintenance - no further releases will be produced from victoria, but branch remains open to accept bug fixes 16:38:44 and that's all I think ^^^ 16:39:15 we need second core wink wink 16:39:24 :] 16:39:41 I would be happy to approve... ;) 16:39:52 (has some blocked train backports :D) 16:40:03 I can do things 16:40:12 my brain fsck'd me 16:40:12 bauzas: ask elodilles to add me into the stable-core group 16:40:20 then I can help 16:40:53 gibi: I think we said at the PTG I should propose your name against the stable team 16:41:01 yeah I think so 16:41:05 so lets do it :D 16:41:05 so, 16:41:19 #1 I'll do my homework and review such l-c patches 16:41:53 #2 I'll do my duty and engage discussions about reconciling the nova team and the nova-stable team in some intelligent manner 16:43:18 last topic in the agenda, 16:43:21 #topic Open discussion 16:43:27 (gibi) Allow claiming PCI PF if child VF is unavailable https://review.opendev.org/c/openstack/nova/+/838555 16:43:33 gibi: take the mic 16:43:38 thanks 16:43:45 so 16:43:51 it is a bug 16:43:58 we saw DB inconsistencies at customers 16:44:23 the pci_devices table contains available PF and unavailable children VF rows 16:44:39 this basically an impossible situation 16:44:54 the VF should be available if the PF is available 16:45:04 or the VF should be allocated 16:45:12 anyhow 16:45:26 I propsed a fix https://review.opendev.org/c/openstack/nova/+/838555 16:45:40 it removes some of the strictness of the state check during the PCI claim 16:45:56 basically allows allocating the available PF if the children VFs are unavailable 16:46:06 this would heal the inconsistent DB state 16:46:13 artom had a good point in the review 16:46:32 that we tend to handle DB healing via nova-manage CLI instead 16:46:45 not true 16:46:59 we heal some things on object load form the db 16:47:02 we had db healing made thru data migrations 16:47:51 Right, but those fixes are from "we had thing X a long time ago, and it might still be in the DB, so now when we load it we convert to thing X" 16:47:59 we do have the heal allcoation type commands but i think healing this on agent start is the right thing 16:48:01 we only heal things thru the CLI if this is for example something due to some relationship between two DBs 16:48:08 vs examples like placement-audit and Lee's connection_info update 16:48:09 as we are also fixing the in memroy representation 16:48:18 sean-k-mooney: correct, because two DBs were involved 16:48:52 My other point was - the DB somehow got into an inconsistent state, wouldn't it be wiser to at least let the operator know, vs siltently fixing it? 16:48:58 in the case of placement audit, this was about reconciling two datastores kept from two different projects 16:49:05 artom: i dont think so 16:49:17 I agree with sean-k-mooney 16:49:25 we could log such thing 16:49:32 artom: i think this happened beacue of how the custoemr recreated the compute node after the hdd died 16:49:32 but no need to claim it loud 16:49:34 OK :) Not a hill I want to die on, but wanted to at least raise the question 16:49:35 I've added a WARNING log in the patch 16:49:52 i dont think this is something that most operators would hit 16:49:52 Seems like I'm outnumbered :) 16:50:12 artom: I won't ask you how many divisions you have 16:50:19 sean-k-mooney: one more thing, you said it should be fixed at agent restart. Now my patch fixes it during PCI claim 16:50:31 bauzas, divisions o_O? 16:50:51 gibi: I like the pci claim approach 16:51:05 i have not reviewd the third patch yet 16:51:08 sean-k-mooney: and I have a separate patch for the agent restart + remove VF + inconsistent state case 16:51:24 sean-k-mooney: ahh, OK, let me know you opininon once you reviewed it 16:51:27 artom: sorry, I'll DM you the reference :) 16:51:40 ack we can proceed on the gerrit review 16:51:47 It's going to be an obscure French thing, isn't it :P 16:52:28 artom, sean-k-mooney, bauzas: thanks, we can move on 16:52:37 ++ 16:52:56 sounds we got a consensus : review gibi's patch 16:52:56 I will comment on the patch linking to the meeting logs 16:53:28 #agreed let's continue to review gibi's work with pci claims fixing the inconstitency 16:53:45 that's all we had for the meeting 16:53:55 any last minute item people wanna raise ? 16:54:32 looks not, 16:54:37 - 16:54:37 thanks all ! 16:54:39 thanks! 16:54:46 #endmeeting