16:00:31 #startmeeting nova 16:00:31 Meeting started Tue Aug 26 16:00:31 2025 UTC and is due to finish in 60 minutes. The chair is Uggla. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:31 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:31 The meeting name has been set to 'nova' 16:00:37 o/ 16:00:47 o/ 16:00:56 Hello everyone 16:02:07 o/ 16:02:21 Let's start. 16:02:30 #topic Bugs (stuck/critical) 16:02:41 #info No Critical bug 16:02:49 #topic Gate status 16:02:58 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:03:06 #link https://etherpad.opendev.org/p/nova-ci-failures-minimal 16:03:10 o/ 16:03:19 #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status 16:03:31 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:03:36 #info Please try to provide a meaningful comment when you recheck 16:03:39 o/ 16:04:01 skipping next point as gmaan, is in another meeting. 16:04:16 topic Release Planning 16:04:28 #link https://releases.openstack.org/flamingo/schedule.html 16:04:34 #info Nova deadlines are set in the above schedule 16:04:48 #info Feature freeze is Thursday. 16:05:06 #topic Review priorities 16:05:14 #link https://etherpad.opendev.org/p/nova-2025.2-status 16:05:57 Uggla: Am I correct that FF is on Thurday EOD, whathever apporoved can still rebased / rechecked after to land? Do we plan any FF exception process? 16:05:58 First I need to thank bauzas, who helped me to better highlight status in the board and sync correctly within launchpad. 16:06:38 well, I should have given you some help way before 16:07:19 gibi, I think so. I don't think exception will be needed but we might discuss it here. 16:07:44 Before that here are the review priorities we identified: 16:07:52 #link Amd Sev serie (tkajinam) need review https://review.opendev.org/q/topic:%22bp/amd-sev-es-libvirt-support%22 16:07:59 #link https://blueprints.launchpad.net/nova/+spec/copy-applied-provider-yaml https://review.opendev.org/c/openstack/nova/+/957578 16:08:06 #link https://blueprints.launchpad.net/nova/+spec/libvirt-migrate-parallel https://review.opendev.org/c/openstack/nova/+/950667 16:08:19 #link https://blueprints.launchpad.net/nova/+spec/policy-service-role-default one patch to review https://review.opendev.org/c/openstack/nova/+/957578 16:09:32 I discussed latest one with gmaan and that might land if we review it. (only one patch) 16:10:08 bauzas and I are currently reviewing Amd sev series. 16:10:48 o/ 16:11:15 Uggla: I'm currently try to squeez in https://review.opendev.org/c/openstack/nova/+/957088 before FF 16:11:27 the rest of the eventlet series can wait post FF / G 16:11:57 my only concern is that AMD SEV-ES series requires a reshape 16:12:03 gibi, thanks that was one of the question I had. 16:12:14 gibi: you said you were fine with the content on that one, but...no unit tests? 16:12:17 this requires a lot of caution 16:12:18 are those coming? 16:12:26 Uggla: techihnaly teh parla clodu migration chnag eis for 2026.1 16:12:45 the what ? 16:12:47 or did we approve it late for this cycle 16:12:50 dansmith: it is a refactor like the scatter gather change 16:13:01 dansmith: I don't see what extra unit test we need 16:14:19 well, yes and no 16:14:21 okay I guess I thought something has to change with those mechanics, but I'll go look at what is testing this to see 16:14:37 there is a refactor but there is also a slight behavioral change 16:15:47 the lock obviously isn't verified, but if it's being run over then maybe that's okay 16:17:02 the change is the lock and I want to verify that locally. If you need a functional test to ensure that lock is valid then sure we can do that too 16:17:22 I don't see any other change in logic 16:18:58 I guess having such an invasive change without any single line of nova/tests made me afraid and I missed the point of the refactor 16:19:12 invasive? you mean a lock? 16:19:30 I'm confused 16:19:56 I thought the goal of the refactor is not to change behavior so if the existing test does not need to be touched that is actaully good 16:20:23 as it proves no behavior change happened 16:21:23 gibi: I guess I expected to see at least something running this code in both modes, but perhaps I'm jumping the gun there. I know we don't have that for everything we're thread-ifying, but this is also sort of a self-contained example 16:21:49 so it would be nice to have the zuul change to enable it in the condutore in the patch 16:21:49 gibi: I agree with the fact that until some env var is set, nothing changes 16:22:10 but i dont consider that an invasive change either for what its worth 16:22:11 dansmith: yeah we did not transformed the unit test to both modes as part of the changes but we did it separately. 16:22:24 but yeah, I'd have appreciated some guideline to temper my guts 16:22:32 its adding a lock that shoudl techenically have been there even under eventlet + swapting the executor we use 16:22:34 unlike general thread spawning, this is a self-contained create/destroy of a single executor pool, which means we could test both modes here and now which is kinda what I was expecting 16:23:01 but that's fine 16:23:22 dansmith: I will check my unit test enalbing series if it enables the related tests already... 16:23:42 with threading in a separate tox target 16:24:37 what was that job called by the way 16:24:45 i tough we already enabled that in ci 16:24:53 is that still pending? 16:24:57 nova-tox-py312-threading 16:25:00 it is pending 16:25:04 https://review.opendev.org/c/openstack/nova/+/955791/9 16:25:06 ah ok 16:25:26 sorry i tought that was earlier in the series 16:25:36 https://review.opendev.org/c/openstack/nova/+/953475/20 that one right 16:25:45 assuming gibi is working on the eventlet refactor any chance to land something in the above list ? Can we define some priorities/owners about them ? 16:26:21 there are a number of the ci/testing changes that woudl be nice to also land 16:26:25 dansmith: I see multiple image cache unit test passing in https://4d426b9917a9f7962308-d3c6167007a6d8477c4cec901604a1c7.ssl.cf1.rackcdn.com/openstack/4beb6e364cb74aaeb2b8ff0b475d966f/testr_results.html with threading. So if we move the refactor under the unit test series we can verify the change 16:26:37 but i wonder if we woudl be oke with some of those happening post FF 16:27:01 gibi: ack 16:27:17 sean-k-mooney: https://review.opendev.org/c/openstack/nova/+/953475/20 adds the job, but later patches enables more and more tests 16:27:21 the 6 patches including and up to https://review.opendev.org/c/openstack/nova/+/955791/9 are strech goals for me 16:27:42 yeah the unit test enablement was planned post FF as streach for Flamingo 16:28:13 but if we block the nova-conductor refactor on it then I will at least make an arrangement to have test results 16:28:31 I mean unit test results with image cache tests running in threading mode 16:28:42 do you want me to do that? 16:29:47 I agree with the statement that I feel that the patches above n-api seem more important for now 16:29:58 Max proposed openstack/nova master: db: add indexes for source/dest migrations https://review.opendev.org/c/openstack/nova/+/958556 16:30:36 bauzas: so you say lets prioritize the unit test enablement over the nova-conductor work. 16:30:39 ? 16:30:58 (I have no problem with it if that is what the team wants) 16:31:00 for me, at least, I'd prefer reviewing those patches 16:31:47 i was going to suggest rebasign the 6 packages on top of the conductor one 16:31:55 and I think those would be more beneficial for the rest of the work we have to do 16:32:09 sean-k-mooney: you mean, on the bottom ? 16:32:13 no 16:32:18 sean-k-mooney: yeah that what I meant by arranging to get test results 16:32:36 because i disagree that havign the ablity to run the condcutor in treading mdoe is less valuable 16:33:03 sambork: Are you around? ^^ :) 16:33:04 if we have the conductor patch we can test it and ask operator to test it and report bugs 16:33:08 if wre're not able to run it in CI, what's the benefit ? 16:33:17 we can trivially 16:33:26 its liktrally a 1 line change to the zuul job 16:33:37 to set the var in devstack to enabel threading mode 16:33:44 but we won't execute that code, right? 16:33:50 sean-k-mooney: I think what bauzas imply is that we don't have tempest coverage for the changed code path 16:34:07 ok that should nto block the code change 16:34:07 I mean, the zuul config change will be meaningless 16:34:23 if we don't have zuul jobs that call image caching 16:34:25 its wont be it makes sure everythign else in the conductor works 16:34:39 i.e. that we are not implcitly relying on eventlet indrectly 16:34:53 that is a good point 16:35:12 we could have missed other eventlet dependencies 16:35:25 I then agree, 16:35:27 by the way i think we could land both if we are willing to merge the unit test change up to RC1 16:35:51 my suggesting is for gibi to rebase them so we get more results today/tomorrow 16:36:01 agreed, I like that approach 16:36:03 and then we can decied if we are comfrotabel with merging the conductor changes 16:36:05 I will do the rebase today to get unit test results with threadin on image cache 16:36:11 thanks gibi 16:36:14 sambork: ^^ FYI 16:36:29 gibi: can we add the devstack job change too if you or sambork have time 16:36:32 I will do local testing of the lock explicitly 16:36:37 sean-k-mooney: I will do so 16:36:41 +1 16:37:41 I have nothing further 16:38:05 asking again, assuming gibi is working on the eventlet refactor any chance to land something in the above list ? Can we define some priorities/owners about them ? 16:38:37 about the other things to review, I want to speak about SEV-ES 16:38:46 but tkajinam is not here I guess (and hope) 16:38:56 I am 16:38:59 wow 16:39:01 okayu 16:39:01 :-) 16:39:27 stephenfin: and i both were interested in the sev one as well 16:39:38 or at least started looking at ti 16:39:46 Uggla: land...in the next two days? 16:39:46 so i think taht woudl be good to try and land 16:40:10 my concern is that we're on a non-SLURP release 16:40:20 but the main problem with that serise is reviewing the reshape 16:40:43 tkajinam: I assume you know that you'll have to continue to handle the reshape case for at least two releases, right? 16:40:49 dansmith yes if possible. 16:40:58 or at least try. 16:41:03 bauzas, yes 16:41:23 that seems like a big ask to me, 16:41:28 I agree with the concern about reshape but I'm hoping that I at least get any feedback to hear how far it is from merge. 16:41:35 but I'm not really positioned to review a reshape 16:42:01 we can discuss if that can be a subject to FFE or I'm even ok if that is eventually postponed to 2026.1 16:42:10 I feel reasonable being able to review a reshape since I wrote two of them 16:42:26 my main concern is that we've been saying the same for recent 2~3 cycles (if that's too much for this cycle and then put it to next) 16:42:36 but that's a very old knowledge part of myself I need to go with 16:42:41 bauzas: that would help because the rest of the serise is much more straght forward 16:43:00 sean-k-mooney: I can help with providing pointers to how the reshape mechanism works 16:43:18 from the very first glance I made on tkajinam's series, he perfectly managed it 16:43:36 I mean, he's triggerring the signal, and then using the signal to reshape 16:43:43 well thats encuraging at least. 16:43:44 I added functional tests coverage according to the existing example so I hope that helps review... if any additional scenario may be needed I can try adding it 16:44:14 tkajinam: do you have any excerpts of a real environment upgrade scenario that you tested locally ? 16:44:22 I mean, pastebins 16:44:41 that would ease the review 16:45:19 you basiclly mean deploy devstack with out the patch and boot an sev instance then apply the patch adn restart nova right 16:45:32 to see it properly migrate the exsting allcoations 16:45:44 that yes 16:45:57 that's how I tested VGPU reshapes 16:46:57 hmm I'll check but I'm afraid I didn't keep the test result logs, and it might take some time to spin up the env again (due to machine availability) 16:47:24 tkajinam: but I assume you tested such upgrade scenario, right? 16:47:42 as an asided there is an exsting bug that at least on rhel prevent sev form wroking. i aslo dont know if our internal amd host is free to do any testing 16:47:44 like, you had an env, you patched it, you saw the allocations moved 16:48:08 yes, when I wrote the series initially. 16:48:11 excellent 16:48:18 and yeah, functional tests can prove it works 16:48:36 so if you already wrote them (I haven't reviewed them yet), that's excellent 16:48:57 tkajinam: I assume you found how to do it using the existing reshape methods? 16:49:22 yeah 16:49:23 https://review.opendev.org/c/openstack/nova/+/921814/32/nova/tests/functional/libvirt/test_reshape.py#233 16:49:28 perfecty 16:49:33 I followed the existing rehape test for mdev, IIRC 16:49:41 OK, I'll defineitely commit myself to the review then 16:51:47 ok shall we move on ? 16:51:58 yup, we'll see what we can do for the other bits 16:52:02 like every cycle 16:53:13 sure, what we have today is already cool. 16:53:37 so moving on. 16:53:40 #topic OpenAPI 16:53:47 #link: https://review.opendev.org/q/topic:%22openapi%22+(project:openstack/nova+OR+project:openstack/placement)+-status:merged+-status:abandoned 16:53:58 #info still 32 remaining atm. 16:54:24 so this will probably not fully land in this cycle, but we made good progress. 16:54:40 #topic Stable Branches 16:54:55 elodilles, please go ahead. 16:55:01 #info stable branches (stable/2025.1 and stable/2024.*) seem to be in OK state 16:55:09 #info nova stable versions released: 29.3.0 (2024.1 Caracal); 30.1.0 (2024.2 Dalmatian); 31.1.0 (2025.1 Epoxy) 16:55:20 with the sec related bug fix from last week 16:55:35 and that's all from me for now, back to you Uggla 16:55:42 thx elodilles 16:55:46 np 16:56:17 skipping next point due to fwiesel pto's 16:57:03 I also guess we can skip gibi's point as I think he already covered previously. 16:57:20 yepp nothing further for FF. I have things for post FF but we can disucss that next week 16:58:18 Yep we are at the top of hour. I know jssfr wanted to discuss, but I think we can do that next week. 16:59:08 did the UEFI NVRAM preservation thing also count as bugfix? 16:59:20 yes 16:59:25 okay, thanks 16:59:42 its not merged but we said we woudl treate the nvram and tpm data the same both as bug fixes 16:59:43 (even if not, it looks like y'all are busy enough getting all the other features over the line) 16:59:56 thanks, then we can certainly defer to next week from my side 17:00:05 jssfr thx. 17:00:11 So I'm gonna close, thanks all. 17:00:17 #endmeeting