16:00:47 <gibi> #startmeeting nova 16:00:47 <opendevmeet> Meeting started Tue Jul 22 16:00:47 2025 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:47 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:47 <opendevmeet> The meeting name has been set to 'nova' 16:00:54 <fwiesel> o/ 16:01:08 <sp-bmilanov> o/ 16:01:31 <dansmith> o/ 16:01:33 <gibi> Uggla is on a well deserved PTO and his stand in cannot be here today. So I agreed to run something like a nova meeting but I'm not prepared. 16:02:13 <gibi> lets wait a bit to see if other cores join but so far we are low on quorum 16:02:19 <elodilles> o/ 16:02:42 <gmaan> o/ 16:03:09 <gibi> (an I have a dinner invitation that helps me prioritize) 16:03:35 <gibi> lets get roling 16:03:36 <gibi> #topic Bugs (stuck/critical) 16:03:50 <gibi> any fresh critical bug we need to look at? 16:04:28 <gibi> the on one the agenda https://bugs.launchpad.net/nova/+bug/2116852 is not critical any more as we disable the single tempest test that caused the blockade 16:05:05 <gibi> I filed the upstream bug to Ceph https://tracker.ceph.com/issues/72203 they are silent so far. I have way to ping them downstream which I will use next week if no reaction on the upstream tracker 16:05:18 <gibi> any other critical adjacent bug? 16:05:45 <gibi> #topic Gate status 16:05:51 <gibi> any issues with our gate? 16:06:01 <gibi> I'm not tracking anything major on my side at least 16:06:45 <gibi> #topic tempest-with-latest-microversion job status 16:06:58 <gibi> it is red 16:07:01 <gibi> :) 16:07:14 <gibi> - Failed: 27 16:07:28 <gibi> I have no other info. Anybody wants to comment? 16:07:32 <gmaan> yeah, my last fix is still not merged but that make 6 more test green and 21 still failing 16:07:42 <gmaan> I did not get chance to continue this one. 16:07:48 <gibi> gmaan: cool. Thanks 16:08:01 <gibi> #topic Release Planning 16:08:05 <gibi> #link https://releases.openstack.org/flamingo/schedule.html 16:08:17 <gibi> anybody has any comment here? 16:08:54 <gibi> #topic Review priorities 16:08:58 <gibi> #link https://etherpad.opendev.org/p/nova-2025.2-status 16:09:05 <gibi> any comments? 16:09:44 <gibi> #topic OpenAPI 16:09:47 <gibi> #link: https://review.opendev.org/q/topic:%22openapi%22+(project:openstack/nova+OR+project:openstack/placement)+-status:merged+-status:abandoned 16:10:00 <gibi> 16 open patches mostly stuck on gate or waiting for rebase 16:10:24 <gibi> any comments? 16:11:06 <gibi> #topic Stable Branches 16:11:17 <gibi> elodilles: give us what you have! 16:11:23 <elodilles> ACK :) 16:11:36 <elodilles> actually state is pretty same as last week 16:11:46 <elodilles> stable branches seems healthy 16:11:59 <elodilles> and stable releases are waiting for release liaisons 16:12:05 <elodilles> gibi: back to you 16:12:51 <gibi> who are our liaisons to ping? 16:13:00 * gibi feels bad not knowing 16:13:14 <elodilles> Uggla Amit and Sylvain 16:13:55 <gibi> bauzas: ^^ auniyal ^^ 16:14:02 <gibi> please look at the stable release requests 16:14:17 <gibi> #topic vmwareapi 3rd-party CI efforts Highlights 16:14:24 <gibi> fwiesel: any news? 16:14:26 <fwiesel> Hi, nothing from my side. 16:14:35 <gibi> fwiesel: OK cool 16:14:39 <gibi> #topic Gibi's news about eventlet removal. 16:14:44 <gibi> hey thats me :) 16:15:17 <gibi> we are slowly landing patches from the scheduler series 16:15:45 <gibi> I got a nice set of reviews from bauzas on the doc patch. I have to go back there and touch up the doc 16:16:50 <gibi> sambork's patch https://review.opendev.org/c/openstack/nova/+/949754 logic looks good to me but I found some extra cleanup pieces and a bit of test issues 16:17:35 <gibi> and I'm following Dan's series starting https://review.opendev.org/c/openstack/nova/+/954990/4 16:18:16 <gibi> I still have the intention to go back making our unit tests run with threading 16:18:19 <gibi> that is it 16:18:31 <gibi> #topic Open discussion 16:18:34 <gibi> (sp-bmilanov) Bug #2092391: duplication instances when nova compute service restart: https://bugs.launchpad.net/nova/+bug/2092391 16:18:46 <sp-bmilanov> hi :) 16:19:07 <gibi> I guess this is a review request for https://review.opendev.org/c/openstack/nova/+/938223 16:19:10 <gibi> am I correct? 16:19:16 <sp-bmilanov> not exactly 16:19:21 <gibi> ohh 16:19:25 <gibi> then tell us :) 16:20:00 <sp-bmilanov> I wonder if it would be better to bring this up when more core people are around but still -- we hit this bug recently and it was not due to a graceful Nova agent shutdown 16:20:24 <gibi> what was the trigger? 16:20:55 <sp-bmilanov> the tldr; is that during a migration, if a nova-agent crashes at the correct moment, it is possible to have the same VM running on the source and destination hypervisor 16:21:22 <dansmith> I think it has already been noted on that bug that nova-compute doesn't really have any graceful shutdown support, and what the bug describes during a live migration is pretty much expected at the moment 16:22:14 <gibi> even with graceful shutdown a crash would not be handled 16:22:29 <sp-bmilanov> dansmith: right, I read Sean's comment as "it is not supported to ask nova-compute to shutdown during live migration" 16:22:31 <dansmith> gibi: I think the problem is likely on restart we re-activate the instance 16:22:36 <gibi> so I guess we need a solution where a compute starting up can fix the situation 16:22:46 <gibi> dansmith: yeah 16:22:54 <sp-bmilanov> yes, as gibi said, it's about when it crashes 16:23:08 <dansmith> and the review mentioned above would only be the non-crash situation 16:23:24 <gibi> yepp 16:23:45 <gibi> I feel that nova-compute during statup can be smarter about this to remove the VM duplication 16:24:03 <sp-bmilanov> the VM was seen in an error state after the nova-compute got back up because of a mismatch in what libvirt was reporting and the contents of the nova DB 16:25:51 <sp-bmilanov> a teammate suggested it would be better to have this as an separate error state which has more obstacles to get around until you are able to start the VM again 16:26:43 <sp-bmilanov> else nova-compute recreates the libvirt domain on VM start on the source hypervisor 16:26:53 <gibi> whichever compute puts the VM to error could be smarter and try to abort the live migration I guess 16:26:55 <dansmith> any solution for this is going to be something we need to document as a spec I think, because there are a lot of factors in play.. it's hard to know what to do when a live migration fails and in the past, we've basically said "lean on the operator to clean it up" 16:27:14 <gibi> dansmith: make sense 16:27:18 <gibi> it is complicated 16:27:46 <dansmith> preventing nova from re-starting on startup if it's not sure is good, but barriers to prevent the user from doing something bad are part of the complexityt 16:28:21 <gibi> I agree 16:28:45 <gibi> Also having a spec would force us to load context around this codepath (I don't have it loaded) 16:29:25 <gibi> sp-bmilanov: could you draft a spec even if it is just the problem statement with more details about what exactly happening and why 16:29:46 <dansmith> +1 16:29:55 <gibi> I think that would help us brainstorming on a list of potential solutions 16:30:11 <sp-bmilanov> sure can 16:30:19 <gibi> cool. thanks. 16:31:06 <sp-bmilanov> thanks gibi dansmith! 16:31:08 <gibi> Is there anything else to discuss? 16:32:32 <gibi> then thanks for joining today. Next week we will have Uggla back. 16:32:35 <gibi> #endmeeting