16:00:47 <gibi> #startmeeting nova
16:00:47 <opendevmeet> Meeting started Tue Jul 22 16:00:47 2025 UTC and is due to finish in 60 minutes.  The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:47 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:47 <opendevmeet> The meeting name has been set to 'nova'
16:00:54 <fwiesel> o/
16:01:08 <sp-bmilanov> o/
16:01:31 <dansmith> o/
16:01:33 <gibi> Uggla is on a well deserved PTO and his stand in cannot be here today. So I agreed to run something like a nova meeting but I'm not prepared.
16:02:13 <gibi> lets wait a bit to see if other cores join but so far we are low on quorum
16:02:19 <elodilles> o/
16:02:42 <gmaan> o/
16:03:09 <gibi> (an I have a dinner invitation that helps me prioritize)
16:03:35 <gibi> lets get roling
16:03:36 <gibi> #topic Bugs (stuck/critical)
16:03:50 <gibi> any fresh critical bug we need to look at?
16:04:28 <gibi> the on one the agenda https://bugs.launchpad.net/nova/+bug/2116852 is not critical any more as we disable the single tempest test that caused the blockade
16:05:05 <gibi> I filed the upstream bug to Ceph https://tracker.ceph.com/issues/72203 they are silent so far. I have way to ping them downstream which I will use next week if no reaction on the upstream tracker
16:05:18 <gibi> any other critical adjacent bug?
16:05:45 <gibi> #topic Gate status
16:05:51 <gibi> any issues with our gate?
16:06:01 <gibi> I'm not tracking anything major on my side at least
16:06:45 <gibi> #topic tempest-with-latest-microversion job status
16:06:58 <gibi> it is red
16:07:01 <gibi> :)
16:07:14 <gibi> - Failed: 27
16:07:28 <gibi> I have no other info. Anybody wants to comment?
16:07:32 <gmaan> yeah, my last fix is still not merged but that make 6 more test green and 21 still failing
16:07:42 <gmaan> I did not get chance to continue this one.
16:07:48 <gibi> gmaan: cool. Thanks
16:08:01 <gibi> #topic Release Planning
16:08:05 <gibi> #link https://releases.openstack.org/flamingo/schedule.html
16:08:17 <gibi> anybody has any comment here?
16:08:54 <gibi> #topic Review priorities
16:08:58 <gibi> #link https://etherpad.opendev.org/p/nova-2025.2-status
16:09:05 <gibi> any comments?
16:09:44 <gibi> #topic OpenAPI
16:09:47 <gibi> #link: https://review.opendev.org/q/topic:%22openapi%22+(project:openstack/nova+OR+project:openstack/placement)+-status:merged+-status:abandoned
16:10:00 <gibi> 16 open patches mostly stuck on gate or waiting for rebase
16:10:24 <gibi> any comments?
16:11:06 <gibi> #topic Stable Branches
16:11:17 <gibi> elodilles: give us what you have!
16:11:23 <elodilles> ACK :)
16:11:36 <elodilles> actually state is pretty same as last week
16:11:46 <elodilles> stable branches seems healthy
16:11:59 <elodilles> and stable releases are waiting for release liaisons
16:12:05 <elodilles> gibi: back to you
16:12:51 <gibi> who are our liaisons to ping?
16:13:00 * gibi feels bad not knowing
16:13:14 <elodilles> Uggla Amit and Sylvain
16:13:55 <gibi> bauzas: ^^ auniyal ^^
16:14:02 <gibi> please look at the stable release requests
16:14:17 <gibi> #topic vmwareapi 3rd-party CI efforts Highlights
16:14:24 <gibi> fwiesel: any news?
16:14:26 <fwiesel> Hi, nothing from my side.
16:14:35 <gibi> fwiesel: OK cool
16:14:39 <gibi> #topic Gibi's news about eventlet removal.
16:14:44 <gibi> hey thats me :)
16:15:17 <gibi> we are slowly landing patches from the scheduler series
16:15:45 <gibi> I got a nice set of reviews from bauzas on the doc patch. I have to go back there and touch up the doc
16:16:50 <gibi> sambork's patch https://review.opendev.org/c/openstack/nova/+/949754 logic looks good to me but I found some extra cleanup pieces and a bit of test issues
16:17:35 <gibi> and I'm following Dan's series starting https://review.opendev.org/c/openstack/nova/+/954990/4
16:18:16 <gibi> I still have the intention to go back making our unit tests run with threading
16:18:19 <gibi> that is it
16:18:31 <gibi> #topic Open discussion
16:18:34 <gibi> (sp-bmilanov) Bug #2092391: duplication instances when nova compute service restart: https://bugs.launchpad.net/nova/+bug/2092391
16:18:46 <sp-bmilanov> hi :)
16:19:07 <gibi> I guess this is a review request for https://review.opendev.org/c/openstack/nova/+/938223
16:19:10 <gibi> am I correct?
16:19:16 <sp-bmilanov> not exactly
16:19:21 <gibi> ohh
16:19:25 <gibi> then tell us :)
16:20:00 <sp-bmilanov> I wonder if it would be better to bring this up when more core people are around but still -- we hit this bug recently and it was not due to a graceful Nova agent shutdown
16:20:24 <gibi> what was the trigger?
16:20:55 <sp-bmilanov> the tldr; is that during a migration, if a nova-agent crashes at the correct moment, it is possible to have the same VM running on the source and destination hypervisor
16:21:22 <dansmith> I think it has already been noted on that bug that nova-compute doesn't really have any graceful shutdown support, and what the bug describes during a live migration is pretty much expected at the moment
16:22:14 <gibi> even with graceful shutdown a crash would not be handled
16:22:29 <sp-bmilanov> dansmith: right, I read Sean's comment as "it is not supported to ask nova-compute to shutdown during live migration"
16:22:31 <dansmith> gibi: I think the problem is likely on restart we re-activate the instance
16:22:36 <gibi> so I guess we need a solution where a compute starting up can fix the situation
16:22:46 <gibi> dansmith: yeah
16:22:54 <sp-bmilanov> yes, as gibi said, it's about when it crashes
16:23:08 <dansmith> and the review mentioned above would only be the non-crash situation
16:23:24 <gibi> yepp
16:23:45 <gibi> I feel that nova-compute during statup can be smarter about this to remove the VM duplication
16:24:03 <sp-bmilanov> the VM was seen in an error state after the nova-compute got back up because of a mismatch in what libvirt was reporting and the contents of the nova DB
16:25:51 <sp-bmilanov> a teammate suggested it would be better to have this as an separate error state which has more obstacles to get around until you are able to start the VM again
16:26:43 <sp-bmilanov> else nova-compute recreates the libvirt domain on VM start on the source hypervisor
16:26:53 <gibi> whichever compute puts the VM to error could be smarter and try to abort the live migration I guess
16:26:55 <dansmith> any solution for this is going to be something we need to document as a spec I think, because there are a lot of factors in play.. it's hard to know what to do when a live migration fails and in the past, we've basically said "lean on the operator to clean it up"
16:27:14 <gibi> dansmith: make sense
16:27:18 <gibi> it is complicated
16:27:46 <dansmith> preventing nova from re-starting on startup if it's not sure is good, but barriers to prevent the user from doing something bad are part of the complexityt
16:28:21 <gibi> I agree
16:28:45 <gibi> Also having a spec would force us to load context around this codepath (I don't have it loaded)
16:29:25 <gibi> sp-bmilanov: could you draft a spec even if it is just the problem statement with more details about what exactly happening and why
16:29:46 <dansmith> +1
16:29:55 <gibi> I think that would help us brainstorming on a list of potential solutions
16:30:11 <sp-bmilanov> sure can
16:30:19 <gibi> cool. thanks.
16:31:06 <sp-bmilanov> thanks gibi dansmith!
16:31:08 <gibi> Is there anything else to discuss?
16:32:32 <gibi> then thanks for joining today. Next week we will have Uggla back.
16:32:35 <gibi> #endmeeting