16:02:54 <Uggla> #startmeeting nova 16:02:54 <opendevmeet> Meeting started Tue Jul 29 16:02:54 2025 UTC and is due to finish in 60 minutes. The chair is Uggla. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:02:54 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:02:54 <opendevmeet> The meeting name has been set to 'nova' 16:03:05 <elodilles> o/ 16:03:06 <Uggla> Hello everyone 16:03:16 <gmaan> o/ 16:03:30 <sp-bmilanov> hi! 16:03:43 <opendevreview> Stephen Finucane proposed openstack/nova master: api: Add response body schemas for simple tenant usage APIs https://review.opendev.org/c/openstack/nova/+/956096 16:04:05 <Uggla> Let's start smoothly so people can join 16:04:28 <Uggla> #topic Bugs (stuck/critical) 16:04:39 <Uggla> #info No Critical bug 16:05:06 <Uggla> #topic Gate status 16:05:14 <Uggla> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:05:21 <Uggla> #link https://etherpad.opendev.org/p/nova-ci-failures-minimal 16:05:30 <Uggla> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status 16:05:38 <Uggla> #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:05:47 <Uggla> #info Please try to provide a meaningful comment when you recheck 16:06:14 <Uggla> #info (gibi): The ceph regression is now tracked in https://tracker.ceph.com/issues/72203 and I pinged two of the maintainers to let them know we need help. 16:06:20 * bauzas waves late, was in meeting 16:06:24 <Uggla> #info (gibi): https://review.opendev.org/c/openstack/grenade/+/955865 is merged. If you still see grenade failures due to timeout during DB dumping then please let me know in the bug. 16:06:59 <gmaan> ++ 16:07:00 <Uggla> gibi, I think ^ is related to issues we had last week. Any news about them. 16:07:26 <gibi> I added this to the agenda today as FYI :) 16:07:32 <gibi> s/this/these/ 16:07:45 <gibi> so no news since two hours ago :) 16:08:29 <Uggla> ok thanks, so this is still in progess. 16:08:43 <gibi> yepp 16:09:36 <Uggla> #topic tempest-with-latest-microversion job status 16:09:43 <Uggla> #link https://zuul.opendev.org/t/openstack/builds?job_name=tempest-with-latest-microversion&skip=0 16:09:56 <gmaan> no update on this. I need to schedule some time for this 16:10:05 <Uggla> no worries, thanks 16:10:23 <Uggla> #topic Release Planning 16:10:31 <Uggla> #link https://releases.openstack.org/flamingo/schedule.html 16:10:40 <Uggla> #info Nova deadlines are set in the above schedule 16:11:06 <Uggla> We are ~1 month before feature freeze. 16:11:28 <Uggla> #topic Review priorities 16:11:37 <Uggla> #link https://etherpad.opendev.org/p/nova-2025.2-status 16:12:03 <Uggla> I'm not sure it is fully up2date, I will have a look tomorrow. 16:12:59 <Uggla> I think the main priorities are eventlet removal and vTPM live migration. 16:13:28 <Uggla> Both features have patches to review unless I'm wrong. 16:14:10 <Uggla> #topic OpenAPI 16:14:20 <Uggla> #link: https://review.opendev.org/q/topic:%22openapi%22+(project:openstack/nova+OR+project:openstack/placement)+-status:merged+-status:abandoned 16:14:27 <Uggla> #info 13 remaining. 16:15:05 <Uggla> I guess gmaan and sean-k-mooney managed to do some reviews. Thanks 16:16:02 <Uggla> #topic Stable Branches 16:16:11 <Uggla> elodilles, the floor is yours 16:16:17 <elodilles> #info stable branches (stable/2025.1 and stable/2024.*) seem to be in OK state 16:16:27 <elodilles> release nova stable versions: 16:16:33 <elodilles> 31.0.1 (2025.1 Epoxy) https://review.opendev.org/c/openstack/releases/+/955058 16:16:36 <elodilles> 30.0.2 (2024.2 Dalmatian) https://review.opendev.org/c/openstack/releases/+/947847 16:16:42 <elodilles> 29.2.2 (2024.1 Caracal) https://review.opendev.org/c/openstack/releases/+/954474 16:16:56 <elodilles> still waiting for release liaison's review o:) 16:17:03 <elodilles> (the latter is preparation for 2024.1 Caracal to move to Unmainained in October) 16:17:09 <elodilles> #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:17:27 <elodilles> that's all from me -> back to you Uggla 16:17:36 <Uggla> elodilles, I will review them tomorrow. 16:17:45 <elodilles> Uggla: thanks in advance o/ 16:18:20 <Uggla> #topic vmwareapi 3rd-party CI efforts Highlights 16:18:29 <fwiesel> Hi, no update from my side. 16:18:44 <Uggla> fwiesel, thanks. 16:18:47 <fwiesel> Uggla: back to you 16:18:59 <Uggla> #topic Gibi's news about eventlet removal. 16:19:03 <gibi> o/ 16:19:07 <Uggla> #link Blog: https://gibizer.github.io/categories/eventlet/ 16:19:13 <Uggla> #link nova-scheduler series is ready for core review, starting at https://review.opendev.org/c/openstack/nova/+/947966 16:19:33 <gibi> Thanks to the reviews we have Kamil's patch approved https://review.opendev.org/c/openstack/nova/+/949754 16:20:07 <gibi> and we are too commits from https://review.opendev.org/c/openstack/nova/+/948450/35 that will enable n-sch in threading mode in nova-next 16:20:13 <gibi> s/too/two/ 16:21:11 <gibi> I recently pushed a series of unit test re-enabling changes to the top of the series mostly Tpool and Threading.lock copying fixes 16:21:52 <gibi> ahh and we have a ssl socket warpping change as well that is an interesting one and still failing on the gate after my initial fix 16:22:01 <gibi> https://review.opendev.org/c/openstack/nova/+/955915 16:22:55 <gibi> anyhow my goal is to land https://review.opendev.org/c/openstack/nova/+/948450 before the FF (n-sch) or as a stretch goal https://review.opendev.org/c/openstack/nova/+/951957/17 (n-api) 16:22:59 <gibi> that is it. 16:23:32 <gibi> there is nothing major in between n-sch and n-api so I'm hopeful about the strech goal 16:24:08 <gibi> back to you Uggla if no questions 16:24:32 <Uggla> That would be great. So next cycle you may focus on compute. 16:24:46 <gibi> vncproxy and compute yepp 16:24:58 <gibi> Kamil has the n-cond on his plate 16:26:02 <Uggla> thanks gibi, moving on 16:26:09 <Uggla> #topic Open discussion 16:26:32 <sean-k-mooney> sorry was diestackted. but here now o/ 16:27:10 <Uggla> gibi, I thnik you want to discuss --> just want to socialize a small independent test speed improvement series: https://review.opendev.org/q/topic:%22unit-test-speedup%22 Easy to review and low risk patches. 16:27:24 <Uggla> Hi sean-k-mooney 16:27:48 <sean-k-mooney> hum that sounds interesting 16:27:57 <gibi> yeah just a heads up that I run a lot of unit test these days so I spent an afternoon speeding up some of the slow unit test 16:28:00 <gibi> s 16:28:28 <gibi> mostly changing timeout values or mocking time.sleep in test 16:28:42 <sean-k-mooney> cool. mainly uitnitest not functional at this point, will any of it carry over or are they all targeted 16:28:51 <gmaan> nice 16:28:55 <gibi> they are all targeted 16:28:59 <sean-k-mooney> ack 16:29:08 <sean-k-mooney> im happy to rewview those 16:29:19 <gibi> when I will start enabling threading in func test you can expet similar speed up series there as well 16:29:32 <gmaan> I will check 16:29:41 <gibi> thanks 16:30:14 <sean-k-mooney> i currently cheat and when i want to run unit test alot i move to a system with 48 cores so it woudl be nice to not if adding mockign of time.sleep ectra can do this in the gate too 16:30:42 <sean-k-mooney> the unit test i dont find to be too bad but the functional test are slower. 16:31:16 <gmaan> if you run api sample test too in finctional then yes it is more slower. I usually separate tgem 16:31:19 <sean-k-mooney> gibi: have you added this to the nova status etherpad 16:31:20 <gmaan> them 16:31:33 <gibi> sean-k-mooney: not yet, doing it now... 16:32:01 <sean-k-mooney> cool ill add my self to the bottom patch and try to do a pass on all fo them this week 16:32:54 <sean-k-mooney> i want to check them out and run them locally but skiming them they all seam to make sense 16:33:07 <gibi> nothing fancy :) 16:33:57 <sean-k-mooney> before we move on did you have a stragy to find thse slow test 16:34:08 <sean-k-mooney> or were you just looking at the slow test list and going hum why is that so slow 16:34:13 <gibi> tox prints the slow tests 16:34:28 <gibi> I just checked those 16:34:44 <sean-k-mooney> ack, its been a while since i looked at that list 16:35:04 <sean-k-mooney> i see it but i dont dig into it often. thanks for takign the time 16:35:23 <Uggla> gibi, 'tox prints the slow tests' with a specific commands or ? 16:35:48 <gibi> Uggla: after each successful target run :) 16:36:03 <gmaan> yeah, at the end you can see those 16:36:43 <Uggla> Shame on me I did not noticed that... 16:36:45 <sean-k-mooney> Uggla: https://github.com/openstack/nova/blob/master/tox.ini#L91 16:36:53 <gibi> -> flag = self._event.wait(timeout) 16:37:00 <gibi> ahh wrong buffer 16:37:06 <gibi> https://github.com/openstack/nova/blob/master/tox.ini#L54 16:37:18 <sean-k-mooney> so context, we use stester which uses subunit internally as a db 16:37:19 <gibi> so you can use stestr to print it 16:37:35 <sean-k-mooney> and there are some tools to extract thigns like slowest tests and print them 16:37:46 <sean-k-mooney> that what stestr slowest is doing 16:38:00 <Uggla> good to know thanks 16:38:00 <sean-k-mooney> it looking at the last test run in the subunit db and listing them 16:38:45 <Uggla> sean-k-mooney I'm just wondering if the 48 cores machine is part of your homelab ? Sorry I'm curious 16:39:17 <sean-k-mooney> yes it part of the hardware im considerign replacing its pretty old at this point 16:39:29 * gibi has 16 cores 32 threads desktop CPU 16:39:55 <sp-bmilanov> sean-k-mooney sounds like a dual-xeon workstation from mid 2010s 16:40:13 <sean-k-mooney> yep 16:40:27 <sean-k-mooney> Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz 16:41:03 <gmaan> I recently bought two 16 core dell poweredge R630 but need to setup those 16:41:43 <Uggla> gmaan for homelab too ? 16:42:00 <sean-k-mooney> honestly you can get much more power effeict systems now 16:42:00 <gmaan> yeah, my both old desktop are dead 16:42:25 <sean-k-mooney> but its my main compute node in my home openstack i jsut pop onto the host if i want to run fast 16:43:20 <sean-k-mooney> our unit test are very very paralisable 16:43:28 <sean-k-mooney> anyway i think we are a little off topic 16:43:35 <Uggla> yepp 16:43:55 <Uggla> any other topic ? 16:44:08 <sp-bmilanov> yep 16:44:45 <Uggla> oh sp-bmilanov, please go ahead. And sorry I might remove your topic thinking it was covered last time 16:44:54 <sp-bmilanov> Uggla: no probs 16:45:28 <sp-bmilanov> yeah, so from last time, gibi, dansmith, you might remember the nova-compute forced restart bug, I drafted a spec: https://review.opendev.org/c/openstack/nova-specs/+/955812 16:45:53 <sp-bmilanov> does it make sense to discuss it with the hindsight of Sean's comments? (thanks sean-k-mooney!) 16:46:29 <sp-bmilanov> as I understand it, it would be best if we add some extra RPC functionality for the source to ask the destination if the instance is running there 16:46:42 <sean-k-mooney> just opening that now 16:46:54 <sean-k-mooney> ah the restart during live migration issue 16:47:18 <sean-k-mooney> so there are proably a few differnt way to adress that. btu there are trade off for all of them 16:47:34 <sp-bmilanov> IMO admin-locking it would be a compromise, what you suggested with the self-heal sounds best 16:48:14 <sean-k-mooney> so i think there are diffent mitigation for diffent usecase. where we can gracefully shutdown i think we likely shoudl abort the live migration ad the libvirt level 16:48:24 <sean-k-mooney> and then clean it up at the nova level on start up 16:48:45 <sean-k-mooney> buting the isntnace to error and locking them i think is a step we coudl take to improve the situraion 16:49:16 <sean-k-mooney> but the ultimate fix would be to be able to resume the management of the migration and allow it to succeded if in fact did 16:49:51 <sean-k-mooney> i feel like an incremental approch is what we shoudl try to do but i dont know what others think 16:50:11 <sean-k-mooney> sp-bmilanov: is this somethign you woudl ahve time to work on or are you rasing it as a pain point that you think we shoudl prioritize 16:50:19 <gibi> last week we discussed that sp-bmilanov interested in the non-graceful restart case 16:50:33 <gibi> so we need something in the startup code path that prevents the original problem 16:51:36 <sp-bmilanov> gibi: yes, the ungraceful restart case is what we experienced ; sean-k-mooney pain point first, I am not sure if I can help implement something that you've triaged as so complex 16:51:37 <gibi> ie. the duplication of instances reported https://bugs.launchpad.net/nova/+bug/2092391 16:52:02 * sp-bmilanov should maybe link the bug to the spec 16:52:43 <gibi> I believe we should be able to detect during compute startup that we have a duplication based on the fact that one of the domain is on a host that is not equal to the instance.host 16:53:12 <gibi> or based on that we have an unfinished migration 16:53:16 <gibi> in the DB 16:53:33 <sp-bmilanov> just an aside, I guess that's is not a good approach since you haven't brought it up, but why not try using a flag in the database to check if the migration is at at point where the instance is running at the destination 16:53:39 <sean-k-mooney> so the duplciation i think weill need us to do an rpc to the dest compute 16:53:59 <sean-k-mooney> and determin if we shoudl revert or run post-live-migration 16:54:02 <gibi> sp-bmilanov: I hope we have a migration state that signals that already 16:55:50 <opendevreview> Stephen Finucane proposed openstack/nova master: api: Address issues with images APIs https://review.opendev.org/c/openstack/nova/+/956102 16:56:37 <sean-k-mooney> sp-bmilanov: so resumeing the guest is not done by nova 16:56:40 <gibi> anyhow I think it make sense to review a spec and perpare a PTG discussion about it 16:56:43 <sean-k-mooney> its doen by libvirt 16:56:55 <sean-k-mooney> and the scoue comput enode will update the dbstate if its still runing 16:57:15 <sean-k-mooney> the problem currently is if you restart it the thread we have monitoring the libvirt job will be killed 16:57:25 <sean-k-mooney> and we wont do that update. 16:57:57 <sean-k-mooney> but on startup we can see fi the job is still runing and either resume waiting or make an rpc to see if it scuceeded. or check if the vm is running lcoally 16:58:14 <sean-k-mooney> if its runnign locally and there is no migration job active in libvirt it mean the migratoin failed 16:58:25 <sean-k-mooney> if the vm is not local it means either it crashed or it moved 16:58:38 <sean-k-mooney> so we coudl be more intelegent on startup 16:58:48 <sean-k-mooney> but someone need to spend tiem documenting this bevhioer 16:58:50 <sean-k-mooney> and writing it up 16:59:10 <sean-k-mooney> so yes i think a spec/ptg discussion woudl be good 16:59:11 <gibi> yepp I agree 17:00:25 * sp-bmilanov is still wondering why "make an rpc to see if it scuceeded" cannot be checked from the DB, but he needs to make sense of sean-k-mooney's messages offline 17:00:58 <sp-bmilanov> s/checked from the DB/set from the dst and then checked from the src from the DB/ 17:01:31 <sean-k-mooney> well 17:01:45 <sean-k-mooney> the compute agent does not have direct db access for one 17:01:55 <sean-k-mooney> so just chekc the db state is already an rpc to teh conductor 17:02:06 <sean-k-mooney> but the responisbleity for updating the host of the vm is on the souce node 17:02:14 <sean-k-mooney> as part of post live migrate 17:02:33 <sean-k-mooney> so on start up if we need to check if tis on the dest node the source node need to call the dest to do that 17:02:52 <sean-k-mooney> i.e. asumeign the vm is not still running locally 17:03:39 <sean-k-mooney> we need to know if it crashed or if its actully active on the other host. and we can do that with only local knowlsage or by checkign the db. 17:04:50 <Uggla> We are on top of the hour. So I propose to close the meeting. ^ discusion can happen right after if needed. 17:05:06 <sp-bmilanov> thanks Uggla 17:05:20 <Uggla> Thanks all. 17:05:22 <Uggla> #endmeeting