Thursday, 2025-12-04

gmaansean-k-mooney: gibi updated the graceful shurdown spec, please check -  https://review.opendev.org/c/openstack/nova-specs/+/96929900:10
opendevreviewmelanie witt proposed openstack/nova master: WIP Make QEMU_IMG_LIMITS process limits configurable  https://review.opendev.org/c/openstack/nova/+/96953800:34
sean-k-mooneygmaan: ack ill take a look in my morning00:44
opendevreviewsean mooney proposed openstack/nova-specs master: add spec for resource tracker notifications  https://review.opendev.org/c/openstack/nova-specs/+/96771201:04
opendevreviewMerged openstack/nova stable/2025.2: [ironic] Ensure unprovision happens for new states  https://review.opendev.org/c/openstack/nova/+/96820802:17
*** mhen_ is now known as mhen02:17
opendevreviewJay Faulkner proposed openstack/nova stable/2025.1: [ironic] Ensure unprovision happens for new states  https://review.opendev.org/c/openstack/nova/+/96954702:36
opendevreviewzhou zhong proposed openstack/nova master: Create ephemeral and private secret when attaching encryptor  https://review.opendev.org/c/openstack/nova/+/96901902:50
opendevreviewMerged openstack/nova master: ensure correct cleanup of multi-attach volumes  https://review.opendev.org/c/openstack/nova/+/91632207:00
ratailorgibi, sean-k-mooney bauzas could you please check this schema validation failure https://e88fada250d77d396ef5-a605dbf95134d478b50bec2dfa092555.ssl.cf1.rackcdn.com/openstack/146bb31174804bc5bebdba2b537c97db/testr_results.html related to https://review.opendev.org/q/topic:%22bp/show-instance-action-finish-time%22 07:59
* bauzas can look briefly08:00
bauzasbut I want to give a swing on the left specs I didn't read yet08:00
ratailorIn my local devstack env, I could get the instance action finish_time for create and attach_volume actions here https://paste.openstack.org/show/bWzmPfwfOZ9FQpfCpVPm/08:00
ratailorbauzas, ack.08:01
ratailorgmaan, ^^08:01
bauzasratailor: I'm not really a tempest expert about schema validations08:06
bauzasI saw the patch that got -1 review.opendev.org/c/openstack/tempest/+/953331 but honestly I don't know why08:06
ratailorbauzas, ack. I think stephenfin can help here.  ^^08:07
bauzasI see you're depending on the nova patches08:07
bauzasso everything looks correct to me at a first glance08:07
ratailorbauzas, could you please review nova-spec and implementation and suggest if anything is missing.08:08
bauzasas I said, I'll do another round of open specs today08:08
gibiDetails: HTTP response body is invalid (Additional properties are not allowed ('finish_time' was unexpected)08:09
gibiso I assume your nova patch added a new field to the response08:09
gibithat was not there before08:09
ratailorgibi, yes08:09
gibiand the tempest schema wasn't updated to accept that field in the new microvresion08:09
ratailorgibi, this is what I have done to update tempest schema08:21
ratailorhttps://review.opendev.org/c/openstack/tempest/+/953331/7/tempest/lib/api_schema/response/compute/v2_101/servers.py08:21
jkulikratailor: looks like you're overwriting your changes to show_instance_action in line 19 again in line 4009:00
ratailorjkulik, nice catch, thanks!09:13
zigoHi there!09:36
zigoLooks like I have 2 unit tests failing in Nova's autopkgtest because our ppc64el servers are too fast, no ?09:36
zigohttps://ci.debian.net/packages/n/nova/testing/ppc64el/66782181/09:36
zigogibi: ^ Your thoughts?09:39
zigoShould we add a sleep() in these unit tests?09:40
opendevreviewThomas Goirand proposed openstack/nova master: Fix race condition in DB updates  https://review.opendev.org/c/openstack/nova/+/96974510:57
opendevreviewDominik proposed openstack/nova master: Regression test for Placement allocations remaining during failed schedule (bug #2132020)  https://review.opendev.org/c/openstack/nova/+/96925111:12
opendevreviewDominik proposed openstack/nova master: Regression test for Placement allocations remaining during failed schedule  https://review.opendev.org/c/openstack/nova/+/96925111:19
sean-k-mooneyzigo: we try not ot add sleeps in gerneal so ideally no11:39
sean-k-mooneyzigo: we normally contol time in these test11:40
sean-k-mooneyso it looks like the test has a race condtion. if we add a sleep it woudl be somethign liks sleep(0.001)11:41
sean-k-mooneyzigo: i think gibi fixed a few cases likt this before when they started workign on the eventlet removal11:42
stephenfinratailor: bauzas: you need to define a new schema for tempest before the nova patch can merge. gmaan is a better person again to ask https://github.com/openstack/tempest/tree/8e2e854051b247ce914982ab0c3c0ed5e2f01b09/tempest/lib/api_schema/response/compute12:02
stephenfinhopefully the duplication can be removed once the final patches merge against nova (hint hint)12:02
stephenfin*final OpenAPI patches12:03
opendevreviewThomas Goirand proposed openstack/nova master: Fix race condition in DB updates  https://review.opendev.org/c/openstack/nova/+/96974512:06
ratailorstephenfin, ack.12:12
zigosean-k-mooney: You're wrong, IMO.12:19
zigosean-k-mooney: The issue is getting 2 timestamps in less than 1 second.12:19
zigoThen they are equal, and later on, the test checks if they are NOT equal.12:19
sean-k-mooneyzigo: what im saying is we normlaly try to decouple the unit test entirly form teh infra12:23
sean-k-mooneyfor example by mockign time.now() so that each call woudl increment by 1 second or some other way12:23
sean-k-mooneyzigo: for functional test we might uses sleeps as we try to avoid mockign if we cna btu for unit test we usually mock otu the tiem functions12:24
opendevreviewMerged openstack/nova stable/2025.1: [ironic] Ensure unprovision happens for new states  https://review.opendev.org/c/openstack/nova/+/96954712:37
zigosean-k-mooney: Oh ok, I get it now. Then I can just do that (ie: do timestamp2 = timestamp+1)12:40
zigoThough it may be harder than just mocking the timestamp function, no? I don't know how the updated_at is generated.12:44
sean-k-mooneywe can likely mock time.now() and that shoudl be enough i think but i woudl have to look at the specific tests12:51
gibizigo: sean-k-mooney: we should have a fixture for the time.now() already in our tests maybe it is not used in those specific tests12:55
gibibut overall I agree to make the test independent of the value of time.now12:55
zigoI'd say: please, feel free to takeover my patch and make it better.12:56
zigoI don't have enough available time to do so.12:56
gibiI don't have time for that at the moment too. sorry12:59
sean-k-mooneygibi: we do have a fixture for that yes but i didnt check if it was used either14:14
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/test.py#L82814:14
sean-k-mooneythat works a littel defently then i descrbed making all calsl return a fixed time14:15
sean-k-mooneyin this case the test logic is either incorect or we need one to monotmicly march forward to fit this test14:16
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/tests/functional/test_nova_manage.py#L2172-L217514:16
opendevreviewDominik proposed openstack/nova master: Regression test for Placement allocations remaining during failed schedule  https://review.opendev.org/c/openstack/nova/+/96925114:24
opendevreviewDominik proposed openstack/nova master: Remove Placement allocations in the broken build cleanup  https://review.opendev.org/c/openstack/nova/+/96844614:39
opendevreviewMerged openstack/nova-specs master: Propose Spec 1 of Graceful shutodwn for 2026.1  https://review.opendev.org/c/openstack/nova-specs/+/96929914:45
gibidansmith: sorry no progress yet with the utils.spawn issue. I was distracted with other things15:08
dansmithgibi: no worries, I found it while doing other things, so once I found the issue I went back to those things15:08
gibidansmith: but now I found the change I had a hunch about https://review.opendev.org/c/openstack/nova/+/953121/2/nova/tests/fixtures/nova.py#b1248 so before this change the utils.spawn in untit test executed the passed in function and *raised back* any exception. But that is semantically wrong as only utils.spawn(f).wait() should raise it. So I changed the unit test fixture semantics to not raise from 15:18
gibi__init__15:18
gibiso that is way in the past the test got the exception without the code under test ever waiting for the result of the spawned thread15:19
gibia/way/why/15:19
gibis/way/why/15:19
dansmithgibi: okay but we don't wait() on the result of the spawn there, that's the point right?15:20
gibiright. the live migration code never wait for that spawned thread to finish hence it cannot get the exception from it15:21
dansmithwe should have wait()ed on the result of the greenthread before, so the exception showed up at that point, so it was probably not semantically right before, but we still need a change 15:21
sean-k-mooneydansmith: gibi thanks for the review. ill remove the storage stuff and try and adress the other comments15:21
gibidansmith: I don't know yet if the livemigration code needs to wait or not. In the past it did not waited. 15:22
dansmithgibi: but if it doesn't we would never notice if we failed to call guest.migrate() or raised any exception within15:23
dansmiththis example was a bad mock, but it stands for any failure within the thread we spin up to do the actual migration15:25
gibiyeah I got you. The errors in the thread are either handled within that thread or lost and the migration_monitor just detects the failed migration independetly15:27
dansmithwill it? it will eventually time out or something, but I'm not sure that's really the same15:27
dansmithbut yeah, either way.. I think we need to collect the result regardless15:27
dansmithand certainly by not doing it here, any unit testing we do will always pass no matter what happens inside that "thread"15:28
gibiyeah I'm not against collecting the result15:28
gibiI will do some trials locally what happens in devstack if I break the call to guest.migrate() 15:28
dansmithokay15:29
gibido I understand correctly that such result collection is anyhow after the monitor already finished the monitoring so the migration either happened or failed and we just want to get better logs by knowing any exceptions from the thread?15:31
dansmithsean-k-mooney: quick before you revise, I think storage_fsid is fine.. we could even make that support(able) on other things by saying "put a .fsid file with a uuid in the root and we'll report it".. it's the local|shared I think is a lie :)15:32
dansmithgibi: I'll have to go back and look and the monitor behavior if nothing ever starts15:32
gibiOK. No need to jump back yet. I will verify that locally. I just wanted to see that my general understading is not totally wrong15:33
dansmithkinda looks like it will wait forever if DOMAIN_JOB_NONE right?15:33
opendevreviewMerged openstack/nova master: pre-commit: Bump versions  https://review.opendev.org/c/openstack/nova/+/96608915:33
gibiyeah there is a loop15:33
dansmithwe can probe the future in that loop if we pass it right? and stop waiting if the thread has ended for any reason?15:34
gibifuture.done() is non blocking so yes we can probe15:35
dansmithright15:36
dansmithif future.done(): future.result()15:36
dansmith^ will raise where we're already try..excepting for migration failure15:36
dansmithif done inside the loop15:36
gibiyeah I think so15:38
sean-k-mooneydansmith: ok so if i removed storage_type and kept stroage_fsid you woudl be fine with that15:39
sean-k-mooneyand sure we culd make that declarbale in some way vai a file or config option where we cant dicusver it easily.15:39
dansmithsean-k-mooney: yes, I called out storage_type specifically because we can't know.. we _can_ know storage_fsid15:40
sean-k-mooneyack15:40
dansmithman, I do not get it..15:41
sean-k-mooneyoh you mean each compute cant know that its shared even if its nfs15:41
sean-k-mooneyoh ya your right15:41
opendevreviewTakashi Kajinami proposed openstack/nova master: libvirt: Use firmware auto-selection by libvirt  https://review.opendev.org/c/openstack/nova/+/96913215:41
dansmithcorrect15:41
dansmithsometimes I load the spec and gerrit shows me tons of unresolved comments and other times, not.15:41
sean-k-mooneyya...15:41
sean-k-mooneyi noticed that too for the last few days15:41
sean-k-mooneyi dont know if its a recent regression or browswer caching15:42
gibidansmith: heh we have a convoluted way already in place to detect early end of the thread. The code adds a callback to the future to execute when it finishes. That callback sets finish_event (which is an Event) that is passed into the monitoring loop and checked there https://github.com/openstack/nova/blob/5d3d0c870a1c37dcd6ce9cd0d542282c7c7994f3/nova/virt/libvirt/driver.py#L1140515:45
gibiso the monitoring loop detects the early end of the thread via that event15:46
dansmithgibi: that detects when the future ends yes, but doesn't care why15:46
gibi(still it does not collect the resutl)15:46
dansmithright15:46
dansmithyep15:46
dansmithgibi: fwiw, I wasn't actually seeing that callback be called when I was pdb'ing the unit test15:47
dansmithbut that might be specific to the unit test environment or something15:47
gibithe unit test fixture calles the callback right when it is registered :/15:47
dansmithah15:47
gibithat fixture is suboptimal to serve the unit test needs15:48
gibiOK. I'm convinced enough now that collecting the result after the monitor make sense15:48
dansmithdo we actually need the convoluted done event? is that maybe to give the same behavior in both models or something, instead of just checking future.done?15:49
gibias far as I think we don't need the convoluted way, we can just use future.done instead within the monitor15:50
dansmithokay that might be better anyway then15:51
gibiusing the future directly  removes the callback which is a nice symplification15:51
dansmithyup15:51
gibiI have context now should I just pick you patch up and update it? or you would like to do it?15:51
dansmithyou're welcome to, but if you need to do other stuff I can do it also15:53
gibiI take it. I need to write some code to keep my sanity :)15:53
dansmithyeah, I've been just reviewing for weeks now.. sanity gone :(15:54
gmaandansmith: gibi: bauzas: I did one ammended in graceful shutdown backlog spec as spec3, please check. it is for backlog only and to keep it out of scope from the scope of this cycle spec https://review.opendev.org/c/openstack/nova-specs/+/96954316:21
gibigmaan: make sense to me thanks.16:22
opendevreviewsean mooney proposed openstack/nova-specs master: add spec for resource tracker notifications  https://review.opendev.org/c/openstack/nova-specs/+/96771216:30
opendevreviewGhanshyam proposed openstack/nova-specs master: Follow up fixes for graceful shutdown spec  https://review.opendev.org/c/openstack/nova-specs/+/96981316:48
gmaangibi: bauzas ^^ quick follow up for fixing some comments16:50
gmaanand thanks dansmith sean-k-mooney gibi bauzas for review and help on the shaping graceful shutdown specs. 16:51
gmaanratailor: most probably it is missing schema but let me review todaty16:52
opendevreviewMerged openstack/nova-specs master: Follow up fixes for graceful shutdown spec  https://review.opendev.org/c/openstack/nova-specs/+/96981317:07
opendevreviewsean mooney proposed openstack/nova-specs master: Repropose and update cyborg vGPU (mdev) support  https://review.opendev.org/c/openstack/nova-specs/+/96751517:18
sean-k-mooneybauzas: gibi i adresss bogdans feedback and the other open comments if you have time to review that too.17:22
opendevreviewBalazs Gibizer proposed openstack/nova master: Collect result of _live_migration_operation  https://review.opendev.org/c/openstack/nova/+/96950117:51
opendevreviewsean mooney proposed openstack/nova-specs master: Repropose and update cyborg vGPU (mdev) support  https://review.opendev.org/c/openstack/nova-specs/+/96751518:08
opendevreviewJay Faulkner proposed openstack/nova stable/2024.2: [ironic] Ensure unprovision happens for new states  https://review.opendev.org/c/openstack/nova/+/96983318:17
JayFThe next step backport for the ironic bugfix, for your consideration ^^^^ :) thank you!18:17
melwittdansmith: EZ io=native config patch that you have expressed some interest before 😇 https://review.opendev.org/c/openstack/nova/+/96484819:06
dansmithoh yeah I literally have it open already19:12
dansmithI shall hit for sure eventually19:12
sean-k-mooneythe only concern i have with that19:13
melwitt:)19:13
sean-k-mooneyis live migration btween host with diffent values19:13
sean-k-mooneyi assume we need to make sure we do not change the mode during a live mifation19:13
sean-k-mooney*migration19:13
sean-k-mooneyits ok if it change on hard reboot or cold migraion19:14
sean-k-mooneymelwitt: did you try that19:15
sean-k-mooneywe coudl also tweak one of the zuul jobs to enable this19:15
melwittno, I did not19:16
sean-k-mooneyfor example the normal nova-live-migration job which uses iscsi19:16
melwittyeah that sounds like it would be easy to try, just configure the subnode different and run the job?19:17
sean-k-mooneyyep19:18
sean-k-mooneywe have live_migration_back_and_forth enabled in that job19:18
melwittin general I wasn't thinking config like this would be different than two computes having various config not matching but maybe none of the others affect xml19:18
sean-k-mooneyso if you enable it on one or both host it will give us interesting data19:19
melwittok, I'll try it19:19
sean-k-mooneywell libvirt might not care im currnetly tying to see if this is confire a backend option or not19:20
sean-k-mooneysome thing it will happly allow change in a live migrtion and others it wont an i have never played with seting io="threads" vs io="native" ectra difently on a migration19:22
sean-k-mooneymelwitt: related question do we just let libvit choose for local disks or is that also hardcoded to native today19:23
melwittsean-k-mooney: yes local disks are left to default. this io=native hard-coded thing is only for iSCSI, NFS, and FC Cinder volumes19:24
sean-k-mooneyack19:25
melwittit comes from this spec from mitaka https://specs.openstack.org/openstack/nova-specs/specs/mitaka/implemented/libvirt-aio-mode.html19:25
sean-k-mooneyya i was asking ai about libvirt io modes and that actullythe firt link it found19:27
sean-k-mooneyso it seam like we did adopt the best practices that were recommend at the tiem and optied into native but we pessimsed some backend so the change makes sens19:27
melwittoh heh19:28
sean-k-mooneynumber 2 was https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-blockio-io_mode which is a littel more useful but im actully surpised this is not coverd in the libvirt domain xml docs19:29
sean-k-mooneythey are useable pretty good at docuemtning the difent values19:29
melwittyeah I guess it only says "The optional io attribute controls specific policies on I/O; qemu guests support "threads" and "native" Since 0.8.8, io_uring Since 6.3.0 (QEMU 5.0)." in https://libvirt.org/formatdomain.html#hard-drives-floppy-disks-cdroms19:31
sean-k-mooneyah yep just roudn it deep in the driver section19:32
sean-k-mooneyunfortuetly you cant deep link that far into the doc19:33
sean-k-mooneyio_uring support could be very interesting in teh future19:33
melwittyeah I wish their doc pages had more fine grain links19:34
sean-k-mooneyhttps://kb.blockbridge.com/technote/proxmox-aio-vs-iouring/ is interesting but i have not had time to read i tyet19:39
sean-k-mooneyfinding 1 "IOThreads significantly improve performance for most workloads" well i guess its a good thing we are finally going to turn them on19:39
melwitthaha yeah. I saw similar things when i first started reading up on io=native vs io=threads19:42
opendevreviewmelanie witt proposed openstack/nova master: DNM test io=threads => io=native live migration  https://review.opendev.org/c/openstack/nova/+/96984419:42
gmaanratailor if you see it when online. I commented in tempest change. maybe you can run the same tests which is executing same action on isntance in your devstack end and see why finsih_time is None - https://review.opendev.org/c/openstack/tempest/+/95333119:47
opendevreviewmelanie witt proposed openstack/nova master: DNM test io=threads => io=native live migration  https://review.opendev.org/c/openstack/nova/+/96984419:47
gmaanthere should be difference between API, action you call via tempest test or osc but somewhere it is missing the recording of finish time. 19:48
sean-k-mooneygmaan: os that comment thread on the swap voluem one only show up on v2 for me19:52
sean-k-mooneyi didnt see that in the latest revsion at all when i went to revie it19:52
gmaansean-k-mooney: yeah, there are multiple way to do lile task state as melwitt suggested early or checking migration status or based on request (your latest comment). but we should not make RPC call whenever we have GET volume attachment call19:53
sean-k-mooneyya i agree19:54
gmaanand that is something we need to agree as design in spec and should not delay to implementation.  19:54
sean-k-mooneyi just didnt see that rad on the spec yestready because the commens are not propagating the way i expect19:54
melwittgmaan: yeah, thanks for commenting on that spec. it seemed crazy to me to do the RPC call about swap progress every time when 99% of the time no swap would be in progress19:54
gmaantrue19:55
gmaansean-k-mooney: comment propagation in gerrit is really weird. i hate that many times :) 19:56
gmaanlast spec I am going through today is rajesh finish_time  one, and previous comments are all lost there either due to Gerrit or they are marked resolved before agreement was reached. that is adding more time to review specs19:58
sean-k-mooneyya i normlly try not to mark thing resovled unlless i made the change that was requested20:02
sean-k-mooneyeven then i generaly prefer for non trivial thing if the orginal review acks it intesad of me20:02
gmaanyeah20:04
opendevreviewmelanie witt proposed openstack/nova master: DNM test io=threads => io=native live migration  https://review.opendev.org/c/openstack/nova/+/96984421:26
opendevreviewmelanie witt proposed openstack/nova master: DNM test io=threads => io=native live migration  https://review.opendev.org/c/openstack/nova/+/96984421:27
opendevreviewmelanie witt proposed openstack/nova master: DNM test io=threads => io=native live migration  https://review.opendev.org/c/openstack/nova/+/96984421:31
opendevreviewmelanie witt proposed openstack/nova master: Make QEMU_IMG_LIMITS process limits configurable  https://review.opendev.org/c/openstack/nova/+/96953822:10
melwittsean-k-mooney: fyi the DNM patch test run finished -- nova-live-migration passes if [libvirt]use_default_aio_mode_for_volumes = true on the subnode only https://review.opendev.org/c/openstack/nova/+/96984422:30
melwittlooks like live_migrate_back_and_forth was not enabled though22:33
opendevreviewmelanie witt proposed openstack/nova master: DNM test io=threads => io=native live migration  https://review.opendev.org/c/openstack/nova/+/96984422:35
melwittrunning again with live_migrate_back_and_forth = true ^22:35
opendevreviewmelanie witt proposed openstack/nova master: Reproducer for bug 2016173  https://review.opendev.org/c/openstack/nova/+/94622222:37
opendevreviewmelanie witt proposed openstack/nova master: Call volume detach rollback API if detach fails  https://review.opendev.org/c/openstack/nova/+/88039922:37
opendevreviewmelanie witt proposed openstack/nova master: DNM test io=threads => io=native live migration  https://review.opendev.org/c/openstack/nova/+/96984423:31

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!