Friday, 2025-09-12

*** jonnyb0 is now known as jonnyb01:13
*** mhen_ is now known as mhen01:56
opendevreviewLajos Katona proposed openstack/nova master: Use SDK for Neutron networks  https://review.opendev.org/c/openstack/nova/+/92802206:59
jlejeunesean-k-mooney: hello, yeah I know that catch 'Exception' is not necessarily a good idea, but in that case if I only put the try/except statement around the 'check_can_live_migrate_source' call, it will catch every exceptions which may happen during that specific rpc call07:05
ratailorelodilles, could you please review when you have time https://review.opendev.org/c/openstack/nova/+/958834 and https://review.opendev.org/q/owner:ratailor@redhat.com+branch:stable/2024.207:09
opendevreviewJulien LE JEUNE proposed openstack/nova master: nova-condutor puts instance in error state  https://review.opendev.org/c/openstack/nova/+/90165508:17
elodillesratailor: ACK, added them to my TODO list08:37
opendevreviewMerged openstack/placement stable/2025.2: Update .gitreview for stable/2025.2  https://review.opendev.org/c/openstack/placement/+/96050708:41
opendevreviewMerged openstack/placement stable/2025.2: Update TOX_CONSTRAINTS_FILE for stable/2025.2  https://review.opendev.org/c/openstack/placement/+/96050808:41
opendevreviewOpenStack Release Bot proposed openstack/nova stable/2025.2: Update .gitreview for stable/2025.2  https://review.opendev.org/c/openstack/nova/+/96074308:54
opendevreviewOpenStack Release Bot proposed openstack/nova stable/2025.2: Update TOX_CONSTRAINTS_FILE for stable/2025.2  https://review.opendev.org/c/openstack/nova/+/96074408:54
opendevreviewOpenStack Release Bot proposed openstack/nova master: Update master for stable/2025.2  https://review.opendev.org/c/openstack/nova/+/96074508:54
gibisean-k-mooney: dansmith: I have the first comparable rally results between eventlet and native threading. See my top level comment in https://review.opendev.org/c/openstack/nova/+/960130/16#message-e6a461d92172aea570e8f18405d47ee00b9c300a It links to both rally results with timing and I manually pulled out the pool usage and the memory usage of the scheduler (I have the intention to script some of it 09:01
gibiin CI)09:01
ratailorelodilles, Thanks!09:49
opendevreviewTaketani Ryo proposed openstack/nova-specs master: Add a spec for 2026.1 for libvirt launching Arm CCA instances  https://review.opendev.org/c/openstack/nova-specs/+/96077709:52
sean-k-mooney gibi  do you happen ot know if they ran on the same provider09:54
sean-k-mooneygibi: i.e. are the host vms comparible09:54
gibithe two rally execution happen within the same CI job on the same CI worker. So it is always comparable :)09:58
gibithe base job runs rally then in the post-run I reconfigure to native threading and run rally again09:59
sean-k-mooneyyep just reading the comment on the patch09:59
sean-k-mooneyso that is good because it elimiates the biggest variabel in our ci10:00
sean-k-mooneyrun to run hardware deltas10:00
sean-k-mooneylooking atht eh results this is very promising10:00
sean-k-mooneygibi: the only metric i woul have liket to see that i have not seen yet is the total cpu time of the nova process10:01
sean-k-mooneyi.e. is there any delter on cpu load with eventlet vs threading10:01
gibisean-k-mooney: how can we collect that?10:02
sean-k-mooneyin theory there shoudl not be since the eventlet shcdluer vs kernel shclduer selecting the next thread shoudl eb more or less a wash10:02
sean-k-mooneywe could do it with perf but that might be overkill10:02
gibiI guess somewhere in the /proc it is recorded10:03
gibias top knows it10:03
sean-k-mooneytop woudl also prtint it. yes its aviabel in proc, we can turn on atop in ci10:03
sean-k-mooneyi t was added not too long ago but i have not seen job using it10:03
sean-k-mooneyim just not sure if it will show cumlitive cpu tiem 10:04
gibiI will check I see podman login  --username gibi --password10:04
gibiahh not that10:04
gibiI see https://github.com/openstack/devstack/commit/1aac81ee881534276fd7d6540ed356a85d064a1310:04
sean-k-mooneyyep so i was looking at that patch a few times when looking at odd gate failerus and debated if it woudl help but i have not tried it yet10:06
gibiit should be enabled by defautl but I don't see the logs in the run so I need to dig10:06
sean-k-mooneyi dont think its on by defaualt10:06
sean-k-mooneyhttps://github.com/openstack/devstack/commit/1aac81ee881534276fd7d6540ed356a85d064a13#diff-3fedb5d16b14e2fe731944ed0819eabefe91f42d456377e5ce6f0a1874232269R109710:07
sean-k-mooneyshoudl card against it10:07
sean-k-mooneywe startar the base devstack josb with disable all services10:07
sean-k-mooneybut even without that i dont see this patch addign it to the default service list10:07
sean-k-mooneygibi: but it shoudl jsut require added `enable_service atop` 10:08
sean-k-mooneyor in ci i guess thats `atop: true`10:08
sean-k-mooneygibi: oh i see you addign post-run: playbooks/nova-rally-fake-virt-threading/post.yaml and i guess that also runs rally a second time correct10:10
gibiyes, it calls rally task restart that trigger rally to re-run the previous task10:11
gibihttps://review.opendev.org/c/openstack/nova/+/960130/16/roles/rerun-rally/tasks/main.yaml#1110:11
gibiit would be nice to re-use the rally roles but it is not stored in the rally-openstack repo in a re-usable way10:12
gibialso they hardcode a bunch of things that does not help reuse like the report generation does not handle multiple rally runs10:12
* gibi needs to go get some foof10:13
gibifood even :)10:13
sean-k-mooney enjoy. while we should do more testing this is all very promising congrats10:14
opendevreviewMerged openstack/nova master: Update master for stable/2025.2  https://review.opendev.org/c/openstack/nova/+/96074510:54
opendevreviewMerged openstack/nova master: reno: Update master for unmaintained/2023.1  https://review.opendev.org/c/openstack/nova/+/93511710:55
opendevreviewMerged openstack/placement master: Update master for stable/2025.2  https://review.opendev.org/c/openstack/placement/+/96050910:55
opendevreviewMerged openstack/nova stable/2025.2: Update .gitreview for stable/2025.2  https://review.opendev.org/c/openstack/nova/+/96074310:55
opendevreviewMerged openstack/nova stable/2025.2: Update TOX_CONSTRAINTS_FILE for stable/2025.2  https://review.opendev.org/c/openstack/nova/+/96074410:55
opendevreviewBalazs Gibizer proposed openstack/nova master: Rally job for eventlet-removal  https://review.opendev.org/c/openstack/nova/+/96013012:22
opendevreviewBalazs Gibizer proposed openstack/nova master: Rally job for eventlet-removal  https://review.opendev.org/c/openstack/nova/+/96013012:40
opendevreviewBalazs Gibizer proposed openstack/nova master: Rally job for eventlet-removal  https://review.opendev.org/c/openstack/nova/+/96013013:30
dansmithgibi: just trying to sanity check some of the stats there..13:55
dansmithit seems like for the majority of the run everything is idle all the time, with the exception of a short thing at the front where we have idle_workers < max_workers, and I wonder if that's just because we're starting up13:56
opendevreviewJulien LE JEUNE proposed openstack/nova master: nova-condutor puts instance in error state  https://review.opendev.org/c/openstack/nova/+/90165513:56
dansmithI went looking for a comparison of how busy the workers were between the two.. like if the threaded mode was able to keep up with 5 workers...but only just, or if they both were taxed, but keeping up13:57
dansmithjust wondering if maybe that stats log line is not giving us accurate results (I assume the wall time is, and that looks good of course)13:57
gibidansmith: I think given the limited number of 5 computes the scheduler can keep up with the requests as the scheduling them is fairly easy13:58
gibiI'm trying to crank up both the number of computes and the number of parallel VM boot requests to see if we can make the scheduler more busy13:59
dansmithokay yeah, my point is just that while the numbers seem great, feels like maybe the load is too low to really measure a difference14:03
sean-k-mooneygibi: you could create a custom zull nodeset in the job with up to 5 nodes14:04
sean-k-mooneythat the node limit in our tenant. i also suppoect at least on the subnodes you coudl push to 10-20 fake agents14:05
sean-k-mooneyso you might be able to similate somethign in the 85 compute range14:05
gibidansmith: I agree with your observation and I will try to crank up the numbers14:06
gibisean-k-mooney: yeah I just doubled the compute agents to 1014:06
sean-k-mooneyim just looking at the mem low point to see how close you were and the swap usage14:07
gibisean-k-mooney: also I found that the memory_tracker collects cpu time data as well per process so I can pull RSS and time there14:07
sean-k-mooneyto see if i can find anythign to indeicatre how many more we might eb able to run14:07
sean-k-mooneywe can also requst vms with 16GB of ram now14:07
sean-k-mooneyvia a custom node set14:08
sean-k-mooneygibi: oh i looked brifely but must have missed that before14:08
gibisoo many options:)14:08
sean-k-mooneyoh is time process time?14:09
gibias far as I see yes14:09
sean-k-mooneyas in wall clock time of the acticve (user) time of the process14:09
sean-k-mooney memory_tracker low_point: 190096014:09
gibiit is cputime from ps14:09
gibihttps://github.com/openstack/devstack/blob/f6d8dab0e885b8de8c0f44388d538da7d4f9b7ec/tools/memory_tracker.sh#L88C66-L88C7314:10
sean-k-mooneyso the low point was almost 2G fo aviabel ram14:10
opendevreviewJulien LE JEUNE proposed openstack/nova master: nova-condutor puts instance in error state  https://review.opendev.org/c/openstack/nova/+/90165514:11
sean-k-mooneyteh resident memory of nova-comptue seams to be around 165mb so if we round up to say 200MB14:11
sean-k-mooneywe can proably run 20-25 fake agent on the contoler and i would guess 50+ on the 8G subnodes based on ram14:12
sean-k-mooneymy question woudl then be what those the cpu load look like but you proably can simulate a 200 node cluster if needed14:13
sean-k-mooneyif you were to try that i woudl proably do 4 dedicatec comptue and disable the nova-comptues on the contoler entirly14:14
sean-k-mooney gibi  interesting i wonder if we shoudl make the colelcted parmaters configurable via an devstack environment var14:15
gibisoo many options:)14:16
sean-k-mooneyill stop suggesting them14:16
sean-k-mooneyyour current patch has times 400 and concurrency 40 + 10 fake computes14:18
sean-k-mooneyand you turned on atop14:18
sean-k-mooneyso lets see what that shows14:18
sean-k-mooney... post failure14:19
sean-k-mooneyrally: error: unrecognized arguments: -n 2 | xargs rally task trends --out /opt/stack/.rally/results/trends.html --tasks14:19
opendevreviewSylvain Bauza proposed openstack/nova master: Support multiple allocations for vGPUs  https://review.opendev.org/c/openstack/nova/+/84575714:20
gibiahh probably ansible command vs shell. I use pipes14:22
sean-k-mooneyah right proably14:27
sean-k-mooneyya your using command14:27
sean-k-mooneyso you need shell for that to work14:27
opendevreviewBalazs Gibizer proposed openstack/nova master: Rally job for eventlet-removal  https://review.opendev.org/c/openstack/nova/+/96013014:29
opendevreviewFlorian proposed openstack/nova master: Add check for PCIe devices attach limit for volume and ports  https://review.opendev.org/c/openstack/nova/+/95558415:06
gibisean-k-mooney: dansmith: 20 fake fits in one node https://zuul.opendev.org/t/openstack/build/d865972fcf374ebb813496cec36be473/logs memory low point is around 700MB15:11
sean-k-mooneygibi: that is for a contoller right dedicated comptue could run more. mind if i create a seperete patch just to play with that aspect and a custom nodeset15:21
sean-k-mooneyi dont want to modify your but i ws thinkign of puting one on top and tweakign the nodeset def and subnodes to see if we coudl get to the 100-200 fake node range15:22
sean-k-mooney@gibi with the increase comptues and iteration and concurrance 16:07
sean-k-mooneythe tests look like tehy took about the same amoutn of tiem16:07
sean-k-mooneywell no16:07
sean-k-mooneyit went form around 207 to 270 seconds16:08
sean-k-mooneythe median time is shockingly consitent across all 4 runs16:11
sean-k-mooneythe max time doubles as we doubled the number of instances and the number of hosts.16:13
sean-k-mooneythat i do find a littel interstign as i was expectign that to cancle out espcially since you doubled the concurrency as well16:13
sean-k-mooneygibi: you could consider upping the max confurrent build per host form the default 10 to see it that reduces that but its seams to be linear16:14
sean-k-mooneyso if you cahnge that form 10 to say 100 i wonder if that woudl affect anything16:15
zigoGot this building Nova Flamingo: https://paste.opendev.org/show/bpVRJbZbVFoY5mieellm/16:26
zigoAny idea what's going on?16:26
zigoOh, should be my patch... :P16:27
sean-k-mooneywe have been removing the disperate imports of eventlet from the difent moduels and centralising them16:28
zigoIt's my fault, no worries.16:28
zigoI had a patch adding greenthread.sleep(0), from past fix in Epoxy, removed it, should be ok this time.16:29
sean-k-mooneygibi: https://gist.github.com/SeanMooney/27bb304653173a393c72d4cf4eca98e2 i got gemini to so some basic analasy fo the raw memory tracker data16:33
sean-k-mooneyi actuly need to check if i gave it the right data16:37
opendevreviewTobias Urdin proposed openstack/nova master: Prevent leaking host info when HostMappingNotFound  https://review.opendev.org/c/openstack/nova/+/95929616:39
opendevreviewTobias Urdin proposed openstack/nova master: Prevent leaking host info when HostMappingNotFound  https://review.opendev.org/c/openstack/nova/+/95929616:39
sean-k-mooneythe memory tracker data seasm to end at Sep 12 14:44:5216:39
sean-k-mooneywere as the second rally run starts at  2025-09-12T14:48:5416:39
sean-k-mooneyso i dont think we actully have good data for that16:39
opendevreviewMerged openstack/nova stable/2025.1: Fix 'nova-manage image_property set' command  https://review.opendev.org/c/openstack/nova/+/95883418:17
sean-k-mooneygibi: i gave up on gmini and got claude to write a script to process the atop data by converting it to json an then procesing that21:14
sean-k-mooneythis is the inial output https://paste.opendev.org/show/brIoumAiGGb9GHx2eXzv/21:15
sean-k-mooneyim still not sure tha tis correct either but ill push it somewhere on monday after i have checked if it sane21:15

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!