| *** jonnyb0 is now known as jonnyb | 01:13 | |
| *** mhen_ is now known as mhen | 01:56 | |
| opendevreview | Lajos Katona proposed openstack/nova master: Use SDK for Neutron networks https://review.opendev.org/c/openstack/nova/+/928022 | 06:59 |
|---|---|---|
| jlejeune | sean-k-mooney: hello, yeah I know that catch 'Exception' is not necessarily a good idea, but in that case if I only put the try/except statement around the 'check_can_live_migrate_source' call, it will catch every exceptions which may happen during that specific rpc call | 07:05 |
| ratailor | elodilles, could you please review when you have time https://review.opendev.org/c/openstack/nova/+/958834 and https://review.opendev.org/q/owner:ratailor@redhat.com+branch:stable/2024.2 | 07:09 |
| opendevreview | Julien LE JEUNE proposed openstack/nova master: nova-condutor puts instance in error state https://review.opendev.org/c/openstack/nova/+/901655 | 08:17 |
| elodilles | ratailor: ACK, added them to my TODO list | 08:37 |
| opendevreview | Merged openstack/placement stable/2025.2: Update .gitreview for stable/2025.2 https://review.opendev.org/c/openstack/placement/+/960507 | 08:41 |
| opendevreview | Merged openstack/placement stable/2025.2: Update TOX_CONSTRAINTS_FILE for stable/2025.2 https://review.opendev.org/c/openstack/placement/+/960508 | 08:41 |
| opendevreview | OpenStack Release Bot proposed openstack/nova stable/2025.2: Update .gitreview for stable/2025.2 https://review.opendev.org/c/openstack/nova/+/960743 | 08:54 |
| opendevreview | OpenStack Release Bot proposed openstack/nova stable/2025.2: Update TOX_CONSTRAINTS_FILE for stable/2025.2 https://review.opendev.org/c/openstack/nova/+/960744 | 08:54 |
| opendevreview | OpenStack Release Bot proposed openstack/nova master: Update master for stable/2025.2 https://review.opendev.org/c/openstack/nova/+/960745 | 08:54 |
| gibi | sean-k-mooney: dansmith: I have the first comparable rally results between eventlet and native threading. See my top level comment in https://review.opendev.org/c/openstack/nova/+/960130/16#message-e6a461d92172aea570e8f18405d47ee00b9c300a It links to both rally results with timing and I manually pulled out the pool usage and the memory usage of the scheduler (I have the intention to script some of it | 09:01 |
| gibi | in CI) | 09:01 |
| ratailor | elodilles, Thanks! | 09:49 |
| opendevreview | Taketani Ryo proposed openstack/nova-specs master: Add a spec for 2026.1 for libvirt launching Arm CCA instances https://review.opendev.org/c/openstack/nova-specs/+/960777 | 09:52 |
| sean-k-mooney | gibi do you happen ot know if they ran on the same provider | 09:54 |
| sean-k-mooney | gibi: i.e. are the host vms comparible | 09:54 |
| gibi | the two rally execution happen within the same CI job on the same CI worker. So it is always comparable :) | 09:58 |
| gibi | the base job runs rally then in the post-run I reconfigure to native threading and run rally again | 09:59 |
| sean-k-mooney | yep just reading the comment on the patch | 09:59 |
| sean-k-mooney | so that is good because it elimiates the biggest variabel in our ci | 10:00 |
| sean-k-mooney | run to run hardware deltas | 10:00 |
| sean-k-mooney | looking atht eh results this is very promising | 10:00 |
| sean-k-mooney | gibi: the only metric i woul have liket to see that i have not seen yet is the total cpu time of the nova process | 10:01 |
| sean-k-mooney | i.e. is there any delter on cpu load with eventlet vs threading | 10:01 |
| gibi | sean-k-mooney: how can we collect that? | 10:02 |
| sean-k-mooney | in theory there shoudl not be since the eventlet shcdluer vs kernel shclduer selecting the next thread shoudl eb more or less a wash | 10:02 |
| sean-k-mooney | we could do it with perf but that might be overkill | 10:02 |
| gibi | I guess somewhere in the /proc it is recorded | 10:03 |
| gibi | as top knows it | 10:03 |
| sean-k-mooney | top woudl also prtint it. yes its aviabel in proc, we can turn on atop in ci | 10:03 |
| sean-k-mooney | i t was added not too long ago but i have not seen job using it | 10:03 |
| sean-k-mooney | im just not sure if it will show cumlitive cpu tiem | 10:04 |
| gibi | I will check I see podman login --username gibi --password | 10:04 |
| gibi | ahh not that | 10:04 |
| gibi | I see https://github.com/openstack/devstack/commit/1aac81ee881534276fd7d6540ed356a85d064a13 | 10:04 |
| sean-k-mooney | yep so i was looking at that patch a few times when looking at odd gate failerus and debated if it woudl help but i have not tried it yet | 10:06 |
| gibi | it should be enabled by defautl but I don't see the logs in the run so I need to dig | 10:06 |
| sean-k-mooney | i dont think its on by defaualt | 10:06 |
| sean-k-mooney | https://github.com/openstack/devstack/commit/1aac81ee881534276fd7d6540ed356a85d064a13#diff-3fedb5d16b14e2fe731944ed0819eabefe91f42d456377e5ce6f0a1874232269R1097 | 10:07 |
| sean-k-mooney | shoudl card against it | 10:07 |
| sean-k-mooney | we startar the base devstack josb with disable all services | 10:07 |
| sean-k-mooney | but even without that i dont see this patch addign it to the default service list | 10:07 |
| sean-k-mooney | gibi: but it shoudl jsut require added `enable_service atop` | 10:08 |
| sean-k-mooney | or in ci i guess thats `atop: true` | 10:08 |
| sean-k-mooney | gibi: oh i see you addign post-run: playbooks/nova-rally-fake-virt-threading/post.yaml and i guess that also runs rally a second time correct | 10:10 |
| gibi | yes, it calls rally task restart that trigger rally to re-run the previous task | 10:11 |
| gibi | https://review.opendev.org/c/openstack/nova/+/960130/16/roles/rerun-rally/tasks/main.yaml#11 | 10:11 |
| gibi | it would be nice to re-use the rally roles but it is not stored in the rally-openstack repo in a re-usable way | 10:12 |
| gibi | also they hardcode a bunch of things that does not help reuse like the report generation does not handle multiple rally runs | 10:12 |
| * gibi needs to go get some foof | 10:13 | |
| gibi | food even :) | 10:13 |
| sean-k-mooney | enjoy. while we should do more testing this is all very promising congrats | 10:14 |
| opendevreview | Merged openstack/nova master: Update master for stable/2025.2 https://review.opendev.org/c/openstack/nova/+/960745 | 10:54 |
| opendevreview | Merged openstack/nova master: reno: Update master for unmaintained/2023.1 https://review.opendev.org/c/openstack/nova/+/935117 | 10:55 |
| opendevreview | Merged openstack/placement master: Update master for stable/2025.2 https://review.opendev.org/c/openstack/placement/+/960509 | 10:55 |
| opendevreview | Merged openstack/nova stable/2025.2: Update .gitreview for stable/2025.2 https://review.opendev.org/c/openstack/nova/+/960743 | 10:55 |
| opendevreview | Merged openstack/nova stable/2025.2: Update TOX_CONSTRAINTS_FILE for stable/2025.2 https://review.opendev.org/c/openstack/nova/+/960744 | 10:55 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Rally job for eventlet-removal https://review.opendev.org/c/openstack/nova/+/960130 | 12:22 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Rally job for eventlet-removal https://review.opendev.org/c/openstack/nova/+/960130 | 12:40 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Rally job for eventlet-removal https://review.opendev.org/c/openstack/nova/+/960130 | 13:30 |
| dansmith | gibi: just trying to sanity check some of the stats there.. | 13:55 |
| dansmith | it seems like for the majority of the run everything is idle all the time, with the exception of a short thing at the front where we have idle_workers < max_workers, and I wonder if that's just because we're starting up | 13:56 |
| opendevreview | Julien LE JEUNE proposed openstack/nova master: nova-condutor puts instance in error state https://review.opendev.org/c/openstack/nova/+/901655 | 13:56 |
| dansmith | I went looking for a comparison of how busy the workers were between the two.. like if the threaded mode was able to keep up with 5 workers...but only just, or if they both were taxed, but keeping up | 13:57 |
| dansmith | just wondering if maybe that stats log line is not giving us accurate results (I assume the wall time is, and that looks good of course) | 13:57 |
| gibi | dansmith: I think given the limited number of 5 computes the scheduler can keep up with the requests as the scheduling them is fairly easy | 13:58 |
| gibi | I'm trying to crank up both the number of computes and the number of parallel VM boot requests to see if we can make the scheduler more busy | 13:59 |
| dansmith | okay yeah, my point is just that while the numbers seem great, feels like maybe the load is too low to really measure a difference | 14:03 |
| sean-k-mooney | gibi: you could create a custom zull nodeset in the job with up to 5 nodes | 14:04 |
| sean-k-mooney | that the node limit in our tenant. i also suppoect at least on the subnodes you coudl push to 10-20 fake agents | 14:05 |
| sean-k-mooney | so you might be able to similate somethign in the 85 compute range | 14:05 |
| gibi | dansmith: I agree with your observation and I will try to crank up the numbers | 14:06 |
| gibi | sean-k-mooney: yeah I just doubled the compute agents to 10 | 14:06 |
| sean-k-mooney | im just looking at the mem low point to see how close you were and the swap usage | 14:07 |
| gibi | sean-k-mooney: also I found that the memory_tracker collects cpu time data as well per process so I can pull RSS and time there | 14:07 |
| sean-k-mooney | to see if i can find anythign to indeicatre how many more we might eb able to run | 14:07 |
| sean-k-mooney | we can also requst vms with 16GB of ram now | 14:07 |
| sean-k-mooney | via a custom node set | 14:08 |
| sean-k-mooney | gibi: oh i looked brifely but must have missed that before | 14:08 |
| gibi | soo many options:) | 14:08 |
| sean-k-mooney | oh is time process time? | 14:09 |
| gibi | as far as I see yes | 14:09 |
| sean-k-mooney | as in wall clock time of the acticve (user) time of the process | 14:09 |
| sean-k-mooney | memory_tracker low_point: 1900960 | 14:09 |
| gibi | it is cputime from ps | 14:09 |
| gibi | https://github.com/openstack/devstack/blob/f6d8dab0e885b8de8c0f44388d538da7d4f9b7ec/tools/memory_tracker.sh#L88C66-L88C73 | 14:10 |
| sean-k-mooney | so the low point was almost 2G fo aviabel ram | 14:10 |
| opendevreview | Julien LE JEUNE proposed openstack/nova master: nova-condutor puts instance in error state https://review.opendev.org/c/openstack/nova/+/901655 | 14:11 |
| sean-k-mooney | teh resident memory of nova-comptue seams to be around 165mb so if we round up to say 200MB | 14:11 |
| sean-k-mooney | we can proably run 20-25 fake agent on the contoler and i would guess 50+ on the 8G subnodes based on ram | 14:12 |
| sean-k-mooney | my question woudl then be what those the cpu load look like but you proably can simulate a 200 node cluster if needed | 14:13 |
| sean-k-mooney | if you were to try that i woudl proably do 4 dedicatec comptue and disable the nova-comptues on the contoler entirly | 14:14 |
| sean-k-mooney | gibi interesting i wonder if we shoudl make the colelcted parmaters configurable via an devstack environment var | 14:15 |
| gibi | soo many options:) | 14:16 |
| sean-k-mooney | ill stop suggesting them | 14:16 |
| sean-k-mooney | your current patch has times 400 and concurrency 40 + 10 fake computes | 14:18 |
| sean-k-mooney | and you turned on atop | 14:18 |
| sean-k-mooney | so lets see what that shows | 14:18 |
| sean-k-mooney | ... post failure | 14:19 |
| sean-k-mooney | rally: error: unrecognized arguments: -n 2 | xargs rally task trends --out /opt/stack/.rally/results/trends.html --tasks | 14:19 |
| opendevreview | Sylvain Bauza proposed openstack/nova master: Support multiple allocations for vGPUs https://review.opendev.org/c/openstack/nova/+/845757 | 14:20 |
| gibi | ahh probably ansible command vs shell. I use pipes | 14:22 |
| sean-k-mooney | ah right proably | 14:27 |
| sean-k-mooney | ya your using command | 14:27 |
| sean-k-mooney | so you need shell for that to work | 14:27 |
| opendevreview | Balazs Gibizer proposed openstack/nova master: Rally job for eventlet-removal https://review.opendev.org/c/openstack/nova/+/960130 | 14:29 |
| opendevreview | Florian proposed openstack/nova master: Add check for PCIe devices attach limit for volume and ports https://review.opendev.org/c/openstack/nova/+/955584 | 15:06 |
| gibi | sean-k-mooney: dansmith: 20 fake fits in one node https://zuul.opendev.org/t/openstack/build/d865972fcf374ebb813496cec36be473/logs memory low point is around 700MB | 15:11 |
| sean-k-mooney | gibi: that is for a contoller right dedicated comptue could run more. mind if i create a seperete patch just to play with that aspect and a custom nodeset | 15:21 |
| sean-k-mooney | i dont want to modify your but i ws thinkign of puting one on top and tweakign the nodeset def and subnodes to see if we coudl get to the 100-200 fake node range | 15:22 |
| sean-k-mooney | @gibi with the increase comptues and iteration and concurrance | 16:07 |
| sean-k-mooney | the tests look like tehy took about the same amoutn of tiem | 16:07 |
| sean-k-mooney | well no | 16:07 |
| sean-k-mooney | it went form around 207 to 270 seconds | 16:08 |
| sean-k-mooney | the median time is shockingly consitent across all 4 runs | 16:11 |
| sean-k-mooney | the max time doubles as we doubled the number of instances and the number of hosts. | 16:13 |
| sean-k-mooney | that i do find a littel interstign as i was expectign that to cancle out espcially since you doubled the concurrency as well | 16:13 |
| sean-k-mooney | gibi: you could consider upping the max confurrent build per host form the default 10 to see it that reduces that but its seams to be linear | 16:14 |
| sean-k-mooney | so if you cahnge that form 10 to say 100 i wonder if that woudl affect anything | 16:15 |
| zigo | Got this building Nova Flamingo: https://paste.opendev.org/show/bpVRJbZbVFoY5mieellm/ | 16:26 |
| zigo | Any idea what's going on? | 16:26 |
| zigo | Oh, should be my patch... :P | 16:27 |
| sean-k-mooney | we have been removing the disperate imports of eventlet from the difent moduels and centralising them | 16:28 |
| zigo | It's my fault, no worries. | 16:28 |
| zigo | I had a patch adding greenthread.sleep(0), from past fix in Epoxy, removed it, should be ok this time. | 16:29 |
| sean-k-mooney | gibi: https://gist.github.com/SeanMooney/27bb304653173a393c72d4cf4eca98e2 i got gemini to so some basic analasy fo the raw memory tracker data | 16:33 |
| sean-k-mooney | i actuly need to check if i gave it the right data | 16:37 |
| opendevreview | Tobias Urdin proposed openstack/nova master: Prevent leaking host info when HostMappingNotFound https://review.opendev.org/c/openstack/nova/+/959296 | 16:39 |
| opendevreview | Tobias Urdin proposed openstack/nova master: Prevent leaking host info when HostMappingNotFound https://review.opendev.org/c/openstack/nova/+/959296 | 16:39 |
| sean-k-mooney | the memory tracker data seasm to end at Sep 12 14:44:52 | 16:39 |
| sean-k-mooney | were as the second rally run starts at 2025-09-12T14:48:54 | 16:39 |
| sean-k-mooney | so i dont think we actully have good data for that | 16:39 |
| opendevreview | Merged openstack/nova stable/2025.1: Fix 'nova-manage image_property set' command https://review.opendev.org/c/openstack/nova/+/958834 | 18:17 |
| sean-k-mooney | gibi: i gave up on gmini and got claude to write a script to process the atop data by converting it to json an then procesing that | 21:14 |
| sean-k-mooney | this is the inial output https://paste.opendev.org/show/brIoumAiGGb9GHx2eXzv/ | 21:15 |
| sean-k-mooney | im still not sure tha tis correct either but ill push it somewhere on monday after i have checked if it sane | 21:15 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!