opendevreview | melanie witt proposed openstack/tempest master: DNM test rbd retype https://review.opendev.org/c/openstack/tempest/+/890360 | 00:52 |
---|---|---|
opendevreview | Merged openstack/devstack master: Disable waiting forever for connpool workers https://review.opendev.org/c/openstack/devstack/+/890526 | 03:06 |
opendevreview | melanie witt proposed openstack/tempest master: DNM test rbd retype https://review.opendev.org/c/openstack/tempest/+/890360 | 03:15 |
opendevreview | melanie witt proposed openstack/tempest master: DNM test rbd retype https://review.opendev.org/c/openstack/tempest/+/890360 | 07:05 |
opendevreview | Ke Niu proposed openstack/devstack master: remove unicode prefix from code https://review.opendev.org/c/openstack/devstack/+/890887 | 07:41 |
opendevreview | Katarina Strenkova proposed openstack/tempest master: Skip failing tests affected by minimum password age https://review.opendev.org/c/openstack/tempest/+/890653 | 09:12 |
opendevreview | Katarina Strenkova proposed openstack/tempest master: Skip failing tests affected by minimum password age https://review.opendev.org/c/openstack/tempest/+/890653 | 09:14 |
opendevreview | Merged openstack/devstack stable/zed: Use RDO official CloudSIG mirrors for C9S deployments https://review.opendev.org/c/openstack/devstack/+/890221 | 10:53 |
opendevreview | Takashi Kajinami proposed openstack/tempest master: Do not retry immediately after server fault https://review.opendev.org/c/openstack/tempest/+/890911 | 12:28 |
*** haleyb_ is now known as haleyb | 12:51 | |
opendevreview | Merged openstack/devstack stable/yoga: Use RDO official CloudSIG mirrors for C9S deployments https://review.opendev.org/c/openstack/devstack/+/890222 | 13:35 |
dansmith | sean-k-mooney: so I just saw another oom in a job with concurrency=4 and I noticed that there were 14 qemu processes running at the time | 13:54 |
dansmith | I wonder if in some cases we're waiting for DELETED before we exit and run another test, | 13:55 |
dansmith | but the instance is still running and trying to be destroyed in compute | 13:55 |
dansmith | such that we end up with a few stacked up like that | 13:55 |
sean-k-mooney | i tought we did not mark it as deleted in the api until itwas actully deleted | 14:00 |
sean-k-mooney | could we be calling delete and expecting it to be syncrounouse in some places | 14:00 |
dansmith | um, well, I was just looking at that and thought we did once we fire off the rpc | 14:01 |
dansmith | but I was looking at local delete | 14:01 |
sean-k-mooney | i think in the local delete case we stay in deleteing or somehting like until its actully deleted then it disappear form the server list but i havent looked at that in a while | 14:02 |
dansmith | hmm, okay yeah we actually set it in compute manager after we've called shutdown | 14:02 |
dansmith | maybe tempest doesn't always wait for severs to be deleted when it asks like during cleanup? | 14:03 |
* dansmith checks | 14:03 | |
sean-k-mooney | what the hell is instance.disable_terminate since when do we have a way to prevent deleing an instnace other then lock | 14:04 |
dansmith | yeah I saw that :) | 14:06 |
dansmith | so it looks like we're pretty much always doing wait_for_server_termination | 14:06 |
dansmith | but that actually waits for the show to return 404 | 14:06 |
sean-k-mooney | we pass in task_state=task_states.DELETING here https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2715 so on the first instance.save we shoudl go to that | 14:07 |
dansmith | the 404 will happen once we've marked the instance as deleted | 14:07 |
dansmith | which happens after we set vm_state=DELETED so I guess that's okay | 14:07 |
sean-k-mooney | ya so that should not happen untill the vm is actully deleted | 14:08 |
dansmith | unless the oom dump is showing multiple kprocs per actual qemu or something, I'm not sure how we're ending up with 14 | 14:08 |
sean-k-mooney | coudl this be related to soft_delete | 14:08 |
sean-k-mooney | actully even with soft_delete | 14:08 |
sean-k-mooney | the vm will be stopped | 14:08 |
sean-k-mooney | so it shoudl not effect OOM issues | 14:09 |
dansmith | yeah | 14:09 |
dansmith | it's definitely a lot better than it was at n=6 and we had OOMs before, but it'd be nice to get to the bottom of that | 14:11 |
sean-k-mooney | i wonder if it woudl make sense to see if the zswap help | 14:14 |
sean-k-mooney | and moving to 8G by default | 14:14 |
dansmith | yeah, both probably | 14:15 |
sean-k-mooney | what job was OOMing | 14:15 |
dansmith | let's see what gmann thinks | 14:15 |
sean-k-mooney | ceph? | 14:15 |
dansmith | it was tempest-integrated-compute-rbac-old-defaults but it's basically just tempest-integrated-compute | 14:15 |
sean-k-mooney | ack ill push up a dnm with just that enabled that depend on teh zram and bumps the swap size with concurance 4? 6? | 14:16 |
dansmith | don't mess with the concurrency, just the swap, IMHO | 14:16 |
opendevreview | melanie witt proposed openstack/tempest master: DNM test rbd retype https://review.opendev.org/c/openstack/tempest/+/890360 | 15:17 |
gmann | dansmith: sean-k-mooney: you mean for zswap change or delete server in tests ? | 17:57 |
gmann | I have not looked into the zswap change yet | 17:57 |
dansmith | gmann: more swap or all jobs like the ceph job uses 8G, and also maybe zswap to make more use of the 8G we give it | 17:58 |
dansmith | gmann: note I've only seen the one OOM, so this is not a huge problem like it was, I'm just saying.. we had some OOMs before too | 17:59 |
JayF | zswap has a side effect of reducing I/O when swapping, so OOM or not it'd be interesting to see | 18:00 |
sean-k-mooney | yep | 18:01 |
sean-k-mooney | i was pretty conservitiy by staying wiht lz4 in the patch but we could do that even more by changing the compressor to zstd | 18:01 |
sean-k-mooney | https://review.opendev.org/c/openstack/devstack/+/890693 is the patch | 18:02 |
dansmith | lz4 is good I think because it's fast | 18:02 |
sean-k-mooney | yep fast but still provides a benifit | 18:02 |
dansmith | yeah | 18:02 |
sean-k-mooney | we have some slow host but we are not that disk io starved | 18:03 |
gmann | ok, I think doing with 8gb make sense to try | 18:03 |
dansmith | we very much are in some workers, but it seems very spotty | 18:04 |
sean-k-mooney | i know we had issue in the past with some workesr having less then 100 iops | 18:04 |
sean-k-mooney | to the disks | 18:04 |
sean-k-mooney | i have tried to mitigate that in the patch too | 18:04 |
sean-k-mooney | with this sysctl turning https://review.opendev.org/c/openstack/devstack/+/890693/7/lib/host#54 | 18:05 |
sean-k-mooney | we could do more proably but i didnt want to do too much in one patch | 18:06 |
sean-k-mooney | i havent actully tuened the filesystem/io schduiler in that patch but i did tweak the caching/swaping behavior so defer bursty writes and via vm.dirty | 18:10 |
gmann | lpiwowar: replied in this, check if container created by cinder backup test can be deleted if not then I commented one suggestion to solve that test https://review.opendev.org/c/openstack/tempest/+/890798 | 19:31 |
gmann | lpiwowar: please let me know if that make sense to you | 19:32 |
opendevreview | Ghanshyam proposed openstack/tempest master: Improve rebuild tests in test_server_actions https://review.opendev.org/c/openstack/tempest/+/890821 | 23:44 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!