Wednesday, 2023-08-09

opendevreviewmelanie witt proposed openstack/tempest master: DNM test rbd retype  https://review.opendev.org/c/openstack/tempest/+/89036000:52
opendevreviewMerged openstack/devstack master: Disable waiting forever for connpool workers  https://review.opendev.org/c/openstack/devstack/+/89052603:06
opendevreviewmelanie witt proposed openstack/tempest master: DNM test rbd retype  https://review.opendev.org/c/openstack/tempest/+/89036003:15
opendevreviewmelanie witt proposed openstack/tempest master: DNM test rbd retype  https://review.opendev.org/c/openstack/tempest/+/89036007:05
opendevreviewKe Niu proposed openstack/devstack master: remove unicode prefix from code  https://review.opendev.org/c/openstack/devstack/+/89088707:41
opendevreviewKatarina Strenkova proposed openstack/tempest master: Skip failing tests affected by minimum password age  https://review.opendev.org/c/openstack/tempest/+/89065309:12
opendevreviewKatarina Strenkova proposed openstack/tempest master: Skip failing tests affected by minimum password age  https://review.opendev.org/c/openstack/tempest/+/89065309:14
opendevreviewMerged openstack/devstack stable/zed: Use RDO official CloudSIG mirrors for C9S deployments  https://review.opendev.org/c/openstack/devstack/+/89022110:53
opendevreviewTakashi Kajinami proposed openstack/tempest master: Do not retry immediately after server fault  https://review.opendev.org/c/openstack/tempest/+/89091112:28
*** haleyb_ is now known as haleyb12:51
opendevreviewMerged openstack/devstack stable/yoga: Use RDO official CloudSIG mirrors for C9S deployments  https://review.opendev.org/c/openstack/devstack/+/89022213:35
dansmithsean-k-mooney: so I just saw another oom in a job with concurrency=4 and I noticed that there were 14 qemu processes running at the time13:54
dansmithI wonder if in some cases we're waiting for DELETED before we exit and run another test,13:55
dansmithbut the instance is still running and trying to be destroyed in compute13:55
dansmithsuch that we end up with a few stacked up like that13:55
sean-k-mooneyi tought we did not mark it as deleted in the api until itwas actully deleted14:00
sean-k-mooneycould we be calling delete and expecting it to be syncrounouse in some places14:00
dansmithum, well, I was just looking at that and thought we did once we fire off the rpc14:01
dansmithbut I was looking at local delete14:01
sean-k-mooneyi think in the local delete case we stay in deleteing or somehting like until its actully deleted then it disappear form the server list but i havent looked at that in a while14:02
dansmithhmm, okay yeah we actually set it in compute manager after we've called shutdown14:02
dansmithmaybe tempest doesn't always wait for severs to be deleted when it asks like during cleanup?14:03
* dansmith checks14:03
sean-k-mooneywhat the hell is instance.disable_terminate since when do we have a way to prevent deleing an instnace other then lock14:04
dansmithyeah I saw that :)14:06
dansmithso it looks like we're pretty much always doing wait_for_server_termination14:06
dansmithbut that actually waits for the show to return 40414:06
sean-k-mooneywe pass in  task_state=task_states.DELETING here https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2715 so on the first instance.save we shoudl go to that14:07
dansmiththe 404 will happen once we've marked the instance as deleted14:07
dansmithwhich happens after we set vm_state=DELETED so I guess that's okay14:07
sean-k-mooneyya so that should not happen untill the vm is actully deleted14:08
dansmithunless the oom dump is showing multiple kprocs per actual qemu or something, I'm not sure how we're ending up with 1414:08
sean-k-mooneycoudl this be related to soft_delete14:08
sean-k-mooneyactully even with soft_delete14:08
sean-k-mooneythe vm will be stopped14:08
sean-k-mooneyso it shoudl not effect OOM issues14:09
dansmithyeah14:09
dansmithit's definitely a lot better than it was at n=6 and we had OOMs before, but it'd be nice to get to the bottom of that14:11
sean-k-mooneyi wonder if  it woudl make sense to see if the zswap help14:14
sean-k-mooneyand moving to 8G by default14:14
dansmithyeah, both probably14:15
sean-k-mooneywhat job was OOMing14:15
dansmithlet's see what gmann thinks14:15
sean-k-mooneyceph?14:15
dansmithit was tempest-integrated-compute-rbac-old-defaults but it's basically just tempest-integrated-compute14:15
sean-k-mooneyack ill push up a dnm with just that enabled that depend on teh zram and bumps the swap size with concurance 4? 6?14:16
dansmithdon't mess with the concurrency, just the swap, IMHO14:16
opendevreviewmelanie witt proposed openstack/tempest master: DNM test rbd retype  https://review.opendev.org/c/openstack/tempest/+/89036015:17
gmanndansmith: sean-k-mooney: you mean for zswap change or delete server in tests  ?17:57
gmannI have not looked into the zswap change yet17:57
dansmithgmann: more swap or all jobs like the ceph job uses 8G, and also maybe zswap to make more use of the 8G we give it17:58
dansmithgmann: note I've only seen the one OOM, so this is not a huge problem like it was, I'm just saying.. we had some OOMs before too17:59
JayFzswap has a side effect of reducing I/O when swapping, so OOM or not it'd be interesting to see18:00
sean-k-mooneyyep18:01
sean-k-mooneyi was pretty conservitiy by staying wiht lz4 in the patch but we could do that even more by changing the compressor to zstd18:01
sean-k-mooneyhttps://review.opendev.org/c/openstack/devstack/+/890693 is the patch18:02
dansmithlz4 is good I think because it's fast18:02
sean-k-mooneyyep fast but still provides a benifit18:02
dansmithyeah18:02
sean-k-mooneywe have some slow host but we are not that disk io starved18:03
gmannok, I think doing with 8gb make sense to try18:03
dansmithwe very much are in some workers, but it seems very spotty18:04
sean-k-mooneyi know we had issue in the past with some workesr having less then 100 iops18:04
sean-k-mooneyto the disks18:04
sean-k-mooneyi have tried to mitigate that in the patch too18:04
sean-k-mooneywith this sysctl turning https://review.opendev.org/c/openstack/devstack/+/890693/7/lib/host#5418:05
sean-k-mooneywe could do more proably but i didnt want to do too much in one patch18:06
sean-k-mooneyi havent actully tuened the filesystem/io schduiler in that patch but i did tweak the caching/swaping behavior so defer bursty writes and via vm.dirty18:10
gmannlpiwowar: replied in this, check if container created by cinder backup test can be deleted if not then I commented one suggestion to solve that test https://review.opendev.org/c/openstack/tempest/+/89079819:31
gmannlpiwowar: please let me know if that make sense to you19:32
opendevreviewGhanshyam proposed openstack/tempest master: Improve rebuild tests in test_server_actions  https://review.opendev.org/c/openstack/tempest/+/89082123:44

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!