Wednesday, 2023-08-09

opendevreview	melanie witt proposed openstack/tempest master: DNM test rbd retype https://review.opendev.org/c/openstack/tempest/+/890360	00:52
opendevreview	Merged openstack/devstack master: Disable waiting forever for connpool workers https://review.opendev.org/c/openstack/devstack/+/890526	03:06
opendevreview	melanie witt proposed openstack/tempest master: DNM test rbd retype https://review.opendev.org/c/openstack/tempest/+/890360	03:15
opendevreview	melanie witt proposed openstack/tempest master: DNM test rbd retype https://review.opendev.org/c/openstack/tempest/+/890360	07:05
opendevreview	Ke Niu proposed openstack/devstack master: remove unicode prefix from code https://review.opendev.org/c/openstack/devstack/+/890887	07:41
opendevreview	Katarina Strenkova proposed openstack/tempest master: Skip failing tests affected by minimum password age https://review.opendev.org/c/openstack/tempest/+/890653	09:12
opendevreview	Katarina Strenkova proposed openstack/tempest master: Skip failing tests affected by minimum password age https://review.opendev.org/c/openstack/tempest/+/890653	09:14
opendevreview	Merged openstack/devstack stable/zed: Use RDO official CloudSIG mirrors for C9S deployments https://review.opendev.org/c/openstack/devstack/+/890221	10:53
opendevreview	Takashi Kajinami proposed openstack/tempest master: Do not retry immediately after server fault https://review.opendev.org/c/openstack/tempest/+/890911	12:28
*** haleyb_ is now known as haleyb		12:51
opendevreview	Merged openstack/devstack stable/yoga: Use RDO official CloudSIG mirrors for C9S deployments https://review.opendev.org/c/openstack/devstack/+/890222	13:35
dansmith	sean-k-mooney: so I just saw another oom in a job with concurrency=4 and I noticed that there were 14 qemu processes running at the time	13:54
dansmith	I wonder if in some cases we're waiting for DELETED before we exit and run another test,	13:55
dansmith	but the instance is still running and trying to be destroyed in compute	13:55
dansmith	such that we end up with a few stacked up like that	13:55
sean-k-mooney	i tought we did not mark it as deleted in the api until itwas actully deleted	14:00
sean-k-mooney	could we be calling delete and expecting it to be syncrounouse in some places	14:00
dansmith	um, well, I was just looking at that and thought we did once we fire off the rpc	14:01
dansmith	but I was looking at local delete	14:01
sean-k-mooney	i think in the local delete case we stay in deleteing or somehting like until its actully deleted then it disappear form the server list but i havent looked at that in a while	14:02
dansmith	hmm, okay yeah we actually set it in compute manager after we've called shutdown	14:02
dansmith	maybe tempest doesn't always wait for severs to be deleted when it asks like during cleanup?	14:03
* dansmith checks		14:03
sean-k-mooney	what the hell is instance.disable_terminate since when do we have a way to prevent deleing an instnace other then lock	14:04
dansmith	yeah I saw that :)	14:06
dansmith	so it looks like we're pretty much always doing wait_for_server_termination	14:06
dansmith	but that actually waits for the show to return 404	14:06
sean-k-mooney	we pass in task_state=task_states.DELETING here https://github.com/openstack/nova/blob/master/nova/compute/api.py#L2715 so on the first instance.save we shoudl go to that	14:07
dansmith	the 404 will happen once we've marked the instance as deleted	14:07
dansmith	which happens after we set vm_state=DELETED so I guess that's okay	14:07
sean-k-mooney	ya so that should not happen untill the vm is actully deleted	14:08
dansmith	unless the oom dump is showing multiple kprocs per actual qemu or something, I'm not sure how we're ending up with 14	14:08
sean-k-mooney	coudl this be related to soft_delete	14:08
sean-k-mooney	actully even with soft_delete	14:08
sean-k-mooney	the vm will be stopped	14:08
sean-k-mooney	so it shoudl not effect OOM issues	14:09
dansmith	yeah	14:09
dansmith	it's definitely a lot better than it was at n=6 and we had OOMs before, but it'd be nice to get to the bottom of that	14:11
sean-k-mooney	i wonder if it woudl make sense to see if the zswap help	14:14
sean-k-mooney	and moving to 8G by default	14:14
dansmith	yeah, both probably	14:15
sean-k-mooney	what job was OOMing	14:15
dansmith	let's see what gmann thinks	14:15
sean-k-mooney	ceph?	14:15
dansmith	it was tempest-integrated-compute-rbac-old-defaults but it's basically just tempest-integrated-compute	14:15
sean-k-mooney	ack ill push up a dnm with just that enabled that depend on teh zram and bumps the swap size with concurance 4? 6?	14:16
dansmith	don't mess with the concurrency, just the swap, IMHO	14:16
opendevreview	melanie witt proposed openstack/tempest master: DNM test rbd retype https://review.opendev.org/c/openstack/tempest/+/890360	15:17
gmann	dansmith: sean-k-mooney: you mean for zswap change or delete server in tests ?	17:57
gmann	I have not looked into the zswap change yet	17:57
dansmith	gmann: more swap or all jobs like the ceph job uses 8G, and also maybe zswap to make more use of the 8G we give it	17:58
dansmith	gmann: note I've only seen the one OOM, so this is not a huge problem like it was, I'm just saying.. we had some OOMs before too	17:59
JayF	zswap has a side effect of reducing I/O when swapping, so OOM or not it'd be interesting to see	18:00
sean-k-mooney	yep	18:01
sean-k-mooney	i was pretty conservitiy by staying wiht lz4 in the patch but we could do that even more by changing the compressor to zstd	18:01
sean-k-mooney	https://review.opendev.org/c/openstack/devstack/+/890693 is the patch	18:02
dansmith	lz4 is good I think because it's fast	18:02
sean-k-mooney	yep fast but still provides a benifit	18:02
dansmith	yeah	18:02
sean-k-mooney	we have some slow host but we are not that disk io starved	18:03
gmann	ok, I think doing with 8gb make sense to try	18:03
dansmith	we very much are in some workers, but it seems very spotty	18:04
sean-k-mooney	i know we had issue in the past with some workesr having less then 100 iops	18:04
sean-k-mooney	to the disks	18:04
sean-k-mooney	i have tried to mitigate that in the patch too	18:04
sean-k-mooney	with this sysctl turning https://review.opendev.org/c/openstack/devstack/+/890693/7/lib/host#54	18:05
sean-k-mooney	we could do more proably but i didnt want to do too much in one patch	18:06
sean-k-mooney	i havent actully tuened the filesystem/io schduiler in that patch but i did tweak the caching/swaping behavior so defer bursty writes and via vm.dirty	18:10
gmann	lpiwowar: replied in this, check if container created by cinder backup test can be deleted if not then I commented one suggestion to solve that test https://review.opendev.org/c/openstack/tempest/+/890798	19:31
gmann	lpiwowar: please let me know if that make sense to you	19:32
opendevreview	Ghanshyam proposed openstack/tempest master: Improve rebuild tests in test_server_actions https://review.opendev.org/c/openstack/tempest/+/890821	23:44

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!