Friday, 2022-05-13

*** dviroel\|afk is now known as dviroel\|out		01:26
opendevreview	Merged openstack/cinder master: mypy: set no_implicit_optional https://review.opendev.org/c/openstack/cinder/+/782260	03:12
opendevreview	Merged openstack/cinder master: Don't destroy existing backup by mistake on import https://review.opendev.org/c/openstack/cinder/+/839451	10:07
*** dviroel\|out is now known as dviroel		11:00
opendevreview	Stephen Finucane proposed openstack/python-cinderclient master: Deprecate cinder CLI https://review.opendev.org/c/openstack/python-cinderclient/+/841725	12:13
opendevreview	Stephen Finucane proposed openstack/python-cinderclient master: docs: Update docs to reflect deprecation status https://review.opendev.org/c/openstack/python-cinderclient/+/841726	12:13
geguileo	rosmaita: whoami-rajat__ I see many commands missing in the table https://docs.openstack.org/python-openstackclient/latest/cli/decoder.html	12:56
geguileo	There is no support for Active-Active, no support for dynamically getting or setting log levels	12:57
geguileo	no way to get manageable lists	12:57
geguileo	2 quota commands missing	12:57
rosmaita	geguileo: whoami-rajat__: guess we need to update the parity matrix doc	12:57
geguileo	rosmaita: I mean, they are there, there is just no OSC command for them	12:58
geguileo	no revert-to-snapshot support	12:58
geguileo	we would need to go and make sure that we have 100% parity	12:58
geguileo	and document that new features need to be added to OSC	12:58
rosmaita	and i forgot the parity doc was stored on ethercalc, which has recently been commissioned	12:58
rosmaita	:(	12:59
rosmaita	well, looks like we need a new parity doc, anyway	12:59
geguileo	that page looks pretty good to me	13:00
rosmaita	yes, but we need a gap document to keep track of what needs to be completed	13:00
rosmaita	or however the PTL wants to do it, i'm flexible	13:01
geguileo	we can probably copy/paste that table into a google sheet without any effort	13:01
opendevreview	Stephen Finucane proposed openstack/cinder master: api-ref: Add docs for clusters https://review.opendev.org/c/openstack/cinder/+/795785	13:02
opendevreview	Stephen Finucane proposed openstack/cinder master: Add Python 3.10 functional jobs https://review.opendev.org/c/openstack/cinder/+/841753	13:02
opendevreview	Stephen Finucane proposed openstack/cinder master: WIP: tests: Add functional tests for cluster API https://review.opendev.org/c/openstack/cinder/+/841754	13:02
stephenfin	rosmaita: I split the functional tests out of that patch. They're miles from complete and I'd really like to see _something_ merged here, particularly given we just added support for this stuff to OSC ^	13:03
stephenfin	geguileo: Are these new commands. I updated that doc only last year https://review.opendev.org/c/openstack/python-openstackclient/+/792946	13:05
stephenfin	*?	13:05
rosmaita	stephenfin: yeah, i got sidetracked while reviewing the doc patch and didn't get back to it ... i hope i have notes somewhere about what i was worried about	13:05
rosmaita	stephenfin: the ones geguileo is talking about are not very new; only new thing added in yoga was volume reimage	13:06
stephenfin	Maybe I missed something but iirc I just ran 'OS_VOLUME_API_VERSION=3.X cinder --help' (whatever X was at the time) and reformatted the output so I could cross-reference	13:06
stephenfin	Hmm, I must have missed something so	13:07
geguileo	stephenfin: the commands are in the doc	13:07
geguileo	stephenfin: what's missing is the OSC command equivalent	13:07
geguileo	rosmaita: had said that he thought there was parity between the clients, and I was disagreeing	13:08
stephenfin	Ah, gotcha. I see...	13:08
stephenfin	group-create-from-src,,Creates a group from a group snapshot or a source group.	13:08
stephenfin	manageable-list,,Lists all manageable volumes.	13:08
stephenfin	quota-delete,,Delete the quotas for a tenant.	13:08
stephenfin	quota-usage,,Lists quota usage for a tenant.	13:08
stephenfin	and about 6 others	13:08
stephenfin	geguileo: It's not parity, but it's _really_ close, especially now that the cluster stuff is merging. Minimal effort is needed to close the gap	13:10
geguileo	stephenfin: yes, definitely we are a lot closer than I thought	13:10
geguileo	I would say it justifies merging the cinder client deprecation patch	13:11
geguileo	although I still don't like that OSC is slower when loading, thoug that's something we can work on	13:11
stephenfin	The stuff that's remaining is mostly stuff I either don't understand (what's a "manageable" volume) or stuff I think might be covered by other commands though I'm not sure (again because I don't understand)	13:11
geguileo	stephenfin: manageable volumes and snapshot is the concept of how we can bring volumes that already exist into Cinder	13:12
geguileo	or make cinder "forget" about them	13:12
stephenfin	Yeah, that one's complicated. It's partially because we use entrypoints (though that's got faster thanks to importlib_metadata) and partially because of cmd2 (via cliff) which loads a load of other crap like the GTK stuff for clipboard support	13:12
geguileo	so if you unmanage a volume then cinder doesn't delete the actual volume in the backend, but marks the row in the DB as deleted	13:13
stephenfin	oh, TIL	13:13
geguileo	and if you manage it, you tell cinder that there's a volume in the array that you want to start managing (be the owner of the volume)	13:14
geguileo	and the listing is so that it's easier to know which volumes are in the array (usually the specific pool) that are not yet managed by Cinder	13:14
geguileo	same thing for snapshots	13:14
rosmaita	geguileo: take a look at https://review.opendev.org/c/openstack/cinder/+/815660/9#message-001fb6ded8a7fe210b5a061b45197180d72fcd80 , and hit 'recheck' if you feel lucky	13:33
geguileo	rosmaita: I'm feeling lucky	13:34
rosmaita	go for it, man!!!	13:34
rosmaita	geguileo: lmk when you want to lay some CI job improvement ideas on me	13:38
geguileo	rosmaita: we can start now	13:41
geguileo	I believe we have at least 3 different problems:	13:42
geguileo	1- OOM kill of backup service	13:42
geguileo	2- timeout of backups	13:42
geguileo	3- no host found on scheduler	13:42
geguileo	for #1 I believe there is something going on either with Python or we somehow are holding buffers in variables for too long (though I though I had fixed that)	13:43
geguileo	I wanted to deploy cinder-backup to do a memory profile, but I've been having troubles with the deployment	13:43
geguileo	so we should probably wait a bit to do the analysis, because the backup driver is using swift, and the size is of the chunks is 50MB	13:45
geguileo	and there should be no reason why cinder-backup ends up using 4GB before it gets killed	13:45
geguileo	so changing backup_swift_object_size config option probably won't help :-(	13:46
rosmaita	right, i think in the tests it's only doing one backup at a time	13:46
rosmaita	or at most 5	13:46
geguileo	I've seen a couple happening concurrently	13:46
geguileo	because tempest is running multiple workers in parallel	13:46
rosmaita	nothing that accounts for 4G though at 50MB chunks	13:46
geguileo	exactly	13:46
geguileo	so I have to properly investigate it	13:47
geguileo	memory profiling, object relationship analysis, garbage collection status, etc	13:47
rosmaita	or somebody does ... maybe we can convince the dev who posted the patch to change object size to help	13:47
geguileo	oh, is there a patch to change the object size?	13:47
* geguileo didn't know		13:48
rosmaita	i may be confusing cinder with glance, i think there's something posted	13:48
rosmaita	i will look later	13:48
geguileo	in any case, the only default that is big is backup_file_size, but afaik that was not used in the job I saw get OOM killed	13:48
geguileo	for #2 I believe that one the zuul jobs have changed a default from 300 to 196 or something like that	13:49
geguileo	and it's crazy that timeouts at 196 and then at 200 seconds the backup completes...	13:50
geguileo	rofl	13:50
rosmaita	yeah, that's a killer	13:50
geguileo	the default for tempest build_timeout configuration option in the volume group is 300	13:51
geguileo	so I don't know where that is being changed	13:51
rosmaita	ok, i can look into that	13:51
geguileo	then let me give you some additional info...	13:51
rosmaita	you don't happen to have a link to that job?	13:51
geguileo	on this patch https://review.opendev.org/c/openstack/os-brick/+/836059/3	13:52
geguileo	this job https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/logs	13:52
geguileo	this test cinder_tempest_plugin.api.volume.test_volume_backup.VolumesBackupsTest.test_volume_snapshot_backup	13:52
geguileo	backup id 7bae1276-1488-442b-a521-9785c58c75fd	13:52
geguileo	error: backup 7bae1276-1488-442b-a521-9785c58c75fd failed to reach available status (current creating) within the required time (196 s).	13:52
geguileo	start of the request in the backup service https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/log/controller/logs/screen-c-bak.txt#6661	13:53
geguileo	end: https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/log/controller/logs/screen-c-bak.txt#8015	13:53
rosmaita	thanks!	13:53
geguileo	I looked into the test, the create_backup method, the client, and it uses client.build_timeout as the timeout	13:54
geguileo	I looked at the default and it's 300, so somewhere this has been changed for our tempest jobs	13:54
rosmaita	iirc, there may be another fixtures timeout that is kind of hidden	13:55
geguileo	maybe...	13:55
rosmaita	anyway, this is good info to trace through the job definitions and look for something	13:56
geguileo	but for backups I think I would either increase the timeout or make backup tests execute serially	13:56
geguileo	because if they are executed concurrently they may be too slow	13:56
rosmaita	i think i'd be in favor of increasing the timeout, would rather see us test in parallel	13:57
geguileo	me too	13:58
rosmaita	ok, we'll take that approach first	13:58
geguileo	rosmaita: https://zuul.opendev.org/t/openstack/build/f82bc5be933b4fe9bf2cbca40f141939/log/controller/logs/tempest_conf.txt#29	13:59
geguileo	196 seconds timeout, I just don't know where that is being set	13:59
rosmaita	cool, i should be able to track that down	14:00
rosmaita	(it's still early in my time zone, i am optimistic this morning)	14:01
geguileo	that's a good way to do Friday	14:01
geguileo	finally #3 scheduling host not found issues	14:03
rosmaita	you made some suggestions about this but i have not followed up	14:04
geguileo	yes, I would try increasing LVM's max_over_subscription_ratio to 40 or something like that	14:06
geguileo	in the driver itself	14:07
geguileo	and then in the defaults change backend_stats_polling_interval and periodic_interval both to something like 7 seconds ro something like that	14:07
rosmaita	ok	14:07
rosmaita	which would be a good job to test this out?	14:07
geguileo	I saw #1 and #2 happen on the same job	14:08
geguileo	os-brick-src-tempest-lvm-lio-barbican	14:09
rosmaita	ok	14:10
rosmaita	i saw #3 in one of the ceph jobs, i thought	14:10
raghavendrat	hi, it would be great if someone can have a look at https://review.opendev.org/c/openstack/cinder/+/824911	14:19
raghavendrat	It has one +2. Thanks.	14:19
opendevreview	Eric Harney proposed openstack/cinder master: Use modern type annotation format for collections https://review.opendev.org/c/openstack/cinder/+/839987	14:37
*** dviroel is now known as dviroel\|lunch		15:12
*** dviroel\|lunch is now known as dviroel		16:00
opendevreview	Brian Rosmaita proposed openstack/cinder master: Increase swap size to 4GB https://review.opendev.org/c/openstack/cinder/+/841782	16:22
opendevreview	Rico Lin proposed openstack/cinder master: Add image_conversion_disable config https://review.opendev.org/c/openstack/cinder/+/839793	17:00
ricolin	rosmaita: updated accordingly, thanks for the nice wording :)	17:01
opendevreview	Merged openstack/cinder master: Replace distutils with packaging in 3rd party drivers https://review.opendev.org/c/openstack/cinder/+/832130	19:12
*** dviroel is now known as dviroel\|out		21:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!