Thursday, 2025-05-22

*** mhen_ is now known as mhen		01:32
croelandt	o/	14:00
croelandt	#startmeeting glance	14:00
opendevmeet	Meeting started Thu May 22 14:00:39 2025 UTC and is due to finish in 60 minutes. The chair is croelandt. Information about MeetBot at http://wiki.debian.org/MeetBot.	14:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	14:00
opendevmeet	The meeting name has been set to 'glance'	14:00
croelandt	#topic roll call	14:00
croelandt	o/	14:00
abhishekk_	o/	14:00
mhen	o/	14:01
croelandt	#link https://etherpad.openstack.org/p/glance-team-meeting-agenda	14:01
croelandt	#topic Release/periodic job updates	14:02
croelandt	glance-multistore-cinder-import-fip is still failing with the same nova-related packaging error as last week	14:02
croelandt	do we know whether anybody is looking at this?	14:02
abhishekk_	Not me	14:02
croelandt	is there a channel where we can get help with stuff like this?	14:04
croelandt	I'm wondering if maybe something is wrong with devstack-single-node-centos-9-stream	14:04
abhishekk_	May be infra?	14:05
abhishekk_	Or rosmaita	14:05
croelandt	ok I'll try #openstack-infra	14:05
dansmith	I thought all the centos jobs were broken	14:05
croelandt	oh	14:05
dansmith	(it's their normal state I think, I'm surprised when they work)	14:05
croelandt	has there been an email about this?	14:05
croelandt	haha	14:05
croelandt	ok I see what you mean	14:05
dansmith	I dunno I heard someone else talking about it	14:05
croelandt	ok I'll dig around on #openstack-infra and try to figure it out	14:06
croelandt	moving on	14:06
croelandt	#topic cheroot to replace use of eventlet.wsgi	14:06
croelandt	abhishekk_: ^	14:06
abhishekk_	Yeah, do we want it?	14:07
abhishekk_	I think someone suggested us we can use cheroot if we want	14:07
croelandt	so how many ways do we have to deploy Glance right now?	14:07
croelandt	eventlet (to be removed soon) and uwsgi?	14:07
dansmith	um what	14:07
abhishekk_	I think uwsgi and mod_wsgi?	14:08
dansmith	what is "cheroot" ?	14:08
abhishekk_	Cheroot is a pure-Python HTTP server, used as the underlying server component for web frameworks like CherryPy.	14:08
dansmith	ah, so not a direct replacement but yet another deployment mechanism I see	14:09
abhishekk_	It’s WSGI-compliant	14:09
abhishekk_	Yes	14:09
croelandt	so why would we want that if we got uwsgi and mod_wsgi?	14:09
dansmith	right, and also, if we're wsgi compliant we don't really need to worry (much) about which wsgi container people use	14:10
abhishekk_	If we still want to use WSGI based server i think	14:10
croelandt	honestly I'm lost here	14:10
dansmith	yeah	14:10
croelandt	Aren't we using WSGI already?	14:10
dansmith	yes	14:10
croelandt	so why add yet another deployment mechanism?	14:11
abhishekk_	i don’t recall but someone suggested that if we want then we can think of cheroot	14:11
dansmith	well, if chreroot is just a wsgi container, then were not and it should work in this one, and all the many others that are out there	14:11
dansmith	but I don't know why we would focus on cheroot any more than gunicorn or mod_wsgi, etc etc	14:12
abhishekk_	May be we can discuss this next PTG	14:12
dansmith	okay but.. there shouldn't be anything we need to do	14:12
dansmith	if someone wants to use cheroot with glance, they probably can	14:12
croelandt	I'd expect that if WSGI is a standard, we can plug anything	14:12
abhishekk_	So right now can we remove wsgi which is using eventlet?	14:12
croelandt	I would keep it for a few cycles in case things go south	14:13
croelandt	so users can easily switch back to eventlet if needed	14:13
dansmith	abhishekk_: in pure wsgi mode we're not really using eventlet	14:13
abhishekk_	Right, but code is still there and as we are migrating our functional tests there are some files which are strictly using that	14:14
abhishekk_	So should we not migrate those and keep it there in repo?	14:14
dansmith	right, is your suggestion to use cheroot for the functional test harness?	14:14
dansmith	i.e. not in production, but for the test spinup?	14:14
abhishekk_	Kind of	14:15
croelandt	how does that work since we're migrating away from using a "real" server?	14:15
abhishekk_	Ok may be during next meeting i will come up with one example	14:16
dansmith	maybe abhishekk_ is suggesting using cheroot to maintain some amount of standalone-like runtime?	14:16
dansmith	otherwise I'm definitely confused about the overlap between eventlet, wsgi, and cheroot	14:17
abhishekk_	Yeah	14:17
croelandt	Now I'm confused whether this is related to deployment or funcitonal testing	14:18
abhishekk_	Ok, let’s revisit this next werk	14:18
croelandt	also we're in the middle of migrating all the funcitonal tests away from using a real server :D	14:18
dansmith	I think I'd prefer not to have (and maintain) any sort of standalone thing in parallel to the way we expect glance to be run in production, unless it's very specifically more for just in the test environment or something	14:18
croelandt	yeah maybe if you could put up a document on how we deploy, and what you want to change - something that we could review before the next meeting	14:18
croelandt	dansmith: +1	14:19
dansmith	but either way, many of the other api projects do functional testing without standing up a full server, and thus have a lot less complexity for doing this in general	14:19
abhishekk_	Yes we are and there are still some tests where we need actual server unless we mock the usage of it	14:19
croelandt	dansmith: I think we should also observe what stephenfin does for Devstack and make sure we make it easy for him	14:19
croelandt	ok maybe let's identify those tests	14:19
croelandt	and see why we cannot do like all the other projects	14:19
dansmith	yeah	14:19
abhishekk_	Ack	14:19
croelandt	ok moving on	14:20
croelandt	#topic RBD Delete issue while hash calculation is in progress	14:20
croelandt	abhishekk_: still you :)	14:20
abhishekk_	test_wsgi for example	14:20
abhishekk_	What should we do about this case	14:21
abhishekk_	Unless we solve this issue, new location api is of no use to consumers of glance	14:21
croelandt	ok so this is the issue with turning on do_secure_hash by default, right?	14:22
abhishekk_	I am still inclined towards adding new method in glance_store rbd driver to move image to trash and then from glance delete call we can check if any task is progress for that image then move it to trash	14:22
dansmith	it is on by default, as it should. be	14:23
abhishekk_	Yes	14:23
croelandt	Is this only an issue with RBD?	14:23
abhishekk_	Yes	14:23
dansmith	abhishekk: is this. not a problem also with cinder?	14:23
abhishekk_	I don’t think we have encountered this with cinder as glance backend	14:24
dansmith	but, shouldn't it be? If the hashing is running you can't delete and unmount right?	14:24
abhishekk_	InUseByStore is only raised by rbd	14:24
croelandt	we should probably check with Rajat/Brian	14:25
dansmith	I understand that the exact problem and trace is rbd specific, I'm talking about the general problem of delete while hash is running	14:25
abhishekk_	I am not sure about it, but as per my understanding only rbd restricts us from deleting the image (but actually it deletes it)	14:26
croelandt	so if we send it to trash	14:26
croelandt	will RBD allow us to do that while it's computing the has?	14:26
croelandt	hash*	14:26
dansmith	but the cinder driver will dry to unmount and delete the volume during an image delete right?	14:26
abhishekk_	I will check with cinder as glance backend	14:26
dansmith	I'm saying, let's spend a minute to find out if there are other backends that might also be affected, so we can make sure we fix it right	14:26
abhishekk_	Yes and it will delete it successfully imo	14:27
abhishekk_	Also we have a code at location import in hash calculation task to catch not found and log a warning and continue rather than failing	14:27
croelandt	so 1) the image goes to trash 2) the hash is still being computed 3) the image is "really" deleted?	14:27
dansmith	croelandt: that's the expectation yes	14:28
abhishekk	https://github.com/openstack/glance/blob/master/glance/async_/flows/location_import.py#L94	14:28
dansmith	croelandt: I think rosmaita told me that he thought that might not work, that we can only trash something that just has other clones but isn't actually open for reading	14:28
dansmith	croelandt: I also feel like this must be an existing issue with download..	14:29
croelandt	This might sound stupid, but when Glance deletes an image, can't it start by cancelling the hash calculation?	14:29
dansmith	croelandt: if I go to download an image and then slow-walk the data stream so it takes an hour, can't I block delete of that image?	14:29
dansmith	croelandt: I've already suggested that too - we can, but it takes more work	14:29
abhishekk	AFAIK pranali has tested this with Octopus version of ceph and it does not have that issue	14:29
croelandt	dansmith: oh really?	14:29
dansmith	croelandt: just like the import-from-worker, we need to know which worker and call to that one to stop the task	14:29
dansmith	croelandt: if we recorded that, then yes, we could call to cancel and that would be better, IMHO	14:30
croelandt	how hard is it to keep a list of workers and tasks?	14:30
dansmith	croelandt: we don't even need to do that,	14:30
croelandt	isn't that something we can log at some point?	14:30
dansmith	croelandt: like the distributed import, we just record which one is hashing an image, on the image	14:30
dansmith	we already do this for distributed import	14:30
dansmith	we record on the image "hey it's me $conf.self_ref_url who staged this image, let me know if you want me to import it"	14:31
abhishekk	so we should add another property?	14:31
dansmith	we could do the same for the hash task	14:31
croelandt	I mean	14:31
croelandt	that seems easier to say "hey, let's forget about the hash" than to say "let's compute this hash we'll never use because the user changed their mind and decided to delete the image"	14:32
dansmith	this is why I want to know if cinder suffers here, because the cinder backend has no "trash" like rbd does	14:32
croelandt	cancelling the hash calculation would be backend-agnostic	14:32
dansmith	exactly	14:32
croelandt	which is a huge plus imo	14:32
croelandt	I don't want to find out in 2 years that "lol s3 changed soemthing and now we have s3-specific code"	14:32
dansmith	not likely for s3 but substitute anything else in there, agreed :D	14:32
croelandt	yeah	14:33
croelandt	or $Shiny_new_backend	14:33
croelandt	or $ai_backend_that_hallucinates_your_data	14:33
croelandt	abhishekk_: you say this used to work, does Ceph know about this?	14:33
croelandt	Is this a change that appears in the release notes?	14:33
abhishekk	I don't know about ceph, but as per pranali it was working with octopus	14:34
croelandt	ok I'll talk to Francesco maybe	14:34
abhishekk	we can confirm that if we manage to add a job which deploys octopus for us?	14:34
croelandt	hm it would be nice to do that	14:35
croelandt	can we easily specify a ceph version for our -ceph jobs?	14:35
dansmith	I want to know about cinder :)	14:35
abhishekk	I think I will check dnm patches of pranali she might have added the octopus related job at that time	14:35
croelandt	dansmith: writing that down as well	14:35
abhishekk	may be rajat can help us with cinder case,	14:36
croelandt	but truly avoiding driver-specific code would be nice	14:36
abhishekk	we have cinder multistore job which does not break imo	14:36
abhishekk	ack,	14:36
croelandt	easier to understand & debug if all drivers behave similarly	14:36
abhishekk	so as per dan we should add one more property to record a worker which is calculating hash	14:36
dansmith	the ceph job isn't actually failing either right?	14:37
croelandt	not sure how hard that is and maybe we'll find it impractical but that sounds good	14:37
dansmith	I mean, reliably	14:37
croelandt	dansmith: the Tempest tests were failing, weren't there?	14:37
croelandt	https://review.opendev.org/c/openstack/tempest/+/949595	14:37
dansmith	they must have been flaky	14:37
croelandt	#link https://review.opendev.org/c/openstack/tempest/+/949595	14:37
abhishekk	ceph nova job is failing intermmittently for snapshot, backp tests	14:37
croelandt	wasn't this patch a way to workaround the issue?	14:37
dansmith	right, so intermittent means the cinder one not being broken doesn't prove anything to me	14:38
abhishekk	ack	14:38
dansmith	looking at the cinder driver real quick,	14:38
dansmith	I don't see how it could not be failing the same way	14:39
dansmith	also remember that upstream we use basically zero-length images so the hash can complete faster than we can delete a server...	14:39
abhishekk	May be I will deploy glance with cinder and test this manually	14:39
croelandt	so maybe we're lucky and the hash calculation is just fast enough?	14:39
croelandt	yeah ok	14:39
croelandt	in real life, it's also probably best not to compute the hash for no reason, right?	14:40
abhishekk	is it ok to test with lvm as backend or should test with glance and cinder both using rbd?	14:40
dansmith	you mean keep hashing it after delete?	14:40
dansmith	croelandt: ^	14:40
croelandt	dansmith: yeah, if a user deletes an image and we keep computing the hash, it's basically useless	14:40
croelandt	and wasted resources	14:40
dansmith	croelandt: its also a DoS opportunity for them	14:40
dansmith	abhishekk: I don't think it matters, but you need to (a) make sure the hashing is working for cinder and (b) that it's actually running when you try to delete the image	14:41
croelandt	by doing this N times simultaneously?	14:41
dansmith	croelandt: I can create and delete images all day long yeah	14:41
abhishekk	AFAIK this new location API will use service credentials so that's less possible>	14:42
dansmith	croelandt: especially if I use web-download to source the material it doesn't even cost me bandwidth :)	14:42
croelandt	yeah and for one image the CPU load may not be high but if you do that enough...	14:42
croelandt	so yeah maybe let's kill the hash computation task	14:42
dansmith	abhishekk: right but I can create lots of instances and snapshot them	14:42
abhishekk	:D	14:42
dansmith	abhishekk: I'm not saying it's an acute problem, but croelandt is right that it's wasted resource	14:42
abhishekk	agree	14:43
croelandt	yeah and we think it might not be triggered, but some smart ass is going to find a workaround	14:43
croelandt	so might as well not do it	14:43
abhishekk	OK so action plan is	14:43
abhishekk	1, test with cinder	14:43
abhishekk	2. kill the hash computation during delete	14:44
abhishekk	anything else?	14:44
dansmith	I would still experiment with the rbd trash solution	14:44
dansmith	rosmaita wasn't sure you could trash an active in-use image, but lets figure out if that's an option	14:44
croelandt	also I'm worried this might change at some point in Ceph's life :)	14:45
abhishekk	No, we will not trash active, we will mark it as deleted in glance and then move it to trash	14:45
dansmith	#2 seems like a good idea to me, but it will be more work too and thus will take longer	14:45
dansmith	croelandt: because it already has?	14:45
croelandt	dansmith: I'm confused, didn't you like the driver-agnostic solution better?	14:45
dansmith	croelandt: absolutely	14:46
dansmith	croelandt: it's not just something abhishekk can write up in an afternoon (I suspect)	14:46
abhishekk	may be 3-4 afternoons ?	14:46
dansmith	and I just want to know what the trash solution is, although "marking as deleted in glance" is not really a thing AFAIK, so I'm curious about that	14:46
abhishekk	that will be patchy solution :/	14:47
dansmith	abhishekk: did you mean do what I suggested before, delete in glance, ignore the InUseByStore and make the hash task delete in glance when it finds out the image is deleted in glance?	14:47
abhishekk	yeah	14:48
dansmith	that's good too although it does have a leaky hole	14:48
dansmith	if we mark as deleted and then the hash task crashes or stops because an operator kills it,	14:48
dansmith	then we leak the image on rbd with no real good way to clean up (AFAIK)	14:48
abhishekk	I think rbd deletes the image it does not keep it	14:49
dansmith	why would it?	14:49
abhishekk	it deletes the image and then raises InUse exception :/	14:49
dansmith	oh right, that bug	14:49
abhishekk	yeah	14:49
dansmith	that's the buggy behavior,	14:49
dansmith	but we can't depend on that forever,	14:50
dansmith	so if they fix that behavior we start leaking	14:50
abhishekk	exactly	14:50
abhishekk	So should we stick with cancelling the hash computation before delete call goes to actual backend?	14:50
dansmith	so, I was hoping that we could make delete move the image to trash only (if rosmaita is wrong). if it's unused it will go away immediately, and if it is, it will go away when finished (hash cancel aside)	14:51
dansmith	so _that_ is what I was saying we should still investigate :)	14:51
abhishekk	OK, I will write a POC for that	14:52
abhishekk	So modified action plan	14:52
abhishekk	1, test with cinder	14:52
abhishekk	2, POC for moving image to trash	14:53
abhishekk	3, hash calculation cancellation before deleting the image	14:53
dansmith	yep	14:53
dansmith	the distributed import should be a good example for the hash cancelation thing	14:53
abhishekk	IF 2 works then we can skip 3?	14:53
croelandt	Ideally if 3 works can we skip 2? :)	14:53
abhishekk	That depends on 1 I guess :P	14:53
dansmith	idk, I think 3 is still worthwhile, but yeah.. only if 1 :)	14:54
dansmith	croelandt: yeah, maybe we should just do 1, 3, and then 2 if 3 looks harder than we thought or something	14:54
abhishekk	Ok, may be next Thursday we will have more data to decide on	14:54
croelandt	yeah	14:54
croelandt	let's move on	14:54
croelandt	#topic Specs	14:55
croelandt	On Monday I will merge https://review.opendev.org/c/openstack/glance-specs/+/947423	14:55
croelandt	unless I see a -1 there :)	14:55
dansmith	ugh, I should go review that	14:55
croelandt	#topic One easy patch per core dev	14:55
croelandt	#link https://review.opendev.org/c/openstack/glance/+/936319	14:55
dansmith	I'm just really behind and swamped	14:56
croelandt	^ this is a simple patch by Takashi to remove a bunch of duplicated hacking checks	14:56
croelandt	dansmith: yeah :-(	14:56
abhishekk	We have mhen here I think, want to discuss something	14:56
mhen	hi	14:56
croelandt	I see there was a lenghty discussion about the spec	14:56
croelandt	so feel free to go -1 if this has not been resolved	14:57
croelandt	mhen: oh yeah, did you have something? I don't see a topic in the agenda	14:57
mhen	no, just quick update for now	14:57
mhen	I'm working on the image encryption again, currently looking into image import cases	14:57
mhen	glance-direct from staging seems to be working fine and no impact, will look into cross-backend import of encrypted images next	14:57
abhishekk	cool	14:57
abhishekk	let us know if you need anything	14:58
mhen	btw, noticed that `openstack image import --method glance-direct` has pretty bad UX: if the Glance API returns any Conflict or BadRequest (in glance/glance/api/v2/images.py there are lot of cases for this!), the client simply ignores it and shows a GET output of the image stuck in "uploading" state, which can be repeated indefinitely	14:58
mhen	even with `--debug` it only briefly shows the 409 but not the message	14:58
croelandt	interesting	14:58
croelandt	can you file a bug for that?	14:58
abhishekk	may be we need to look into that	14:58
mhen	yes, I will put it on my todo list to file a bug	14:59
abhishekk	could you possibly check with glance image-create-via-import as well?	14:59
mhen	will try, noted	14:59
abhishekk	cool, thank you!!	14:59
croelandt	#topic Open Discussion	15:00
croelandt	I won't be there for the next 2 Thursdays	15:00
croelandt	so it's up to all of y'all whether there will be meetings :)	15:00
abhishekk	Ok, I will chair the next meeting	15:01
abhishekk	we will decide for the next one later	15:01
croelandt	perfect	15:02
croelandt	It's been a long one	15:02
croelandt	see you on #openstack-glance :)	15:02
croelandt	Thanks for joining	15:02
abhishekk	thank you!	15:02
croelandt	#endmeeting	15:03
opendevmeet	Meeting ended Thu May 22 15:03:33 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	15:03
opendevmeet	Minutes: https://meetings.opendev.org/meetings/glance/2025/glance.2025-05-22-14.00.html	15:03
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/glance/2025/glance.2025-05-22-14.00.txt	15:03
opendevmeet	Log: https://meetings.opendev.org/meetings/glance/2025/glance.2025-05-22-14.00.log.html	15:03

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!