Thursday, 2025-05-22

*** mhen_ is now known as mhen01:32
croelandto/14:00
croelandt#startmeeting glance14:00
opendevmeetMeeting started Thu May 22 14:00:39 2025 UTC and is due to finish in 60 minutes.  The chair is croelandt. Information about MeetBot at http://wiki.debian.org/MeetBot.14:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.14:00
opendevmeetThe meeting name has been set to 'glance'14:00
croelandt#topic roll call14:00
croelandto/14:00
abhishekk_o/14:00
mheno/14:01
croelandt#link https://etherpad.openstack.org/p/glance-team-meeting-agenda14:01
croelandt#topic Release/periodic job updates14:02
croelandtglance-multistore-cinder-import-fip is still failing with the same nova-related packaging error as last week14:02
croelandtdo we know whether anybody is looking at this?14:02
abhishekk_Not me14:02
croelandtis there a channel where we can get help with  stuff like this?14:04
croelandtI'm wondering if maybe something is wrong with devstack-single-node-centos-9-stream14:04
abhishekk_May be infra?14:05
abhishekk_Or rosmaita14:05
croelandtok I'll try #openstack-infra14:05
dansmithI thought all the centos jobs were broken14:05
croelandtoh14:05
dansmith(it's their normal state I think, I'm surprised when they work)14:05
croelandthas there been an email about this?14:05
croelandthaha14:05
croelandtok I see what you mean14:05
dansmithI dunno I heard someone else talking about it14:05
croelandtok I'll dig around on #openstack-infra and try to figure it out14:06
croelandtmoving on14:06
croelandt#topic cheroot to replace use of eventlet.wsgi14:06
croelandtabhishekk_: ^14:06
abhishekk_Yeah, do we want it?14:07
abhishekk_I think someone suggested us we can use cheroot if we want14:07
croelandtso how many ways do we have to deploy Glance right now?14:07
croelandteventlet (to be removed soon) and uwsgi?14:07
dansmithum what14:07
abhishekk_I think uwsgi and mod_wsgi?14:08
dansmithwhat is "cheroot" ?14:08
abhishekk_Cheroot is a pure-Python HTTP server, used as the underlying server component for web frameworks like CherryPy.14:08
dansmithah, so not a direct replacement but yet another deployment mechanism I see14:09
abhishekk_It’s WSGI-compliant14:09
abhishekk_Yes14:09
croelandtso why would we want that if we got uwsgi and mod_wsgi?14:09
dansmithright, and also, if we're wsgi compliant we don't really need to worry (much) about which wsgi container people use14:10
abhishekk_If we still want to use WSGI based server i think14:10
croelandthonestly I'm lost here14:10
dansmithyeah14:10
croelandtAren't we using WSGI already?14:10
dansmithyes14:10
croelandtso why add yet another deployment mechanism?14:11
abhishekk_i don’t recall but someone suggested that if we want then we can think of cheroot14:11
dansmithwell, if chreroot is just a wsgi container, then were not and it should work in this one, and all the many others that are out there14:11
dansmithbut I don't know why we would focus on cheroot any more than gunicorn or mod_wsgi, etc etc14:12
abhishekk_May be we can discuss this next PTG14:12
dansmithokay but.. there shouldn't be anything we need to do14:12
dansmithif someone wants to use cheroot with glance, they probably can14:12
croelandtI'd expect that if WSGI is a standard, we can plug anything14:12
abhishekk_So right now can we remove wsgi which is using eventlet?14:12
croelandtI would keep it for a few cycles in case things go south14:13
croelandtso users can easily switch back to eventlet if needed14:13
dansmithabhishekk_: in pure wsgi mode we're not really using eventlet14:13
abhishekk_Right, but code is still there and as we are migrating our functional tests there are some files which are strictly using that14:14
abhishekk_So should we not migrate those and keep it there in repo?14:14
dansmithright, is your suggestion to use cheroot for the functional test harness?14:14
dansmithi.e. not in production, but for the test spinup?14:14
abhishekk_Kind of14:15
croelandthow does that work since we're migrating away from using a "real" server?14:15
abhishekk_Ok may be during next meeting i will come up with one example14:16
dansmithmaybe abhishekk_ is suggesting using cheroot to maintain some amount of standalone-like runtime?14:16
dansmithotherwise I'm definitely confused about the overlap between eventlet, wsgi, and cheroot14:17
abhishekk_Yeah14:17
croelandtNow I'm confused whether this is related to deployment or funcitonal testing14:18
abhishekk_Ok, let’s revisit this next werk14:18
croelandtalso we're in the middle of migrating all the funcitonal tests away from using a real server :D14:18
dansmithI think I'd prefer not to have (and maintain) any sort of standalone thing in parallel to the way we expect glance to be run in production, unless it's very specifically more for just in the test environment or something14:18
croelandtyeah maybe if you could put up a document on how we deploy, and what you want to change - something that we could review before the next meeting14:18
croelandtdansmith: +114:19
dansmithbut either way, many of the other api projects do functional testing without standing up a full server, and thus have a lot less complexity for doing this in general14:19
abhishekk_Yes we are and there are still some tests where we need actual server unless we mock the usage of it14:19
croelandtdansmith: I think we should also observe what stephenfin does for Devstack and make sure we make it easy for him14:19
croelandtok maybe let's identify those tests14:19
croelandtand see why we cannot do like all the other projects14:19
dansmithyeah14:19
abhishekk_Ack14:19
croelandtok moving on14:20
croelandt#topic RBD Delete issue while hash calculation is in progress14:20
croelandtabhishekk_: still you :)14:20
abhishekk_test_wsgi for example14:20
abhishekk_What should we do about this case14:21
abhishekk_Unless we solve this issue, new location api is of no use to consumers of glance14:21
croelandtok so this is the issue with turning on do_secure_hash by default, right?14:22
abhishekk_I am still inclined towards adding new method in glance_store rbd driver to move image to trash and then from glance delete call we can check if any task is progress for that image then move it to trash14:22
dansmithit is on by default, as it should. be14:23
abhishekk_Yes14:23
croelandtIs this only an issue with RBD?14:23
abhishekk_Yes14:23
dansmithabhishekk: is this. not a problem also with cinder?14:23
abhishekk_I don’t think we have encountered this with cinder as glance backend14:24
dansmithbut, shouldn't it be? If the hashing is running you can't delete and unmount right?14:24
abhishekk_InUseByStore is only raised by rbd14:24
croelandtwe should probably check with Rajat/Brian14:25
dansmithI understand that the exact problem and trace is rbd specific, I'm talking about the general problem of delete while hash is running14:25
abhishekk_I am not sure about it, but as per my understanding only rbd restricts us from deleting the image (but actually it deletes it)14:26
croelandtso if we send it to trash14:26
croelandtwill RBD allow us to do that while it's computing the has?14:26
croelandthash*14:26
dansmithbut the cinder driver will dry to unmount and delete the volume during an image delete right?14:26
abhishekk_I will check with cinder as glance backend14:26
dansmithI'm saying, let's spend a minute to find out if there are other backends that might also be affected, so we can make sure we fix it right14:26
abhishekk_Yes and it will delete it successfully imo14:27
abhishekk_Also we have a code at location import in hash calculation task to catch not found and log a warning and continue rather than failing14:27
croelandtso 1) the image goes to trash 2) the hash is still being computed 3) the image is "really" deleted?14:27
dansmithcroelandt: that's the expectation yes14:28
abhishekkhttps://github.com/openstack/glance/blob/master/glance/async_/flows/location_import.py#L9414:28
dansmithcroelandt: I think rosmaita told me that he thought that might not work, that we can only trash something that just has other clones but isn't actually open for reading14:28
dansmithcroelandt: I also feel like this must be an existing issue with download..14:29
croelandtThis might sound stupid, but when Glance deletes an image, can't it start by cancelling the hash calculation?14:29
dansmithcroelandt: if I go to download an image and then slow-walk the data stream so it takes an hour, can't I block delete of that image?14:29
dansmithcroelandt: I've already suggested that too - we can, but it takes more work14:29
abhishekkAFAIK pranali has tested this with Octopus version of ceph and it does not have that issue14:29
croelandtdansmith: oh really?14:29
dansmithcroelandt: just like the import-from-worker, we need to know which worker and call to *that* one to stop the task14:29
dansmithcroelandt: if we recorded that, then yes, we could call to cancel and that would be better, IMHO14:30
croelandthow hard is it to keep a list of workers and tasks?14:30
dansmithcroelandt: we don't even need to do that,14:30
croelandtisn't that something we can log at some point?14:30
dansmithcroelandt: like the distributed import, we just record *which* one is hashing an image, on the image14:30
dansmithwe already do this for distributed import14:30
dansmithwe record on the image "hey it's me $conf.self_ref_url who staged this image, let me know if you want me to import it"14:31
abhishekkso we should add another property?14:31
dansmithwe could do the same for the hash task14:31
croelandtI mean14:31
croelandtthat seems easier to say "hey, let's forget about the hash" than to say "let's compute this hash we'll never use because the user changed their mind and decided to delete the image"14:32
dansmiththis is why I want to know if cinder suffers here, because the cinder backend has no "trash" like rbd does14:32
croelandtcancelling the hash calculation would be backend-agnostic14:32
dansmithexactly14:32
croelandtwhich is a huge plus imo14:32
croelandtI don't want to find out in 2 years that "lol s3 changed soemthing and now we have s3-specific code"14:32
dansmithnot likely for s3 but substitute anything else in there, agreed :D14:32
croelandtyeah14:33
croelandtor $Shiny_new_backend14:33
croelandtor $ai_backend_that_hallucinates_your_data14:33
croelandtabhishekk_: you say this used to work, does Ceph know about this?14:33
croelandtIs this a change that appears in the release notes?14:33
abhishekkI don't know about ceph, but as per pranali it was working with octopus14:34
croelandtok I'll talk to Francesco maybe14:34
abhishekkwe can confirm that if we manage to add a job which deploys octopus for us?14:34
croelandthm it would be nice to do that14:35
croelandtcan we easily specify a ceph version for our -ceph jobs?14:35
dansmithI want to know about cinder :)14:35
abhishekkI think I will check dnm patches of pranali she might have added the octopus related job at that time14:35
croelandtdansmith: writing that down as well14:35
abhishekkmay be rajat can help us with cinder case, 14:36
croelandtbut truly avoiding driver-specific code would be nice14:36
abhishekkwe have cinder multistore job which does not break imo14:36
abhishekkack,14:36
croelandteasier to understand & debug if all drivers behave similarly14:36
abhishekkso as per dan we should add one more property to record a worker which is calculating hash14:36
dansmiththe ceph job isn't actually failing either right?14:37
croelandtnot sure how hard that is and maybe we'll find it impractical but that sounds good14:37
dansmithI mean, reliably14:37
croelandtdansmith: the Tempest tests were failing, weren't there?14:37
croelandthttps://review.opendev.org/c/openstack/tempest/+/949595 14:37
dansmiththey must have been flaky14:37
croelandt#link https://review.opendev.org/c/openstack/tempest/+/949595 14:37
abhishekkceph nova job is failing intermmittently for snapshot, backp tests14:37
croelandtwasn't this patch a way to workaround the issue?14:37
dansmithright, so intermittent means the cinder one not being broken doesn't prove anything to me14:38
abhishekkack14:38
dansmithlooking at the cinder driver real quick,14:38
dansmithI don't see how it could not be failing the same way14:39
dansmithalso remember that upstream we use basically zero-length images so the hash can complete faster than we can delete a server...14:39
abhishekkMay be I will deploy glance with cinder and test this manually14:39
croelandtso maybe we're lucky and the hash calculation is just fast enough?14:39
croelandtyeah ok14:39
croelandtin real life, it's also probably best not to compute the hash for no reason, right?14:40
abhishekkis it ok to test with lvm as backend or should test with glance and cinder both using rbd?14:40
dansmithyou mean keep hashing it after delete?14:40
dansmithcroelandt: ^14:40
croelandtdansmith: yeah, if a user deletes an image and we keep computing the hash, it's basically useless14:40
croelandtand wasted resources14:40
dansmithcroelandt: its also a DoS opportunity for them14:40
dansmithabhishekk: I don't think it matters, but you need to (a) make sure the hashing is working for cinder and (b) that it's actually running when you try to delete the image14:41
croelandtby doing this N times simultaneously?14:41
dansmithcroelandt: I can create and delete images all day long yeah14:41
abhishekkAFAIK this new location API will use service credentials so that's less possible>14:42
dansmithcroelandt: especially if I use web-download to source the material it doesn't even cost me bandwidth :)14:42
croelandtyeah and for one image the CPU load may not be high but if you do that enough...14:42
croelandtso yeah maybe let's kill the hash computation task14:42
dansmithabhishekk: right but I can create lots of instances and snapshot them14:42
abhishekk:D14:42
dansmithabhishekk: I'm not saying it's an acute problem, but croelandt is right that it's wasted resource14:42
abhishekkagree14:43
croelandtyeah and we think it might not be triggered, but some smart ass is going to find a workaround14:43
croelandtso might as well not do it14:43
abhishekkOK so action plan is14:43
abhishekk1, test with cinder14:43
abhishekk2. kill the hash computation during delete14:44
abhishekkanything else?14:44
dansmithI would still experiment with the rbd trash solution14:44
dansmithrosmaita wasn't sure you could trash an active in-use image, but lets figure out if that's an option14:44
croelandtalso I'm worried this might change at some point in Ceph's life :)14:45
abhishekkNo, we will not trash active, we will mark it as deleted in glance and then move it to trash14:45
dansmith#2 seems like a good idea to me, but it will be more work too and thus will take longer14:45
dansmithcroelandt: because it already has?14:45
croelandtdansmith: I'm confused, didn't you like the driver-agnostic solution better?14:45
dansmithcroelandt: absolutely14:46
dansmithcroelandt: it's not just something abhishekk can write up in an afternoon (I suspect)14:46
abhishekkmay be 3-4 afternoons ?14:46
dansmithand I just want to know what the trash solution is, although "marking as deleted in glance" is not really a thing AFAIK, so I'm curious about that14:46
abhishekkthat will be patchy solution :/14:47
dansmithabhishekk: did you mean do what I suggested before, delete in glance, ignore the InUseByStore and make the hash task delete in glance when it finds out the image is deleted in glance?14:47
abhishekkyeah14:48
dansmiththat's good too although it does have a leaky hole14:48
dansmithif we mark as deleted and then the hash task crashes or stops because an operator kills it,14:48
dansmiththen we leak the image on rbd with no real good way to clean up (AFAIK)14:48
abhishekkI think rbd deletes the image it does not keep it14:49
dansmithwhy would it?14:49
abhishekkit deletes the image and then raises InUse exception :/14:49
dansmithoh right, that bug14:49
abhishekkyeah14:49
dansmiththat's the buggy behavior,14:49
dansmithbut we can't depend on that forever,14:50
dansmithso if they fix that behavior we start leaking14:50
abhishekkexactly14:50
abhishekkSo should we stick with cancelling the hash computation before delete call goes to actual backend?14:50
dansmithso, I was hoping that we could make delete move the image to trash only (if rosmaita is wrong). if it's unused it will go away immediately, and if it is, it will go away when finished (hash cancel aside)14:51
dansmithso _that_ is what I was saying we should still investigate :)14:51
abhishekkOK, I will write a POC for that14:52
abhishekkSo modified action plan14:52
abhishekk1, test with cinder14:52
abhishekk2, POC for moving image to trash14:53
abhishekk3, hash calculation cancellation before deleting the image14:53
dansmithyep14:53
dansmiththe distributed import should be a good example for the hash cancelation thing14:53
abhishekkIF 2 works then we can skip 3?14:53
croelandtIdeally if 3 works can we skip 2? :)14:53
abhishekkThat depends on 1 I guess :P14:53
dansmithidk, I think 3 is still worthwhile, but yeah.. only if 1 :)14:54
dansmithcroelandt: yeah, maybe we should just do 1, 3, and then 2 if 3 looks harder than we thought or something14:54
abhishekkOk, may be next Thursday we will have more data to decide on14:54
croelandtyeah14:54
croelandtlet's move on14:54
croelandt#topic Specs14:55
croelandtOn Monday I will merge  https://review.opendev.org/c/openstack/glance-specs/+/94742314:55
croelandtunless I see a -1 there :)14:55
dansmithugh, I should go review that14:55
croelandt#topic One easy patch per core dev14:55
croelandt#link https://review.opendev.org/c/openstack/glance/+/93631914:55
dansmithI'm just really behind and swamped14:56
croelandt^ this is a simple patch by Takashi to remove a bunch of duplicated hacking checks14:56
croelandtdansmith: yeah :-(14:56
abhishekkWe have mhen here I think, want to discuss something14:56
mhenhi14:56
croelandtI see there was a lenghty discussion about the spec14:56
croelandtso feel free to go -1 if this has not been resolved14:57
croelandtmhen: oh yeah, did you have something? I don't see a topic in the agenda14:57
mhenno, just quick update for now14:57
mhenI'm working on the image encryption again, currently looking into image import cases14:57
mhenglance-direct from staging seems to be working fine and no impact, will look into cross-backend import of encrypted images next14:57
abhishekkcool14:57
abhishekklet us know if you need anything 14:58
mhenbtw, noticed that `openstack image import --method glance-direct` has pretty bad UX: if the Glance API returns any Conflict or BadRequest (in glance/glance/api/v2/images.py there are lot of cases for this!), the client simply ignores it and shows a GET output of the image stuck in "uploading" state, which can be repeated indefinitely14:58
mheneven with `--debug` it only briefly shows the 409 but not the message14:58
croelandtinteresting14:58
croelandtcan you file a bug for that?14:58
abhishekkmay be we need to look into that14:58
mhenyes, I will put it on my todo list to file a bug14:59
abhishekkcould you possibly check with glance image-create-via-import as well?14:59
mhenwill try, noted14:59
abhishekkcool, thank you!!14:59
croelandt#topic Open Discussion15:00
croelandtI won't be there for the next 2 Thursdays15:00
croelandtso it's up to all of y'all whether there will be meetings :)15:00
abhishekkOk, I will chair the next meeting15:01
abhishekkwe will decide for the next one later15:01
croelandtperfect15:02
croelandtIt's been a long one15:02
croelandtsee you on #openstack-glance :)15:02
croelandtThanks for joining15:02
abhishekkthank you!15:02
croelandt#endmeeting15:03
opendevmeetMeeting ended Thu May 22 15:03:33 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:03
opendevmeetMinutes:        https://meetings.opendev.org/meetings/glance/2025/glance.2025-05-22-14.00.html15:03
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/glance/2025/glance.2025-05-22-14.00.txt15:03
opendevmeetLog:            https://meetings.opendev.org/meetings/glance/2025/glance.2025-05-22-14.00.log.html15:03

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!