*** mhen_ is now known as mhen | 01:32 | |
croelandt | o/ | 14:00 |
---|---|---|
croelandt | #startmeeting glance | 14:00 |
opendevmeet | Meeting started Thu May 22 14:00:39 2025 UTC and is due to finish in 60 minutes. The chair is croelandt. Information about MeetBot at http://wiki.debian.org/MeetBot. | 14:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 14:00 |
opendevmeet | The meeting name has been set to 'glance' | 14:00 |
croelandt | #topic roll call | 14:00 |
croelandt | o/ | 14:00 |
abhishekk_ | o/ | 14:00 |
mhen | o/ | 14:01 |
croelandt | #link https://etherpad.openstack.org/p/glance-team-meeting-agenda | 14:01 |
croelandt | #topic Release/periodic job updates | 14:02 |
croelandt | glance-multistore-cinder-import-fip is still failing with the same nova-related packaging error as last week | 14:02 |
croelandt | do we know whether anybody is looking at this? | 14:02 |
abhishekk_ | Not me | 14:02 |
croelandt | is there a channel where we can get help with stuff like this? | 14:04 |
croelandt | I'm wondering if maybe something is wrong with devstack-single-node-centos-9-stream | 14:04 |
abhishekk_ | May be infra? | 14:05 |
abhishekk_ | Or rosmaita | 14:05 |
croelandt | ok I'll try #openstack-infra | 14:05 |
dansmith | I thought all the centos jobs were broken | 14:05 |
croelandt | oh | 14:05 |
dansmith | (it's their normal state I think, I'm surprised when they work) | 14:05 |
croelandt | has there been an email about this? | 14:05 |
croelandt | haha | 14:05 |
croelandt | ok I see what you mean | 14:05 |
dansmith | I dunno I heard someone else talking about it | 14:05 |
croelandt | ok I'll dig around on #openstack-infra and try to figure it out | 14:06 |
croelandt | moving on | 14:06 |
croelandt | #topic cheroot to replace use of eventlet.wsgi | 14:06 |
croelandt | abhishekk_: ^ | 14:06 |
abhishekk_ | Yeah, do we want it? | 14:07 |
abhishekk_ | I think someone suggested us we can use cheroot if we want | 14:07 |
croelandt | so how many ways do we have to deploy Glance right now? | 14:07 |
croelandt | eventlet (to be removed soon) and uwsgi? | 14:07 |
dansmith | um what | 14:07 |
abhishekk_ | I think uwsgi and mod_wsgi? | 14:08 |
dansmith | what is "cheroot" ? | 14:08 |
abhishekk_ | Cheroot is a pure-Python HTTP server, used as the underlying server component for web frameworks like CherryPy. | 14:08 |
dansmith | ah, so not a direct replacement but yet another deployment mechanism I see | 14:09 |
abhishekk_ | It’s WSGI-compliant | 14:09 |
abhishekk_ | Yes | 14:09 |
croelandt | so why would we want that if we got uwsgi and mod_wsgi? | 14:09 |
dansmith | right, and also, if we're wsgi compliant we don't really need to worry (much) about which wsgi container people use | 14:10 |
abhishekk_ | If we still want to use WSGI based server i think | 14:10 |
croelandt | honestly I'm lost here | 14:10 |
dansmith | yeah | 14:10 |
croelandt | Aren't we using WSGI already? | 14:10 |
dansmith | yes | 14:10 |
croelandt | so why add yet another deployment mechanism? | 14:11 |
abhishekk_ | i don’t recall but someone suggested that if we want then we can think of cheroot | 14:11 |
dansmith | well, if chreroot is just a wsgi container, then were not and it should work in this one, and all the many others that are out there | 14:11 |
dansmith | but I don't know why we would focus on cheroot any more than gunicorn or mod_wsgi, etc etc | 14:12 |
abhishekk_ | May be we can discuss this next PTG | 14:12 |
dansmith | okay but.. there shouldn't be anything we need to do | 14:12 |
dansmith | if someone wants to use cheroot with glance, they probably can | 14:12 |
croelandt | I'd expect that if WSGI is a standard, we can plug anything | 14:12 |
abhishekk_ | So right now can we remove wsgi which is using eventlet? | 14:12 |
croelandt | I would keep it for a few cycles in case things go south | 14:13 |
croelandt | so users can easily switch back to eventlet if needed | 14:13 |
dansmith | abhishekk_: in pure wsgi mode we're not really using eventlet | 14:13 |
abhishekk_ | Right, but code is still there and as we are migrating our functional tests there are some files which are strictly using that | 14:14 |
abhishekk_ | So should we not migrate those and keep it there in repo? | 14:14 |
dansmith | right, is your suggestion to use cheroot for the functional test harness? | 14:14 |
dansmith | i.e. not in production, but for the test spinup? | 14:14 |
abhishekk_ | Kind of | 14:15 |
croelandt | how does that work since we're migrating away from using a "real" server? | 14:15 |
abhishekk_ | Ok may be during next meeting i will come up with one example | 14:16 |
dansmith | maybe abhishekk_ is suggesting using cheroot to maintain some amount of standalone-like runtime? | 14:16 |
dansmith | otherwise I'm definitely confused about the overlap between eventlet, wsgi, and cheroot | 14:17 |
abhishekk_ | Yeah | 14:17 |
croelandt | Now I'm confused whether this is related to deployment or funcitonal testing | 14:18 |
abhishekk_ | Ok, let’s revisit this next werk | 14:18 |
croelandt | also we're in the middle of migrating all the funcitonal tests away from using a real server :D | 14:18 |
dansmith | I think I'd prefer not to have (and maintain) any sort of standalone thing in parallel to the way we expect glance to be run in production, unless it's very specifically more for just in the test environment or something | 14:18 |
croelandt | yeah maybe if you could put up a document on how we deploy, and what you want to change - something that we could review before the next meeting | 14:18 |
croelandt | dansmith: +1 | 14:19 |
dansmith | but either way, many of the other api projects do functional testing without standing up a full server, and thus have a lot less complexity for doing this in general | 14:19 |
abhishekk_ | Yes we are and there are still some tests where we need actual server unless we mock the usage of it | 14:19 |
croelandt | dansmith: I think we should also observe what stephenfin does for Devstack and make sure we make it easy for him | 14:19 |
croelandt | ok maybe let's identify those tests | 14:19 |
croelandt | and see why we cannot do like all the other projects | 14:19 |
dansmith | yeah | 14:19 |
abhishekk_ | Ack | 14:19 |
croelandt | ok moving on | 14:20 |
croelandt | #topic RBD Delete issue while hash calculation is in progress | 14:20 |
croelandt | abhishekk_: still you :) | 14:20 |
abhishekk_ | test_wsgi for example | 14:20 |
abhishekk_ | What should we do about this case | 14:21 |
abhishekk_ | Unless we solve this issue, new location api is of no use to consumers of glance | 14:21 |
croelandt | ok so this is the issue with turning on do_secure_hash by default, right? | 14:22 |
abhishekk_ | I am still inclined towards adding new method in glance_store rbd driver to move image to trash and then from glance delete call we can check if any task is progress for that image then move it to trash | 14:22 |
dansmith | it is on by default, as it should. be | 14:23 |
abhishekk_ | Yes | 14:23 |
croelandt | Is this only an issue with RBD? | 14:23 |
abhishekk_ | Yes | 14:23 |
dansmith | abhishekk: is this. not a problem also with cinder? | 14:23 |
abhishekk_ | I don’t think we have encountered this with cinder as glance backend | 14:24 |
dansmith | but, shouldn't it be? If the hashing is running you can't delete and unmount right? | 14:24 |
abhishekk_ | InUseByStore is only raised by rbd | 14:24 |
croelandt | we should probably check with Rajat/Brian | 14:25 |
dansmith | I understand that the exact problem and trace is rbd specific, I'm talking about the general problem of delete while hash is running | 14:25 |
abhishekk_ | I am not sure about it, but as per my understanding only rbd restricts us from deleting the image (but actually it deletes it) | 14:26 |
croelandt | so if we send it to trash | 14:26 |
croelandt | will RBD allow us to do that while it's computing the has? | 14:26 |
croelandt | hash* | 14:26 |
dansmith | but the cinder driver will dry to unmount and delete the volume during an image delete right? | 14:26 |
abhishekk_ | I will check with cinder as glance backend | 14:26 |
dansmith | I'm saying, let's spend a minute to find out if there are other backends that might also be affected, so we can make sure we fix it right | 14:26 |
abhishekk_ | Yes and it will delete it successfully imo | 14:27 |
abhishekk_ | Also we have a code at location import in hash calculation task to catch not found and log a warning and continue rather than failing | 14:27 |
croelandt | so 1) the image goes to trash 2) the hash is still being computed 3) the image is "really" deleted? | 14:27 |
dansmith | croelandt: that's the expectation yes | 14:28 |
abhishekk | https://github.com/openstack/glance/blob/master/glance/async_/flows/location_import.py#L94 | 14:28 |
dansmith | croelandt: I think rosmaita told me that he thought that might not work, that we can only trash something that just has other clones but isn't actually open for reading | 14:28 |
dansmith | croelandt: I also feel like this must be an existing issue with download.. | 14:29 |
croelandt | This might sound stupid, but when Glance deletes an image, can't it start by cancelling the hash calculation? | 14:29 |
dansmith | croelandt: if I go to download an image and then slow-walk the data stream so it takes an hour, can't I block delete of that image? | 14:29 |
dansmith | croelandt: I've already suggested that too - we can, but it takes more work | 14:29 |
abhishekk | AFAIK pranali has tested this with Octopus version of ceph and it does not have that issue | 14:29 |
croelandt | dansmith: oh really? | 14:29 |
dansmith | croelandt: just like the import-from-worker, we need to know which worker and call to *that* one to stop the task | 14:29 |
dansmith | croelandt: if we recorded that, then yes, we could call to cancel and that would be better, IMHO | 14:30 |
croelandt | how hard is it to keep a list of workers and tasks? | 14:30 |
dansmith | croelandt: we don't even need to do that, | 14:30 |
croelandt | isn't that something we can log at some point? | 14:30 |
dansmith | croelandt: like the distributed import, we just record *which* one is hashing an image, on the image | 14:30 |
dansmith | we already do this for distributed import | 14:30 |
dansmith | we record on the image "hey it's me $conf.self_ref_url who staged this image, let me know if you want me to import it" | 14:31 |
abhishekk | so we should add another property? | 14:31 |
dansmith | we could do the same for the hash task | 14:31 |
croelandt | I mean | 14:31 |
croelandt | that seems easier to say "hey, let's forget about the hash" than to say "let's compute this hash we'll never use because the user changed their mind and decided to delete the image" | 14:32 |
dansmith | this is why I want to know if cinder suffers here, because the cinder backend has no "trash" like rbd does | 14:32 |
croelandt | cancelling the hash calculation would be backend-agnostic | 14:32 |
dansmith | exactly | 14:32 |
croelandt | which is a huge plus imo | 14:32 |
croelandt | I don't want to find out in 2 years that "lol s3 changed soemthing and now we have s3-specific code" | 14:32 |
dansmith | not likely for s3 but substitute anything else in there, agreed :D | 14:32 |
croelandt | yeah | 14:33 |
croelandt | or $Shiny_new_backend | 14:33 |
croelandt | or $ai_backend_that_hallucinates_your_data | 14:33 |
croelandt | abhishekk_: you say this used to work, does Ceph know about this? | 14:33 |
croelandt | Is this a change that appears in the release notes? | 14:33 |
abhishekk | I don't know about ceph, but as per pranali it was working with octopus | 14:34 |
croelandt | ok I'll talk to Francesco maybe | 14:34 |
abhishekk | we can confirm that if we manage to add a job which deploys octopus for us? | 14:34 |
croelandt | hm it would be nice to do that | 14:35 |
croelandt | can we easily specify a ceph version for our -ceph jobs? | 14:35 |
dansmith | I want to know about cinder :) | 14:35 |
abhishekk | I think I will check dnm patches of pranali she might have added the octopus related job at that time | 14:35 |
croelandt | dansmith: writing that down as well | 14:35 |
abhishekk | may be rajat can help us with cinder case, | 14:36 |
croelandt | but truly avoiding driver-specific code would be nice | 14:36 |
abhishekk | we have cinder multistore job which does not break imo | 14:36 |
abhishekk | ack, | 14:36 |
croelandt | easier to understand & debug if all drivers behave similarly | 14:36 |
abhishekk | so as per dan we should add one more property to record a worker which is calculating hash | 14:36 |
dansmith | the ceph job isn't actually failing either right? | 14:37 |
croelandt | not sure how hard that is and maybe we'll find it impractical but that sounds good | 14:37 |
dansmith | I mean, reliably | 14:37 |
croelandt | dansmith: the Tempest tests were failing, weren't there? | 14:37 |
croelandt | https://review.opendev.org/c/openstack/tempest/+/949595 | 14:37 |
dansmith | they must have been flaky | 14:37 |
croelandt | #link https://review.opendev.org/c/openstack/tempest/+/949595 | 14:37 |
abhishekk | ceph nova job is failing intermmittently for snapshot, backp tests | 14:37 |
croelandt | wasn't this patch a way to workaround the issue? | 14:37 |
dansmith | right, so intermittent means the cinder one not being broken doesn't prove anything to me | 14:38 |
abhishekk | ack | 14:38 |
dansmith | looking at the cinder driver real quick, | 14:38 |
dansmith | I don't see how it could not be failing the same way | 14:39 |
dansmith | also remember that upstream we use basically zero-length images so the hash can complete faster than we can delete a server... | 14:39 |
abhishekk | May be I will deploy glance with cinder and test this manually | 14:39 |
croelandt | so maybe we're lucky and the hash calculation is just fast enough? | 14:39 |
croelandt | yeah ok | 14:39 |
croelandt | in real life, it's also probably best not to compute the hash for no reason, right? | 14:40 |
abhishekk | is it ok to test with lvm as backend or should test with glance and cinder both using rbd? | 14:40 |
dansmith | you mean keep hashing it after delete? | 14:40 |
dansmith | croelandt: ^ | 14:40 |
croelandt | dansmith: yeah, if a user deletes an image and we keep computing the hash, it's basically useless | 14:40 |
croelandt | and wasted resources | 14:40 |
dansmith | croelandt: its also a DoS opportunity for them | 14:40 |
dansmith | abhishekk: I don't think it matters, but you need to (a) make sure the hashing is working for cinder and (b) that it's actually running when you try to delete the image | 14:41 |
croelandt | by doing this N times simultaneously? | 14:41 |
dansmith | croelandt: I can create and delete images all day long yeah | 14:41 |
abhishekk | AFAIK this new location API will use service credentials so that's less possible> | 14:42 |
dansmith | croelandt: especially if I use web-download to source the material it doesn't even cost me bandwidth :) | 14:42 |
croelandt | yeah and for one image the CPU load may not be high but if you do that enough... | 14:42 |
croelandt | so yeah maybe let's kill the hash computation task | 14:42 |
dansmith | abhishekk: right but I can create lots of instances and snapshot them | 14:42 |
abhishekk | :D | 14:42 |
dansmith | abhishekk: I'm not saying it's an acute problem, but croelandt is right that it's wasted resource | 14:42 |
abhishekk | agree | 14:43 |
croelandt | yeah and we think it might not be triggered, but some smart ass is going to find a workaround | 14:43 |
croelandt | so might as well not do it | 14:43 |
abhishekk | OK so action plan is | 14:43 |
abhishekk | 1, test with cinder | 14:43 |
abhishekk | 2. kill the hash computation during delete | 14:44 |
abhishekk | anything else? | 14:44 |
dansmith | I would still experiment with the rbd trash solution | 14:44 |
dansmith | rosmaita wasn't sure you could trash an active in-use image, but lets figure out if that's an option | 14:44 |
croelandt | also I'm worried this might change at some point in Ceph's life :) | 14:45 |
abhishekk | No, we will not trash active, we will mark it as deleted in glance and then move it to trash | 14:45 |
dansmith | #2 seems like a good idea to me, but it will be more work too and thus will take longer | 14:45 |
dansmith | croelandt: because it already has? | 14:45 |
croelandt | dansmith: I'm confused, didn't you like the driver-agnostic solution better? | 14:45 |
dansmith | croelandt: absolutely | 14:46 |
dansmith | croelandt: it's not just something abhishekk can write up in an afternoon (I suspect) | 14:46 |
abhishekk | may be 3-4 afternoons ? | 14:46 |
dansmith | and I just want to know what the trash solution is, although "marking as deleted in glance" is not really a thing AFAIK, so I'm curious about that | 14:46 |
abhishekk | that will be patchy solution :/ | 14:47 |
dansmith | abhishekk: did you mean do what I suggested before, delete in glance, ignore the InUseByStore and make the hash task delete in glance when it finds out the image is deleted in glance? | 14:47 |
abhishekk | yeah | 14:48 |
dansmith | that's good too although it does have a leaky hole | 14:48 |
dansmith | if we mark as deleted and then the hash task crashes or stops because an operator kills it, | 14:48 |
dansmith | then we leak the image on rbd with no real good way to clean up (AFAIK) | 14:48 |
abhishekk | I think rbd deletes the image it does not keep it | 14:49 |
dansmith | why would it? | 14:49 |
abhishekk | it deletes the image and then raises InUse exception :/ | 14:49 |
dansmith | oh right, that bug | 14:49 |
abhishekk | yeah | 14:49 |
dansmith | that's the buggy behavior, | 14:49 |
dansmith | but we can't depend on that forever, | 14:50 |
dansmith | so if they fix that behavior we start leaking | 14:50 |
abhishekk | exactly | 14:50 |
abhishekk | So should we stick with cancelling the hash computation before delete call goes to actual backend? | 14:50 |
dansmith | so, I was hoping that we could make delete move the image to trash only (if rosmaita is wrong). if it's unused it will go away immediately, and if it is, it will go away when finished (hash cancel aside) | 14:51 |
dansmith | so _that_ is what I was saying we should still investigate :) | 14:51 |
abhishekk | OK, I will write a POC for that | 14:52 |
abhishekk | So modified action plan | 14:52 |
abhishekk | 1, test with cinder | 14:52 |
abhishekk | 2, POC for moving image to trash | 14:53 |
abhishekk | 3, hash calculation cancellation before deleting the image | 14:53 |
dansmith | yep | 14:53 |
dansmith | the distributed import should be a good example for the hash cancelation thing | 14:53 |
abhishekk | IF 2 works then we can skip 3? | 14:53 |
croelandt | Ideally if 3 works can we skip 2? :) | 14:53 |
abhishekk | That depends on 1 I guess :P | 14:53 |
dansmith | idk, I think 3 is still worthwhile, but yeah.. only if 1 :) | 14:54 |
dansmith | croelandt: yeah, maybe we should just do 1, 3, and then 2 if 3 looks harder than we thought or something | 14:54 |
abhishekk | Ok, may be next Thursday we will have more data to decide on | 14:54 |
croelandt | yeah | 14:54 |
croelandt | let's move on | 14:54 |
croelandt | #topic Specs | 14:55 |
croelandt | On Monday I will merge https://review.opendev.org/c/openstack/glance-specs/+/947423 | 14:55 |
croelandt | unless I see a -1 there :) | 14:55 |
dansmith | ugh, I should go review that | 14:55 |
croelandt | #topic One easy patch per core dev | 14:55 |
croelandt | #link https://review.opendev.org/c/openstack/glance/+/936319 | 14:55 |
dansmith | I'm just really behind and swamped | 14:56 |
croelandt | ^ this is a simple patch by Takashi to remove a bunch of duplicated hacking checks | 14:56 |
croelandt | dansmith: yeah :-( | 14:56 |
abhishekk | We have mhen here I think, want to discuss something | 14:56 |
mhen | hi | 14:56 |
croelandt | I see there was a lenghty discussion about the spec | 14:56 |
croelandt | so feel free to go -1 if this has not been resolved | 14:57 |
croelandt | mhen: oh yeah, did you have something? I don't see a topic in the agenda | 14:57 |
mhen | no, just quick update for now | 14:57 |
mhen | I'm working on the image encryption again, currently looking into image import cases | 14:57 |
mhen | glance-direct from staging seems to be working fine and no impact, will look into cross-backend import of encrypted images next | 14:57 |
abhishekk | cool | 14:57 |
abhishekk | let us know if you need anything | 14:58 |
mhen | btw, noticed that `openstack image import --method glance-direct` has pretty bad UX: if the Glance API returns any Conflict or BadRequest (in glance/glance/api/v2/images.py there are lot of cases for this!), the client simply ignores it and shows a GET output of the image stuck in "uploading" state, which can be repeated indefinitely | 14:58 |
mhen | even with `--debug` it only briefly shows the 409 but not the message | 14:58 |
croelandt | interesting | 14:58 |
croelandt | can you file a bug for that? | 14:58 |
abhishekk | may be we need to look into that | 14:58 |
mhen | yes, I will put it on my todo list to file a bug | 14:59 |
abhishekk | could you possibly check with glance image-create-via-import as well? | 14:59 |
mhen | will try, noted | 14:59 |
abhishekk | cool, thank you!! | 14:59 |
croelandt | #topic Open Discussion | 15:00 |
croelandt | I won't be there for the next 2 Thursdays | 15:00 |
croelandt | so it's up to all of y'all whether there will be meetings :) | 15:00 |
abhishekk | Ok, I will chair the next meeting | 15:01 |
abhishekk | we will decide for the next one later | 15:01 |
croelandt | perfect | 15:02 |
croelandt | It's been a long one | 15:02 |
croelandt | see you on #openstack-glance :) | 15:02 |
croelandt | Thanks for joining | 15:02 |
abhishekk | thank you! | 15:02 |
croelandt | #endmeeting | 15:03 |
opendevmeet | Meeting ended Thu May 22 15:03:33 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:03 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/glance/2025/glance.2025-05-22-14.00.html | 15:03 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/glance/2025/glance.2025-05-22-14.00.txt | 15:03 |
opendevmeet | Log: https://meetings.opendev.org/meetings/glance/2025/glance.2025-05-22-14.00.log.html | 15:03 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!