14:00:28 <pranali> #startmeeting glance
14:00:28 <opendevmeet> Meeting started Thu Dec  7 14:00:28 2023 UTC and is due to finish in 60 minutes.  The chair is pranali. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:28 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:28 <opendevmeet> The meeting name has been set to 'glance'
14:00:28 <pranali> #topic roll call
14:00:28 <pranali> #link https://etherpad.openstack.org/p/glance-team-meeting-agenda
14:00:33 <pranali> o/
14:00:33 <mrjoshi> o/
14:01:12 <pranali> ok, so assuming everynone is back here let's start :)
14:01:15 <rosmaita> o/
14:01:19 <pranali> #topic release/periodic jobs updates
14:01:43 <pranali> M2 is 5 weeks from now which will be spec freeze for us as well
14:02:10 <pranali> Periodic jobs are all green except intermittent TIME_OUTs on fips jobs
14:02:46 <pranali> moving to next
14:02:50 <pranali> #topic RBD deletion Issue
14:02:59 <pranali> #link https://bugs.launchpad.net/glance/+bug/2045769 - Image remains in active state even image data is deleted from the rbd store
14:03:59 <pranali> so i've observed this issue during the new add location api testing when delete is attempted when hash calculation is ongoing after image has set to active
14:04:04 <rosmaita> you asked me to look at this yesterday but i forgot
14:04:18 <pranali> yeah np
14:05:09 <pranali> I just thought you must be having an idea on this bcz i have seen one of you old patch where in the commit msg it's mentioned that when store throws  in use exception it deleted the data as well
14:05:26 <pranali> #link https://github.com/openstack/glance/commit/f267bd6cde0e2b3ef5d08ae7c91831e1c88ed990
14:05:33 <pranali> this one ^
14:06:04 <rosmaita> ok, i will claim that i co-authored the part that doesn't have a bug
14:06:46 <pranali> ohh ok
14:07:00 <rosmaita> (just kidding)
14:07:17 <pranali> :D
14:08:55 <pranali> I've tried to fix that in my current location import patch by marking the image to deleted after catching the exception
14:08:58 <pranali> #link https://review.opendev.org/c/openstack/glance/+/886749/31/glance/async_/flows/location_import.py#83
14:09:33 <pranali> but after noticing this same issue for image download as well i thin kit should be handled in deleted operation it self, right ?
14:10:41 <rosmaita> sorry, i'm still trying to figure out the context (looking at the bug https://bugs.launchpad.net/glance/+bug/2045769 )
14:10:57 <rosmaita> with that bug, for step #1
14:11:37 <rosmaita> the image has been uploaded and gone active before you go to step #2, is that right?
14:11:44 <pranali> yes
14:12:33 <rosmaita> ok, and since that was a regular 'glance image-create', the hash would be done during the upload (not later? or have we changed that?)
14:13:20 <pranali> yeah i think so
14:13:56 <rosmaita> ok, what i'm getting at is that i don't think the hash computation is involved in this issue
14:14:15 <rosmaita> the error in step #2 i'm pretty sure is coming from the client
14:15:20 <pranali> hmm need to check that but download has got NotFound error since the data was lost
14:16:06 <pranali> #link https://paste.opendev.org/show/bg8hJ7kF7CYJVM4lZMe2/
14:16:06 <rosmaita> right
14:17:37 <pranali> I'm just not sure why store raises InUseByStore exception if it deletes the data
14:17:51 <rosmaita> right
14:18:03 <rosmaita> i wonder if the image cache has anything to do with this
14:18:25 <abhishekk> rosmaita, let me explain the issue here
14:18:27 <rosmaita> have you tried it without the cache (or do we always cache these days)
14:18:35 <rosmaita> please!
14:19:02 <abhishekk> I created the image A of 5 gb (hash is calculated) and image is active now
14:19:33 <abhishekk> I sent image download request, download started and in 2nd window I sent delete image request
14:20:02 <abhishekk> Now what happens is download interrupts as data is deleted but delete call fails by saying image in use
14:20:09 <abhishekk> and image remains active state
14:20:26 <abhishekk> on 2nd download call we get error that image has no data
14:20:46 <abhishekk> Problem is store returns image is busy but it also deletes the data from the store
14:21:21 <abhishekk> And user gets delete call failed and he sees image is still active
14:21:53 <rosmaita> are all the locations gone at that point?
14:22:24 <abhishekk> (assume) There is only one location, store deletes the data and returns Busy exception to glance
14:22:42 <abhishekk> glance does not deletes the location and keeps image in active state
14:22:51 <rosmaita> but does it leave the location on the image
14:22:54 <abhishekk> yes
14:22:57 <rosmaita> ok
14:23:36 <abhishekk> I think this is serious issue
14:23:57 <abhishekk> There are two possibilities,
14:24:03 <abhishekk> regression in ceph
14:24:15 <abhishekk> or store code is wrong
14:24:30 <rosmaita> or both!
14:24:35 <abhishekk> :D
14:25:15 <abhishekk> my suggestion to pranali is deploy quincy and check this scenario
14:25:19 <rosmaita> ok, on the plus side, though, the user deleted the image and the data is gone, so they will be annoyed that it still shows active, but shouldn't be too annoyed because they deleted it
14:25:35 <abhishekk> correct
14:26:53 <rosmaita> do we have debug logs from the first image delete in this scenario?
14:27:30 <pranali> abhishekk, i've tried to change the ceph version in nova-ceph-multistore job but it's failing
14:27:33 <pranali> #link https://zuul.opendev.org/t/openstack/build/e62d4a18b87f4be1872c84c0560f61d3
14:28:03 <abhishekk> pranali, might have it
14:28:19 * pranali is checking
14:28:37 <abhishekk> find out the error, and try, because we need to rule out the possibilities
14:29:45 <abhishekk> this issue can be easily reproducible, so we can get logs again
14:29:53 <rosmaita> so basically, glance_store rbd driver asks ceph to delete the data, it gets back an is-busy-error, but ceph deletes the data anyway
14:30:07 <rosmaita> glance thinks that the delete failed, so it keeps the image in 'active'
14:30:25 <rosmaita> and doesn't remove the location where it thinks the data is
14:30:40 <abhishekk> correct
14:30:46 <rosmaita> but since ceph deleted the data, all downloads fail
14:30:59 <abhishekk> yes
14:31:26 <rosmaita> and this is with current master code, and which ceph?
14:32:23 <pranali> i think the latest ceph, Reef
14:32:44 <rosmaita> ok
14:32:58 <pranali> we have not yet confirmed whether it's there with previous version of ceph as well
14:33:20 <pranali> #link https://etherpad.opendev.org/p/image-delete-from-rbd-issue
14:33:31 <pranali> I've these logs atm
14:34:10 <rosmaita> ok, thanks
14:34:38 <abhishekk> etherpad is empty?
14:34:49 <pranali> :O
14:35:54 <pranali> I can see the logs in that etherpad
14:36:16 <abhishekk> its empty for me
14:36:50 <mrjoshi> It's empty for me too
14:37:07 <abhishekk> \o
14:37:42 <abhishekk> \o/ voodoo
14:38:38 * croeland1 sees nothing
14:39:17 <pranali> hmm not sure why it's showing me now reconnecting continuously :/
14:39:25 * abhishekk it's magic, issue does not want us to solve it except pranali
14:39:39 <pranali> LOL
14:41:29 <pranali> ok, I think we should move ahead and can continue this discussion on glance channel
14:42:08 <pranali> #link https://paste.opendev.org/show/b8Lt6CgF5Sjd7Sd3g8SV/, tried to add the logs here
14:42:33 <rosmaita> ok, i can see that one
14:42:43 <abhishekk> I think your logs broke etherpad :P
14:43:59 <pranali> plz ingnore the above link , #link https://paste.opendev.org/show/b8sruRYp2tcRcJ9sWwqB/
14:45:12 <abhishekk> I think you can explore above possibilities to isolate the problem
14:45:22 <abhishekk> let's move ahead,
14:45:26 <pranali> yeah
14:45:46 <pranali> #topic Specs
14:46:09 <pranali> #link https://review.opendev.org/c/openstack/glance-specs/+/899367 - Use Centralized database for cache operations
14:46:13 <abhishekk> also do one check, upload large image and during upload delete the image and see what happens
14:46:17 <pranali> #link https://review.opendev.org/c/openstack/glance-specs/+/900267 - New API to restore image
14:46:31 <pranali> #link https://review.opendev.org/c/openstack/glance-specs/+/899804 - [Spec Lite] Deprecate location strategy
14:46:41 <pranali> #link https://review.opendev.org/c/openstack/glance-specs/+/899805 - [Spec Lite] Deprecate cachemanage middleware
14:46:51 <pranali> #link https://review.opendev.org/c/openstack/glance-specs/+/899857 - Caracal project priorities
14:47:17 <pranali> abhishekk, yeah that also should be tried, I will do that
14:48:39 <pranali> kindly please have a look at these specs, the deprecation specs emails I've sent on ML, so we can wait for those till end of this month if anyone has any objection on the same
14:49:26 <abhishekk> please review centralized db spec, that is most important this cycle
14:49:55 <pranali> yes
14:50:29 <pranali> that's it from me
14:51:07 <pranali> let's move to open discussions
14:51:15 <pranali> #topic Open Discussions
14:51:16 <abhishekk> I don't have anything else
14:51:33 <mrjoshi> I would like to highlight
14:51:39 <rosmaita> ok, somebody please bug me tomorrow about reviewing specs
14:52:06 <pranali> rosmaita, ack :)
14:52:10 <abhishekk> haha
14:52:19 * abhishekk signing out
14:52:22 <abhishekk> thank you all
14:52:37 <pranali> Thanks everyone for joining !!
14:52:59 <pranali> #endmeeting