14:00:28 <pranali> #startmeeting glance 14:00:28 <opendevmeet> Meeting started Thu Dec 7 14:00:28 2023 UTC and is due to finish in 60 minutes. The chair is pranali. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:28 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:28 <opendevmeet> The meeting name has been set to 'glance' 14:00:28 <pranali> #topic roll call 14:00:28 <pranali> #link https://etherpad.openstack.org/p/glance-team-meeting-agenda 14:00:33 <pranali> o/ 14:00:33 <mrjoshi> o/ 14:01:12 <pranali> ok, so assuming everynone is back here let's start :) 14:01:15 <rosmaita> o/ 14:01:19 <pranali> #topic release/periodic jobs updates 14:01:43 <pranali> M2 is 5 weeks from now which will be spec freeze for us as well 14:02:10 <pranali> Periodic jobs are all green except intermittent TIME_OUTs on fips jobs 14:02:46 <pranali> moving to next 14:02:50 <pranali> #topic RBD deletion Issue 14:02:59 <pranali> #link https://bugs.launchpad.net/glance/+bug/2045769 - Image remains in active state even image data is deleted from the rbd store 14:03:59 <pranali> so i've observed this issue during the new add location api testing when delete is attempted when hash calculation is ongoing after image has set to active 14:04:04 <rosmaita> you asked me to look at this yesterday but i forgot 14:04:18 <pranali> yeah np 14:05:09 <pranali> I just thought you must be having an idea on this bcz i have seen one of you old patch where in the commit msg it's mentioned that when store throws in use exception it deleted the data as well 14:05:26 <pranali> #link https://github.com/openstack/glance/commit/f267bd6cde0e2b3ef5d08ae7c91831e1c88ed990 14:05:33 <pranali> this one ^ 14:06:04 <rosmaita> ok, i will claim that i co-authored the part that doesn't have a bug 14:06:46 <pranali> ohh ok 14:07:00 <rosmaita> (just kidding) 14:07:17 <pranali> :D 14:08:55 <pranali> I've tried to fix that in my current location import patch by marking the image to deleted after catching the exception 14:08:58 <pranali> #link https://review.opendev.org/c/openstack/glance/+/886749/31/glance/async_/flows/location_import.py#83 14:09:33 <pranali> but after noticing this same issue for image download as well i thin kit should be handled in deleted operation it self, right ? 14:10:41 <rosmaita> sorry, i'm still trying to figure out the context (looking at the bug https://bugs.launchpad.net/glance/+bug/2045769 ) 14:10:57 <rosmaita> with that bug, for step #1 14:11:37 <rosmaita> the image has been uploaded and gone active before you go to step #2, is that right? 14:11:44 <pranali> yes 14:12:33 <rosmaita> ok, and since that was a regular 'glance image-create', the hash would be done during the upload (not later? or have we changed that?) 14:13:20 <pranali> yeah i think so 14:13:56 <rosmaita> ok, what i'm getting at is that i don't think the hash computation is involved in this issue 14:14:15 <rosmaita> the error in step #2 i'm pretty sure is coming from the client 14:15:20 <pranali> hmm need to check that but download has got NotFound error since the data was lost 14:16:06 <pranali> #link https://paste.opendev.org/show/bg8hJ7kF7CYJVM4lZMe2/ 14:16:06 <rosmaita> right 14:17:37 <pranali> I'm just not sure why store raises InUseByStore exception if it deletes the data 14:17:51 <rosmaita> right 14:18:03 <rosmaita> i wonder if the image cache has anything to do with this 14:18:25 <abhishekk> rosmaita, let me explain the issue here 14:18:27 <rosmaita> have you tried it without the cache (or do we always cache these days) 14:18:35 <rosmaita> please! 14:19:02 <abhishekk> I created the image A of 5 gb (hash is calculated) and image is active now 14:19:33 <abhishekk> I sent image download request, download started and in 2nd window I sent delete image request 14:20:02 <abhishekk> Now what happens is download interrupts as data is deleted but delete call fails by saying image in use 14:20:09 <abhishekk> and image remains active state 14:20:26 <abhishekk> on 2nd download call we get error that image has no data 14:20:46 <abhishekk> Problem is store returns image is busy but it also deletes the data from the store 14:21:21 <abhishekk> And user gets delete call failed and he sees image is still active 14:21:53 <rosmaita> are all the locations gone at that point? 14:22:24 <abhishekk> (assume) There is only one location, store deletes the data and returns Busy exception to glance 14:22:42 <abhishekk> glance does not deletes the location and keeps image in active state 14:22:51 <rosmaita> but does it leave the location on the image 14:22:54 <abhishekk> yes 14:22:57 <rosmaita> ok 14:23:36 <abhishekk> I think this is serious issue 14:23:57 <abhishekk> There are two possibilities, 14:24:03 <abhishekk> regression in ceph 14:24:15 <abhishekk> or store code is wrong 14:24:30 <rosmaita> or both! 14:24:35 <abhishekk> :D 14:25:15 <abhishekk> my suggestion to pranali is deploy quincy and check this scenario 14:25:19 <rosmaita> ok, on the plus side, though, the user deleted the image and the data is gone, so they will be annoyed that it still shows active, but shouldn't be too annoyed because they deleted it 14:25:35 <abhishekk> correct 14:26:53 <rosmaita> do we have debug logs from the first image delete in this scenario? 14:27:30 <pranali> abhishekk, i've tried to change the ceph version in nova-ceph-multistore job but it's failing 14:27:33 <pranali> #link https://zuul.opendev.org/t/openstack/build/e62d4a18b87f4be1872c84c0560f61d3 14:28:03 <abhishekk> pranali, might have it 14:28:19 * pranali is checking 14:28:37 <abhishekk> find out the error, and try, because we need to rule out the possibilities 14:29:45 <abhishekk> this issue can be easily reproducible, so we can get logs again 14:29:53 <rosmaita> so basically, glance_store rbd driver asks ceph to delete the data, it gets back an is-busy-error, but ceph deletes the data anyway 14:30:07 <rosmaita> glance thinks that the delete failed, so it keeps the image in 'active' 14:30:25 <rosmaita> and doesn't remove the location where it thinks the data is 14:30:40 <abhishekk> correct 14:30:46 <rosmaita> but since ceph deleted the data, all downloads fail 14:30:59 <abhishekk> yes 14:31:26 <rosmaita> and this is with current master code, and which ceph? 14:32:23 <pranali> i think the latest ceph, Reef 14:32:44 <rosmaita> ok 14:32:58 <pranali> we have not yet confirmed whether it's there with previous version of ceph as well 14:33:20 <pranali> #link https://etherpad.opendev.org/p/image-delete-from-rbd-issue 14:33:31 <pranali> I've these logs atm 14:34:10 <rosmaita> ok, thanks 14:34:38 <abhishekk> etherpad is empty? 14:34:49 <pranali> :O 14:35:54 <pranali> I can see the logs in that etherpad 14:36:16 <abhishekk> its empty for me 14:36:50 <mrjoshi> It's empty for me too 14:37:07 <abhishekk> \o 14:37:42 <abhishekk> \o/ voodoo 14:38:38 * croeland1 sees nothing 14:39:17 <pranali> hmm not sure why it's showing me now reconnecting continuously :/ 14:39:25 * abhishekk it's magic, issue does not want us to solve it except pranali 14:39:39 <pranali> LOL 14:41:29 <pranali> ok, I think we should move ahead and can continue this discussion on glance channel 14:42:08 <pranali> #link https://paste.opendev.org/show/b8Lt6CgF5Sjd7Sd3g8SV/, tried to add the logs here 14:42:33 <rosmaita> ok, i can see that one 14:42:43 <abhishekk> I think your logs broke etherpad :P 14:43:59 <pranali> plz ingnore the above link , #link https://paste.opendev.org/show/b8sruRYp2tcRcJ9sWwqB/ 14:45:12 <abhishekk> I think you can explore above possibilities to isolate the problem 14:45:22 <abhishekk> let's move ahead, 14:45:26 <pranali> yeah 14:45:46 <pranali> #topic Specs 14:46:09 <pranali> #link https://review.opendev.org/c/openstack/glance-specs/+/899367 - Use Centralized database for cache operations 14:46:13 <abhishekk> also do one check, upload large image and during upload delete the image and see what happens 14:46:17 <pranali> #link https://review.opendev.org/c/openstack/glance-specs/+/900267 - New API to restore image 14:46:31 <pranali> #link https://review.opendev.org/c/openstack/glance-specs/+/899804 - [Spec Lite] Deprecate location strategy 14:46:41 <pranali> #link https://review.opendev.org/c/openstack/glance-specs/+/899805 - [Spec Lite] Deprecate cachemanage middleware 14:46:51 <pranali> #link https://review.opendev.org/c/openstack/glance-specs/+/899857 - Caracal project priorities 14:47:17 <pranali> abhishekk, yeah that also should be tried, I will do that 14:48:39 <pranali> kindly please have a look at these specs, the deprecation specs emails I've sent on ML, so we can wait for those till end of this month if anyone has any objection on the same 14:49:26 <abhishekk> please review centralized db spec, that is most important this cycle 14:49:55 <pranali> yes 14:50:29 <pranali> that's it from me 14:51:07 <pranali> let's move to open discussions 14:51:15 <pranali> #topic Open Discussions 14:51:16 <abhishekk> I don't have anything else 14:51:33 <mrjoshi> I would like to highlight 14:51:39 <rosmaita> ok, somebody please bug me tomorrow about reviewing specs 14:52:06 <pranali> rosmaita, ack :) 14:52:10 <abhishekk> haha 14:52:19 * abhishekk signing out 14:52:22 <abhishekk> thank you all 14:52:37 <pranali> Thanks everyone for joining !! 14:52:59 <pranali> #endmeeting