#openstack-meeting log

14:00:01 <abhishekk> #startmeeting glance
14:00:02 <opendevmeet> Meeting started Thu Jan 20 14:00:01 2022 UTC and is due to finish in 60 minutes.  The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:02 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:02 <opendevmeet> The meeting name has been set to 'glance'
14:00:04 <abhishekk> #topic roll call
14:00:09 <abhishekk> #link https://etherpad.openstack.org/p/glance-team-meeting-agenda
14:00:10 <abhishekk> o/
14:02:33 <abhishekk> Waiting for others to show
14:03:27 <rosmaita> o/
14:03:37 <rajiv> Hey
14:04:12 <abhishekk> cool, lets start,
14:04:21 <abhishekk> may be others will show in between
14:04:37 <abhishekk> #topic release/periodic jobs update
14:04:46 <abhishekk> Milestone 3 6 weeks from now
14:04:53 <abhishekk> Possible targets for M3
14:05:03 <abhishekk> Cache API
14:05:04 <abhishekk> Stores detail API
14:05:04 <abhishekk> Unified limits usage API
14:05:04 <abhishekk> Append existing metadef tags
14:05:28 <abhishekk> So these are some important work we are targeting for M3
14:05:51 <abhishekk> Will ping for reviews as and when they are up
14:05:57 <abhishekk> Non-Client library release - 5 weeks
14:06:20 <abhishekk> We do need to release glance-store by next week with V2 clone fix
14:06:35 <abhishekk> Periodic jobs all green
14:06:47 <abhishekk> #topic Cache API
14:07:14 <abhishekk> Cache API base patch is up for review, couple of suggestions from dansmith, I will fix them
14:07:25 <abhishekk> Tempest coverage is in progress
14:07:35 <abhishekk> #link https://review.opendev.org/c/openstack/glance/+/825115
14:08:14 <abhishekk> I am thinking to cover more cache APIs and scenarios, will be open for reviews before next meeting
14:08:29 <abhishekk> #topic Devstack CephAdmin plugin
14:08:38 <abhishekk> #link http://lists.openstack.org/pipermail/openstack-discuss/2022-January/026778.html
14:09:24 <abhishekk> There will be efforts to create new cephadmin devstack plugin
14:09:45 <abhishekk> I will sync with victoria for more information
14:10:19 <abhishekk> from glance prospective, we need to make sure that this plugin will deploy ceph with single store as well as multistore configuration
14:10:40 <abhishekk> that's it from me for today
14:11:00 <abhishekk> rosmaita, do you have any inputs to add about cephadm plugin?
14:11:38 <rosmaita> no, i think sean moody's response to vkmc's initial email is basically correct
14:12:09 <rosmaita> that is, do the work in the current devstack-plugin-ceph, don't make a new one
14:13:02 <abhishekk> yes, I went through it
14:13:56 <abhishekk> lets see how it goes
14:14:02 <abhishekk> #topic Open discussion
14:14:12 <abhishekk> I don't have anything to add
14:14:16 <jokke_> I guess it's just matter of changing devstack to deploy with the new tooling Ceph introduced
14:14:27 <jokke_> not sure if there's anything else really to it for now
14:14:57 <abhishekk> likely
14:16:16 <abhishekk> anything else to discuss or we should wrap this up?
14:16:24 <jokke_> abhishekk: I saw you had revived the cache management api patch but didn't see any of your negative tests you held it from merging last cycle ... we're still expecting new ps for that?
14:16:48 <abhishekk> jokke_, yes, I am working on those
14:17:04 <jokke_> I still have no idea what you meant with that so can't tell if I just missed them, but there was nothing added
14:17:36 <jokke_> kk
14:17:47 <abhishekk> Nope, I haven't pushed those yet as facing some issues
14:18:30 <abhishekk> Like one scenario for example
14:18:47 <abhishekk> create image without any data (queued status)
14:19:07 <abhishekk> add that image to queue for cache and its getting added to queued
14:19:41 <abhishekk> So I am thinking whether we add some validation there (like non-active images should not be added to queue)
14:21:02 <jokke_> up to you ... I tried to get the API entry point moved last cycle and was very clear that I had no interest to change the actual logic that happens in the caching module ... IMHO those things should be bugfixes and changed on their pwn patches
14:21:11 <jokke_> but you do as you wish with them
14:21:22 <abhishekk> ack
14:21:50 <abhishekk> sounds good
14:22:00 <abhishekk> anything else to add ?
14:22:12 <abhishekk> croelandt, ^
14:22:19 <jokke_> it makes sense to fix issues like that and the bug I filed asap for the new API endpoints so we're not breaking them right after release ;)
14:22:44 <jokke_> but IMO they are not related to moving the endpoints from the middleware to actual api
14:23:44 <croelandt> abhishekk: nope :D
14:24:02 <abhishekk> yes they are not, but I am just thinking to do it at this point only
14:24:39 <abhishekk> croelandt, ack
14:24:45 <dansmith> o/
14:24:54 <abhishekk> hey
14:25:09 <abhishekk> we are done for today
14:25:16 <dansmith> sweet :)
14:25:21 <abhishekk> dansmith, do you have anything to add ?
14:25:25 <rajiv> hi, i would like to follow up on this bug : https://bugs.launchpad.net/python-swiftclient/+bug/1899495
14:25:26 <dansmith> nope
14:25:52 <abhishekk> I have cache tempest base work up, if you have time, please have a look
14:26:19 <rosmaita> i must say, it is nice to see all this tempest work for glance happening
14:26:19 <dansmith> I saw yesterday yep
14:26:24 <dansmith> rosmaita: ++
14:27:19 <abhishekk> rajiv, unfortunately didn't get time to go through it much
14:27:49 <jokke_> rajiv: I just read Tim's last comment on it
14:28:15 <jokke_> rajiv: have you actually confirmed that scenario that it happens when there is other images in the container?
14:28:25 <abhishekk> I just need input whether we wait for default cache periodic time (5 minutes) or set it in zuul.yaml to less time
14:28:28 <rajiv> jokke_: yes, i replied to the comment, we have already implemented it but it dint help
14:28:59 <jokke_> rajiv: ok, so the 500 is coming from the swift, not from Glance?
14:30:31 <rajiv> since i have nginx in the middle in my containerised setup, i am unable to validate the source
14:31:35 <jokke_> kk, I'll try to give it another look and see if I can come up with something that could work based on Tim's comment
14:32:04 <rosmaita> rajiv: looking at your last comment in the bug, i think it's always possible to get a 5xx response even though we didn't list them in the api-ref
14:33:08 <rajiv> 409 for sure comes from swift/client.py but 500 from glance
14:33:40 <jokke_> Ok, that's what I was asking, so the 500 is coming from glance, swift correctly returns 409
14:33:50 <rajiv> 2022-01-20 02:02:01,536.536 23 INFO eventlet.wsgi.server [req-7cd63508-bed1-4c5f-b2cc-7f0e93907813 60d12fe738fe73aeea4219a0b3b9e55c8435b55455e7c9f144eece379d88f252 a2caa84313704823b7321b3fb0fc1763 - ec213443e8834473b579f7bea9e8c194 ec213443e8834473b579f7bea9e8c194] 10.236.203.62,100.65.1.96 - - [20/Jan/2022 02:02:01] "DELETE /v2/images/5f3c87fd-9a0e-4d61-88f9-301e3f01309d HTTP/1.1" 500 430 28.849376
14:34:10 <abhishekk> rajiv, any stack trace ?
14:34:45 <rajiv> abhishekk: not more than this :(
14:34:52 <abhishekk> ack
14:35:04 <rajiv> 2022-01-20 02:02:01,469.469 23 ERROR glance.common.wsgi [req-7cd63508-bed1-4c5f-b2cc-7f0e93907813 60d12fe738fe73aeea4219a0b3b9e55c8435b55455e7c9f144eece379d88f252 a2caa84313704823b7321b3fb0fc1763 - ec213443e8834473b579f7bea9e8c194 ec213443e8834473b579f7bea9e8c194] Caught error: Container DELETE failed: https://objectstore-3.eu-de-1.cloud.sap:443/v1/AUTH_a2caa84313704823b7321b3fb0fc1763/glance_5f3c87fd-9a0e-4d61-88f9-301e3f01309d 409 Conflict  [
14:35:27 <jokke_> so we do always expect to whack the container. I'm wondering if we really do store one image per container and it doesn't get properly deleted or if there is a chanse of having multiple images in thta one contianer and it's really jut cleanup we fail to catch
14:35:56 <rajiv> its 1 container per image
14:36:09 <rajiv> and segments of 200MB inside the container
14:36:14 <jokke_> I thought it should
14:36:27 <jokke_> so it's really a problem of the segments not getting deleted
14:36:51 <rajiv> yes, our custom code retries deletion 5 times in case of a conflict
14:37:12 <rajiv> and wait time was increased from 1 to 5 seconds, but had no luck
14:37:37 <rajiv> code : https://github.com/sapcc/glance_store/blob/stable/xena-m3/glance_store/_drivers/swift/store.py#L1617-L1639
14:38:47 <jokke_> I wonder what would happen if we instead of trying to delete the object and then the container we just asked swiftclient to delete the container recursively
14:39:05 <jokke_> and let it to deal with it, would the result be the same
14:39:17 <rajiv> yes, i tried this as well but had same results
14:39:24 <jokke_> ok, thanks
14:39:48 <rajiv> does the code need to be time.sleep(self.container_delete_timeout)  https://github.com/sapcc/glance_store/blob/stable/xena-m3/glance_store/_drivers/swift/store.py#L1637
14:39:55 <abhishekk> no
14:40:25 <abhishekk> https://github.com/sapcc/glance_store/blob/2cb722c22a085ee9cdf77d39e37d2955f48811c3/glance_store/_drivers/swift/store.py#L37
14:40:33 <rajiv> i see a similar spec in cinder, hence i asked : https://github.com/sapcc/glance_store/blob/stable/xena-m3/glance_store/_drivers/cinder.py#L659
14:40:34 <jokke_> lets try to get on the next swift weekly and see if they have any better ideas why this happens and how to get around it now whn we know that it's for sure 1:1 relation and it's really swift not deleting the segments
14:41:16 <rajiv> abhishekk: ack
14:41:23 <abhishekk> wait
14:42:19 <abhishekk> this is wrong coding practice but it will work
14:42:53 <abhishekk> Lets move this to glance Irc channel
14:43:35 <rajiv> sure
14:43:41 <abhishekk> thank you all
14:43:50 <abhishekk> #endmeeting