14:00:16 <abhishekk> #startmeeting glance
14:00:16 <opendevmeet> Meeting started Thu Jul 29 14:00:16 2021 UTC and is due to finish in 60 minutes.  The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:16 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:16 <opendevmeet> The meeting name has been set to 'glance'
14:00:20 <abhishekk> #topic roll call
14:00:25 <jokke_> o/
14:00:26 <abhishekk> #link https://etherpad.openstack.org/p/glance-team-meeting-agenda
14:00:28 <abhishekk> o/
14:00:37 <dansmith> o/
14:00:43 <croelandt> o/
14:01:13 <abhishekk> Cool I guess we have enough audience today, lets start, rosmaita will join shortly
14:01:22 <rosmaita> o/
14:01:50 <abhishekk> #topic release/periodic jobs update
14:01:56 <amorin> helo
14:02:19 <abhishekk> M3 is 5 weeks away and glance-store release is 3 weeks
14:02:32 <abhishekk> We are good on store release front at the moment
14:03:24 <abhishekk> Periodic job, we started hitting time out again, daily at least one job is failing with Time out issue
14:03:39 <abhishekk> Same goes with our patches in gate
14:03:53 <dansmith> where is the timeout?
14:04:03 <abhishekk> wait a minute
14:04:18 <abhishekk> https://zuul.opendev.org/t/openstack/build/e3e617817bce4d7b8fe332ee9a528610
14:04:33 <dansmith> oh,
14:05:07 <abhishekk> I haven't got much time to go through logs, will do it after the meeting
14:05:12 <dansmith> hmm, yeah okay
14:05:24 <dansmith> interesting that it was doing some metadef namespace tests right then
14:05:54 <abhishekk> ok
14:06:31 <abhishekk> Important part is we have 5 weeks from now for policy refactoring and cache API work
14:07:14 <abhishekk> Implementing project scope for metadefs in good shape and does not have much work left
14:07:24 <abhishekk> Moving ahead
14:07:36 <abhishekk> #topic M3 targets
14:07:52 <abhishekk> Policy refactoring work
14:08:05 <abhishekk> You can get entire overview of this work in one spreadsheet
14:08:12 <abhishekk> #link https://docs.google.com/spreadsheets/d/1SWBq0CsHw8jofHxmOG8QeZEX6veDE4eU0QHItOu8uQs/edit?pli=1#gid=73773117
14:08:18 <dansmith> (most impressive spreadsheet I've seen in a long time)
14:08:27 <abhishekk> Around 50% patches are up for reviews
14:08:39 <abhishekk> :D
14:09:10 <abhishekk> croelandt, is already doing great job for us in reviews, thank you for that
14:09:24 <dansmith> I don't really expect we're going to fully finish the refactor in xena, but I do think we're making more progress than I thought we would
14:10:07 <abhishekk> hmm, we will assess the situation next week again
14:10:53 <abhishekk> All i will say is guys please review patches, there are plenty in the tree other than policy as well
14:11:39 <abhishekk> I am hoping once lance is back then the work will pick up the speed
14:11:46 <abhishekk> Moving to Cache API
14:11:57 <abhishekk> jokke_, anything to update?
14:12:09 <abhishekk> I guess testing and documentation is pending for the same
14:12:32 <jokke_> I will hopefully get back to it (the tests) next week. Had couple of days of pto and been busy with some other stuff
14:13:03 <abhishekk> ack, please let me know if there is anything to do
14:13:05 <jokke_> Which should be sorted this week ;)
14:13:34 <abhishekk> great
14:13:40 <abhishekk> Metadef project persona integration
14:13:52 <abhishekk> #link https://review.opendev.org/c/openstack/glance/+/798700/
14:14:12 <abhishekk> We have tempest-plugin tests up and under review for the same as well
14:14:56 <abhishekk> That's it from me for today
14:15:23 <abhishekk> Just to know that I have filed one bug in glanceclient
14:15:42 <abhishekk> Our client does not have support to show member details
14:16:06 <abhishekk> Moving to Open discussion
14:16:11 <abhishekk> #topic Open discussion
14:16:27 <abhishekk> CI issue on stable/ussuri?
14:16:37 <abhishekk> #link https://review.opendev.org/c/openstack/glance/+/785552
14:17:06 <abhishekk> This certainly started to fail, earlier it was passing
14:17:12 <croelandt> yes
14:17:17 <croelandt> the logs are weird
14:17:22 <croelandt> I'm not sure exactly what to do about it
14:17:29 <croelandt> not how to find out the root cause of the issue :/
14:17:35 <croelandt> IOW: I'm stuck :D
14:17:52 <abhishekk> Me too
14:17:58 <jokke_> I'll have a quick look after the meeting
14:18:01 <abhishekk> #link glance-code-constants-check https://zuul.opendev.org/t/openstack/build/22fc30ffbb0b400f87f4261d7397fec0
14:18:09 <abhishekk> great, thank you
14:18:13 <croelandt> I'm not even sure what this job is, tbh
14:18:31 <abhishekk> I will explain you after the meeting
14:18:37 <dansmith> I would like to know too :)
14:18:37 <jokke_> it's sheet
14:18:49 <jokke_> :P
14:19:03 <croelandt> jokke_: always here to make things easy to understand :D
14:19:29 <abhishekk> Its has something to do with database migration
14:20:05 <abhishekk> next on Open discussion is we have review request for this bug;
14:20:07 <alistarle> Hello guy's, it's been a long time :)
14:20:09 <abhishekk> #link https://bugs.launchpad.net/glance/+bug/1938265
14:20:10 <alistarle> we just see a bug with multi-store when taking a snapshot with rbd driver (so using direct-snapshot): https://bugs.launchpad.net/glance/+bug/1938265
14:20:16 <abhishekk> alistarle, welcome back
14:20:41 <alistarle> Yup, I always come back when I find new bug ><'
14:21:16 <abhishekk> :D
14:21:41 <abhishekk> I think as suggested by Dan we need some tests there
14:21:58 <alistarle> We are currently writing it, it should be submitted today :)
14:22:17 <alistarle> It seems it totally prevent nova to perform snapshot with this configuration
14:22:31 <jokke_> alistarle: thanks for the patch too, not just a bug :D
14:22:43 <jokke_> I was peaking into it earlier today
14:23:03 <alistarle> It's not actually my patch, but amorin one :)
14:23:15 <alistarle> But yeah, cool to fix stuff
14:23:22 <amorin> we did it together, but you're the boss for the tests for sure :)
14:23:57 <amorin> do you have any clue on kind of tests for this?
14:24:01 <abhishekk> glad to see that multistore is finally in use
14:24:24 <amorin> FYI, we hit this bug in Stein downstream
14:24:51 <amorin> but code base has not moved so much
14:25:07 <abhishekk> no I guess
14:25:29 <abhishekk> what happens when rbd is not used in nova?
14:25:48 <amorin> then nova is not doing any location update
14:25:53 <amorin> it's done differently AFAIK
14:25:58 <dansmith> yup
14:26:04 <amorin> only direct snapshot is triggering this
14:26:07 <alistarle> He call the standard glance workflow, so POST /images
14:26:24 <alistarle> And what about the solution, seems suitable for you ?
14:26:45 <jokke_> I wonder if that issue actully persists in Train, or when we implemented the lazy upload abhishekk. That should at least in theory solve it or do we fail the actual add lication call?
14:26:52 <alistarle> I think guessing the store is acceptable, as we already doing it for lazy update
14:27:12 <amorin> I tested this against a small victoria deployment in my dev environment
14:27:17 <amorin> and I triggered the same bug
14:27:24 <alistarle> No because lazy update is called for get and list call, not patch for adding a new location
14:27:46 <jokke_> kk, so it's the actual location add validation that fails. Interesting
14:27:58 <alistarle> So we are still hitting this enigmatic "Invalid location" 400 error
14:28:05 <abhishekk> jokke_, lazy update is for list or get call, once we have actual image in action
14:28:06 <jokke_> croelandt: add to your list :D
14:28:44 <alistarle> And something interesting is we are allowing "backend=None" in glance, but in glance_store it is a mandatory parameter
14:28:48 <jokke_> abhishekk: yeah, but we do get right after the add, so if it would allow adding it, we would have updated it right after I think
14:29:13 <alistarle> And as far as I see, there is no way "backend=None" can produce a workable result, it will always end by a KeyError somewhere
14:29:23 <abhishekk> I think we have one job running for nova-ceph-glance
14:29:38 <abhishekk> It means we have no test to cover this workflow ?
14:30:08 <amorin> :(
14:30:32 <abhishekk> alistarle, nova-ceph-multistore could you verify this job and check whether we run this scenario or not ?
14:30:49 <jokke_> abhishekk: also means that nova doesn't test snapshotting with Ceph either
14:31:18 <dansmith> unless the ceph-multistore job isn't running full tempest, then we certainly are
14:31:26 <abhishekk> Else for start I guess we can add this scenario as reproducer and then consider this fix
14:32:50 <abhishekk> I do see 923 tests running in that job
14:32:58 <abhishekk> and 80 skips
14:33:09 <dansmith> hmm, looks like maybe the ceph-plugin job we inherit from might be missing some of full tempest
14:33:21 <dansmith> we do volume snap tests, but might be skipping the compute ones
14:33:30 <dansmith> not us, but that plugin's job def, which we inherit from
14:34:26 <jokke_> it's likely the storage scoped parent job. Few cycles back we stopped running full tempest 20 times on all possible scenarios
14:34:49 <abhishekk> hmm
14:35:32 <abhishekk> So I guess we need to enable those
14:35:58 <dansmith> https://github.com/openstack/devstack-plugin-ceph/blob/master/.zuul.yaml#L31
14:36:07 <dansmith> looks like that's focused mostly on volume tests
14:36:17 <jokke_> or we probably should finally default to multistore configs in gate so the tests would actually catch multi-store issues
14:36:37 <jokke_> as we've deprecated the single store configs for quite a while ago
14:36:39 <dansmith> we've been running that in nova as our ceph job for a long time, before multistore :/
14:37:29 <abhishekk> jokke_, we don't have support to configure mutiple stores other than file store in devstack
14:37:50 <dansmith> ...and ceph?
14:38:06 <abhishekk> I have patches up for swift and ceph but didn't got time to relook at those
14:38:24 <dansmith> I'm not sure what you mean.. the multistore job is file+ceph
14:38:57 <abhishekk> #link https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/741801
14:39:35 <dansmith> right, but you know that my multistore job is ceph+file, right?
14:39:37 <abhishekk> you have done it via yaml file by adding some local configs there
14:39:42 <dansmith> right
14:39:45 <abhishekk> yep
14:39:50 <dansmith> you just mean there's no easy flag in devstack, okay
14:39:55 <abhishekk> yeah
14:40:19 <abhishekk> I was saying to jokke's comment, to default multistore in gate
14:40:52 <abhishekk> this for swift + file, https://review.opendev.org/c/openstack/devstack/+/741654
14:40:58 <alistarle> Despite that I think this bug can be valid also with a single multi-store configuration
14:41:03 <dansmith> ack, okay
14:41:09 <alistarle> No need to have two backend actually configured
14:41:32 <abhishekk> alistarle, right,
14:42:16 <abhishekk> So as I said and rightly commented by Dan on the patch, we need one reproducer and then the fix
14:42:29 <jokke_> alistarle: correct, that was my point. By now we should not be testing by default the old single store configs but actually configuring the store(s) even single with ulti-store enabled
14:43:04 <dansmith> okay, hold up
14:43:04 <dansmith> test_create_image_from_paused_server[id-71bcb732-0261-11e7-9086-fa163e4fa634]
14:43:04 <dansmith> pass
14:43:19 <jokke_> as the old way of configuring should have been removed already, but obviously that's a years away if all gating is still relying on it
14:43:26 <dansmith> that's from our multistore job, so it should be creating images from servers
14:43:47 <dansmith> and maybe better:
14:43:52 <dansmith> test_create_delete_image[id-3731d080-d4c5-4872-b41a-64d0d0021314] pass
14:44:30 <dansmith> so we should dig into a job run and see if those are really working, and if so, figure out why
14:45:23 <abhishekk> dansmith, those might be using post call and not copy on write ?
14:45:45 <alistarle> Yes but it fail only with rbd backend, so direct-snapshot enabled
14:45:54 <alistarle> Maybe this job are using file backend ?
14:45:55 <dansmith> abhishekk: in the ceph job they should be rbd-backed, which means it'll do the cow snapshot
14:46:23 <dansmith> if these tests are actually on file backend, then the whole premise of the job is broken
14:46:34 <abhishekk> may be
14:47:05 <abhishekk> I think default backend is ceph
14:47:14 <abhishekk> as defined in job
14:47:44 <dansmith> that's the whole point of the job yeah
14:48:21 <jokke_> Well Nova is clearly not doing direct snapshots in that job 'cause it would have been broken
14:48:33 <alistarle> Yes I can double check, but even in case of the default backend, this line will fail https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L111 if backend = None
14:48:48 <jokke_> so even if glnce is using ceph as backend, nova might still be uploading the image
14:48:51 <abhishekk> default is file backend
14:48:53 <alistarle> And backend are directly coming from the metadata of the "add_location" call
14:49:08 <dansmith> jokke_: all of our customers use rbd and direct snapshots.. surely if this was completely broken someone would have mentioned it right?
14:49:42 <jokke_> dansmith: OSP 16.1+ DCN only.
14:49:46 <amorin> are they using multi store?
14:50:06 <dansmith> amorin: I thought the assertion is that this is broken in non-multistore as well?
14:50:25 <dansmith> jokke_: you mean multistore is only 16.1+, right?
14:50:32 <jokke_> dansmith: but that's why I told croelandt to add it into his list of work to do as we ned to fix this for downstream customers too
14:50:46 <abhishekk> https://github.com/openstack/nova/blob/master/.zuul.yaml#L480
14:51:11 <amorin> good point, I havnt checked in non-multistore
14:51:19 <jokke_> dansmith: only 16.1+ DCN (distributed/edge) with storage. Otherwise we still config the old way
14:51:49 <amorin> it may not be broken because we are not calling the same function
14:51:56 <amorin> we are calling https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L55 AFAIK
14:51:57 <dansmith> ah, alistarle said "single multistore configuration" above.. I took that to mean "non-multistore" but I see
14:51:59 <abhishekk> dansmith, images_rbd_glance_store_name what does this do?
14:52:14 <dansmith> abhishekk: it's just for the auto-copy from another store
14:52:27 <abhishekk> ack
14:52:45 <abhishekk> so the job is running for file store I guess
14:53:21 <abhishekk> Last 7 minutes
14:54:02 <dansmith> I dunno what "running for file store" means
14:54:19 <abhishekk> dansmith, default backend in the job is file backend
14:54:44 <abhishekk> so any request coming from nova to glance will be using file backend
14:55:02 <dansmith> abhishekk: nova will always copy it to rbd first when booting, so it should be on rbd when snapshot happens
14:55:29 <abhishekk> because of that flag I just mentioned earlier ?
14:55:41 <jokke_> dansmith: but nova doesn't do direct snapshot if it had to copy it into ceph
14:55:53 <dansmith> abhishekk: yes
14:55:57 <dansmith> jokke_: huh?
14:56:39 <jokke_> dansmith: if nova pulls the image over http from glance and writes it to ceph, it will not do direct snapshot in eph. It will upload the snapshot image back over http too
14:56:49 <dansmith> we're calling set image location clearly in the tests: tempest-ImagesTestJSON-1022511244-project] enforce: rule="set_image_location"
14:57:04 <jokke_> hmm-m, interesting
14:57:24 <abhishekk> last 3 minutes
14:57:34 <abhishekk> we can move to glance channel for discussion
14:57:42 <dansmith> jokke_: it's not doing that though. it's asking glance to copy the cirros image from file to rbd before it ever boots the instance, and then boots the instance from the rbd copy
14:57:56 <abhishekk> jokke_, you need to look at stable/ussuri job as well
14:58:46 <abhishekk> lets move to our channel
14:58:50 <abhishekk> Thank you all
14:58:59 <abhishekk> See you next week
14:59:05 <abhishekk> Keep reviewing
14:59:21 <abhishekk> #endmeeting