14:00:16 <abhishekk> #startmeeting glance 14:00:16 <opendevmeet> Meeting started Thu Jul 29 14:00:16 2021 UTC and is due to finish in 60 minutes. The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:16 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:16 <opendevmeet> The meeting name has been set to 'glance' 14:00:20 <abhishekk> #topic roll call 14:00:25 <jokke_> o/ 14:00:26 <abhishekk> #link https://etherpad.openstack.org/p/glance-team-meeting-agenda 14:00:28 <abhishekk> o/ 14:00:37 <dansmith> o/ 14:00:43 <croelandt> o/ 14:01:13 <abhishekk> Cool I guess we have enough audience today, lets start, rosmaita will join shortly 14:01:22 <rosmaita> o/ 14:01:50 <abhishekk> #topic release/periodic jobs update 14:01:56 <amorin> helo 14:02:19 <abhishekk> M3 is 5 weeks away and glance-store release is 3 weeks 14:02:32 <abhishekk> We are good on store release front at the moment 14:03:24 <abhishekk> Periodic job, we started hitting time out again, daily at least one job is failing with Time out issue 14:03:39 <abhishekk> Same goes with our patches in gate 14:03:53 <dansmith> where is the timeout? 14:04:03 <abhishekk> wait a minute 14:04:18 <abhishekk> https://zuul.opendev.org/t/openstack/build/e3e617817bce4d7b8fe332ee9a528610 14:04:33 <dansmith> oh, 14:05:07 <abhishekk> I haven't got much time to go through logs, will do it after the meeting 14:05:12 <dansmith> hmm, yeah okay 14:05:24 <dansmith> interesting that it was doing some metadef namespace tests right then 14:05:54 <abhishekk> ok 14:06:31 <abhishekk> Important part is we have 5 weeks from now for policy refactoring and cache API work 14:07:14 <abhishekk> Implementing project scope for metadefs in good shape and does not have much work left 14:07:24 <abhishekk> Moving ahead 14:07:36 <abhishekk> #topic M3 targets 14:07:52 <abhishekk> Policy refactoring work 14:08:05 <abhishekk> You can get entire overview of this work in one spreadsheet 14:08:12 <abhishekk> #link https://docs.google.com/spreadsheets/d/1SWBq0CsHw8jofHxmOG8QeZEX6veDE4eU0QHItOu8uQs/edit?pli=1#gid=73773117 14:08:18 <dansmith> (most impressive spreadsheet I've seen in a long time) 14:08:27 <abhishekk> Around 50% patches are up for reviews 14:08:39 <abhishekk> :D 14:09:10 <abhishekk> croelandt, is already doing great job for us in reviews, thank you for that 14:09:24 <dansmith> I don't really expect we're going to fully finish the refactor in xena, but I do think we're making more progress than I thought we would 14:10:07 <abhishekk> hmm, we will assess the situation next week again 14:10:53 <abhishekk> All i will say is guys please review patches, there are plenty in the tree other than policy as well 14:11:39 <abhishekk> I am hoping once lance is back then the work will pick up the speed 14:11:46 <abhishekk> Moving to Cache API 14:11:57 <abhishekk> jokke_, anything to update? 14:12:09 <abhishekk> I guess testing and documentation is pending for the same 14:12:32 <jokke_> I will hopefully get back to it (the tests) next week. Had couple of days of pto and been busy with some other stuff 14:13:03 <abhishekk> ack, please let me know if there is anything to do 14:13:05 <jokke_> Which should be sorted this week ;) 14:13:34 <abhishekk> great 14:13:40 <abhishekk> Metadef project persona integration 14:13:52 <abhishekk> #link https://review.opendev.org/c/openstack/glance/+/798700/ 14:14:12 <abhishekk> We have tempest-plugin tests up and under review for the same as well 14:14:56 <abhishekk> That's it from me for today 14:15:23 <abhishekk> Just to know that I have filed one bug in glanceclient 14:15:42 <abhishekk> Our client does not have support to show member details 14:16:06 <abhishekk> Moving to Open discussion 14:16:11 <abhishekk> #topic Open discussion 14:16:27 <abhishekk> CI issue on stable/ussuri? 14:16:37 <abhishekk> #link https://review.opendev.org/c/openstack/glance/+/785552 14:17:06 <abhishekk> This certainly started to fail, earlier it was passing 14:17:12 <croelandt> yes 14:17:17 <croelandt> the logs are weird 14:17:22 <croelandt> I'm not sure exactly what to do about it 14:17:29 <croelandt> not how to find out the root cause of the issue :/ 14:17:35 <croelandt> IOW: I'm stuck :D 14:17:52 <abhishekk> Me too 14:17:58 <jokke_> I'll have a quick look after the meeting 14:18:01 <abhishekk> #link glance-code-constants-check https://zuul.opendev.org/t/openstack/build/22fc30ffbb0b400f87f4261d7397fec0 14:18:09 <abhishekk> great, thank you 14:18:13 <croelandt> I'm not even sure what this job is, tbh 14:18:31 <abhishekk> I will explain you after the meeting 14:18:37 <dansmith> I would like to know too :) 14:18:37 <jokke_> it's sheet 14:18:49 <jokke_> :P 14:19:03 <croelandt> jokke_: always here to make things easy to understand :D 14:19:29 <abhishekk> Its has something to do with database migration 14:20:05 <abhishekk> next on Open discussion is we have review request for this bug; 14:20:07 <alistarle> Hello guy's, it's been a long time :) 14:20:09 <abhishekk> #link https://bugs.launchpad.net/glance/+bug/1938265 14:20:10 <alistarle> we just see a bug with multi-store when taking a snapshot with rbd driver (so using direct-snapshot): https://bugs.launchpad.net/glance/+bug/1938265 14:20:16 <abhishekk> alistarle, welcome back 14:20:41 <alistarle> Yup, I always come back when I find new bug ><' 14:21:16 <abhishekk> :D 14:21:41 <abhishekk> I think as suggested by Dan we need some tests there 14:21:58 <alistarle> We are currently writing it, it should be submitted today :) 14:22:17 <alistarle> It seems it totally prevent nova to perform snapshot with this configuration 14:22:31 <jokke_> alistarle: thanks for the patch too, not just a bug :D 14:22:43 <jokke_> I was peaking into it earlier today 14:23:03 <alistarle> It's not actually my patch, but amorin one :) 14:23:15 <alistarle> But yeah, cool to fix stuff 14:23:22 <amorin> we did it together, but you're the boss for the tests for sure :) 14:23:57 <amorin> do you have any clue on kind of tests for this? 14:24:01 <abhishekk> glad to see that multistore is finally in use 14:24:24 <amorin> FYI, we hit this bug in Stein downstream 14:24:51 <amorin> but code base has not moved so much 14:25:07 <abhishekk> no I guess 14:25:29 <abhishekk> what happens when rbd is not used in nova? 14:25:48 <amorin> then nova is not doing any location update 14:25:53 <amorin> it's done differently AFAIK 14:25:58 <dansmith> yup 14:26:04 <amorin> only direct snapshot is triggering this 14:26:07 <alistarle> He call the standard glance workflow, so POST /images 14:26:24 <alistarle> And what about the solution, seems suitable for you ? 14:26:45 <jokke_> I wonder if that issue actully persists in Train, or when we implemented the lazy upload abhishekk. That should at least in theory solve it or do we fail the actual add lication call? 14:26:52 <alistarle> I think guessing the store is acceptable, as we already doing it for lazy update 14:27:12 <amorin> I tested this against a small victoria deployment in my dev environment 14:27:17 <amorin> and I triggered the same bug 14:27:24 <alistarle> No because lazy update is called for get and list call, not patch for adding a new location 14:27:46 <jokke_> kk, so it's the actual location add validation that fails. Interesting 14:27:58 <alistarle> So we are still hitting this enigmatic "Invalid location" 400 error 14:28:05 <abhishekk> jokke_, lazy update is for list or get call, once we have actual image in action 14:28:06 <jokke_> croelandt: add to your list :D 14:28:44 <alistarle> And something interesting is we are allowing "backend=None" in glance, but in glance_store it is a mandatory parameter 14:28:48 <jokke_> abhishekk: yeah, but we do get right after the add, so if it would allow adding it, we would have updated it right after I think 14:29:13 <alistarle> And as far as I see, there is no way "backend=None" can produce a workable result, it will always end by a KeyError somewhere 14:29:23 <abhishekk> I think we have one job running for nova-ceph-glance 14:29:38 <abhishekk> It means we have no test to cover this workflow ? 14:30:08 <amorin> :( 14:30:32 <abhishekk> alistarle, nova-ceph-multistore could you verify this job and check whether we run this scenario or not ? 14:30:49 <jokke_> abhishekk: also means that nova doesn't test snapshotting with Ceph either 14:31:18 <dansmith> unless the ceph-multistore job isn't running full tempest, then we certainly are 14:31:26 <abhishekk> Else for start I guess we can add this scenario as reproducer and then consider this fix 14:32:50 <abhishekk> I do see 923 tests running in that job 14:32:58 <abhishekk> and 80 skips 14:33:09 <dansmith> hmm, looks like maybe the ceph-plugin job we inherit from might be missing some of full tempest 14:33:21 <dansmith> we do volume snap tests, but might be skipping the compute ones 14:33:30 <dansmith> not us, but that plugin's job def, which we inherit from 14:34:26 <jokke_> it's likely the storage scoped parent job. Few cycles back we stopped running full tempest 20 times on all possible scenarios 14:34:49 <abhishekk> hmm 14:35:32 <abhishekk> So I guess we need to enable those 14:35:58 <dansmith> https://github.com/openstack/devstack-plugin-ceph/blob/master/.zuul.yaml#L31 14:36:07 <dansmith> looks like that's focused mostly on volume tests 14:36:17 <jokke_> or we probably should finally default to multistore configs in gate so the tests would actually catch multi-store issues 14:36:37 <jokke_> as we've deprecated the single store configs for quite a while ago 14:36:39 <dansmith> we've been running that in nova as our ceph job for a long time, before multistore :/ 14:37:29 <abhishekk> jokke_, we don't have support to configure mutiple stores other than file store in devstack 14:37:50 <dansmith> ...and ceph? 14:38:06 <abhishekk> I have patches up for swift and ceph but didn't got time to relook at those 14:38:24 <dansmith> I'm not sure what you mean.. the multistore job is file+ceph 14:38:57 <abhishekk> #link https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/741801 14:39:35 <dansmith> right, but you know that my multistore job is ceph+file, right? 14:39:37 <abhishekk> you have done it via yaml file by adding some local configs there 14:39:42 <dansmith> right 14:39:45 <abhishekk> yep 14:39:50 <dansmith> you just mean there's no easy flag in devstack, okay 14:39:55 <abhishekk> yeah 14:40:19 <abhishekk> I was saying to jokke's comment, to default multistore in gate 14:40:52 <abhishekk> this for swift + file, https://review.opendev.org/c/openstack/devstack/+/741654 14:40:58 <alistarle> Despite that I think this bug can be valid also with a single multi-store configuration 14:41:03 <dansmith> ack, okay 14:41:09 <alistarle> No need to have two backend actually configured 14:41:32 <abhishekk> alistarle, right, 14:42:16 <abhishekk> So as I said and rightly commented by Dan on the patch, we need one reproducer and then the fix 14:42:29 <jokke_> alistarle: correct, that was my point. By now we should not be testing by default the old single store configs but actually configuring the store(s) even single with ulti-store enabled 14:43:04 <dansmith> okay, hold up 14:43:04 <dansmith> test_create_image_from_paused_server[id-71bcb732-0261-11e7-9086-fa163e4fa634] 14:43:04 <dansmith> pass 14:43:19 <jokke_> as the old way of configuring should have been removed already, but obviously that's a years away if all gating is still relying on it 14:43:26 <dansmith> that's from our multistore job, so it should be creating images from servers 14:43:47 <dansmith> and maybe better: 14:43:52 <dansmith> test_create_delete_image[id-3731d080-d4c5-4872-b41a-64d0d0021314] pass 14:44:30 <dansmith> so we should dig into a job run and see if those are really working, and if so, figure out why 14:45:23 <abhishekk> dansmith, those might be using post call and not copy on write ? 14:45:45 <alistarle> Yes but it fail only with rbd backend, so direct-snapshot enabled 14:45:54 <alistarle> Maybe this job are using file backend ? 14:45:55 <dansmith> abhishekk: in the ceph job they should be rbd-backed, which means it'll do the cow snapshot 14:46:23 <dansmith> if these tests are actually on file backend, then the whole premise of the job is broken 14:46:34 <abhishekk> may be 14:47:05 <abhishekk> I think default backend is ceph 14:47:14 <abhishekk> as defined in job 14:47:44 <dansmith> that's the whole point of the job yeah 14:48:21 <jokke_> Well Nova is clearly not doing direct snapshots in that job 'cause it would have been broken 14:48:33 <alistarle> Yes I can double check, but even in case of the default backend, this line will fail https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L111 if backend = None 14:48:48 <jokke_> so even if glnce is using ceph as backend, nova might still be uploading the image 14:48:51 <abhishekk> default is file backend 14:48:53 <alistarle> And backend are directly coming from the metadata of the "add_location" call 14:49:08 <dansmith> jokke_: all of our customers use rbd and direct snapshots.. surely if this was completely broken someone would have mentioned it right? 14:49:42 <jokke_> dansmith: OSP 16.1+ DCN only. 14:49:46 <amorin> are they using multi store? 14:50:06 <dansmith> amorin: I thought the assertion is that this is broken in non-multistore as well? 14:50:25 <dansmith> jokke_: you mean multistore is only 16.1+, right? 14:50:32 <jokke_> dansmith: but that's why I told croelandt to add it into his list of work to do as we ned to fix this for downstream customers too 14:50:46 <abhishekk> https://github.com/openstack/nova/blob/master/.zuul.yaml#L480 14:51:11 <amorin> good point, I havnt checked in non-multistore 14:51:19 <jokke_> dansmith: only 16.1+ DCN (distributed/edge) with storage. Otherwise we still config the old way 14:51:49 <amorin> it may not be broken because we are not calling the same function 14:51:56 <amorin> we are calling https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L55 AFAIK 14:51:57 <dansmith> ah, alistarle said "single multistore configuration" above.. I took that to mean "non-multistore" but I see 14:51:59 <abhishekk> dansmith, images_rbd_glance_store_name what does this do? 14:52:14 <dansmith> abhishekk: it's just for the auto-copy from another store 14:52:27 <abhishekk> ack 14:52:45 <abhishekk> so the job is running for file store I guess 14:53:21 <abhishekk> Last 7 minutes 14:54:02 <dansmith> I dunno what "running for file store" means 14:54:19 <abhishekk> dansmith, default backend in the job is file backend 14:54:44 <abhishekk> so any request coming from nova to glance will be using file backend 14:55:02 <dansmith> abhishekk: nova will always copy it to rbd first when booting, so it should be on rbd when snapshot happens 14:55:29 <abhishekk> because of that flag I just mentioned earlier ? 14:55:41 <jokke_> dansmith: but nova doesn't do direct snapshot if it had to copy it into ceph 14:55:53 <dansmith> abhishekk: yes 14:55:57 <dansmith> jokke_: huh? 14:56:39 <jokke_> dansmith: if nova pulls the image over http from glance and writes it to ceph, it will not do direct snapshot in eph. It will upload the snapshot image back over http too 14:56:49 <dansmith> we're calling set image location clearly in the tests: tempest-ImagesTestJSON-1022511244-project] enforce: rule="set_image_location" 14:57:04 <jokke_> hmm-m, interesting 14:57:24 <abhishekk> last 3 minutes 14:57:34 <abhishekk> we can move to glance channel for discussion 14:57:42 <dansmith> jokke_: it's not doing that though. it's asking glance to copy the cirros image from file to rbd before it ever boots the instance, and then boots the instance from the rbd copy 14:57:56 <abhishekk> jokke_, you need to look at stable/ussuri job as well 14:58:46 <abhishekk> lets move to our channel 14:58:50 <abhishekk> Thank you all 14:58:59 <abhishekk> See you next week 14:59:05 <abhishekk> Keep reviewing 14:59:21 <abhishekk> #endmeeting