*** rpittau|afk is now known as rpittau | 07:10 | |
abhishekk | #startmeeting glance | 14:00 |
---|---|---|
opendevmeet | Meeting started Thu Jul 29 14:00:16 2021 UTC and is due to finish in 60 minutes. The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot. | 14:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 14:00 |
opendevmeet | The meeting name has been set to 'glance' | 14:00 |
abhishekk | #topic roll call | 14:00 |
jokke_ | o/ | 14:00 |
abhishekk | #link https://etherpad.openstack.org/p/glance-team-meeting-agenda | 14:00 |
abhishekk | o/ | 14:00 |
dansmith | o/ | 14:00 |
croelandt | o/ | 14:00 |
abhishekk | Cool I guess we have enough audience today, lets start, rosmaita will join shortly | 14:01 |
rosmaita | o/ | 14:01 |
abhishekk | #topic release/periodic jobs update | 14:01 |
amorin | helo | 14:01 |
abhishekk | M3 is 5 weeks away and glance-store release is 3 weeks | 14:02 |
abhishekk | We are good on store release front at the moment | 14:02 |
abhishekk | Periodic job, we started hitting time out again, daily at least one job is failing with Time out issue | 14:03 |
abhishekk | Same goes with our patches in gate | 14:03 |
dansmith | where is the timeout? | 14:03 |
abhishekk | wait a minute | 14:04 |
abhishekk | https://zuul.opendev.org/t/openstack/build/e3e617817bce4d7b8fe332ee9a528610 | 14:04 |
dansmith | oh, | 14:04 |
abhishekk | I haven't got much time to go through logs, will do it after the meeting | 14:05 |
dansmith | hmm, yeah okay | 14:05 |
dansmith | interesting that it was doing some metadef namespace tests right then | 14:05 |
abhishekk | ok | 14:05 |
abhishekk | Important part is we have 5 weeks from now for policy refactoring and cache API work | 14:06 |
abhishekk | Implementing project scope for metadefs in good shape and does not have much work left | 14:07 |
abhishekk | Moving ahead | 14:07 |
abhishekk | #topic M3 targets | 14:07 |
abhishekk | Policy refactoring work | 14:07 |
abhishekk | You can get entire overview of this work in one spreadsheet | 14:08 |
abhishekk | #link https://docs.google.com/spreadsheets/d/1SWBq0CsHw8jofHxmOG8QeZEX6veDE4eU0QHItOu8uQs/edit?pli=1#gid=73773117 | 14:08 |
dansmith | (most impressive spreadsheet I've seen in a long time) | 14:08 |
abhishekk | Around 50% patches are up for reviews | 14:08 |
abhishekk | :D | 14:08 |
abhishekk | croelandt, is already doing great job for us in reviews, thank you for that | 14:09 |
dansmith | I don't really expect we're going to fully finish the refactor in xena, but I do think we're making more progress than I thought we would | 14:09 |
abhishekk | hmm, we will assess the situation next week again | 14:10 |
abhishekk | All i will say is guys please review patches, there are plenty in the tree other than policy as well | 14:10 |
abhishekk | I am hoping once lance is back then the work will pick up the speed | 14:11 |
abhishekk | Moving to Cache API | 14:11 |
abhishekk | jokke_, anything to update? | 14:11 |
abhishekk | I guess testing and documentation is pending for the same | 14:12 |
jokke_ | I will hopefully get back to it (the tests) next week. Had couple of days of pto and been busy with some other stuff | 14:12 |
abhishekk | ack, please let me know if there is anything to do | 14:13 |
jokke_ | Which should be sorted this week ;) | 14:13 |
abhishekk | great | 14:13 |
abhishekk | Metadef project persona integration | 14:13 |
abhishekk | #link https://review.opendev.org/c/openstack/glance/+/798700/ | 14:13 |
abhishekk | We have tempest-plugin tests up and under review for the same as well | 14:14 |
abhishekk | That's it from me for today | 14:14 |
abhishekk | Just to know that I have filed one bug in glanceclient | 14:15 |
abhishekk | Our client does not have support to show member details | 14:15 |
abhishekk | Moving to Open discussion | 14:16 |
abhishekk | #topic Open discussion | 14:16 |
abhishekk | CI issue on stable/ussuri? | 14:16 |
abhishekk | #link https://review.opendev.org/c/openstack/glance/+/785552 | 14:16 |
abhishekk | This certainly started to fail, earlier it was passing | 14:17 |
croelandt | yes | 14:17 |
croelandt | the logs are weird | 14:17 |
croelandt | I'm not sure exactly what to do about it | 14:17 |
croelandt | not how to find out the root cause of the issue :/ | 14:17 |
croelandt | IOW: I'm stuck :D | 14:17 |
abhishekk | Me too | 14:17 |
jokke_ | I'll have a quick look after the meeting | 14:17 |
abhishekk | #link glance-code-constants-check https://zuul.opendev.org/t/openstack/build/22fc30ffbb0b400f87f4261d7397fec0 | 14:18 |
abhishekk | great, thank you | 14:18 |
croelandt | I'm not even sure what this job is, tbh | 14:18 |
abhishekk | I will explain you after the meeting | 14:18 |
dansmith | I would like to know too :) | 14:18 |
jokke_ | it's sheet | 14:18 |
jokke_ | :P | 14:18 |
croelandt | jokke_: always here to make things easy to understand :D | 14:19 |
abhishekk | Its has something to do with database migration | 14:19 |
abhishekk | next on Open discussion is we have review request for this bug; | 14:20 |
alistarle | Hello guy's, it's been a long time :) | 14:20 |
abhishekk | #link https://bugs.launchpad.net/glance/+bug/1938265 | 14:20 |
alistarle | we just see a bug with multi-store when taking a snapshot with rbd driver (so using direct-snapshot): https://bugs.launchpad.net/glance/+bug/1938265 | 14:20 |
abhishekk | alistarle, welcome back | 14:20 |
alistarle | Yup, I always come back when I find new bug ><' | 14:20 |
abhishekk | :D | 14:21 |
abhishekk | I think as suggested by Dan we need some tests there | 14:21 |
alistarle | We are currently writing it, it should be submitted today :) | 14:21 |
alistarle | It seems it totally prevent nova to perform snapshot with this configuration | 14:22 |
jokke_ | alistarle: thanks for the patch too, not just a bug :D | 14:22 |
jokke_ | I was peaking into it earlier today | 14:22 |
alistarle | It's not actually my patch, but amorin one :) | 14:23 |
alistarle | But yeah, cool to fix stuff | 14:23 |
amorin | we did it together, but you're the boss for the tests for sure :) | 14:23 |
amorin | do you have any clue on kind of tests for this? | 14:23 |
abhishekk | glad to see that multistore is finally in use | 14:24 |
amorin | FYI, we hit this bug in Stein downstream | 14:24 |
amorin | but code base has not moved so much | 14:24 |
abhishekk | no I guess | 14:25 |
abhishekk | what happens when rbd is not used in nova? | 14:25 |
amorin | then nova is not doing any location update | 14:25 |
amorin | it's done differently AFAIK | 14:25 |
dansmith | yup | 14:25 |
amorin | only direct snapshot is triggering this | 14:26 |
alistarle | He call the standard glance workflow, so POST /images | 14:26 |
alistarle | And what about the solution, seems suitable for you ? | 14:26 |
jokke_ | I wonder if that issue actully persists in Train, or when we implemented the lazy upload abhishekk. That should at least in theory solve it or do we fail the actual add lication call? | 14:26 |
alistarle | I think guessing the store is acceptable, as we already doing it for lazy update | 14:26 |
amorin | I tested this against a small victoria deployment in my dev environment | 14:27 |
amorin | and I triggered the same bug | 14:27 |
alistarle | No because lazy update is called for get and list call, not patch for adding a new location | 14:27 |
jokke_ | kk, so it's the actual location add validation that fails. Interesting | 14:27 |
alistarle | So we are still hitting this enigmatic "Invalid location" 400 error | 14:27 |
abhishekk | jokke_, lazy update is for list or get call, once we have actual image in action | 14:28 |
jokke_ | croelandt: add to your list :D | 14:28 |
alistarle | And something interesting is we are allowing "backend=None" in glance, but in glance_store it is a mandatory parameter | 14:28 |
jokke_ | abhishekk: yeah, but we do get right after the add, so if it would allow adding it, we would have updated it right after I think | 14:28 |
alistarle | And as far as I see, there is no way "backend=None" can produce a workable result, it will always end by a KeyError somewhere | 14:29 |
abhishekk | I think we have one job running for nova-ceph-glance | 14:29 |
abhishekk | It means we have no test to cover this workflow ? | 14:29 |
amorin | :( | 14:30 |
abhishekk | alistarle, nova-ceph-multistore could you verify this job and check whether we run this scenario or not ? | 14:30 |
jokke_ | abhishekk: also means that nova doesn't test snapshotting with Ceph either | 14:30 |
dansmith | unless the ceph-multistore job isn't running full tempest, then we certainly are | 14:31 |
abhishekk | Else for start I guess we can add this scenario as reproducer and then consider this fix | 14:31 |
abhishekk | I do see 923 tests running in that job | 14:32 |
abhishekk | and 80 skips | 14:32 |
dansmith | hmm, looks like maybe the ceph-plugin job we inherit from might be missing some of full tempest | 14:33 |
dansmith | we do volume snap tests, but might be skipping the compute ones | 14:33 |
dansmith | not us, but that plugin's job def, which we inherit from | 14:33 |
jokke_ | it's likely the storage scoped parent job. Few cycles back we stopped running full tempest 20 times on all possible scenarios | 14:34 |
abhishekk | hmm | 14:34 |
abhishekk | So I guess we need to enable those | 14:35 |
dansmith | https://github.com/openstack/devstack-plugin-ceph/blob/master/.zuul.yaml#L31 | 14:35 |
dansmith | looks like that's focused mostly on volume tests | 14:36 |
jokke_ | or we probably should finally default to multistore configs in gate so the tests would actually catch multi-store issues | 14:36 |
jokke_ | as we've deprecated the single store configs for quite a while ago | 14:36 |
dansmith | we've been running that in nova as our ceph job for a long time, before multistore :/ | 14:36 |
abhishekk | jokke_, we don't have support to configure mutiple stores other than file store in devstack | 14:37 |
dansmith | ...and ceph? | 14:37 |
abhishekk | I have patches up for swift and ceph but didn't got time to relook at those | 14:38 |
dansmith | I'm not sure what you mean.. the multistore job is file+ceph | 14:38 |
abhishekk | #link https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/741801 | 14:38 |
dansmith | right, but you know that my multistore job is ceph+file, right? | 14:39 |
abhishekk | you have done it via yaml file by adding some local configs there | 14:39 |
dansmith | right | 14:39 |
abhishekk | yep | 14:39 |
dansmith | you just mean there's no easy flag in devstack, okay | 14:39 |
abhishekk | yeah | 14:39 |
abhishekk | I was saying to jokke's comment, to default multistore in gate | 14:40 |
abhishekk | this for swift + file, https://review.opendev.org/c/openstack/devstack/+/741654 | 14:40 |
alistarle | Despite that I think this bug can be valid also with a single multi-store configuration | 14:40 |
dansmith | ack, okay | 14:41 |
alistarle | No need to have two backend actually configured | 14:41 |
abhishekk | alistarle, right, | 14:41 |
abhishekk | So as I said and rightly commented by Dan on the patch, we need one reproducer and then the fix | 14:42 |
jokke_ | alistarle: correct, that was my point. By now we should not be testing by default the old single store configs but actually configuring the store(s) even single with ulti-store enabled | 14:42 |
dansmith | okay, hold up | 14:43 |
dansmith | test_create_image_from_paused_server[id-71bcb732-0261-11e7-9086-fa163e4fa634] | 14:43 |
dansmith | pass | 14:43 |
jokke_ | as the old way of configuring should have been removed already, but obviously that's a years away if all gating is still relying on it | 14:43 |
dansmith | that's from our multistore job, so it should be creating images from servers | 14:43 |
dansmith | and maybe better: | 14:43 |
dansmith | test_create_delete_image[id-3731d080-d4c5-4872-b41a-64d0d0021314] pass | 14:43 |
dansmith | so we should dig into a job run and see if those are really working, and if so, figure out why | 14:44 |
abhishekk | dansmith, those might be using post call and not copy on write ? | 14:45 |
alistarle | Yes but it fail only with rbd backend, so direct-snapshot enabled | 14:45 |
alistarle | Maybe this job are using file backend ? | 14:45 |
dansmith | abhishekk: in the ceph job they should be rbd-backed, which means it'll do the cow snapshot | 14:45 |
dansmith | if these tests are actually on file backend, then the whole premise of the job is broken | 14:46 |
abhishekk | may be | 14:46 |
abhishekk | I think default backend is ceph | 14:47 |
abhishekk | as defined in job | 14:47 |
dansmith | that's the whole point of the job yeah | 14:47 |
jokke_ | Well Nova is clearly not doing direct snapshots in that job 'cause it would have been broken | 14:48 |
alistarle | Yes I can double check, but even in case of the default backend, this line will fail https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L111 if backend = None | 14:48 |
jokke_ | so even if glnce is using ceph as backend, nova might still be uploading the image | 14:48 |
abhishekk | default is file backend | 14:48 |
alistarle | And backend are directly coming from the metadata of the "add_location" call | 14:48 |
dansmith | jokke_: all of our customers use rbd and direct snapshots.. surely if this was completely broken someone would have mentioned it right? | 14:49 |
jokke_ | dansmith: OSP 16.1+ DCN only. | 14:49 |
amorin | are they using multi store? | 14:49 |
dansmith | amorin: I thought the assertion is that this is broken in non-multistore as well? | 14:50 |
dansmith | jokke_: you mean multistore is only 16.1+, right? | 14:50 |
jokke_ | dansmith: but that's why I told croelandt to add it into his list of work to do as we ned to fix this for downstream customers too | 14:50 |
abhishekk | https://github.com/openstack/nova/blob/master/.zuul.yaml#L480 | 14:50 |
amorin | good point, I havnt checked in non-multistore | 14:51 |
jokke_ | dansmith: only 16.1+ DCN (distributed/edge) with storage. Otherwise we still config the old way | 14:51 |
amorin | it may not be broken because we are not calling the same function | 14:51 |
amorin | we are calling https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L55 AFAIK | 14:51 |
dansmith | ah, alistarle said "single multistore configuration" above.. I took that to mean "non-multistore" but I see | 14:51 |
abhishekk | dansmith, images_rbd_glance_store_name what does this do? | 14:51 |
dansmith | abhishekk: it's just for the auto-copy from another store | 14:52 |
abhishekk | ack | 14:52 |
abhishekk | so the job is running for file store I guess | 14:52 |
abhishekk | Last 7 minutes | 14:53 |
dansmith | I dunno what "running for file store" means | 14:54 |
abhishekk | dansmith, default backend in the job is file backend | 14:54 |
abhishekk | so any request coming from nova to glance will be using file backend | 14:54 |
dansmith | abhishekk: nova will always copy it to rbd first when booting, so it should be on rbd when snapshot happens | 14:55 |
abhishekk | because of that flag I just mentioned earlier ? | 14:55 |
jokke_ | dansmith: but nova doesn't do direct snapshot if it had to copy it into ceph | 14:55 |
dansmith | abhishekk: yes | 14:55 |
dansmith | jokke_: huh? | 14:55 |
jokke_ | dansmith: if nova pulls the image over http from glance and writes it to ceph, it will not do direct snapshot in eph. It will upload the snapshot image back over http too | 14:56 |
dansmith | we're calling set image location clearly in the tests: tempest-ImagesTestJSON-1022511244-project] enforce: rule="set_image_location" | 14:56 |
jokke_ | hmm-m, interesting | 14:57 |
abhishekk | last 3 minutes | 14:57 |
abhishekk | we can move to glance channel for discussion | 14:57 |
dansmith | jokke_: it's not doing that though. it's asking glance to copy the cirros image from file to rbd before it ever boots the instance, and then boots the instance from the rbd copy | 14:57 |
abhishekk | jokke_, you need to look at stable/ussuri job as well | 14:57 |
abhishekk | lets move to our channel | 14:58 |
abhishekk | Thank you all | 14:58 |
abhishekk | See you next week | 14:58 |
abhishekk | Keep reviewing | 14:59 |
abhishekk | #endmeeting | 14:59 |
opendevmeet | Meeting ended Thu Jul 29 14:59:21 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 14:59 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/glance/2021/glance.2021-07-29-14.00.html | 14:59 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/glance/2021/glance.2021-07-29-14.00.txt | 14:59 |
opendevmeet | Log: https://meetings.opendev.org/meetings/glance/2021/glance.2021-07-29-14.00.log.html | 14:59 |
*** rpittau is now known as rpittau|afk | 16:03 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!