14:00:09 <abhishekk> #startmeeting glance
14:00:09 <opendevmeet> Meeting started Thu Aug 26 14:00:09 2021 UTC and is due to finish in 60 minutes.  The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:09 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:09 <opendevmeet> The meeting name has been set to 'glance'
14:00:15 <abhishekk> #topic roll call
14:00:20 <abhishekk> #link https://etherpad.openstack.org/p/glance-team-meeting-agenda
14:00:21 <dansmith> o/
14:00:25 <abhishekk> o/
14:01:01 <abhishekk> lets wait couple of minutes for others to show
14:01:41 <abhishekk> I doubt rosmaita will join us today
14:01:55 <jokke_> o/
14:02:01 <abhishekk> Lets start
14:02:09 <abhishekk> #topic release/periodic jobs update
14:02:19 <abhishekk> M3 next week, but we will tag it a week after M3
14:02:29 <abhishekk> i.e. next to next week
14:02:48 <abhishekk> so we still have around 6 working days to get things done
14:03:11 <abhishekk> python-glanceclient needs to be tagged next week though
14:03:33 <abhishekk> I will put a release patch around Sept 01 for the same
14:03:36 <pdeore> o/
14:04:02 <abhishekk> Surprisingly periodic jobs no timeouts for last  3 days
14:04:06 <abhishekk> all green at the moment
14:04:36 <abhishekk> #topic M3 targets
14:04:42 <abhishekk> Glance Xena 3 review dashboard - https://tinyurl.com/glance-xena-3
14:04:57 <abhishekk> Most of the policy patches are merged and remaining are approved
14:05:21 <abhishekk> due to heavy traffic in gate we are facing some unusual failures, I will keep watch on them
14:05:55 <jokke_> I'd say usual milestone failures :)
14:05:55 <abhishekk> Thank you croelandt and dansmith and lance for reviewing these patches on priority
14:06:18 <jokke_> Every cycle the same thing. Just like every year the winter surprises the Finns :D
14:06:19 <abhishekk> again and again at crucial time
14:06:38 <abhishekk> Cache API - Still under review - FFE required?
14:07:01 <abhishekk> There are some comments on tests and some doc changes needs to be done
14:07:22 <abhishekk> need to mention header which we are using to clear the cached and queued images in docs and api reference as well
14:07:40 <dansmith> I have some more draft comments I'm still mulling, I will move those to the latest PS or drop them accordingly
14:07:53 <jokke_> I fixed the comments and the output of clear_cache as Dan kindly pointed out that it was very silly behaviour
14:08:04 <abhishekk> ack
14:08:31 <abhishekk> We will revisit the progress next week and decide on FFE grant for the same
14:09:18 <abhishekk> Any questions ?
14:09:30 <abhishekk> Same goes for metadef project persona
14:09:53 <jokke_> Just FYI I'll be on PTO next week. I'd say the debate on the tests are great opportunity of followup patch after the FF as that API change needs to merge so we can get the client patch in before it needs to be released
14:09:54 <abhishekk> Patches are now open for review and we have good functional test coverage there to ensure the new RBAC behavior
14:10:32 <jokke_> so FFE for that specific work is not great as it needs the client side
14:10:59 <abhishekk> ack, I haven't got enough time to have a look on new patch set or other review comments
14:11:47 <abhishekk> In your absence I will work on that
14:12:43 <abhishekk> Coming back to RBAC metadef, we are working on glance-tempest-plugin protection testing and that will be up and complete by tomorrow
14:13:28 <abhishekk> But I think the functional coverage on the glance side is good and we can consider those changes for M3
14:14:38 <abhishekk> I will apply for the FFE for the same if this is not merged before M3 work
14:14:42 <abhishekk> Moving ahead
14:15:01 <abhishekk> #topic Wallaby backports
14:15:20 <croelandt> whoami-rajat: ^
14:15:40 <croelandt> So, these 2 backports in the agenda are part of a huge big fux that includes Cinder patches as well
14:15:46 <whoami-rajat> hi
14:15:55 <croelandt> We were under the impression it was ok to backport in Wallaby
14:16:04 <croelandt> the first patch is a new feature but also support for the bug fix
14:16:22 <croelandt> I think Rajat has users affected by this in upstream Cinder, am I right?
14:16:30 <whoami-rajat> yes
14:16:59 <whoami-rajat> so most of the fixes on glance cinder side, like multi store support, format info support are all dependent on this attachment API code
14:17:09 <abhishekk> I am also under impression that if we have a crucial bug fix then we can backport supportive patches for it to stable branches
14:18:08 <abhishekk> and, I have seen some similar kind of backports in upstream in the past (not for glance though)
14:18:32 <croelandt> jokke_: what do you think?
14:18:54 <whoami-rajat> I've already backported cinder side changes and they're already +2ed, so we won't have any issues on code side as far as I'm aware
14:19:34 <abhishekk> Can we also have opinion from some other stable maintainers ?
14:20:06 <jokke_> I think I already pointed out in the early phase of fixing these bugs that we should not had depended the prevent qcow2 on nfs on the attachement change as that is not backportable really by the policy.
14:21:16 <whoami-rajat> we can't do the qcow2 change without adding the new attachment API changes, it depends on the attachments get command
14:21:28 <whoami-rajat> s/command/api
14:21:40 <abhishekk> hmm, the policy suggests some corner cases as well
14:23:11 <jokke_> And I do know ad understand that we will be backporting these in downstream anyways but that's totally different story. What comes to the upstream backport is all the refactoring, new dependencies etc. of that attachment API is making it very dodgy backport
14:23:59 <croelandt> Until when are we gonna be backporting stuffi n wallaby?
14:24:03 <croelandt> This might not be an issue for long :D
14:24:28 <jokke_> whoami-rajat: I think we could have done that by looking the volume connection we get and looking the image file we have. There was no need ofr attachment api to figure out that we have combo of qcow2+NFS that we cannot support
14:25:53 <jokke_> croelandt: wallaby is in active stable maintenance still for another ~8months
14:26:13 <abhishekk> I think we should get opinion from stable team as well
14:26:33 <croelandt> jokke_: is it likely that backporting these patches is going to be an issue in the next 8 months?
14:26:53 <whoami-rajat> jokke_, the initialize_connection call doesn't return the right format, the feature i implemented on cinder side was discussed during PTG and was decided to include the format in connection_info in the new attachment API response
14:26:55 <abhishekk> Unfortunately we have couple of requests for it from other customers but we will stick to policy if we need to
14:27:17 <abhishekk> we had lengthy discussion for that
14:28:04 <croelandt> abhishekk: so, how do we make our decision?
14:28:11 <dansmith> I'm not on glance stable,
14:28:37 <dansmith> but I definitely opt for less backporting in general, and definitely extreme caution over anything complex unless it's absolutely necessary
14:29:42 <dansmith> I think I looked at this before briefly, and I don't remember all the details, but anything that requires glance and cinder things to be backported is high risk for breaking people that don't upgrade both services in lockstep unless both sides are fully tolerant (and tested that way) of one happening before the other
14:30:03 <dansmith> downstream we can handle that testing and risk (and support if it breaks) but it's not really appropriate material for upstream stable in general, IMHO
14:31:00 <abhishekk> croelandt, I think we need some opinion from other stable maintainers as well
14:31:20 <abhishekk> but what dansmith has said now, this might be problematic in case of upgrade
14:31:43 <rosmaita> sorry i'm late
14:31:44 <jokke_> dansmith: yeah, that kind of can be flagged in the requirements, which we don't currently do. But in general there is just too many red flags. It's not one or two of our stable rules this specific case is crossing
14:31:50 <jokke_> hi rosmaita \o
14:32:03 <jokke_> just in time for the attachment api backport discussion
14:32:08 <rosmaita> ah
14:32:28 <dansmith> jokke_: requirements.txt you mean? that has nothing to do with what is installed on other servers in a cluster, and certainly no direct impact on distro packages
14:32:30 <abhishekk> I think rosmaita is stable member for very long
14:32:39 <rosmaita> too long
14:32:42 <rosmaita> and not very stable
14:32:49 <abhishekk> :D
14:32:52 <croelandt> *badum tss*
14:33:03 <rosmaita> is there a place i can read the scrollback?
14:33:08 <abhishekk> So just to give you short overview
14:33:15 <rosmaita> i think the logs don't get published until the meeting ends
14:33:23 <rosmaita> ok, short overview is good
14:33:28 <jokke_> dansmith: well sure, not the service. I was more thinking of cinderclient and os_brick needing to be able to do the right thing anyways
14:33:33 <abhishekk> we have one bug fix to backport which id depend on the patch which is impemented as a feature
14:33:53 <abhishekk> #link https://review.opendev.org/c/openstack/glance_store/+/805927
14:33:57 <abhishekk> this is actual bug fix
14:34:05 <abhishekk> #link https://review.opendev.org/c/openstack/glance_store/+/805926
14:34:22 <abhishekk> this is dependent patch which is needed for the above backport
14:34:43 <abhishekk> I am pro for this backport because I thought;
14:35:00 <abhishekk> the change is related to cinder driver and will not affect any other glance backend drivers
14:35:03 <jokke_> rosmaita: basically the qcow2+NFS was implemented in a way that it depends on the attachment API support. Which is problematic backport due to it introducing new dependency, depending on cinder side backports and refactoring significant amount of the driver code
14:35:45 <abhishekk> and in the past for some other projects I have seen these kind of backports were supported
14:36:32 <rosmaita> well, rajat described the cinder attitude to driver backports very clearly in his last comment on https://review.opendev.org/c/openstack/glance_store/+/805926
14:37:49 <abhishekk> yes
14:37:54 <rosmaita> our view is that people actually use the drivers, and it's a big ask to make them upgrade their entire cloud to a new release, rather than update within their current release
14:38:39 <dansmith> you could apply that reasoning to any feature backport for a buddy right?
14:38:53 <rosmaita> not really
14:38:59 <dansmith> "My buddy doesn't want to upgrade but does want this one feature, so we're going to backport so he doesn't have to upgrade?"
14:39:00 <rosmaita> if it impacts main cinder code, we don't do it
14:39:30 <rosmaita> the difference is that it's isolated to a single driver
14:39:31 <jokke_> dansmith: that's why we backport in downstream like there is no tomorrow
14:39:34 <dansmith> you can apply this to a driver or the main code, either yields the same result for me :)
14:39:38 <dansmith> jokke_: exactly
14:39:54 <croelandt> jokke_: true that
14:40:08 <abhishekk> croelandt, I think downstream it is then
14:40:10 <croelandt> jokke_: I'm gonna start backporting every patch, at this rate
14:40:54 <abhishekk> I can understand the feeling
14:41:16 <jokke_> And I personally think that in downstream that is business decision with attached commitment to support any problems it brings. In upstream we should be very cautious what we backport as we do not have similar control of the environment
14:41:35 <dansmith> this ^
14:41:58 <abhishekk> ack
14:42:37 <abhishekk> any counter arguments to this?
14:43:02 <abhishekk> ok, moving ahead then
14:43:14 <abhishekk> #topic Holiday plans
14:43:37 <abhishekk> croelandt, is going on 2 weeks holiday from Monday
14:43:43 <abhishekk> and jokke_ is for 1 week
14:44:09 <rosmaita> slackers!
14:44:12 <abhishekk> any other core member is planning to take off during same time ?
14:44:24 <croelandt> rosmaita: hey, people died so that I could have PTO
14:44:32 <croelandt> I'm glad they did not die so I could eat Brussel sprouts
14:44:59 <jokke_> rosmaita: I have an excuse. HR and the Irish gov will rain proper shaitstorm on me and my manager if I don't use my holidays. so there's that :P
14:45:02 <abhishekk> those two weeks we will be going to have ninja time I guess
14:45:13 <rosmaita> i'm just jealous, that's all
14:45:18 <abhishekk> ++
14:45:34 <dansmith> abhishekk: I have no such plans at the moment
14:45:43 <abhishekk> great
14:45:59 <abhishekk> me neither
14:46:02 <croelandt> rosmaita: Unionize, comrade
14:46:13 <jokke_> LOL
14:46:27 <rosmaita> is RH unionized in France?
14:46:37 <abhishekk> and on top of that jokke_ will send me picture of his tent and beer
14:47:10 <abhishekk> I guess that's it from me for today
14:47:12 <jokke_> Tovarich Cyril :D
14:47:17 <abhishekk> moving to open discussion
14:47:22 <abhishekk> #topic Open discussion
14:47:27 <abhishekk> Nothing from me
14:47:32 <rajiv> Hi, sry to divert from holiday mood, firstly, thanks for merging/commenting on few of my bugs. The below is still pending : https://bugs.launchpad.net/swift/+bug/1899495 any update on this ?
14:47:36 <jokke_> I can be inclusive and send the pictures to rest of ye too!
14:47:53 <abhishekk> Happy holidays croelandt and jokke_
14:48:10 <rosmaita> jokke_: what's your current beer total on that app?
14:48:32 <abhishekk> rajiv, I think everyone is busy at the moment on M3 priorities as it is just around the corner
14:48:38 <jokke_> rajiv: like I mentioned that is tricky one from Glance point of view. And I have couple of others under works still too
14:49:11 <rajiv> okay, i would i like to understand how glance-api process consumes memory ? for example, different image types and sizes consumes different glance-api process memory consumption.
14:49:12 <jokke_> rajiv: I'll be back chasing those once I'm back after next week.
14:49:13 <croelandt> rosmaita: some employees are in a union
14:49:18 * croelandt is not :-(
14:49:29 <abhishekk> we can make ours
14:49:37 <rosmaita> rajiv: i think you get a 409 on a container delete if it's not empty
14:49:40 <abhishekk> glance union
14:50:13 <rosmaita> not sure that's a helpful operation
14:50:18 <rajiv> rosmaita: the swift container isnt empty, since the deletion goes in parallel, a conflict occurs.
14:50:23 <rosmaita> i mean, "observation"
14:50:40 <rajiv> we introduced retries but it dint help either.
14:50:47 <abhishekk> rosmaita, is cinder hitting any tempest failures in gate at the moment ?
14:50:59 <rosmaita> abhishekk: i sure hope not
14:51:02 <rajiv> today, i had an user upload 20 images in parallel and glance-api crashed.
14:51:06 <rosmaita> i will look
14:51:15 <abhishekk> rosmaita, ack, please let me know
14:52:09 <rajiv> rosmaita: jokke_ abhishekk any suggestion on my second question ?
14:52:11 <jokke_> rajiv: I saw your question in #os-glance ... so the memory consumption is very tricky to predict. There is quite a bit buffering involved as the data actually passes through the API service and like you saw there might be lots of data in buffers and caches when you have lots of concurrent connections in flight
14:52:46 <rajiv> jokke_: yes, i raised it there as well but had no response, hence i asked here.
14:53:01 <jokke_> rajiv: yeah, you were gone by the time I saw it :D
14:53:40 <rajiv> initially i set 3GB as the limit, but setting 5GB for 20 image uploads still chocks the glance-api process and sometimes the process gets killed
14:54:11 <rosmaita> abhishekk: just got an tempest-integrated-storage failure on tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive
14:54:15 <jokke_> In general this is one of those things we've seen in busy production clouds having set of decent dedicated servers for gapi alone as it can be quite taxing
14:54:18 <rajiv> hence the upload terminates and sends back a HTTP 502 and we have to manually delete the chunks in the swift container as well.
14:54:27 <rosmaita> just that one test, though
14:54:46 <abhishekk> rosmaita, yeah, I am hitting that three times since morning
14:54:58 <rosmaita> that same test?
14:54:58 <abhishekk> s/I am/I hit
14:55:06 <abhishekk> yeah
14:55:27 <jokke_> rajiv: yeah, I don't think we ever designed any part of the service to really capping. So I think you setting limits for it will eventually lead to that same situation again
14:55:47 <rajiv> jokke_: okay, is there a doc or code i can refer ?
14:56:18 <rajiv> to understand how memory consumption works ? or a pattern ?
14:56:30 <jokke_> rajiv: it's just matter of if it's 5 concurrent operations, 20 or 30. But you will eventually hit your artificial limit and get the service killed
14:56:37 <rosmaita> abhishekk: it did pass on a patch that depended on the one with the failure (which of course got a -2 from Zuul because the dependency didn't merge)
14:56:39 <abhishekk> rosmaita, not same test, mine is resize related
14:56:51 <rajiv> the image being uploaded was ~900GB, among 20, only 3 images were created.
14:57:02 <abhishekk> https://52b5fef6b4a63a70ea73-b7be325c2c973618eb7074df9913ea2c.ssl.cf5.rackcdn.com/799636/21/check/tempest-integrated-storage/91789c0/testr_results.html
14:57:10 <jokke_> rajiv: I can't recall us having any documentation about that.
14:57:11 <rosmaita> ok, mine was a server-failed-to-delete problem
14:57:27 <jokke_> rajiv: the main limiting factor is chunk ize
14:57:33 <abhishekk> rosmaita, ack
14:57:40 <abhishekk> thank you
14:57:54 <rajiv> the chunk size is 200MB, and enabling buffering did not help either.
14:58:01 <abhishekk> 3 minutes to go
14:58:05 <abhishekk> 2
14:58:46 <jokke_> so each of the greenlet worker thread will eat some memory and while you have data transfer flying, there is obviously the network buffers involved but really the chunking is the main limiting factor for the API not just caching your whole 900gigs into memory if the storage is slow :D
14:59:09 <rosmaita> hmmm, looks like it never got to the point where it could attach a volume, couldn't ssh into the vm
14:59:23 <jokke_> rajiv: we can continue on #os-glance as we're running out of time if you prefer
14:59:33 <rajiv> sure, switching over.
14:59:36 <abhishekk> rosmaita, yes
14:59:44 <abhishekk> thank you all
14:59:48 <abhishekk> have a nice weekend
14:59:54 <rosmaita> i think it's just one of those random failures
14:59:55 <jokke_> thanks everyone
15:00:13 <abhishekk> #endmeeting