14:00:09 #startmeeting glance 14:00:09 Meeting started Thu Aug 26 14:00:09 2021 UTC and is due to finish in 60 minutes. The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:09 The meeting name has been set to 'glance' 14:00:15 #topic roll call 14:00:20 #link https://etherpad.openstack.org/p/glance-team-meeting-agenda 14:00:21 o/ 14:00:25 o/ 14:01:01 lets wait couple of minutes for others to show 14:01:41 I doubt rosmaita will join us today 14:01:55 o/ 14:02:01 Lets start 14:02:09 #topic release/periodic jobs update 14:02:19 M3 next week, but we will tag it a week after M3 14:02:29 i.e. next to next week 14:02:48 so we still have around 6 working days to get things done 14:03:11 python-glanceclient needs to be tagged next week though 14:03:33 I will put a release patch around Sept 01 for the same 14:03:36 o/ 14:04:02 Surprisingly periodic jobs no timeouts for last 3 days 14:04:06 all green at the moment 14:04:36 #topic M3 targets 14:04:42 Glance Xena 3 review dashboard - https://tinyurl.com/glance-xena-3 14:04:57 Most of the policy patches are merged and remaining are approved 14:05:21 due to heavy traffic in gate we are facing some unusual failures, I will keep watch on them 14:05:55 I'd say usual milestone failures :) 14:05:55 Thank you croelandt and dansmith and lance for reviewing these patches on priority 14:06:18 Every cycle the same thing. Just like every year the winter surprises the Finns :D 14:06:19 again and again at crucial time 14:06:38 Cache API - Still under review - FFE required? 14:07:01 There are some comments on tests and some doc changes needs to be done 14:07:22 need to mention header which we are using to clear the cached and queued images in docs and api reference as well 14:07:40 I have some more draft comments I'm still mulling, I will move those to the latest PS or drop them accordingly 14:07:53 I fixed the comments and the output of clear_cache as Dan kindly pointed out that it was very silly behaviour 14:08:04 ack 14:08:31 We will revisit the progress next week and decide on FFE grant for the same 14:09:18 Any questions ? 14:09:30 Same goes for metadef project persona 14:09:53 Just FYI I'll be on PTO next week. I'd say the debate on the tests are great opportunity of followup patch after the FF as that API change needs to merge so we can get the client patch in before it needs to be released 14:09:54 Patches are now open for review and we have good functional test coverage there to ensure the new RBAC behavior 14:10:32 so FFE for that specific work is not great as it needs the client side 14:10:59 ack, I haven't got enough time to have a look on new patch set or other review comments 14:11:47 In your absence I will work on that 14:12:43 Coming back to RBAC metadef, we are working on glance-tempest-plugin protection testing and that will be up and complete by tomorrow 14:13:28 But I think the functional coverage on the glance side is good and we can consider those changes for M3 14:14:38 I will apply for the FFE for the same if this is not merged before M3 work 14:14:42 Moving ahead 14:15:01 #topic Wallaby backports 14:15:20 whoami-rajat: ^ 14:15:40 So, these 2 backports in the agenda are part of a huge big fux that includes Cinder patches as well 14:15:46 hi 14:15:55 We were under the impression it was ok to backport in Wallaby 14:16:04 the first patch is a new feature but also support for the bug fix 14:16:22 I think Rajat has users affected by this in upstream Cinder, am I right? 14:16:30 yes 14:16:59 so most of the fixes on glance cinder side, like multi store support, format info support are all dependent on this attachment API code 14:17:09 I am also under impression that if we have a crucial bug fix then we can backport supportive patches for it to stable branches 14:18:08 and, I have seen some similar kind of backports in upstream in the past (not for glance though) 14:18:32 jokke_: what do you think? 14:18:54 I've already backported cinder side changes and they're already +2ed, so we won't have any issues on code side as far as I'm aware 14:19:34 Can we also have opinion from some other stable maintainers ? 14:20:06 I think I already pointed out in the early phase of fixing these bugs that we should not had depended the prevent qcow2 on nfs on the attachement change as that is not backportable really by the policy. 14:21:16 we can't do the qcow2 change without adding the new attachment API changes, it depends on the attachments get command 14:21:28 s/command/api 14:21:40 hmm, the policy suggests some corner cases as well 14:23:11 And I do know ad understand that we will be backporting these in downstream anyways but that's totally different story. What comes to the upstream backport is all the refactoring, new dependencies etc. of that attachment API is making it very dodgy backport 14:23:59 Until when are we gonna be backporting stuffi n wallaby? 14:24:03 This might not be an issue for long :D 14:24:28 whoami-rajat: I think we could have done that by looking the volume connection we get and looking the image file we have. There was no need ofr attachment api to figure out that we have combo of qcow2+NFS that we cannot support 14:25:53 croelandt: wallaby is in active stable maintenance still for another ~8months 14:26:13 I think we should get opinion from stable team as well 14:26:33 jokke_: is it likely that backporting these patches is going to be an issue in the next 8 months? 14:26:53 jokke_, the initialize_connection call doesn't return the right format, the feature i implemented on cinder side was discussed during PTG and was decided to include the format in connection_info in the new attachment API response 14:26:55 Unfortunately we have couple of requests for it from other customers but we will stick to policy if we need to 14:27:17 we had lengthy discussion for that 14:28:04 abhishekk: so, how do we make our decision? 14:28:11 I'm not on glance stable, 14:28:37 but I definitely opt for less backporting in general, and definitely extreme caution over anything complex unless it's absolutely necessary 14:29:42 I think I looked at this before briefly, and I don't remember all the details, but anything that requires glance and cinder things to be backported is high risk for breaking people that don't upgrade both services in lockstep unless both sides are fully tolerant (and tested that way) of one happening before the other 14:30:03 downstream we can handle that testing and risk (and support if it breaks) but it's not really appropriate material for upstream stable in general, IMHO 14:31:00 croelandt, I think we need some opinion from other stable maintainers as well 14:31:20 but what dansmith has said now, this might be problematic in case of upgrade 14:31:43 sorry i'm late 14:31:44 dansmith: yeah, that kind of can be flagged in the requirements, which we don't currently do. But in general there is just too many red flags. It's not one or two of our stable rules this specific case is crossing 14:31:50 hi rosmaita \o 14:32:03 just in time for the attachment api backport discussion 14:32:08 ah 14:32:28 jokke_: requirements.txt you mean? that has nothing to do with what is installed on other servers in a cluster, and certainly no direct impact on distro packages 14:32:30 I think rosmaita is stable member for very long 14:32:39 too long 14:32:42 and not very stable 14:32:49 :D 14:32:52 *badum tss* 14:33:03 is there a place i can read the scrollback? 14:33:08 So just to give you short overview 14:33:15 i think the logs don't get published until the meeting ends 14:33:23 ok, short overview is good 14:33:28 dansmith: well sure, not the service. I was more thinking of cinderclient and os_brick needing to be able to do the right thing anyways 14:33:33 we have one bug fix to backport which id depend on the patch which is impemented as a feature 14:33:53 #link https://review.opendev.org/c/openstack/glance_store/+/805927 14:33:57 this is actual bug fix 14:34:05 #link https://review.opendev.org/c/openstack/glance_store/+/805926 14:34:22 this is dependent patch which is needed for the above backport 14:34:43 I am pro for this backport because I thought; 14:35:00 the change is related to cinder driver and will not affect any other glance backend drivers 14:35:03 rosmaita: basically the qcow2+NFS was implemented in a way that it depends on the attachment API support. Which is problematic backport due to it introducing new dependency, depending on cinder side backports and refactoring significant amount of the driver code 14:35:45 and in the past for some other projects I have seen these kind of backports were supported 14:36:32 well, rajat described the cinder attitude to driver backports very clearly in his last comment on https://review.opendev.org/c/openstack/glance_store/+/805926 14:37:49 yes 14:37:54 our view is that people actually use the drivers, and it's a big ask to make them upgrade their entire cloud to a new release, rather than update within their current release 14:38:39 you could apply that reasoning to any feature backport for a buddy right? 14:38:53 not really 14:38:59 "My buddy doesn't want to upgrade but does want this one feature, so we're going to backport so he doesn't have to upgrade?" 14:39:00 if it impacts main cinder code, we don't do it 14:39:30 the difference is that it's isolated to a single driver 14:39:31 dansmith: that's why we backport in downstream like there is no tomorrow 14:39:34 you can apply this to a driver or the main code, either yields the same result for me :) 14:39:38 jokke_: exactly 14:39:54 jokke_: true that 14:40:08 croelandt, I think downstream it is then 14:40:10 jokke_: I'm gonna start backporting every patch, at this rate 14:40:54 I can understand the feeling 14:41:16 And I personally think that in downstream that is business decision with attached commitment to support any problems it brings. In upstream we should be very cautious what we backport as we do not have similar control of the environment 14:41:35 this ^ 14:41:58 ack 14:42:37 any counter arguments to this? 14:43:02 ok, moving ahead then 14:43:14 #topic Holiday plans 14:43:37 croelandt, is going on 2 weeks holiday from Monday 14:43:43 and jokke_ is for 1 week 14:44:09 slackers! 14:44:12 any other core member is planning to take off during same time ? 14:44:24 rosmaita: hey, people died so that I could have PTO 14:44:32 I'm glad they did not die so I could eat Brussel sprouts 14:44:59 rosmaita: I have an excuse. HR and the Irish gov will rain proper shaitstorm on me and my manager if I don't use my holidays. so there's that :P 14:45:02 those two weeks we will be going to have ninja time I guess 14:45:13 i'm just jealous, that's all 14:45:18 ++ 14:45:34 abhishekk: I have no such plans at the moment 14:45:43 great 14:45:59 me neither 14:46:02 rosmaita: Unionize, comrade 14:46:13 LOL 14:46:27 is RH unionized in France? 14:46:37 and on top of that jokke_ will send me picture of his tent and beer 14:47:10 I guess that's it from me for today 14:47:12 Tovarich Cyril :D 14:47:17 moving to open discussion 14:47:22 #topic Open discussion 14:47:27 Nothing from me 14:47:32 Hi, sry to divert from holiday mood, firstly, thanks for merging/commenting on few of my bugs. The below is still pending : https://bugs.launchpad.net/swift/+bug/1899495 any update on this ? 14:47:36 I can be inclusive and send the pictures to rest of ye too! 14:47:53 Happy holidays croelandt and jokke_ 14:48:10 jokke_: what's your current beer total on that app? 14:48:32 rajiv, I think everyone is busy at the moment on M3 priorities as it is just around the corner 14:48:38 rajiv: like I mentioned that is tricky one from Glance point of view. And I have couple of others under works still too 14:49:11 okay, i would i like to understand how glance-api process consumes memory ? for example, different image types and sizes consumes different glance-api process memory consumption. 14:49:12 rajiv: I'll be back chasing those once I'm back after next week. 14:49:13 rosmaita: some employees are in a union 14:49:18 * croelandt is not :-( 14:49:29 we can make ours 14:49:37 rajiv: i think you get a 409 on a container delete if it's not empty 14:49:40 glance union 14:50:13 not sure that's a helpful operation 14:50:18 rosmaita: the swift container isnt empty, since the deletion goes in parallel, a conflict occurs. 14:50:23 i mean, "observation" 14:50:40 we introduced retries but it dint help either. 14:50:47 rosmaita, is cinder hitting any tempest failures in gate at the moment ? 14:50:59 abhishekk: i sure hope not 14:51:02 today, i had an user upload 20 images in parallel and glance-api crashed. 14:51:06 i will look 14:51:15 rosmaita, ack, please let me know 14:52:09 rosmaita: jokke_ abhishekk any suggestion on my second question ? 14:52:11 rajiv: I saw your question in #os-glance ... so the memory consumption is very tricky to predict. There is quite a bit buffering involved as the data actually passes through the API service and like you saw there might be lots of data in buffers and caches when you have lots of concurrent connections in flight 14:52:46 jokke_: yes, i raised it there as well but had no response, hence i asked here. 14:53:01 rajiv: yeah, you were gone by the time I saw it :D 14:53:40 initially i set 3GB as the limit, but setting 5GB for 20 image uploads still chocks the glance-api process and sometimes the process gets killed 14:54:11 abhishekk: just got an tempest-integrated-storage failure on tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive 14:54:15 In general this is one of those things we've seen in busy production clouds having set of decent dedicated servers for gapi alone as it can be quite taxing 14:54:18 hence the upload terminates and sends back a HTTP 502 and we have to manually delete the chunks in the swift container as well. 14:54:27 just that one test, though 14:54:46 rosmaita, yeah, I am hitting that three times since morning 14:54:58 that same test? 14:54:58 s/I am/I hit 14:55:06 yeah 14:55:27 rajiv: yeah, I don't think we ever designed any part of the service to really capping. So I think you setting limits for it will eventually lead to that same situation again 14:55:47 jokke_: okay, is there a doc or code i can refer ? 14:56:18 to understand how memory consumption works ? or a pattern ? 14:56:30 rajiv: it's just matter of if it's 5 concurrent operations, 20 or 30. But you will eventually hit your artificial limit and get the service killed 14:56:37 abhishekk: it did pass on a patch that depended on the one with the failure (which of course got a -2 from Zuul because the dependency didn't merge) 14:56:39 rosmaita, not same test, mine is resize related 14:56:51 the image being uploaded was ~900GB, among 20, only 3 images were created. 14:57:02 https://52b5fef6b4a63a70ea73-b7be325c2c973618eb7074df9913ea2c.ssl.cf5.rackcdn.com/799636/21/check/tempest-integrated-storage/91789c0/testr_results.html 14:57:10 rajiv: I can't recall us having any documentation about that. 14:57:11 ok, mine was a server-failed-to-delete problem 14:57:27 rajiv: the main limiting factor is chunk ize 14:57:33 rosmaita, ack 14:57:40 thank you 14:57:54 the chunk size is 200MB, and enabling buffering did not help either. 14:58:01 3 minutes to go 14:58:05 2 14:58:46 so each of the greenlet worker thread will eat some memory and while you have data transfer flying, there is obviously the network buffers involved but really the chunking is the main limiting factor for the API not just caching your whole 900gigs into memory if the storage is slow :D 14:59:09 hmmm, looks like it never got to the point where it could attach a volume, couldn't ssh into the vm 14:59:23 rajiv: we can continue on #os-glance as we're running out of time if you prefer 14:59:33 sure, switching over. 14:59:36 rosmaita, yes 14:59:44 thank you all 14:59:48 have a nice weekend 14:59:54 i think it's just one of those random failures 14:59:55 thanks everyone 15:00:13 #endmeeting