14:01:11 #startmeeting glance 14:01:12 Meeting started Thu Jun 25 14:01:11 2020 UTC and is due to finish in 60 minutes. The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:14 #topic roll call 14:01:15 The meeting name has been set to 'glance' 14:01:20 #link https://etherpad.openstack.org/p/glance-team-meeting-agenda 14:01:23 o/ 14:01:26 o/ 14:01:36 o/ 14:01:50 lets wait 2 minutes for others 14:02:31 o/ 14:02:36 lets start 14:02:45 We have dansmith today :D 14:02:55 #topic release/periodic jobs update 14:03:04 glance_store and python-glanceclient released for V1 milestone 14:03:13 * mordred waves 14:03:14 thanks to jokke and smcginnis for taking care of this 14:03:28 mordred, as well 14:04:16 V2 is 4 weeks away from now and we have handful of specs to get reviewed 14:04:37 Kindly focus on reviewing the specs 14:05:10 Regarding periodic job, we have 1 time out this week, else were green 14:05:39 Moving ahead 14:05:45 #topic Specs reviews 14:05:51 sparse image upload - https://review.opendev.org/733157 14:05:51 Unified limits - https://review.opendev.org/729187 14:05:51 Image encryption - https://review.opendev.org/609667 14:05:51 Cinder store multiple stores support - https://review.opendev.org/695152 14:06:06 Apart from these we have Duplicate downloads spec as well 14:06:29 #link https://review.opendev.org/734683 14:07:14 Please review at top priority 14:07:42 Moving to next topic 14:07:54 ok 14:07:55 Hi, any news about the next steps about this spec for sparse upload ? https://review.opendev.org/#/c/733157/ 14:08:21 alistarle, there are some comments on the specs, please go through those 14:08:40 jokke, has added one suggestion on it 14:08:41 Yes I see the one of erno, but I didn't understand yours 14:09:09 because I have added all the things we discuss during the PTG 14:09:12 You had provided two ways to do sparse upload in the specs under proposed section 14:09:49 I am saying keep what we agreed on in Proposed section and another one in Alternative section 14:10:08 Yes because there is two successive optimisation, do you want me to split it in two different spec ? 14:10:19 the two proposal will be implemented 14:10:37 abhishekk: There is read part of it and write part of it 14:11:00 jokke, ack 14:11:31 alistarle, I will revisit my comment and update accordingly 14:11:37 abhishekk: the read part is if we read everything from staging or if we try to take the advantage of the FS call that gives us holes directly 14:12:16 And Erno, do you want me to rename the option to "enable_thin_provisionning" ? 14:12:19 jokke, got it 14:12:54 In my understanding of your comment, this flip should enable all optimizations, not only sparse upload right ? 14:14:00 alistarle: if it's not too much trouble that would be great. Based on the feedback from the Ceph engineering, it would make sense to avoid sending all the 0s over the wire in either case so calling it thin provisioning would be more decriptive for the deployers/admins 14:15:08 alistarle: so for your spec it will be the sparse upload, what we should look as followup is to use the ceph rewrite buffer for those who don't want the images being thin provisioned 14:15:41 which allows us to send like few kB of zeros over the line and tell ceph "Write this 200 000 times" 14:16:22 alistarle, jokke should we continue this in Open Discussion? 14:16:32 abhishekk: sure, sorry 14:16:41 jokke, no problem 14:17:00 moving ahead 14:17:12 #topic cross tenant/user copy image authorization 14:17:40 So, I'm working on making nova able to use the copy-image functionality to make sure a user's image is copied to a local rbd store for remote compute nodes, 14:17:59 and one of the big things that I think doesn't quite fit is that only images the user owns directly can be copied 14:18:35 which is fine for some circumstances, but not for others.. there's one pretty serious bug, which is that we get no indication that we were disallowed from copying, which is fixed by this early auth check, which I think we need to merge regardless: https://review.opendev.org/#/c/737548/ 14:19:17 however, I would also like some way to grant a user the ability to copy images that aren't theirs, i.e. public images that aren't charged to any specific paying user and that just need to be copied when they are deployed to a store at a remote site with computes for the first time 14:19:43 one way to do that is a special property on the image, that either glance sees as relaxing the auth check, or that nova sees and knows to use admin credentials to do the copy 14:20:06 however, I think the better plan would to be do this in policy somehow, so that users of a specific role or relationship can copy images 14:20:24 so something like "if it is shared with me via member list, then I'm allowed to copy it" or "if this image is public, then allow it to be copied" etc 14:20:56 jokke, rosmaita AFAIK we have policy checks at different layer, right? 14:21:00 so I'm looking for thoughts on the policy approach vs. the property hack 14:22:32 it seems to me that through some refactoring, we could spawn the actual copy thread with an admin context, if the policy allows the user to copy, which would cut down on the refactoring, but I know that may be a little less palatable than refactoring any such checks further down 14:22:36 abhishekk: mostly, though there is that tasks-api policy that is at the api layer 14:22:56 If we limit this to public images I think the policy approach would be great. Then deployer could specify user or role which those copies would be allowed 14:23:13 if that needs to be happening in shared images as well, that gets messy quickly 14:23:39 jokke: what are the mechanical differences there? 14:25:55 dansmith: Making image public is behind policy wall already and what I have heard in most cases restricted to admins and image maintaners. Which likely is like you pointed out, not charged customer. Sharing image is much more flexed and then it might need indeed the owners approval for consuming exra storage rather than just admin call 14:26:48 but is the proposal that the user who initiated the copy would own the copy? 14:26:50 jokke: I understand the semantic differences, I meant.. if it's just policy checks in glance, are the shared images somehow more complex than the public ones? 14:27:06 rosmaita: the copy is just an additional location in the image, so no 14:27:50 If we start with just "owner or public" I think that's a major step forward, so I'm happy with that to start... I can just imagine someone being confused about why they can grant people copy access to public images, but not any image a user can see 14:28:05 dansmith: Just the fact that you need to check that the user is in the shared users list as in has actually right to see and consume the image 14:28:47 jokke: okay, but presumably that is already a routine somewhere since we have to do that check in order to show them the image? 14:29:18 can shared images be modified (i.e. metadata) by a share-ee or just downloaded? 14:29:40 dansmith: I think the question is at least as difficult politically if not more than technically. So the politics of public image are likely easier and cleaner to deal with when documented well 14:29:54 definitely understand that :) 14:29:57 dansmith: just consumed 14:30:27 jokke: okay, well, that's the same as the public case then, in terms of the effect you can have on an image 14:30:36 yup 14:30:37 what about community images? 14:31:13 rosmaita: personally, I would expect that I should be able to grant this functionality to any kind of image that a user can already see 14:31:31 so who pays for the storage? 14:31:42 rosmaita: I think politically we should put them into the basket of shared images in a sense that they are more likely to be charged on the owner for storage consumption 14:31:44 if you're a private cloud and not charging for image space, then you really want to grant a user the ability to use an image, which includes using that image in whatever remote edge site they can spin up instances in 14:32:51 public is good enough for a lot of those people, but if they want only some users to be able to spin up a sensitive image, then losing the ability to do the copy is unfortunate, 14:33:07 and may lead to people downloading and re-uploading the image so they have control over the copy, just using more space 14:33:30 dansmith: one thing to take into account with shared images is that it requires out of band communication between those two users already 14:33:58 Which means that the collaboration where that image should be located is not much overhead on that case 14:34:24 rosmaita: in case it's not clear, the case here is a cloud spread across a central DC and several edge sites, the edge sites having their own ceph. with this, nova can (if needed) copy your image from the central store to the appropriate edge store before booting your instance so you get fast clone, snapshot, etc ceph goodness 14:34:34 rosmaita: which right now is only possible for images you own 14:34:57 rosmaita: and if you have multiple things doing that in different tenants, you have to upload the image multiple times, one for each tenant in order for this to work, which sucks 14:35:17 i guess my worry is filling up the edge storage (which is likely to be smaller) if you leave this up to tenants to pick what should be at the edge if they don't have to pay for it 14:35:36 jokke: yep, understand, but if there's no technical way to grant that access then making an actual copy of the image and diverging is the only user-consumable solution 14:36:30 rosmaita: I agree, that's why I think we cannot just by default let anyone do this. And why I think policy where admin can grant that permission for the tenant trusted to behave and having need for it can do it 14:36:37 rosmaita: at the expense of duplicating every image for every tenant in the central site though 14:37:19 that's what we assumed while designing this 14:37:21 and of course, if that image duplication happens, 14:37:35 and there is that ^^ we have no per store policy either so anyone who can create an image, can do it to any store available 14:37:40 then the edge site gets even full-er because two tenants that should share the same base imge, don't, since they're "different" images 14:39:10 anyway, if we can start with "public or owner" that's a big step forward 14:39:17 How about we start with limiting it to public images ? 14:39:30 and if the demand for finer grained sharing comes, then we can do that and I can collect beers :) 14:39:40 dansmith: I glanced through your early fail bug, which was very nice catch. I think controlled public image copy will get us much closer to decent user experience for both sides consumers and owners of those images 14:39:50 it will 14:39:55 And we can refine these in future if store user cases arise 14:40:02 yep, I'm good with that 14:40:21 s/store/strong/ 14:40:31 We need a spec for this 14:40:34 if abhishekk and rosmaita are on board with this approach 14:40:49 I am fine with this approach 14:41:04 i'm not against it, but i have not been following along too closely 14:41:23 we still need to merge the early auth check patch of mine rather soon because right now nova will wait until timeout not knowing it wasn't allowed to copy an image 14:41:37 +1 14:41:50 and extend it with the expanded public stuff 14:42:09 dansmith: yep agreed, I just targeted it to Victoria and Ussuri as high priority ... that is not by any means Nova specific fault 14:42:22 Moving ahead as has less time 14:42:31 thanks 14:42:42 #topic Copy Image race condition 14:42:57 #link https://bugs.launchpad.net/glance/+bug/1884596 14:42:57 Launchpad bug 1884596 in Glance "image import copy-to-store will start multiple importing threads due to race condition" [Critical,New] - Assigned to Abhishek Kekane (abhishek-kekane) 14:43:22 this is another bug we had found around copy-image operation 14:44:26 And just for the record this is just "copy-image" moethod specific as the image state transition catches it in the actual import jobs 14:44:58 This is valid race condition and needs to be addressed, as we have common staging area, two different API calls to same image will provide unexpected results 14:45:34 I am working with dansmith to solve this issue 14:45:40 it will also cause all kind of mess when they get to the point of location handling and try to add duplicate locations 14:45:55 yes 14:46:18 dansmith: thanks for good catch on this too 14:46:25 dansmith, has added one patch to update image property for image only once 14:46:35 #link https://review.opendev.org/737868 14:46:57 just got a couple tweaks from mike, which I will make, but otherwise that'll do it 14:47:09 please have a look as well, I need to build my solution around this patch 14:47:54 moving ahead 14:48:10 #topic openstackclient/sdk patches 14:48:18 rosmaita, this is you? 14:48:27 mordred: I think 14:48:49 ^^ 14:48:52 i added them, but it's mordred's issue 14:48:59 these are patches to make osc able to do the import flow 14:49:08 which I've modified the devstack patch to use instead of glanceclient 14:49:22 still working on getting them tested against a real deployment, but latest rev is probably good 14:49:26 we'll know in a few hours 14:49:26 So this is again came to pace due to Dan's work 14:49:36 \o/ 14:49:37 heya - yeah 14:50:03 mostly just wanted ot let folks know they're there - they worked in my local testing, so hopefully dan's patch goes green now 14:50:15 will have a look at those patches 14:50:45 #topic Open discussion 14:50:46 it won't go green for other reasons like the image sharing thing, but.. it should go red later :) 14:51:05 :) 14:51:05 please do - we also discovered that the api-ref docs and reality don't match on all_stores_must_succeed 14:51:21 mordred: oh? 14:51:27 please do tell 14:51:30 i filed a bug 14:51:35 ok, cool 14:51:45 there is one location where it states default as False 14:51:46 yeah. the actual default behavior is true in the glance code - the api-ref says defaults to false in one place and is silent in the other place 14:51:49 yeah 14:52:06 will put a correction patch soon 14:52:12 Then there is 3rd bug which we found yesterday 14:52:20 https://bugs.launchpad.net/glance/+bug/1884996 14:52:20 Launchpad bug 1884996 in Glance "default value for all_stores_must_succeed is not stated" [Low,Triaged] 14:52:27 #link https://bugs.launchpad.net/glance/+bug/1885003 14:52:28 Launchpad bug 1885003 in Glance "Interrupted copy-image may break a subsequent operation or worse" [High,In progress] 14:52:37 ah, nice catch then, lets make sure that is corrected. IIRC the spec clearly defined it should be true so docs neds changing 14:52:40 "or worse" sounds bad 14:53:22 agree 14:53:33 alistarle: did you have something to continue around the sparse upload stuff 14:53:49 I have a fix up for above bug 14:54:10 I need to change bug title as well 14:54:42 #link https://review.opendev.org/#/c/737867/ 14:55:20 Yes about the flag name, you expect to have another flag for optimization from ceph engineering or keeping the same "thin provisionning" one ? 14:55:23 Last 5 minutes 14:56:00 because seems ceph engineering optimization don't relate to thin provisionning 14:56:17 alistarle: so that's why I wanted to change the name. I'd like to keep it only as one option 14:56:51 alistarle: that way if "thin provisioning" is enabled we use the sparse upload and if it's disabled we use th rewrite buffer mechanism 14:56:56 Oh ok I get it, you don't want to be called "thin provisionning" like 14:57:03 I thought it was your recommandation 14:57:12 both ways we don't send majority of the zeros across but the end result in the storege changes ;) 14:57:44 So everybody wins :) 14:58:29 Ok understood 14:58:33 I asked rosmaita about this briefly yesterday ... he said that there is no current support for having import staging space be in a backing store like ceph 14:58:34 I will update that 14:58:42 are there any future plans for that or blueprint already? 14:59:11 mordred: I'm working on it, but haven't had time to set it up yet to test 14:59:15 jokke: cool! 14:59:24 mnaser: ^^ 15:00:33 time's up 15:00:38 we made the change a while back so that it uses a local filesystem store for the stage, so what jokke is working on is the next step 15:00:52 mordred: The (first) approach would be utilizing cephfs so stuff like qemu-image will work. and second would be to look into using actual rbd driver for it if none of the file operations are needed 15:00:59 shift to #openstack-glance for any further discussion 15:01:07 Thank you all 15:01:09 bye! 15:01:13 thanks! 15:01:18 #endmeeting