#openstack-meeting log

14:01:11 <abhishekk> #startmeeting glance
14:01:12 <openstack> Meeting started Thu Jun 25 14:01:11 2020 UTC and is due to finish in 60 minutes.  The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:14 <abhishekk> #topic roll call
14:01:15 <openstack> The meeting name has been set to 'glance'
14:01:20 <abhishekk> #link https://etherpad.openstack.org/p/glance-team-meeting-agenda
14:01:23 <abhishekk> o/
14:01:26 <dansmith> o/
14:01:36 <jokke> o/
14:01:50 <abhishekk> lets wait 2 minutes for others
14:02:31 <rosmaita> o/
14:02:36 <abhishekk> lets start
14:02:45 <abhishekk> We have dansmith today :D
14:02:55 <abhishekk> #topic release/periodic jobs update
14:03:04 <abhishekk> glance_store and python-glanceclient released for V1 milestone
14:03:13 * mordred waves
14:03:14 <abhishekk> thanks to jokke and smcginnis for taking care of this
14:03:28 <abhishekk> mordred, as well
14:04:16 <abhishekk> V2 is 4 weeks away from now and we have handful of specs to get reviewed
14:04:37 <abhishekk> Kindly focus on reviewing the specs
14:05:10 <abhishekk> Regarding periodic job, we have 1 time out this week, else were green
14:05:39 <abhishekk> Moving ahead
14:05:45 <abhishekk> #topic Specs reviews
14:05:51 <abhishekk> sparse image upload - https://review.opendev.org/733157
14:05:51 <abhishekk> Unified limits - https://review.opendev.org/729187
14:05:51 <abhishekk> Image encryption - https://review.opendev.org/609667
14:05:51 <abhishekk> Cinder store multiple stores support - https://review.opendev.org/695152
14:06:06 <abhishekk> Apart from these we have Duplicate downloads spec as well
14:06:29 <abhishekk> #link https://review.opendev.org/734683
14:07:14 <abhishekk> Please review at top priority
14:07:42 <abhishekk> Moving to next topic
14:07:54 <rosmaita> ok
14:07:55 <alistarle> Hi, any news about the next steps about this spec for sparse upload ? https://review.opendev.org/#/c/733157/
14:08:21 <abhishekk> alistarle, there are some comments on the specs, please go through those
14:08:40 <abhishekk> jokke, has added one suggestion on it
14:08:41 <alistarle> Yes I see the one of erno, but I didn't understand yours
14:09:09 <alistarle> because I have added all the things we discuss during the PTG
14:09:12 <abhishekk> You had provided two ways to do sparse upload in the specs under proposed section
14:09:49 <abhishekk> I am saying keep what we agreed on in Proposed section and another one in Alternative section
14:10:08 <alistarle> Yes because there is two successive optimisation, do you want me to split it in two different spec ?
14:10:19 <alistarle> the two proposal will be implemented
14:10:37 <jokke> abhishekk: There is read part of it and write part of it
14:11:00 <abhishekk> jokke, ack
14:11:31 <abhishekk> alistarle, I will revisit my comment and update accordingly
14:11:37 <jokke> abhishekk: the read part is if we read everything from staging or if we try to take the advantage of the FS call that gives us holes directly
14:12:16 <alistarle> And Erno, do you want me to rename the option to "enable_thin_provisionning" ?
14:12:19 <abhishekk> jokke, got it
14:12:54 <alistarle> In my understanding of your comment, this flip should enable all optimizations, not only  sparse upload right ?
14:14:00 <jokke> alistarle: if it's not too much trouble that would be great. Based on the feedback from the Ceph engineering, it would make sense to avoid sending all the 0s over the wire in either case so calling it thin provisioning would be more decriptive for the deployers/admins
14:15:08 <jokke> alistarle: so for your spec it will be the sparse upload, what we should look as followup is to use the ceph rewrite buffer for those who don't want the images being thin provisioned
14:15:41 <jokke> which allows us to send like few kB of zeros over the line and tell ceph "Write this 200 000 times"
14:16:22 <abhishekk> alistarle, jokke should we continue this in Open Discussion?
14:16:32 <jokke> abhishekk: sure, sorry
14:16:41 <abhishekk> jokke,  no problem
14:17:00 <abhishekk> moving ahead
14:17:12 <abhishekk> #topic cross tenant/user copy image authorization
14:17:40 <dansmith> So, I'm working on making nova able to use the copy-image functionality to make sure a user's image is copied to a local rbd store for remote compute nodes,
14:17:59 <dansmith> and one of the big things that I think doesn't quite fit is that only images the user owns directly can be copied
14:18:35 <dansmith> which is fine for some circumstances, but not for others.. there's one pretty serious bug, which is that we get no indication that we were disallowed from copying, which is fixed by this early auth check, which I think we need to merge regardless: https://review.opendev.org/#/c/737548/
14:19:17 <dansmith> however, I would also like some way to grant a user the ability to copy images that aren't theirs, i.e. public images that aren't charged to any specific paying user and that just need to be copied when they are deployed to a store at a remote site with computes for the first time
14:19:43 <dansmith> one way to do that is a special property on the image, that either glance sees as relaxing the auth check, or that nova sees and knows to use admin credentials to do the copy
14:20:06 <dansmith> however, I think the better plan would to be do this in policy somehow, so that users of a specific role or relationship can copy images
14:20:24 <dansmith> so something like "if it is shared with me via member list, then I'm allowed to copy it" or "if this image is public, then allow it to be copied" etc
14:20:56 <abhishekk> jokke, rosmaita AFAIK we have policy checks at different layer, right?
14:21:00 <dansmith> so I'm looking for thoughts on the policy approach vs. the property hack
14:22:32 <dansmith> it seems to me that through some refactoring, we could spawn the actual copy thread with an admin context, if the policy allows the user to copy, which would cut down on the refactoring, but I know that may be a little less palatable than refactoring any such checks further down
14:22:36 <rosmaita> abhishekk: mostly, though there is that tasks-api policy that is at the api layer
14:22:56 <jokke> If we limit this to public images I think the policy approach would be great. Then deployer could specify user or role which those copies would be allowed
14:23:13 <jokke> if that needs to be happening in shared images as well, that gets messy quickly
14:23:39 <dansmith> jokke: what are the mechanical differences there?
14:25:55 <jokke> dansmith: Making image public is behind policy wall already and what I have heard in most cases restricted to admins and image maintaners. Which likely is like you pointed out, not charged customer. Sharing image is much more flexed and then it might need indeed the owners approval for consuming exra storage rather than just admin call
14:26:48 <rosmaita> but is the proposal that the user who initiated the copy would own the copy?
14:26:50 <dansmith> jokke: I understand the semantic differences, I meant.. if it's just policy checks in glance, are the shared images somehow more complex than the public ones?
14:27:06 <dansmith> rosmaita: the copy is just an additional location in the image, so no
14:27:50 <dansmith> If we start with just "owner or public" I think that's a major step forward, so I'm happy with that to start... I can just imagine someone being confused about why they can grant people copy access to public images, but not any image a user can see
14:28:05 <jokke> dansmith: Just the fact that you need to check that the user is in the shared users list as in has actually right to see and consume the image
14:28:47 <dansmith> jokke: okay, but presumably that is already a routine somewhere since we have to do that check in order to show them the image?
14:29:18 <dansmith> can shared images be modified (i.e. metadata) by a share-ee or just downloaded?
14:29:40 <jokke> dansmith: I think the question is at least as difficult politically if not more than technically. So the politics of public image are likely easier and cleaner to deal with when documented well
14:29:54 <dansmith> definitely understand that :)
14:29:57 <jokke> dansmith: just consumed
14:30:27 <dansmith> jokke: okay, well, that's the same as the public case then, in terms of the effect you can have on an image
14:30:36 <jokke> yup
14:30:37 <rosmaita> what about community images?
14:31:13 <dansmith> rosmaita: personally, I would expect that I should be able to grant this functionality to any kind of image that a user can already see
14:31:31 <rosmaita> so who pays for the storage?
14:31:42 <jokke> rosmaita: I think politically we should put them into the basket of shared images in a sense that they are more likely to be charged on the owner for storage consumption
14:31:44 <dansmith> if you're a private cloud and not charging for image space, then you really want to grant a user the ability to use an image, which includes using that image in whatever remote edge site they can spin up instances in
14:32:51 <dansmith> public is good enough for a lot of those people, but if they want only some users to be able to spin up a sensitive image, then losing the ability to do the copy is unfortunate,
14:33:07 <dansmith> and may lead to people downloading and re-uploading the image so they have control over the copy, just using more space
14:33:30 <jokke> dansmith: one thing to take into account with shared images is that it requires out of band communication between those two users already
14:33:58 <jokke> Which means that the collaboration where that image should be located is not much overhead on that case
14:34:24 <dansmith> rosmaita: in case it's not clear, the case here is a cloud spread across a central DC and several edge sites, the edge sites having their own ceph. with this, nova can (if needed) copy your image from the central store to the appropriate edge store before booting your instance so you get fast clone, snapshot, etc ceph goodness
14:34:34 <dansmith> rosmaita: which right now is only possible for images you own
14:34:57 <dansmith> rosmaita: and if you have multiple things doing that in different tenants, you have to upload the image multiple times, one for each tenant in order for this to work, which sucks
14:35:17 <rosmaita> i guess my worry is filling up the edge storage (which is likely to be smaller) if you leave this up to tenants to pick what should be at the edge if they don't have to pay for it
14:35:36 <dansmith> jokke: yep, understand, but if there's no technical way to grant that access then making an actual copy of the image and diverging is the only user-consumable solution
14:36:30 <jokke> rosmaita: I agree, that's why I think we cannot just by default let anyone do this. And why I think policy where admin can grant that permission for the tenant trusted to behave and having need for it can do it
14:36:37 <dansmith> rosmaita: at the expense of duplicating every image for every tenant in the central site though
14:37:19 <abhishekk> that's what we assumed while designing this
14:37:21 <dansmith> and of course, if that image duplication happens,
14:37:35 <jokke> and there is that ^^ we have no per store policy either so anyone who can create an image, can do it to any store available
14:37:40 <dansmith> then the edge site gets even full-er because two tenants that should share the same base imge, don't, since they're "different" images
14:39:10 <dansmith> anyway, if we can start with "public or owner" that's a big step forward
14:39:17 <abhishekk> How about we start with limiting it to public images ?
14:39:30 <dansmith> and if the demand for finer grained sharing comes, then we can do that and I can collect beers :)
14:39:40 <jokke> dansmith: I glanced through your early fail bug, which was very nice catch. I think controlled public image copy will get us much closer to decent user experience for both sides consumers and owners of those images
14:39:50 <dansmith> it will
14:39:55 <jokke> And we can refine these in future if store user cases arise
14:40:02 <dansmith> yep, I'm good with that
14:40:21 <jokke> s/store/strong/
14:40:31 <abhishekk> We need a spec for this
14:40:34 <jokke> if abhishekk and rosmaita are on board with this approach
14:40:49 <abhishekk> I am fine with this approach
14:41:04 <rosmaita> i'm not against it, but i have not been following along too closely
14:41:23 <dansmith> we still need to merge the early auth check patch of mine rather soon because right now nova will wait until timeout not knowing it wasn't allowed to copy an image
14:41:37 <abhishekk> +1
14:41:50 <dansmith> and extend it with the expanded public stuff
14:42:09 <jokke> dansmith: yep agreed, I just targeted it to Victoria and Ussuri as high priority ... that is not by any means Nova specific fault
14:42:22 <abhishekk> Moving ahead as has less time
14:42:31 <dansmith> thanks
14:42:42 <abhishekk> #topic Copy Image race condition
14:42:57 <abhishekk> #link https://bugs.launchpad.net/glance/+bug/1884596
14:42:57 <openstack> Launchpad bug 1884596 in Glance "image import copy-to-store will start multiple importing threads due to race condition" [Critical,New] - Assigned to Abhishek Kekane (abhishek-kekane)
14:43:22 <abhishekk> this is another bug we had found around copy-image operation
14:44:26 <jokke> And just for the record this is just "copy-image" moethod specific as the image state transition catches it in the actual import jobs
14:44:58 <abhishekk> This is valid race condition and needs to be addressed, as we have common staging area, two different API calls to same image will provide unexpected results
14:45:34 <abhishekk> I am working with dansmith to solve this issue
14:45:40 <jokke> it will also cause all kind of mess when they get to the point of location handling and try to add duplicate locations
14:45:55 <abhishekk> yes
14:46:18 <jokke> dansmith: thanks for good catch on this too
14:46:25 <abhishekk> dansmith, has added one patch to update image property for image only once
14:46:35 <abhishekk> #link https://review.opendev.org/737868
14:46:57 <dansmith> just got a couple tweaks from mike, which I will make, but otherwise that'll do it
14:47:09 <abhishekk> please have a look as well, I need to build my solution around this patch
14:47:54 <abhishekk> moving ahead
14:48:10 <abhishekk> #topic openstackclient/sdk patches
14:48:18 <abhishekk> rosmaita, this is you?
14:48:27 <dansmith> mordred: I think
14:48:49 <jokke> ^^
14:48:52 <rosmaita> i added them, but it's mordred's issue
14:48:59 <dansmith> these are patches to make osc able to do the import flow
14:49:08 <dansmith> which I've modified the devstack patch to use instead of glanceclient
14:49:22 <dansmith> still working on getting them tested against a real deployment, but latest rev is probably good
14:49:26 <dansmith> we'll know in a few hours
14:49:26 <abhishekk> So this is again came to pace due to Dan's work
14:49:36 <dansmith> \o/
14:49:37 <mordred> heya - yeah
14:50:03 <mordred> mostly just wanted ot let folks know they're there - they worked in my local testing, so hopefully dan's patch goes green now
14:50:15 <abhishekk> will have a look at those patches
14:50:45 <abhishekk> #topic Open discussion
14:50:46 <dansmith> it won't go green for other reasons like the image sharing thing, but.. it should go red later :)
14:51:05 <jokke> :)
14:51:05 <mordred> please do - we also discovered that the api-ref docs and reality don't match on all_stores_must_succeed
14:51:21 <jokke> mordred: oh?
14:51:27 <jokke> please do tell
14:51:30 <rosmaita> i filed a bug
14:51:35 <jokke> ok, cool
14:51:45 <abhishekk> there is one location where it states default as False
14:51:46 <mordred> yeah. the actual default behavior is true in the glance code - the api-ref says defaults to false in one place and is silent in the other place
14:51:49 <mordred> yeah
14:52:06 <abhishekk> will put a correction patch soon
14:52:12 <abhishekk> Then there is 3rd bug which we found yesterday
14:52:20 <rosmaita> https://bugs.launchpad.net/glance/+bug/1884996
14:52:20 <openstack> Launchpad bug 1884996 in Glance "default value for all_stores_must_succeed is not stated" [Low,Triaged]
14:52:27 <abhishekk> #link https://bugs.launchpad.net/glance/+bug/1885003
14:52:28 <openstack> Launchpad bug 1885003 in Glance "Interrupted copy-image may break a subsequent operation or worse" [High,In progress]
14:52:37 <jokke> ah, nice catch then, lets make sure that is corrected. IIRC the spec clearly defined it should be true so docs neds changing
14:52:40 <rosmaita> "or worse" sounds bad
14:53:22 <abhishekk> agree
14:53:33 <jokke> alistarle: did you have something to continue around the sparse upload stuff
14:53:49 <abhishekk> I have a fix up for above bug
14:54:10 <abhishekk> I need to change bug title as well
14:54:42 <abhishekk> #link https://review.opendev.org/#/c/737867/
14:55:20 <alistarle> Yes about the flag name, you expect to have another flag for optimization from ceph engineering or keeping the same "thin provisionning" one ?
14:55:23 <abhishekk> Last 5 minutes
14:56:00 <alistarle> because seems ceph engineering optimization don't relate to thin provisionning
14:56:17 <jokke> alistarle: so that's why I wanted to change the name. I'd like to keep it only as one option
14:56:51 <jokke> alistarle: that way if "thin provisioning" is enabled we use the sparse upload and if it's disabled we use th rewrite buffer mechanism
14:56:56 <alistarle> Oh ok I get it, you don't want to be called "thin provisionning" like
14:57:03 <alistarle> I thought it was your recommandation
14:57:12 <jokke> both ways we don't send majority of the zeros across but the end result in the storege changes ;)
14:57:44 <jokke> So everybody wins :)
14:58:29 <alistarle> Ok understood
14:58:33 <mordred> I asked rosmaita about this briefly yesterday ... he said that there is no current support for having import staging space be in a backing store like ceph
14:58:34 <alistarle> I will update that
14:58:42 <mordred> are there any future plans for that or blueprint already?
14:59:11 <jokke> mordred: I'm working on it, but haven't had time to set it up yet to test
14:59:15 <mordred> jokke: cool!
14:59:24 <mordred> mnaser: ^^
15:00:33 <abhishekk> time's up
15:00:38 <rosmaita> we made the change a while back so that it uses a local filesystem store for the stage, so what jokke is working on is the next step
15:00:52 <jokke> mordred: The (first) approach would be utilizing cephfs so stuff like qemu-image will work. and second would be to look into using actual rbd driver for it if none of the file operations are needed
15:00:59 <abhishekk> shift to #openstack-glance for any further discussion
15:01:07 <abhishekk> Thank you all
15:01:09 <rosmaita> bye!
15:01:13 <jokke> thanks!
15:01:18 <abhishekk> #endmeeting