#openstack-meeting-4 log

14:00:03 <abhishekk> #startmeeting glance
14:00:03 <openstack> Meeting started Thu Nov 21 14:00:03 2019 UTC and is due to finish in 60 minutes.  The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:07 <abhishekk> #topic roll call
14:00:07 <openstack> The meeting name has been set to 'glance'
14:00:10 <jokke_> o/
14:00:13 <abhishekk> #link https://etherpad.openstack.org/p/glance-team-meeting-agenda
14:00:14 <davee_> o/
14:00:18 <abhishekk> o/
14:01:06 <rosmaita> o/
14:01:17 <yebinama> o/
14:01:17 <abhishekk> cool, nice turnaround today
14:01:23 <abhishekk> Lets start
14:01:39 <abhishekk> Skipping updates part as don't have any update to share
14:01:58 <abhishekk> #topic release/periodic jobs update
14:02:15 <abhishekk> So, milestone 1 is just two weeks away
14:02:28 <abhishekk> and we have plenty on the plate to finish
14:02:55 <abhishekk> Need reviews on specs
14:03:19 <jokke_> yes, will get on that today or tomorrow
14:03:26 <abhishekk> On periodic jobs front everything is good
14:03:27 <jokke_> hopefully both
14:03:36 <abhishekk> jokke_, thank you
14:03:40 <tosky> o/
14:03:44 <abhishekk> both are interrelated
14:04:17 <abhishekk> enough from this topic
14:04:32 <abhishekk> #topic import image in multi stores
14:04:43 <abhishekk> #link https://review.opendev.org/#/c/669201/11
14:05:09 <abhishekk> Specs looks in good shape and everything is covered what we discussed in PTG
14:05:28 <abhishekk> thanks to yebinama for that
14:05:44 <yebinama> Thanks for the reviews :)
14:06:01 <yebinama> There is just one point left
14:06:07 <abhishekk> We just had one question, are we only targeting this to change import API and not old upload API
14:06:25 <abhishekk> yebinama, do you have anything else in mind?
14:06:26 <jokke_> ok, I put this as priority review on top of my queue
14:06:40 <abhishekk> great
14:06:49 <yebinama> abhishekk No that was the same point
14:06:54 <abhishekk> rosmaita, if you get some time, please have a look as well
14:07:04 <rosmaita> ack
14:07:13 <rosmaita> i'll make some time today
14:07:23 <abhishekk> yebinama, jokke_ will address that on specs, lets move ahead
14:07:30 <jokke_> abhishekk: yes, only image import. As the image upload is asynchronous, keeping the client hanging there for 3hours waiting glance to upload the image to 10 stores sounds like horrible idea
14:07:31 <abhishekk> rosmaita, \o/
14:07:50 <jokke_> sorry, image upload is synchronous, not async
14:08:20 <abhishekk> jokke_, yes, makes sense
14:08:38 <yebinama> jokke_ ok
14:09:06 <jokke_> we would also need to make the image upload to cache the data locally instead of streaming it right through etc. too many things that can go wrong
14:09:40 <abhishekk> jokke_, right
14:10:22 <abhishekk> moving ahead
14:10:27 <abhishekk> #topic copy existing image in multiple stores
14:10:36 <abhishekk> #link https://review.opendev.org/#/c/694724/2
14:10:55 <abhishekk> This design is dependent on importing image in multiple stores
14:11:14 <abhishekk> I have tried to cover all the discussion from the shanghai PTG
14:11:35 <abhishekk> Would like to have some reviews before I could start with the PoC
14:12:23 <abhishekk> Initial idea is, I will introduce one new task which will copy the existing image in staging area and link that task to regular import flow
14:12:41 <jokke_> abhishekk: I've been toying around this already. I'll put this second on the queue right after the import and look how they align easiest way to get them done
14:12:58 <abhishekk> jokke_, great
14:13:47 <abhishekk> that's it, moving to Open discussion
14:13:59 <abhishekk> #topic Open Discussion
14:14:09 <jokke_> Just on the copy import job.
14:14:19 <abhishekk> go ahead
14:15:26 <jokke_> The way I've been looking into it is to have it's own flow, first doing some verifications (like having access to all the stores etc.) and then pulling it down (likely) to staging and ...
14:15:42 <jokke_> yebinama: this nxt part involves you as well
14:16:35 <abhishekk> jokke_, that is what I have covered in the specs (hopefully I have covered verification part as well)
14:16:38 <jokke_> we'd like to have that multi store upload loop task in a place where it can be easily importd to the copy job as well, so not integrated part of the current glance-direct flow
14:17:10 <jokke_> so we don't need to duplicate it all over the place
14:17:23 <abhishekk> makes sense
14:17:30 <yebinama> yes sure
14:17:51 <jokke_> best case scenario would be to import the tasks into those 2 and just inject them to the correct part of the flow if needed
14:18:02 <abhishekk> +q
14:18:06 <abhishekk> +1
14:18:48 <jokke_> there is lots of tasks in those flows I've been hoping to do the same but just haven't had time to refactor them
14:19:32 <abhishekk> I can do that once PoC or yebinama's patch is up for reference
14:19:48 <jokke_> yebinama: so if you write the task itself in a way that it can handle uploading to 1 store or to multiples, that would be best as then we can just replace the current upload task with that in all of our flows and get the benefit everywhere
14:20:12 <yebinama> Yep that's what I've done
14:20:21 <jokke_> yebinama: amazing!!!!
14:20:31 <yebinama> I've modified set_data to handle a list of stores
14:20:56 <yebinama> (or one if using old configuration)
14:21:04 <jokke_> gr8 ... I'll have a look on your patch after I've reviewed the spec and lets get those moving
14:21:13 <yebinama> I think I'll just have to slightly change it
14:21:13 <abhishekk> ++
14:21:27 <yebinama> since I set image.locations at the end
14:21:42 <yebinama> I will have to update it instead
14:21:44 <jokke_> yeah, we need to look into that for failure control
14:21:53 <abhishekk> yebinama, you need to check before every import if image is still available or not
14:22:23 <yebinama> abhishekk yes I need to change that to
14:22:52 <jokke_> yebinama: yeah, that was one thing we were discussing in the PTG, we want to check in between every store that the user did not delete the image in between and we just keep wasting resources
14:23:28 <yebinama> jokke_ if you have time to take a look at the code I've already uploaded, maybe you could say if this feels right or not
14:23:40 <jokke_> I hopefully have also test environment soon so I can play around how this actually behaves :D
14:23:49 <jokke_> yebinama: yes I will do
14:23:50 <yebinama> great :)
14:24:01 <abhishekk> jokke_, o/
14:24:18 <yebinama> It's based ont the first version of the specs but the essential is here
14:24:40 <abhishekk> anything else on this
14:25:08 <jokke_> yebinama: cool, np lets get this hammered down asap
14:25:18 <jokke_> I think I'm covered on this for now
14:25:25 <abhishekk> cool
14:25:56 <abhishekk> tosky, thank you for your backport patch, stable/stein job is passing now after making it base patch
14:26:18 <abhishekk> #link https://review.opendev.org/#/c/695176/
14:26:21 <tosky> oh, I missed the changed topic
14:26:41 <abhishekk> jokke_, rosmaita kindly have a look and lets get this rolling
14:26:57 <tosky> I explained everything in my last comment on https://review.opendev.org/#/c/691308/
14:27:30 <tosky> so I needed another patch to make the usage of python2 explicit (a backport of a jokke_'s patch) and another one to satisfy the request to remove py35
14:27:34 <tosky> for a grand total of 5 patches :)
14:27:43 <abhishekk> tosky, yes thank you for following up with infra team
14:27:52 <tosky> also, if you can please merge this grenade patch: https://review.opendev.org/#/c/695102/
14:28:36 <jokke_> tosky: thanks for taking care of that :)
14:29:10 <abhishekk> tosky, ack
14:29:40 <rosmaita> tox is weird
14:30:10 <abhishekk> +2, :D
14:31:04 <abhishekk> cool, anything else
14:31:11 <abhishekk> we have 30 minutes left :D
14:31:15 <davee_> Did I read correctly that a new goal is to rewrite the new developer and PTL get started docs for all projects to make them more consistent and same location for each project?
14:31:26 <jokke_> ok, so I have + vote on all of them that are left :P
14:32:36 <jokke_> davee_: I'm not sure I heard that there is community goal to have per project contributor and ptl guides, which we do have
14:33:06 <jokke_> I'm sure if people are interested of moving/rewriting them they can allocate some bodies to do that :P
14:33:15 <abhishekk> :P
14:34:51 <abhishekk> shall we wrap early to utilize remaining time?
14:35:23 <davee_> well if it does drop, I will volunteer to work on that one
14:35:39 <jokke_> I don't have anything else apart from that cluster awareness approach if we want to discuss that here
14:35:51 <abhishekk> jokke_, we can
14:36:17 <jokke_> davee_: feel free to have a look what the community goa is actually after and close any gaps we might have if you have spare time for it
14:37:08 <jokke_> So I got some BBQ, wine and beer into me at Sunday night with good friend of mine and we were discussing about message queues, as you would in that state
14:37:51 <jokke_> And we were talking about the problem of having fanout rpc calls that needs to be somehow coordinated and only ran once
14:39:32 <abhishekk> I could also use some BBQ and beer while working on Copy tasks :P
14:39:48 <abhishekk> jokke_, any workaround you found?
14:40:15 <jokke_> So most likely the easiest and smartest way to do this is actually define "dry runs" for those tasks we want to be done on some other node than the one receiving the original request, and when we get responses of either success or failure from the dry run we then use more traditional rpc call to actually request the operation to be performed by only one node
14:42:09 <jokke_> the dry run doesn't need to be actual non changing run of the function called, but we would need to define those dry run functions that are checking the prerequisites. Like for delete "can this node access all the locations - if yes, respond success; if no respond failure
14:42:51 <jokke_> and then the original requestor filters the successes from the response queue, picks one randomly and sends the actual call to that host
14:43:03 <abhishekk> if that node return failure then we will divert that call to another node?
14:44:05 <jokke_> well the node has already checked that it meets the requisites, so if it fails then we need to log error out as something more drastic is going on why we can't fulfil the request we thought we should
14:44:26 <abhishekk> ok
14:44:42 <jokke_> so at that point we just report failure to the client and log error out of it
14:44:57 <abhishekk> got it
14:45:21 <abhishekk> sounds like a good plan
14:45:38 <jokke_> so for example delete, if the client is connected to node that can't reach all locations, the client will just see the request taking bit longer due to this rpc dance
14:45:55 <jokke_> we always process the calls locally if we can to optimize the time used
14:46:38 <abhishekk> right
14:46:57 <jokke_> so ofr end user the difference would be that the calls that previously would just fail, might now take longer but succeed ... the responses etc. will be the same
14:47:36 <abhishekk> yes
14:48:00 <jokke_> but by doing this "dry run" we avoid all kind of locking and race condition hell we would have had to tackle otherwise
14:48:15 <abhishekk> So we need dry run logic for each plugin/task
14:48:47 <jokke_> for each call we plan to do over the cluster awareness rpc
14:48:56 <abhishekk> yes
14:49:22 <abhishekk> delete, import, copying and caching (so far)
14:49:50 <jokke_> which is great in a sense that we don't need to know everything and have it defined. We make the framework to work and we introduce those things based on where we see the need to utilize the cluster awareness
14:49:57 <jokke_> mhm
14:50:08 <abhishekk> agree
14:50:35 <abhishekk> cool, I will spend some time after milestone 1 on this
14:50:37 <jokke_> I think we can literally start with the delete as it's the most simple use case and expand from there
14:50:52 <abhishekk> you stole my words :D
14:51:40 <jokke_> _BUT_ we need to get the framework to work first ... which is causing the grey hair and hopefully I have debugging environment for it soon
14:52:16 <jokke_> If this sounds reasonable. I get back to work and modify the approach accordingly
14:52:19 <abhishekk> jokke_, yes once we have environment it will be pretty easy to debug and analyze
14:52:49 <abhishekk> sounds good to me
14:52:50 <jokke_> that's all from me unless anyone has questions about this
14:53:43 <abhishekk> I don't have any, will ping you if something pops up
14:54:28 <abhishekk> Cool, thank you guys, see you next week
14:54:32 <jokke_> thanks all
14:54:40 <yebinama> bye
14:54:56 <abhishekk> #endmeeting