14:00:03 #startmeeting glance 14:00:03 Meeting started Thu Nov 21 14:00:03 2019 UTC and is due to finish in 60 minutes. The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:07 #topic roll call 14:00:07 The meeting name has been set to 'glance' 14:00:10 o/ 14:00:13 #link https://etherpad.openstack.org/p/glance-team-meeting-agenda 14:00:14 o/ 14:00:18 o/ 14:01:06 o/ 14:01:17 o/ 14:01:17 cool, nice turnaround today 14:01:23 Lets start 14:01:39 Skipping updates part as don't have any update to share 14:01:58 #topic release/periodic jobs update 14:02:15 So, milestone 1 is just two weeks away 14:02:28 and we have plenty on the plate to finish 14:02:55 Need reviews on specs 14:03:19 yes, will get on that today or tomorrow 14:03:26 On periodic jobs front everything is good 14:03:27 hopefully both 14:03:36 jokke_, thank you 14:03:40 o/ 14:03:44 both are interrelated 14:04:17 enough from this topic 14:04:32 #topic import image in multi stores 14:04:43 #link https://review.opendev.org/#/c/669201/11 14:05:09 Specs looks in good shape and everything is covered what we discussed in PTG 14:05:28 thanks to yebinama for that 14:05:44 Thanks for the reviews :) 14:06:01 There is just one point left 14:06:07 We just had one question, are we only targeting this to change import API and not old upload API 14:06:25 yebinama, do you have anything else in mind? 14:06:26 ok, I put this as priority review on top of my queue 14:06:40 great 14:06:49 abhishekk No that was the same point 14:06:54 rosmaita, if you get some time, please have a look as well 14:07:04 ack 14:07:13 i'll make some time today 14:07:23 yebinama, jokke_ will address that on specs, lets move ahead 14:07:30 abhishekk: yes, only image import. As the image upload is asynchronous, keeping the client hanging there for 3hours waiting glance to upload the image to 10 stores sounds like horrible idea 14:07:31 rosmaita, \o/ 14:07:50 sorry, image upload is synchronous, not async 14:08:20 jokke_, yes, makes sense 14:08:38 jokke_ ok 14:09:06 we would also need to make the image upload to cache the data locally instead of streaming it right through etc. too many things that can go wrong 14:09:40 jokke_, right 14:10:22 moving ahead 14:10:27 #topic copy existing image in multiple stores 14:10:36 #link https://review.opendev.org/#/c/694724/2 14:10:55 This design is dependent on importing image in multiple stores 14:11:14 I have tried to cover all the discussion from the shanghai PTG 14:11:35 Would like to have some reviews before I could start with the PoC 14:12:23 Initial idea is, I will introduce one new task which will copy the existing image in staging area and link that task to regular import flow 14:12:41 abhishekk: I've been toying around this already. I'll put this second on the queue right after the import and look how they align easiest way to get them done 14:12:58 jokke_, great 14:13:47 that's it, moving to Open discussion 14:13:59 #topic Open Discussion 14:14:09 Just on the copy import job. 14:14:19 go ahead 14:15:26 The way I've been looking into it is to have it's own flow, first doing some verifications (like having access to all the stores etc.) and then pulling it down (likely) to staging and ... 14:15:42 yebinama: this nxt part involves you as well 14:16:35 jokke_, that is what I have covered in the specs (hopefully I have covered verification part as well) 14:16:38 we'd like to have that multi store upload loop task in a place where it can be easily importd to the copy job as well, so not integrated part of the current glance-direct flow 14:17:10 so we don't need to duplicate it all over the place 14:17:23 makes sense 14:17:30 yes sure 14:17:51 best case scenario would be to import the tasks into those 2 and just inject them to the correct part of the flow if needed 14:18:02 +q 14:18:06 +1 14:18:48 there is lots of tasks in those flows I've been hoping to do the same but just haven't had time to refactor them 14:19:32 I can do that once PoC or yebinama's patch is up for reference 14:19:48 yebinama: so if you write the task itself in a way that it can handle uploading to 1 store or to multiples, that would be best as then we can just replace the current upload task with that in all of our flows and get the benefit everywhere 14:20:12 Yep that's what I've done 14:20:21 yebinama: amazing!!!! 14:20:31 I've modified set_data to handle a list of stores 14:20:56 (or one if using old configuration) 14:21:04 gr8 ... I'll have a look on your patch after I've reviewed the spec and lets get those moving 14:21:13 I think I'll just have to slightly change it 14:21:13 ++ 14:21:27 since I set image.locations at the end 14:21:42 I will have to update it instead 14:21:44 yeah, we need to look into that for failure control 14:21:53 yebinama, you need to check before every import if image is still available or not 14:22:23 abhishekk yes I need to change that to 14:22:52 yebinama: yeah, that was one thing we were discussing in the PTG, we want to check in between every store that the user did not delete the image in between and we just keep wasting resources 14:23:28 jokke_ if you have time to take a look at the code I've already uploaded, maybe you could say if this feels right or not 14:23:40 I hopefully have also test environment soon so I can play around how this actually behaves :D 14:23:49 yebinama: yes I will do 14:23:50 great :) 14:24:01 jokke_, o/ 14:24:18 It's based ont the first version of the specs but the essential is here 14:24:40 anything else on this 14:25:08 yebinama: cool, np lets get this hammered down asap 14:25:18 I think I'm covered on this for now 14:25:25 cool 14:25:56 tosky, thank you for your backport patch, stable/stein job is passing now after making it base patch 14:26:18 #link https://review.opendev.org/#/c/695176/ 14:26:21 oh, I missed the changed topic 14:26:41 jokke_, rosmaita kindly have a look and lets get this rolling 14:26:57 I explained everything in my last comment on https://review.opendev.org/#/c/691308/ 14:27:30 so I needed another patch to make the usage of python2 explicit (a backport of a jokke_'s patch) and another one to satisfy the request to remove py35 14:27:34 for a grand total of 5 patches :) 14:27:43 tosky, yes thank you for following up with infra team 14:27:52 also, if you can please merge this grenade patch: https://review.opendev.org/#/c/695102/ 14:28:36 tosky: thanks for taking care of that :) 14:29:10 tosky, ack 14:29:40 tox is weird 14:30:10 +2, :D 14:31:04 cool, anything else 14:31:11 we have 30 minutes left :D 14:31:15 Did I read correctly that a new goal is to rewrite the new developer and PTL get started docs for all projects to make them more consistent and same location for each project? 14:31:26 ok, so I have + vote on all of them that are left :P 14:32:36 davee_: I'm not sure I heard that there is community goal to have per project contributor and ptl guides, which we do have 14:33:06 I'm sure if people are interested of moving/rewriting them they can allocate some bodies to do that :P 14:33:15 :P 14:34:51 shall we wrap early to utilize remaining time? 14:35:23 well if it does drop, I will volunteer to work on that one 14:35:39 I don't have anything else apart from that cluster awareness approach if we want to discuss that here 14:35:51 jokke_, we can 14:36:17 davee_: feel free to have a look what the community goa is actually after and close any gaps we might have if you have spare time for it 14:37:08 So I got some BBQ, wine and beer into me at Sunday night with good friend of mine and we were discussing about message queues, as you would in that state 14:37:51 And we were talking about the problem of having fanout rpc calls that needs to be somehow coordinated and only ran once 14:39:32 I could also use some BBQ and beer while working on Copy tasks :P 14:39:48 jokke_, any workaround you found? 14:40:15 So most likely the easiest and smartest way to do this is actually define "dry runs" for those tasks we want to be done on some other node than the one receiving the original request, and when we get responses of either success or failure from the dry run we then use more traditional rpc call to actually request the operation to be performed by only one node 14:42:09 the dry run doesn't need to be actual non changing run of the function called, but we would need to define those dry run functions that are checking the prerequisites. Like for delete "can this node access all the locations - if yes, respond success; if no respond failure 14:42:51 and then the original requestor filters the successes from the response queue, picks one randomly and sends the actual call to that host 14:43:03 if that node return failure then we will divert that call to another node? 14:44:05 well the node has already checked that it meets the requisites, so if it fails then we need to log error out as something more drastic is going on why we can't fulfil the request we thought we should 14:44:26 ok 14:44:42 so at that point we just report failure to the client and log error out of it 14:44:57 got it 14:45:21 sounds like a good plan 14:45:38 so for example delete, if the client is connected to node that can't reach all locations, the client will just see the request taking bit longer due to this rpc dance 14:45:55 we always process the calls locally if we can to optimize the time used 14:46:38 right 14:46:57 so ofr end user the difference would be that the calls that previously would just fail, might now take longer but succeed ... the responses etc. will be the same 14:47:36 yes 14:48:00 but by doing this "dry run" we avoid all kind of locking and race condition hell we would have had to tackle otherwise 14:48:15 So we need dry run logic for each plugin/task 14:48:47 for each call we plan to do over the cluster awareness rpc 14:48:56 yes 14:49:22 delete, import, copying and caching (so far) 14:49:50 which is great in a sense that we don't need to know everything and have it defined. We make the framework to work and we introduce those things based on where we see the need to utilize the cluster awareness 14:49:57 mhm 14:50:08 agree 14:50:35 cool, I will spend some time after milestone 1 on this 14:50:37 I think we can literally start with the delete as it's the most simple use case and expand from there 14:50:52 you stole my words :D 14:51:40 _BUT_ we need to get the framework to work first ... which is causing the grey hair and hopefully I have debugging environment for it soon 14:52:16 If this sounds reasonable. I get back to work and modify the approach accordingly 14:52:19 jokke_, yes once we have environment it will be pretty easy to debug and analyze 14:52:49 sounds good to me 14:52:50 that's all from me unless anyone has questions about this 14:53:43 I don't have any, will ping you if something pops up 14:54:28 Cool, thank you guys, see you next week 14:54:32 thanks all 14:54:40 bye 14:54:56 #endmeeting