09:00:03 <flwang1> #startmeeting magnum 09:00:04 <openstack> Meeting started Wed May 27 09:00:03 2020 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:07 <openstack> The meeting name has been set to 'magnum' 09:00:12 <flwang1> strigazi: hey, how are you 09:00:44 <strigazi> all good 09:01:07 <flwang1> brtknr: ping 09:01:14 <flwang1> strigazi: let's wait brtknr a while 09:01:28 <strigazi> sure thing 09:01:59 <flwang1> strigazi: i saw you proposed a topic about health status :) 09:03:18 <strigazi> yeap, now that we can control it more we can enhance? 09:03:53 <flwang1> i'm super happy to see you start to look into this 09:04:34 <flwang1> strigazi: sure 09:04:48 <flwang1> strigazi: should we start or wait a bit longer? 09:05:39 <strigazi> We can start from light things 09:05:49 <strigazi> eg the worklist bullet I added 09:06:16 <flwang1> strigazi: no problem 09:07:05 <flwang1> strigazi: which one you want to bring first? 09:07:35 <strigazi> #topic using storyboard 09:07:51 <strigazi> you need to do the topic thing 09:08:18 <strigazi> flwang1: ^^ 09:08:26 <brtknr> hello sorry was dealing with an emergency 09:08:36 <strigazi> brtknr: all good now? 09:08:38 <flwang1> #topic using storboard 09:08:42 <brtknr> ye kind of 09:09:27 <flwang1> strigazi: i'm happy to use storyboard, but i'd like to understand if there is any rule/policy/process we need to follow 09:09:38 <strigazi> To be able to track what we do and what others ask us to do I started adding the magnum-victoria tag 09:10:01 <strigazi> The active stories at the moment are 71 09:10:20 <strigazi> we can easily review them now 09:10:28 <strigazi> We have two options IMO 09:10:46 <strigazi> either let users add the tag (we can't control this) 09:11:12 <strigazi> or we add the stories be hand in a worklist (there are ACLs for this, I think) 09:11:32 <strigazi> for example I created this worklist https://storyboard.openstack.org/#!/worklist/865 09:11:56 <strigazi> that included all stories that that have the magnum-victoria tag 09:12:07 <strigazi> and they are active 09:12:29 <brtknr> strigazi: flwang1 sorry guys i might have to drop out of the meeting today, i will catch up on the discussion later 09:12:43 <strigazi> brtknr sure 09:12:45 <flwang1> brtknr: no worries, take care 09:12:49 <brtknr> have a good day 09:12:59 <strigazi> brtknr: cheers 09:13:25 <flwang1> strigazi: can any user add story to that list? 09:13:37 <flwang1> or only people got the permission? 09:14:17 <strigazi> flwang1 if we use the tag, anyone 09:14:41 <strigazi> flwang1: if we add stories manually to the worklist (only people with permission) 09:14:43 <flwang1> ok, that's alright, i don't think much people will do that 09:15:15 <flwang1> strigazi: and I think we can remove story by removing the tag from stories? 09:15:28 <strigazi> flwang1: and here is a board with more lists https://storyboard.openstack.org/#!/board/212 09:15:28 <flwang1> which one you prefer? 09:15:37 <strigazi> flwang1 I think the tag makes sense at the moment 09:15:44 <flwang1> agree 09:15:55 <flwang1> it's easy to manage 09:16:23 <strigazi> we can review what we want to do and add the tag, then it will appear in the lists of the board 09:16:26 <flwang1> at this stage, are you suggesting we use the magnum-victoria to track all the work for V? 09:16:31 <strigazi> yes 09:16:35 <flwang1> sounds good 09:17:07 <strigazi> I have added you both in the board https://storyboard.openstack.org/#!/board/212 09:17:18 <flwang1> can we do that individually and then we can go through on virtual PTG? 09:18:04 <strigazi> we can do it individually i think 09:18:07 <strigazi> it is not much 09:18:11 <flwang1> cool 09:18:14 <strigazi> we can present it at the PTF 09:18:16 <strigazi> we can present it at the PTG 09:18:23 <flwang1> right, agree 09:18:35 <flwang1> thanks for working on this 09:19:26 <strigazi> ideally we need go through the open stories here: https://storyboard.openstack.org/#!/project/openstack/magnum and add the tag if we want 09:19:50 <flwang1> move on? 09:19:50 <flwang1> anything else? 09:20:12 <strigazi> let's move on 09:20:24 <flwang1> yep, given you have closed a lot, so i assume that's not much 09:20:49 <strigazi> yeah it is only recent things 09:20:58 <flwang1> strigazi: which one you want to discuss next? node/API version in health_status_reason? 09:21:21 <strigazi> let's do health first, it is trivial more or less 09:22:07 <flwang1> #topic node/API version in health_status_reason 09:22:14 <flwang1> strigazi: tell me more 09:22:29 <flwang1> why do you want to add version? for NG 09:22:45 <strigazi> not for NG 09:23:08 <strigazi> We want to have a view of alive clusters and which version are they running 09:23:32 <flwang1> hmm... can't you get from the coe_version? 09:24:05 <strigazi> what is what magnum expects to have in the cluster 09:24:25 <strigazi> or what it tried to have 09:24:41 <flwang1> you mean coe_version or the health_status_reason? 09:24:52 <strigazi> coe_version is the desired 09:25:02 <strigazi> health_status_reason? will be the "current" 09:25:48 <flwang1> firstly, i think anything can help admin/user understand the health status can be put into the dict 09:25:58 <flwang1> so i'm totally ok with that 09:26:08 <flwang1> i'm just trying to understand the user case 09:26:40 <strigazi> the biggest use case is old clusters that we don't know what is going on 09:26:41 <flwang1> do you mean master and worker may run different version? 09:26:49 <flwang1> ah, i see :D 09:27:02 <strigazi> and clusters that the user sshed and did things 09:27:07 <flwang1> because it has been upgraded and we lost the versions? 09:27:14 <strigazi> yeah, bith 09:27:17 <strigazi> yeah, both 09:27:22 <flwang1> right 09:27:25 <flwang1> i'm ok with that 09:27:44 <flwang1> throw a patch and i'm happy to review 09:28:01 <flwang1> move on? 09:28:05 <strigazi> wait 09:28:15 <strigazi> I have the content in the etherpad 09:28:20 <strigazi> and there is a follow up 09:28:54 <strigazi> the dict we have 09:28:58 <strigazi> it is not nested 09:28:59 <flwang1> i'm ok with that format 09:29:16 <flwang1> strigazi: we touched that topic before 09:29:21 <strigazi> yeap 09:29:23 <flwang1> we can do nested dict 09:29:33 <strigazi> but we can't really do it 09:29:39 <flwang1> why? 09:29:50 <strigazi> it will be string, no? 09:30:20 <flwang1> do you mean it will be saved in db as string? 09:30:35 <strigazi> in db it will be string anyway 09:30:44 <strigazi> I mean the type in he API 09:30:59 <flwang1> right 09:31:03 <flwang1> i see your point 09:31:17 <flwang1> i need to do some test to double confirm 09:31:31 <flwang1> but we probably can't do nested IIRC 09:31:47 <flwang1> we can live with flat dict until we figure out a better way 09:31:55 <strigazi> yeap, we can't do arbitrary depth or mix list and dict 09:32:54 <strigazi> so what we do with the depth? 09:33:10 <strigazi> strings that are escaped json? 09:33:37 <flwang1> strigazi: hmm... that's ugly :( 09:34:20 <flwang1> can we just use flat dict for now? i don't have a good answer tbh 09:34:28 <strigazi> flwang1: and we have the issue for helm-config (there base64 makes some sense) 09:34:46 <strigazi> flwang1 for health_status we can do flat 09:34:50 <flwang1> ahhhhh 09:34:58 <strigazi> for helm, we can not 09:35:15 <strigazi> so to wrap the health_status subject 09:35:25 <strigazi> you are OK with having the version 09:35:34 <strigazi> and have a flat dict for now, correct? 09:35:38 <flwang1> yes 09:35:48 <strigazi> let's switch to helm-config? 09:36:31 <flwang1> for helm config, can we just read it as a escapsed string? and do the magic on server side? 09:36:39 <flwang1> #topic helm-config 09:37:05 <strigazi> #action striazi to propose a patch for adding node/API version in health_status_reason with a flat dict 09:37:12 <strigazi> #undo 09:37:20 <strigazi> #action strigazi to propose a patch for adding node/API version in health_status_reason with a flat dict 09:37:39 <flwang1> #action strigazi to propose a patch for adding node/API version in health_status_reason with a flat dict 09:37:47 <strigazi> flwang1: for helm-config, I don't know, some parsing will happen 09:38:03 <strigazi> either to escape the JSON 09:38:18 <flwang1> strigazi: can we use json.dumps and json.loads 09:38:23 <strigazi> or encode it ot base64 and pass it as is 09:38:34 <flwang1> on the two sides, to make sure they're compatible 09:38:37 <strigazi> flwang1: can we? 09:38:53 <flwang1> don't know, just thinking aloud 09:39:19 <strigazi> I don't know either 09:39:40 <flwang1> for this piece, you guys probably need to a small PoC 09:39:58 <strigazi> the relevant part is here: https://review.opendev.org/#/c/727756/4/specs/victoria/helm-config.rst@87 09:40:05 <flwang1> i don't like the idea of base64, TBH 09:41:08 <strigazi> can you leave a comment with preference? I guess you propose to try escaped json 09:42:22 <flwang1> sure, will do 09:43:19 <strigazi> move on? 09:43:41 <flwang1> comments added 09:43:46 <flwang1> which next? 09:44:12 <strigazi> #topic resize: Send only nodes_to_remove and node_count 09:44:22 <flwang1> #topic resize: Send only nodes_to_remove and node_count 09:44:30 <strigazi> https://review.opendev.org/#/c/730868/ 09:44:30 <flwang1> i didn't get the issue 09:45:05 <strigazi> resize need to do only resize 09:45:19 <strigazi> so do only resize we need send only the node_count 09:45:26 <strigazi> and which nodes to drop 09:45:42 <strigazi> after train, stein cluster were breaking 09:46:05 <strigazi> something similar can happen again (doesn't happen now) 09:46:59 <flwang1> hmmm... that's the kind of patch i don't want to review :D 09:47:15 <flwang1> it's too dangerous 09:47:28 <strigazi> what is dangerous is what we have now 09:47:40 <strigazi> see the commit message 09:48:08 <flwang1> i understand. 09:48:24 <flwang1> please give me some time to review it 09:48:25 <strigazi> now we have this: https://review.opendev.org/#/c/642009/7/magnum/drivers/heat/driver.py@176 09:48:55 <flwang1> strigazi: can you please add a steps how to reproduce the issue? 09:49:28 <strigazi> flwang1: it now reproducible now (at least the same thing that I had issues with) 09:50:01 <strigazi> flwang1: it not reproducible now (at least the same thing that I had issues with) 09:50:18 <strigazi> but it was catastrophic i.e. it was replacing all nodes 09:50:34 <flwang1> do you have mean i have to have a very old cluster to reproduce this? 09:50:47 <strigazi> stein 09:51:02 <strigazi> not very old 09:51:12 <strigazi> you upgraded recently, no? 09:51:16 <flwang1> yes 09:51:21 <strigazi> so not that old 09:51:56 <flwang1> let me ask more questions 09:52:04 <strigazi> check this part of the code you had: 09:52:31 <flwang1> even if user only have the default-master and default-worker NGs, when doing resize, all nodes will be replaced? 09:53:09 <strigazi> https://github.com/openstack/magnum/blob/stable/stein/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L705 09:53:24 <strigazi> it really depends 09:53:39 <openstackgerrit> Merged openstack/magnum stable/ussuri: atomic: Do not install control-plane on minions https://review.opendev.org/731092 09:54:03 <strigazi> it depends on what is here: https://github.com/openstack/magnum/blob/stable/stein/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L510 09:54:23 <strigazi> I don't know about your cherry-picks 09:55:16 <strigazi> flwang1: take your time and review this method: https://review.opendev.org/#/c/730868/1/magnum/drivers/heat/driver.py@264 09:55:27 <strigazi> it is very small and very very clear 09:56:05 <strigazi> instead of sending many things in this dict: 09:56:05 <flwang1> i see the issue now 09:56:11 <strigazi> heat_params.update(scale_params) 09:56:16 <strigazi> 'parameters': heat_params, 09:56:42 <flwang1> you mean it will rebuild all the nodes of that NG? 09:56:50 <strigazi> guarantee to end only two parameters 09:57:09 <strigazi> depenfing on the the parameter both default NGS 09:57:29 <strigazi> maybe it won't rebuild anything 09:57:38 <strigazi> but maybe is not enough 09:58:54 <flwang1> ok, i see. thanks for the heads up 09:59:02 <flwang1> i will review it tomorrow 09:59:55 <strigazi> it is not urgnet for master branch but we nee it avoid breaking things 10:00:56 <strigazi> I think out time is up 10:01:01 <strigazi> I think our time is up 10:01:03 <flwang1> strigazi: right. 10:01:13 <flwang1> it's a critical fix 10:01:22 <flwang1> let's get it done asap 10:01:26 <flwang1> #endmeeting