09:00:03 #startmeeting magnum 09:00:04 Meeting started Wed May 27 09:00:03 2020 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:07 The meeting name has been set to 'magnum' 09:00:12 strigazi: hey, how are you 09:00:44 all good 09:01:07 brtknr: ping 09:01:14 strigazi: let's wait brtknr a while 09:01:28 sure thing 09:01:59 strigazi: i saw you proposed a topic about health status :) 09:03:18 yeap, now that we can control it more we can enhance? 09:03:53 i'm super happy to see you start to look into this 09:04:34 strigazi: sure 09:04:48 strigazi: should we start or wait a bit longer? 09:05:39 We can start from light things 09:05:49 eg the worklist bullet I added 09:06:16 strigazi: no problem 09:07:05 strigazi: which one you want to bring first? 09:07:35 #topic using storyboard 09:07:51 you need to do the topic thing 09:08:18 flwang1: ^^ 09:08:26 hello sorry was dealing with an emergency 09:08:36 brtknr: all good now? 09:08:38 #topic using storboard 09:08:42 ye kind of 09:09:27 strigazi: i'm happy to use storyboard, but i'd like to understand if there is any rule/policy/process we need to follow 09:09:38 To be able to track what we do and what others ask us to do I started adding the magnum-victoria tag 09:10:01 The active stories at the moment are 71 09:10:20 we can easily review them now 09:10:28 We have two options IMO 09:10:46 either let users add the tag (we can't control this) 09:11:12 or we add the stories be hand in a worklist (there are ACLs for this, I think) 09:11:32 for example I created this worklist https://storyboard.openstack.org/#!/worklist/865 09:11:56 that included all stories that that have the magnum-victoria tag 09:12:07 and they are active 09:12:29 strigazi: flwang1 sorry guys i might have to drop out of the meeting today, i will catch up on the discussion later 09:12:43 brtknr sure 09:12:45 brtknr: no worries, take care 09:12:49 have a good day 09:12:59 brtknr: cheers 09:13:25 strigazi: can any user add story to that list? 09:13:37 or only people got the permission? 09:14:17 flwang1 if we use the tag, anyone 09:14:41 flwang1: if we add stories manually to the worklist (only people with permission) 09:14:43 ok, that's alright, i don't think much people will do that 09:15:15 strigazi: and I think we can remove story by removing the tag from stories? 09:15:28 flwang1: and here is a board with more lists https://storyboard.openstack.org/#!/board/212 09:15:28 which one you prefer? 09:15:37 flwang1 I think the tag makes sense at the moment 09:15:44 agree 09:15:55 it's easy to manage 09:16:23 we can review what we want to do and add the tag, then it will appear in the lists of the board 09:16:26 at this stage, are you suggesting we use the magnum-victoria to track all the work for V? 09:16:31 yes 09:16:35 sounds good 09:17:07 I have added you both in the board https://storyboard.openstack.org/#!/board/212 09:17:18 can we do that individually and then we can go through on virtual PTG? 09:18:04 we can do it individually i think 09:18:07 it is not much 09:18:11 cool 09:18:14 we can present it at the PTF 09:18:16 we can present it at the PTG 09:18:23 right, agree 09:18:35 thanks for working on this 09:19:26 ideally we need go through the open stories here: https://storyboard.openstack.org/#!/project/openstack/magnum and add the tag if we want 09:19:50 move on? 09:19:50 anything else? 09:20:12 let's move on 09:20:24 yep, given you have closed a lot, so i assume that's not much 09:20:49 yeah it is only recent things 09:20:58 strigazi: which one you want to discuss next? node/API version in health_status_reason? 09:21:21 let's do health first, it is trivial more or less 09:22:07 #topic node/API version in health_status_reason 09:22:14 strigazi: tell me more 09:22:29 why do you want to add version? for NG 09:22:45 not for NG 09:23:08 We want to have a view of alive clusters and which version are they running 09:23:32 hmm... can't you get from the coe_version? 09:24:05 what is what magnum expects to have in the cluster 09:24:25 or what it tried to have 09:24:41 you mean coe_version or the health_status_reason? 09:24:52 coe_version is the desired 09:25:02 health_status_reason? will be the "current" 09:25:48 firstly, i think anything can help admin/user understand the health status can be put into the dict 09:25:58 so i'm totally ok with that 09:26:08 i'm just trying to understand the user case 09:26:40 the biggest use case is old clusters that we don't know what is going on 09:26:41 do you mean master and worker may run different version? 09:26:49 ah, i see :D 09:27:02 and clusters that the user sshed and did things 09:27:07 because it has been upgraded and we lost the versions? 09:27:14 yeah, bith 09:27:17 yeah, both 09:27:22 right 09:27:25 i'm ok with that 09:27:44 throw a patch and i'm happy to review 09:28:01 move on? 09:28:05 wait 09:28:15 I have the content in the etherpad 09:28:20 and there is a follow up 09:28:54 the dict we have 09:28:58 it is not nested 09:28:59 i'm ok with that format 09:29:16 strigazi: we touched that topic before 09:29:21 yeap 09:29:23 we can do nested dict 09:29:33 but we can't really do it 09:29:39 why? 09:29:50 it will be string, no? 09:30:20 do you mean it will be saved in db as string? 09:30:35 in db it will be string anyway 09:30:44 I mean the type in he API 09:30:59 right 09:31:03 i see your point 09:31:17 i need to do some test to double confirm 09:31:31 but we probably can't do nested IIRC 09:31:47 we can live with flat dict until we figure out a better way 09:31:55 yeap, we can't do arbitrary depth or mix list and dict 09:32:54 so what we do with the depth? 09:33:10 strings that are escaped json? 09:33:37 strigazi: hmm... that's ugly :( 09:34:20 can we just use flat dict for now? i don't have a good answer tbh 09:34:28 flwang1: and we have the issue for helm-config (there base64 makes some sense) 09:34:46 flwang1 for health_status we can do flat 09:34:50 ahhhhh 09:34:58 for helm, we can not 09:35:15 so to wrap the health_status subject 09:35:25 you are OK with having the version 09:35:34 and have a flat dict for now, correct? 09:35:38 yes 09:35:48 let's switch to helm-config? 09:36:31 for helm config, can we just read it as a escapsed string? and do the magic on server side? 09:36:39 #topic helm-config 09:37:05 #action striazi to propose a patch for adding node/API version in health_status_reason with a flat dict 09:37:12 #undo 09:37:20 #action strigazi to propose a patch for adding node/API version in health_status_reason with a flat dict 09:37:39 #action strigazi to propose a patch for adding node/API version in health_status_reason with a flat dict 09:37:47 flwang1: for helm-config, I don't know, some parsing will happen 09:38:03 either to escape the JSON 09:38:18 strigazi: can we use json.dumps and json.loads 09:38:23 or encode it ot base64 and pass it as is 09:38:34 on the two sides, to make sure they're compatible 09:38:37 flwang1: can we? 09:38:53 don't know, just thinking aloud 09:39:19 I don't know either 09:39:40 for this piece, you guys probably need to a small PoC 09:39:58 the relevant part is here: https://review.opendev.org/#/c/727756/4/specs/victoria/helm-config.rst@87 09:40:05 i don't like the idea of base64, TBH 09:41:08 can you leave a comment with preference? I guess you propose to try escaped json 09:42:22 sure, will do 09:43:19 move on? 09:43:41 comments added 09:43:46 which next? 09:44:12 #topic resize: Send only nodes_to_remove and node_count 09:44:22 #topic resize: Send only nodes_to_remove and node_count 09:44:30 https://review.opendev.org/#/c/730868/ 09:44:30 i didn't get the issue 09:45:05 resize need to do only resize 09:45:19 so do only resize we need send only the node_count 09:45:26 and which nodes to drop 09:45:42 after train, stein cluster were breaking 09:46:05 something similar can happen again (doesn't happen now) 09:46:59 hmmm... that's the kind of patch i don't want to review :D 09:47:15 it's too dangerous 09:47:28 what is dangerous is what we have now 09:47:40 see the commit message 09:48:08 i understand. 09:48:24 please give me some time to review it 09:48:25 now we have this: https://review.opendev.org/#/c/642009/7/magnum/drivers/heat/driver.py@176 09:48:55 strigazi: can you please add a steps how to reproduce the issue? 09:49:28 flwang1: it now reproducible now (at least the same thing that I had issues with) 09:50:01 flwang1: it not reproducible now (at least the same thing that I had issues with) 09:50:18 but it was catastrophic i.e. it was replacing all nodes 09:50:34 do you have mean i have to have a very old cluster to reproduce this? 09:50:47 stein 09:51:02 not very old 09:51:12 you upgraded recently, no? 09:51:16 yes 09:51:21 so not that old 09:51:56 let me ask more questions 09:52:04 check this part of the code you had: 09:52:31 even if user only have the default-master and default-worker NGs, when doing resize, all nodes will be replaced? 09:53:09 https://github.com/openstack/magnum/blob/stable/stein/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L705 09:53:24 it really depends 09:53:39 Merged openstack/magnum stable/ussuri: atomic: Do not install control-plane on minions https://review.opendev.org/731092 09:54:03 it depends on what is here: https://github.com/openstack/magnum/blob/stable/stein/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L510 09:54:23 I don't know about your cherry-picks 09:55:16 flwang1: take your time and review this method: https://review.opendev.org/#/c/730868/1/magnum/drivers/heat/driver.py@264 09:55:27 it is very small and very very clear 09:56:05 instead of sending many things in this dict: 09:56:05 i see the issue now 09:56:11 heat_params.update(scale_params) 09:56:16 'parameters': heat_params, 09:56:42 you mean it will rebuild all the nodes of that NG? 09:56:50 guarantee to end only two parameters 09:57:09 depenfing on the the parameter both default NGS 09:57:29 maybe it won't rebuild anything 09:57:38 but maybe is not enough 09:58:54 ok, i see. thanks for the heads up 09:59:02 i will review it tomorrow 09:59:55 it is not urgnet for master branch but we nee it avoid breaking things 10:00:56 I think out time is up 10:01:01 I think our time is up 10:01:03 strigazi: right. 10:01:13 it's a critical fix 10:01:22 let's get it done asap 10:01:26 #endmeeting