09:00:03 <flwang1> #startmeeting magnum
09:00:04 <openstack> Meeting started Wed May 27 09:00:03 2020 UTC and is due to finish in 60 minutes.  The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:07 <openstack> The meeting name has been set to 'magnum'
09:00:12 <flwang1> strigazi: hey, how are you
09:00:44 <strigazi> all good
09:01:07 <flwang1> brtknr: ping
09:01:14 <flwang1> strigazi: let's wait brtknr a while
09:01:28 <strigazi> sure thing
09:01:59 <flwang1> strigazi: i saw you proposed a topic about health status :)
09:03:18 <strigazi> yeap, now that we can control it more we can enhance?
09:03:53 <flwang1> i'm super happy to see you start to look into this
09:04:34 <flwang1> strigazi: sure
09:04:48 <flwang1> strigazi: should we start or wait a bit longer?
09:05:39 <strigazi> We can start from light things
09:05:49 <strigazi> eg the worklist bullet I added
09:06:16 <flwang1> strigazi: no problem
09:07:05 <flwang1> strigazi: which one you want to bring first?
09:07:35 <strigazi> #topic using storyboard
09:07:51 <strigazi> you need to do the topic thing
09:08:18 <strigazi> flwang1: ^^
09:08:26 <brtknr> hello sorry was dealing with an emergency
09:08:36 <strigazi> brtknr: all good now?
09:08:38 <flwang1> #topic using storboard
09:08:42 <brtknr> ye kind of
09:09:27 <flwang1> strigazi: i'm happy to use storyboard, but i'd like to understand if there is any rule/policy/process we need to follow
09:09:38 <strigazi> To be able to track what we do and what others ask us to do I started adding the magnum-victoria tag
09:10:01 <strigazi> The active stories at the moment are 71
09:10:20 <strigazi> we can easily review them now
09:10:28 <strigazi> We have two options IMO
09:10:46 <strigazi> either let users add the tag (we can't control this)
09:11:12 <strigazi> or we add the stories be hand in a worklist (there are ACLs for this, I think)
09:11:32 <strigazi> for example I created this worklist https://storyboard.openstack.org/#!/worklist/865
09:11:56 <strigazi> that included all stories that that have the magnum-victoria tag
09:12:07 <strigazi> and they are active
09:12:29 <brtknr> strigazi: flwang1 sorry guys i might have to drop out of the meeting today, i will catch up on the discussion later
09:12:43 <strigazi> brtknr sure
09:12:45 <flwang1> brtknr: no worries, take care
09:12:49 <brtknr> have a good day
09:12:59 <strigazi> brtknr: cheers
09:13:25 <flwang1> strigazi: can any user add story to that list?
09:13:37 <flwang1> or only people got the permission?
09:14:17 <strigazi> flwang1 if we use the tag, anyone
09:14:41 <strigazi> flwang1: if we add stories manually to the worklist (only people with permission)
09:14:43 <flwang1> ok, that's alright, i don't think much people will do that
09:15:15 <flwang1> strigazi: and I think we can remove story by removing the tag from stories?
09:15:28 <strigazi> flwang1: and here is a board with more lists https://storyboard.openstack.org/#!/board/212
09:15:28 <flwang1> which one you prefer?
09:15:37 <strigazi> flwang1 I think the tag makes sense at the moment
09:15:44 <flwang1> agree
09:15:55 <flwang1> it's easy to manage
09:16:23 <strigazi> we can review what we want to do and add the tag, then it will appear in the lists of the board
09:16:26 <flwang1> at this stage, are you suggesting we use the magnum-victoria to track all the work for V?
09:16:31 <strigazi> yes
09:16:35 <flwang1> sounds good
09:17:07 <strigazi> I have added you both in the board https://storyboard.openstack.org/#!/board/212
09:17:18 <flwang1> can we do that individually and then we can go through on virtual PTG?
09:18:04 <strigazi> we can do it individually i think
09:18:07 <strigazi> it is not much
09:18:11 <flwang1> cool
09:18:14 <strigazi> we can present it at the PTF
09:18:16 <strigazi> we can present it at the PTG
09:18:23 <flwang1> right, agree
09:18:35 <flwang1> thanks for working on this
09:19:26 <strigazi> ideally we need go through the open stories here: https://storyboard.openstack.org/#!/project/openstack/magnum and add the tag if we want
09:19:50 <flwang1> move on?
09:19:50 <flwang1> anything else?
09:20:12 <strigazi> let's move on
09:20:24 <flwang1> yep, given you have closed a lot, so i assume that's not much
09:20:49 <strigazi> yeah it is only recent things
09:20:58 <flwang1> strigazi: which one you want to discuss next?  node/API  version in health_status_reason?
09:21:21 <strigazi> let's do health first, it is trivial more or less
09:22:07 <flwang1> #topic node/API  version in health_status_reason
09:22:14 <flwang1> strigazi: tell me more
09:22:29 <flwang1> why do you want to add version? for NG
09:22:45 <strigazi> not for NG
09:23:08 <strigazi> We want to have a view of alive clusters and which version are they running
09:23:32 <flwang1> hmm... can't you get from the coe_version?
09:24:05 <strigazi> what is what magnum expects to have in the cluster
09:24:25 <strigazi> or what it tried to have
09:24:41 <flwang1> you mean coe_version or the health_status_reason?
09:24:52 <strigazi> coe_version is the desired
09:25:02 <strigazi> health_status_reason? will be the "current"
09:25:48 <flwang1> firstly, i think anything can help admin/user understand the health status can be put into the dict
09:25:58 <flwang1> so i'm totally ok with that
09:26:08 <flwang1> i'm just trying to understand the user case
09:26:40 <strigazi> the biggest use case is old clusters that we don't know what is going on
09:26:41 <flwang1> do you mean master and worker may run different version?
09:26:49 <flwang1> ah, i see :D
09:27:02 <strigazi> and clusters that the user sshed and did things
09:27:07 <flwang1> because it has been upgraded  and we lost the versions?
09:27:14 <strigazi> yeah, bith
09:27:17 <strigazi> yeah, both
09:27:22 <flwang1> right
09:27:25 <flwang1> i'm ok with that
09:27:44 <flwang1> throw a patch and i'm happy to review
09:28:01 <flwang1> move on?
09:28:05 <strigazi> wait
09:28:15 <strigazi> I have the content in the etherpad
09:28:20 <strigazi> and there is a follow up
09:28:54 <strigazi> the dict we have
09:28:58 <strigazi> it is not nested
09:28:59 <flwang1> i'm ok with that format
09:29:16 <flwang1> strigazi: we touched that topic before
09:29:21 <strigazi> yeap
09:29:23 <flwang1> we can do nested dict
09:29:33 <strigazi> but we can't really do it
09:29:39 <flwang1> why?
09:29:50 <strigazi> it will be string, no?
09:30:20 <flwang1> do you mean it will be saved in db as string?
09:30:35 <strigazi> in db it will be string anyway
09:30:44 <strigazi> I mean the type in he API
09:30:59 <flwang1> right
09:31:03 <flwang1> i see your point
09:31:17 <flwang1> i need to do some test to double confirm
09:31:31 <flwang1> but we probably can't do nested IIRC
09:31:47 <flwang1> we can live with flat dict until we figure out a better way
09:31:55 <strigazi> yeap, we can't do arbitrary depth or mix list and dict
09:32:54 <strigazi> so what we do with the depth?
09:33:10 <strigazi> strings that are escaped json?
09:33:37 <flwang1> strigazi: hmm... that's ugly :(
09:34:20 <flwang1> can we just use flat dict for now? i don't have a good answer tbh
09:34:28 <strigazi> flwang1: and we have the issue for helm-config (there base64 makes some sense)
09:34:46 <strigazi> flwang1 for health_status we can do flat
09:34:50 <flwang1> ahhhhh
09:34:58 <strigazi> for helm, we can not
09:35:15 <strigazi> so to wrap the health_status subject
09:35:25 <strigazi> you are OK with having the version
09:35:34 <strigazi> and have a flat dict for now, correct?
09:35:38 <flwang1> yes
09:35:48 <strigazi> let's switch to helm-config?
09:36:31 <flwang1> for helm config, can we just read it as a escapsed string? and do the magic on server side?
09:36:39 <flwang1> #topic helm-config
09:37:05 <strigazi> #action striazi to propose a patch for adding node/API version in health_status_reason with a flat dict
09:37:12 <strigazi> #undo
09:37:20 <strigazi> #action strigazi to propose a patch for adding node/API version in health_status_reason with a flat dict
09:37:39 <flwang1> #action strigazi to propose a patch for adding node/API version in health_status_reason with a flat dict
09:37:47 <strigazi> flwang1: for helm-config, I don't know, some parsing will happen
09:38:03 <strigazi> either to escape the JSON
09:38:18 <flwang1> strigazi: can we use json.dumps and json.loads
09:38:23 <strigazi> or encode it ot base64 and pass it as is
09:38:34 <flwang1> on the two sides, to make sure they're compatible
09:38:37 <strigazi> flwang1: can we?
09:38:53 <flwang1> don't know, just thinking aloud
09:39:19 <strigazi> I don't know either
09:39:40 <flwang1> for this piece, you guys probably need to a small PoC
09:39:58 <strigazi> the relevant part is here: https://review.opendev.org/#/c/727756/4/specs/victoria/helm-config.rst@87
09:40:05 <flwang1> i don't like the idea of base64, TBH
09:41:08 <strigazi> can you leave a comment with preference? I guess you propose to try escaped json
09:42:22 <flwang1> sure, will do
09:43:19 <strigazi> move on?
09:43:41 <flwang1> comments added
09:43:46 <flwang1> which next?
09:44:12 <strigazi> #topic resize: Send only nodes_to_remove and node_count
09:44:22 <flwang1> #topic resize: Send only nodes_to_remove and node_count
09:44:30 <strigazi> https://review.opendev.org/#/c/730868/
09:44:30 <flwang1> i didn't get the issue
09:45:05 <strigazi> resize need to do only resize
09:45:19 <strigazi> so do only resize we need send only the node_count
09:45:26 <strigazi> and which nodes to drop
09:45:42 <strigazi> after train, stein cluster were breaking
09:46:05 <strigazi> something similar can happen again (doesn't happen now)
09:46:59 <flwang1> hmmm... that's the kind of patch i don't want to review :D
09:47:15 <flwang1> it's too dangerous
09:47:28 <strigazi> what is dangerous is what we have now
09:47:40 <strigazi> see the commit message
09:48:08 <flwang1> i understand.
09:48:24 <flwang1> please give me some time to review it
09:48:25 <strigazi> now we have this: https://review.opendev.org/#/c/642009/7/magnum/drivers/heat/driver.py@176
09:48:55 <flwang1> strigazi: can you please add a steps how to reproduce the issue?
09:49:28 <strigazi> flwang1: it now reproducible now (at least the same thing that I had issues with)
09:50:01 <strigazi> flwang1: it not reproducible now (at least the same thing that I had issues with)
09:50:18 <strigazi> but it was catastrophic i.e. it was replacing all nodes
09:50:34 <flwang1> do you have mean i have to have a very old cluster to reproduce this?
09:50:47 <strigazi> stein
09:51:02 <strigazi> not very old
09:51:12 <strigazi> you upgraded recently, no?
09:51:16 <flwang1> yes
09:51:21 <strigazi> so not that old
09:51:56 <flwang1> let me ask more questions
09:52:04 <strigazi> check this part of the code you had:
09:52:31 <flwang1> even if user only have the default-master and default-worker NGs, when doing resize,   all nodes will be replaced?
09:53:09 <strigazi> https://github.com/openstack/magnum/blob/stable/stein/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L705
09:53:24 <strigazi> it really depends
09:53:39 <openstackgerrit> Merged openstack/magnum stable/ussuri: atomic: Do not install control-plane on minions  https://review.opendev.org/731092
09:54:03 <strigazi> it depends on what is here: https://github.com/openstack/magnum/blob/stable/stein/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml#L510
09:54:23 <strigazi> I don't know about your cherry-picks
09:55:16 <strigazi> flwang1: take your time and review this method: https://review.opendev.org/#/c/730868/1/magnum/drivers/heat/driver.py@264
09:55:27 <strigazi> it is very small and very very clear
09:56:05 <strigazi> instead of sending many things in this dict:
09:56:05 <flwang1> i see the issue now
09:56:11 <strigazi> heat_params.update(scale_params)
09:56:16 <strigazi> 'parameters': heat_params,
09:56:42 <flwang1> you mean it will rebuild all the nodes of that NG?
09:56:50 <strigazi> guarantee to end only two parameters
09:57:09 <strigazi> depenfing on the the parameter both default NGS
09:57:29 <strigazi> maybe it won't rebuild anything
09:57:38 <strigazi> but maybe is not enough
09:58:54 <flwang1> ok, i see. thanks for the heads up
09:59:02 <flwang1> i will review it tomorrow
09:59:55 <strigazi> it is not urgnet for master branch but we nee it avoid breaking things
10:00:56 <strigazi> I think out time is up
10:01:01 <strigazi> I think our time is up
10:01:03 <flwang1> strigazi: right.
10:01:13 <flwang1> it's a critical fix
10:01:22 <flwang1> let's get it done asap
10:01:26 <flwang1> #endmeeting