09:01:39 <flwang1> #startmeeting magnum
09:01:40 <openstack> Meeting started Wed Feb  5 09:01:39 2020 UTC and is due to finish in 60 minutes.  The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:01:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:01:43 <openstack> The meeting name has been set to 'magnum'
09:01:51 <flwang1> #topic roll call
09:01:57 <strigazi> o/
09:01:58 <flwang1> o/
09:02:01 <flwang1> brtknr: ?
09:02:32 <flwang1> strigazi: so far i think just you and me
09:03:38 <flwang1> strigazi: is there anything you want to discuss?
09:03:42 <brtknr> flwang1: hi
09:03:44 <brtknr> o/
09:04:07 <brtknr> was scoffing down breakfast, apologies
09:04:16 <flwang1> brtknr: all good
09:04:35 <flwang1> brtknr: strigazi: do you guys have special topic to discuss?
09:04:46 <flwang1> otherwise we go through the agenda?
09:04:55 <strigazi> flwang1: brtknr: the new heat agent logs are worse
09:05:15 <brtknr> elaborate
09:05:42 <strigazi> flwang1: brtknr: the output of a command must be printed when it is executed
09:05:49 <brtknr> +1
09:06:06 <strigazi> flwang1: brtknr: atm everything is printed at the end
09:06:10 <brtknr> there is some time where nothing gets printed
09:06:19 <brtknr> would be nice to fix that
09:06:22 <strigazi> flwang1: brtknr: can't catch trhings that are stucj
09:06:26 <flwang1> strigazi: yep, i notice that as well
09:06:27 <strigazi> flwang1: brtknr: can't catch things that are stuck
09:06:42 <strigazi> flwang1: brtknr: when did this happen?
09:06:49 <flwang1> i'd like to see the logs being printed for each execution
09:07:08 <flwang1> i think since last time brtknr fixed it?
09:07:10 <brtknr> its been like this since we moved everything from cloud-init-output to journalctl
09:07:25 <brtknr> when i fixed it, nothing was getting printed at all
09:07:53 <strigazi> "nothing was getting printed at all" when was this true?
09:07:58 <brtknr> when i fixed it, everything was getting printed on a single line
09:08:08 <strigazi> ok
09:08:34 <brtknr> i simply changed it so that newlines were parsed correctly
09:08:38 <strigazi> I don't remember that. Anything we can do?
09:09:11 <flwang1> nothing special, just review the code and see how can we fix it
09:09:36 <strigazi> is it our code or os-collect-config?
09:10:15 <flwang1> probably os-collect-config, i need to check the code, can't remember the details
09:13:46 <brtknr> ok who is going to look into this?
09:14:38 <flwang1> i just created a story to track this https://storyboard.openstack.org/#!/story/2007256
09:14:49 <flwang1> strigazi: can you work on this?
09:15:18 <strigazi> flwang1: I can have a look
09:16:18 <brtknr> next topic?
09:17:05 <strigazi> where is the etherpad?
09:17:19 <flwang1> https://etherpad.openstack.org/p/magnum-weekly-meeting
09:17:24 <strigazi> it is not in he channel anymore
09:17:49 <strigazi> thx
09:20:19 <flwang1> shall we go through the agenda?
09:20:28 <brtknr> yes
09:20:49 <brtknr> are you guys happy to provide feedback for cinder_csi_enabled
09:21:03 <strigazi> is it ready?
09:21:07 <brtknr> to support out of tree cinder?
09:21:25 <brtknr> yep but i dont understand waht the pep8 failure is all about
09:21:51 <flwang1> brtknr: so you have already tested and it works?
09:21:55 <flwang1> i'm happy to test it
09:22:07 <brtknr> yep ive tested and it works
09:22:17 <flwang1> great
09:22:23 <brtknr> works for xinliang too
09:23:09 <flwang1> i will help take a look the pep8 and docs job failure
09:23:26 <strigazi> brtknr: I will have a look, at CERN we have two more CSI drivers. Need to check that this is compatible.
09:25:23 <brtknr> there are some generic csi components, only one of them is cinder_csi replated
09:25:26 <brtknr> related
09:25:27 <flwang1> strigazi: if cern can give some comments on this csi patch, it would be great
09:25:30 <strigazi> brtknr: don't you need any of: --feature-gates=CSINodeInfo=true,CSIDriverRegistry=true ?
09:25:45 <strigazi> brtknr: or --runtime-config=storage.k8s.io/v1alpha1=true" ?
09:26:03 <brtknr> didnt need those
09:26:27 <brtknr> seemed to work with v1.16.x and 1.17.x
09:26:46 <strigazi> interesting
09:27:19 <strigazi> ok, I will test
09:27:21 <brtknr> CSIDriverRegistry went into beta from 1.14
09:27:35 <brtknr> same for CSINodeInfo
09:27:43 <brtknr> so true by default
09:27:48 <strigazi> brtknr: means on by default?
09:27:49 <strigazi> ok
09:28:09 <strigazi> The world will end based on a default
09:28:21 <brtknr> XD
09:28:37 <brtknr> I'll quote you on that when the world ends
09:29:07 <strigazi> 100% sure
09:29:33 <strigazi> some will change the nuke_everything default to true in qa, and this it
09:30:11 <strigazi> anyway, let's stay on track, I will test it
09:30:36 <strigazi> brtknr: would you like to do manila provisioner too?
09:31:16 <strigazi> I'm good, let's move on?
09:32:38 <flwang1> ok
09:32:56 <brtknr> strigazi: i tried the manila provisioner but lost my patience with it
09:32:57 <flwang1> brtknr: what's the nginx issue?
09:35:04 <brtknr> with regards to nginx and traefik, at the moment, we need to label an ingress node. we wanted to explore adding another option which runs a deployment with n replicas behind a load balancer
09:36:25 <strigazi> brtknr: what do you propose?
09:37:10 <brtknr> effectively a label which changes nginx and traefik services from ClusterIP to LoadBalancer type
09:37:17 <brtknr> fairly small change
09:37:32 <strigazi> and DS to deploy and hostNetwork: false
09:37:35 <brtknr> it can be ClusterIP by default
09:37:47 <strigazi> and remove node-selector
09:39:02 <brtknr> possibly those too :S needs to look into it more closely
09:40:03 <strigazi> as mentioned before, just don't break the default :)
09:41:06 <brtknr> ok im glad you're okay to support the alternative, ill refine what it would involve a bit more in a PS
09:41:43 <strigazi> for prometheus, ping dioguerra , I think it works but double check with him
09:42:08 <flwang1> re prometheus, what's the status of removing heapster and adding metrics-server
09:42:13 <brtknr> i take back the promethus issue
09:42:16 <strigazi> it is done
09:42:25 <flwang1> i can see the metrics-server pods is broken on my local
09:42:44 <brtknr> metrics server doesnt work for me  either at 9.2.0
09:43:00 <flwang1> strigazi: you mean the heapster->metrics-server work is done, right
09:43:02 <flwang1>09:43:04 <brtknr> lemme grab the log
09:43:05 <strigazi> yes
09:43:41 <strigazi> metrics server works for me. Only the logs of the master node are not collected due to the cert
09:44:09 <flwang1> strigazi: ok, i will test again
09:44:15 <flwang1> what's the k8s version you're using?
09:44:48 <strigazi> 17.2
09:45:26 <flwang1> fairly new :)
09:45:32 <flwang1> ok, i will give it a try
09:45:36 <flwang1> let's move on?
09:46:17 <strigazi> yes
09:46:22 <flwang1> https://review.opendev.org/#/c/700565  work for ARM
09:46:35 <flwang1> personally i'm ok with that
09:46:45 <brtknr> i created a story for the metrics-server issue: https://storyboard.openstack.org/#!/story/2007264
09:47:20 <brtknr> do you need special label for metrics server?
09:47:29 <brtknr> flwang1: have you tested  https://storyboard.openstack.org/#!/story/2007264
09:47:38 <brtknr> flwang1: have you tested https://review.opendev.org/#/c/700565
09:47:56 <strigazi> https://storyboard.openstack.org/#!/story/2007264 this is the heat-agent
09:48:36 <brtknr> i created a story for the metrics-server issue: https://storyboard.openstack.org/#!/story/2007265
09:48:39 <brtknr> sorry
09:48:40 <flwang1> no i haven't, i just had a quick look about the code
09:49:20 <brtknr> the ARCH change looks trivial, i wanted to get your thoughts on the general approach
09:49:24 <strigazi> brtknr: DNS k8s-podman-calico-k7tz6itxx7k6-node-0 on 10.254.0.10:53: no such host
09:49:34 <flwang1> brtknr: will do
09:49:58 <strigazi> for ARCH I will that the defailt x86_64 works
09:50:05 <brtknr> i.e. determining arch inside /write-heat-params.sh script
09:50:10 <strigazi> I can't test arm
09:51:07 <brtknr> strigazi: what is the fix for DNS k8s-podman-calico-k7tz6itxx7k6-node-0 on 10.254.0.10:53: no such host
09:51:31 <strigazi> left a comment
09:53:11 <brtknr> strigazi: ok thanks, that was not obvious to me :)
09:53:52 <strigazi> https://pbs.twimg.com/media/C9o9ZO0UwAAi3ib.jpg
09:54:09 <brtknr> lol
09:54:54 <strigazi> defaults usually assume DNS. I have more scenarios for the end of the world, all of them include DNS as well
09:55:27 <flwang1> move on?
09:55:32 <flwang1> we have only 5 mins
09:56:07 <jakeyip> hi all
09:56:38 <flwang1> brtknr: what's this 'Modify default-worker flavor after cluster creation'?
09:56:41 <flwang1> jakeyip: hi
09:58:09 <brtknr> flwang1: so we have a use case where we want to be able to change default worker flavor after the cluster is already creted
09:58:46 <brtknr> at the moment, we can delete nodegroup and create a new nodegroup with different flavor
09:58:53 <brtknr> but we cant do this for default worker
09:59:24 <brtknr> i was wondering how involved a change it would be to allow update of default-worker flavor
09:59:55 <strigazi> I think the best approach is to make the default worker deletable
10:00:05 <flwang1> brtknr: maybe we should allow create an empty cluster
10:00:07 <flwang1> like EKS
10:00:22 <strigazi> changing the flavor sounds orthogonal to nova to me
10:00:45 <flwang1> i don't think resizing the instance is a good idea
10:00:51 <strigazi> +1
10:01:09 <brtknr> yes i'd prefer to make the default worker deletable
10:01:13 <strigazi> EKS logic gives little value
10:01:22 <brtknr> and create cluster with 0 workers
10:02:10 <flwang1> btw, i'd like to propose a patch to support distribute master nodes to different AZs, strigazi, do you like the idea?
10:02:22 <brtknr> flwang1: is that what you mean by empty cluster ?
10:02:33 <flwang1> brtknr: no, different idea
10:02:35 <brtknr> or does this also include 0 masters?
10:02:41 <jakeyip> flwang1: +1 we will like that for workers
10:02:48 <flwang1> it's like the Regional cluster in GKE
10:03:33 <flwang1> jakeyip: for workers, we already can support it with node groups, but i know where you come from
10:04:15 <strigazi> master across AZs could work, but it feels hacky
10:04:41 <strigazi> I mean you can deploy N NGs for masters by default
10:04:47 <strigazi> this sounds better
10:05:07 <flwang1> strigazi: what do you mean deploy N NGs for masters?
10:05:48 <strigazi> if you have 3 AZs in your cloud, deploy three master NGs
10:05:54 <strigazi> on creation
10:06:03 <strigazi> N=3 in this case
10:06:22 <brtknr> that sounds like a better approach
10:06:29 <flwang1> you mean create 3NGs for master nodes?
10:06:34 <flwang1> how can i do that?
10:06:58 <strigazi> yes
10:07:04 <strigazi> one sec
10:10:08 <strigazi> still looking
10:10:26 <strigazi> I don't remember were it is
10:11:33 <brtknr> i didnt know you could create master N G
10:11:35 <brtknr> i didnt know you could create master NG
10:11:57 <flwang1> yep, i thought master can only be in default-master NG
10:11:59 <strigazi> you can't atm, we have similar code though
10:12:00 <flwang1> that's why i asked
10:12:29 <flwang1> strigazi: can you please share the code with me when you found it?
10:12:35 <flwang1> i'm keen to learn that
10:13:35 <strigazi> found ir
10:13:36 <strigazi> found it
10:13:37 <strigazi> wait
10:14:19 <strigazi> https://github.com/openstack/magnum/blob/e52f77b299c50f004ee5a875c68d7f129b88a2af/magnum/conductor/handlers/cluster_conductor.py#L56
10:14:57 <strigazi> you can add the additional NGs there, needs some extra work for the LB to be shared across the stacks
10:15:26 <flwang1> hmm... what's the benefit to put masters into different NG?
10:15:46 <flwang1> anyway, thanks, i will think about this
10:15:47 <strigazi> to spawn across AZs?
10:15:57 <strigazi> that is the goal no?
10:16:26 <strigazi> Then you will have a NG with nodes in many AZs inside?
10:16:26 <flwang1> if there is an AZ list, we should be able to spawn masters into different AZ
10:16:50 <strigazi> So then all the nodegroup work we did is kind of worthless
10:17:12 <strigazi> as you want
10:17:24 <flwang1> strigazi: please don't say words like this
10:17:41 <flwang1> i'm not trying to make anyone's work worthless
10:18:02 <strigazi> The biggest use case of NGs was AZs
10:19:42 <flwang1> ok, then if you guys do have a good solution for this, please share, otherwise, what i am doing is trying to figure out a way to achieve that
10:20:15 <strigazi> well, I might be wrong. IN GKE they spawn AZs across zones, if I read correctly
10:20:18 <strigazi> https://cloud.google.com/kubernetes-engine/docs/concepts/node-pools
10:20:24 <strigazi> https://cloud.google.com/kubernetes-engine/docs/concepts/node-pools#nodes_in_multi-zonal_clusters
10:22:09 <strigazi> Maybe the thing with the list works
10:22:29 <flwang1> https://cloud.google.com/kubernetes-engine/docs/concepts/types-of-clusters
10:22:38 <strigazi> It will require some serious hacking in the heat templates, maybe
10:22:39 <flwang1> let's use a spec to track this work
10:22:53 <flwang1> i will start to draft that
10:23:24 <flwang1> brtknr: strigazi: could you please help review this one   Volume AZ regression fix - https://review.opendev.org/705592 ?
10:23:47 <flwang1> it's kind of a regression issue, i didn't fix the volume AZ issue at the first shot
10:23:56 <strigazi> let's wrap the meeting? with train many bugs arrived
10:24:10 <flwang1> cinder doesn't respect "" as AZ
10:24:15 <flwang1> strigazi: sure
10:24:17 <strigazi> flwang1: I will have a look, I might have time only on monday though
10:24:23 <flwang1> #endmeeting