09:01:39 <flwang1> #startmeeting magnum 09:01:40 <openstack> Meeting started Wed Feb 5 09:01:39 2020 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:43 <openstack> The meeting name has been set to 'magnum' 09:01:51 <flwang1> #topic roll call 09:01:57 <strigazi> o/ 09:01:58 <flwang1> o/ 09:02:01 <flwang1> brtknr: ? 09:02:32 <flwang1> strigazi: so far i think just you and me 09:03:38 <flwang1> strigazi: is there anything you want to discuss? 09:03:42 <brtknr> flwang1: hi 09:03:44 <brtknr> o/ 09:04:07 <brtknr> was scoffing down breakfast, apologies 09:04:16 <flwang1> brtknr: all good 09:04:35 <flwang1> brtknr: strigazi: do you guys have special topic to discuss? 09:04:46 <flwang1> otherwise we go through the agenda? 09:04:55 <strigazi> flwang1: brtknr: the new heat agent logs are worse 09:05:15 <brtknr> elaborate 09:05:42 <strigazi> flwang1: brtknr: the output of a command must be printed when it is executed 09:05:49 <brtknr> +1 09:06:06 <strigazi> flwang1: brtknr: atm everything is printed at the end 09:06:10 <brtknr> there is some time where nothing gets printed 09:06:19 <brtknr> would be nice to fix that 09:06:22 <strigazi> flwang1: brtknr: can't catch trhings that are stucj 09:06:26 <flwang1> strigazi: yep, i notice that as well 09:06:27 <strigazi> flwang1: brtknr: can't catch things that are stuck 09:06:42 <strigazi> flwang1: brtknr: when did this happen? 09:06:49 <flwang1> i'd like to see the logs being printed for each execution 09:07:08 <flwang1> i think since last time brtknr fixed it? 09:07:10 <brtknr> its been like this since we moved everything from cloud-init-output to journalctl 09:07:25 <brtknr> when i fixed it, nothing was getting printed at all 09:07:53 <strigazi> "nothing was getting printed at all" when was this true? 09:07:58 <brtknr> when i fixed it, everything was getting printed on a single line 09:08:08 <strigazi> ok 09:08:34 <brtknr> i simply changed it so that newlines were parsed correctly 09:08:38 <strigazi> I don't remember that. Anything we can do? 09:09:11 <flwang1> nothing special, just review the code and see how can we fix it 09:09:36 <strigazi> is it our code or os-collect-config? 09:10:15 <flwang1> probably os-collect-config, i need to check the code, can't remember the details 09:13:46 <brtknr> ok who is going to look into this? 09:14:38 <flwang1> i just created a story to track this https://storyboard.openstack.org/#!/story/2007256 09:14:49 <flwang1> strigazi: can you work on this? 09:15:18 <strigazi> flwang1: I can have a look 09:16:18 <brtknr> next topic? 09:17:05 <strigazi> where is the etherpad? 09:17:19 <flwang1> https://etherpad.openstack.org/p/magnum-weekly-meeting 09:17:24 <strigazi> it is not in he channel anymore 09:17:49 <strigazi> thx 09:20:19 <flwang1> shall we go through the agenda? 09:20:28 <brtknr> yes 09:20:49 <brtknr> are you guys happy to provide feedback for cinder_csi_enabled 09:21:03 <strigazi> is it ready? 09:21:07 <brtknr> to support out of tree cinder? 09:21:25 <brtknr> yep but i dont understand waht the pep8 failure is all about 09:21:51 <flwang1> brtknr: so you have already tested and it works? 09:21:55 <flwang1> i'm happy to test it 09:22:07 <brtknr> yep ive tested and it works 09:22:17 <flwang1> great 09:22:23 <brtknr> works for xinliang too 09:23:09 <flwang1> i will help take a look the pep8 and docs job failure 09:23:26 <strigazi> brtknr: I will have a look, at CERN we have two more CSI drivers. Need to check that this is compatible. 09:25:23 <brtknr> there are some generic csi components, only one of them is cinder_csi replated 09:25:26 <brtknr> related 09:25:27 <flwang1> strigazi: if cern can give some comments on this csi patch, it would be great 09:25:30 <strigazi> brtknr: don't you need any of: --feature-gates=CSINodeInfo=true,CSIDriverRegistry=true ? 09:25:45 <strigazi> brtknr: or --runtime-config=storage.k8s.io/v1alpha1=true" ? 09:26:03 <brtknr> didnt need those 09:26:27 <brtknr> seemed to work with v1.16.x and 1.17.x 09:26:46 <strigazi> interesting 09:27:19 <strigazi> ok, I will test 09:27:21 <brtknr> CSIDriverRegistry went into beta from 1.14 09:27:35 <brtknr> same for CSINodeInfo 09:27:43 <brtknr> so true by default 09:27:48 <strigazi> brtknr: means on by default? 09:27:49 <strigazi> ok 09:28:09 <strigazi> The world will end based on a default 09:28:21 <brtknr> XD 09:28:37 <brtknr> I'll quote you on that when the world ends 09:29:07 <strigazi> 100% sure 09:29:33 <strigazi> some will change the nuke_everything default to true in qa, and this it 09:30:11 <strigazi> anyway, let's stay on track, I will test it 09:30:36 <strigazi> brtknr: would you like to do manila provisioner too? 09:31:16 <strigazi> I'm good, let's move on? 09:32:38 <flwang1> ok 09:32:56 <brtknr> strigazi: i tried the manila provisioner but lost my patience with it 09:32:57 <flwang1> brtknr: what's the nginx issue? 09:35:04 <brtknr> with regards to nginx and traefik, at the moment, we need to label an ingress node. we wanted to explore adding another option which runs a deployment with n replicas behind a load balancer 09:36:25 <strigazi> brtknr: what do you propose? 09:37:10 <brtknr> effectively a label which changes nginx and traefik services from ClusterIP to LoadBalancer type 09:37:17 <brtknr> fairly small change 09:37:32 <strigazi> and DS to deploy and hostNetwork: false 09:37:35 <brtknr> it can be ClusterIP by default 09:37:47 <strigazi> and remove node-selector 09:39:02 <brtknr> possibly those too :S needs to look into it more closely 09:40:03 <strigazi> as mentioned before, just don't break the default :) 09:41:06 <brtknr> ok im glad you're okay to support the alternative, ill refine what it would involve a bit more in a PS 09:41:43 <strigazi> for prometheus, ping dioguerra , I think it works but double check with him 09:42:08 <flwang1> re prometheus, what's the status of removing heapster and adding metrics-server 09:42:13 <brtknr> i take back the promethus issue 09:42:16 <strigazi> it is done 09:42:25 <flwang1> i can see the metrics-server pods is broken on my local 09:42:44 <brtknr> metrics server doesnt work for me either at 9.2.0 09:43:00 <flwang1> strigazi: you mean the heapster->metrics-server work is done, right 09:43:02 <flwang1> ? 09:43:04 <brtknr> lemme grab the log 09:43:05 <strigazi> yes 09:43:41 <strigazi> metrics server works for me. Only the logs of the master node are not collected due to the cert 09:44:09 <flwang1> strigazi: ok, i will test again 09:44:15 <flwang1> what's the k8s version you're using? 09:44:48 <strigazi> 17.2 09:45:26 <flwang1> fairly new :) 09:45:32 <flwang1> ok, i will give it a try 09:45:36 <flwang1> let's move on? 09:46:17 <strigazi> yes 09:46:22 <flwang1> https://review.opendev.org/#/c/700565 work for ARM 09:46:35 <flwang1> personally i'm ok with that 09:46:45 <brtknr> i created a story for the metrics-server issue: https://storyboard.openstack.org/#!/story/2007264 09:47:20 <brtknr> do you need special label for metrics server? 09:47:29 <brtknr> flwang1: have you tested https://storyboard.openstack.org/#!/story/2007264 09:47:38 <brtknr> flwang1: have you tested https://review.opendev.org/#/c/700565 09:47:56 <strigazi> https://storyboard.openstack.org/#!/story/2007264 this is the heat-agent 09:48:36 <brtknr> i created a story for the metrics-server issue: https://storyboard.openstack.org/#!/story/2007265 09:48:39 <brtknr> sorry 09:48:40 <flwang1> no i haven't, i just had a quick look about the code 09:49:20 <brtknr> the ARCH change looks trivial, i wanted to get your thoughts on the general approach 09:49:24 <strigazi> brtknr: DNS k8s-podman-calico-k7tz6itxx7k6-node-0 on 10.254.0.10:53: no such host 09:49:34 <flwang1> brtknr: will do 09:49:58 <strigazi> for ARCH I will that the defailt x86_64 works 09:50:05 <brtknr> i.e. determining arch inside /write-heat-params.sh script 09:50:10 <strigazi> I can't test arm 09:51:07 <brtknr> strigazi: what is the fix for DNS k8s-podman-calico-k7tz6itxx7k6-node-0 on 10.254.0.10:53: no such host 09:51:31 <strigazi> left a comment 09:53:11 <brtknr> strigazi: ok thanks, that was not obvious to me :) 09:53:52 <strigazi> https://pbs.twimg.com/media/C9o9ZO0UwAAi3ib.jpg 09:54:09 <brtknr> lol 09:54:54 <strigazi> defaults usually assume DNS. I have more scenarios for the end of the world, all of them include DNS as well 09:55:27 <flwang1> move on? 09:55:32 <flwang1> we have only 5 mins 09:56:07 <jakeyip> hi all 09:56:38 <flwang1> brtknr: what's this 'Modify default-worker flavor after cluster creation'? 09:56:41 <flwang1> jakeyip: hi 09:58:09 <brtknr> flwang1: so we have a use case where we want to be able to change default worker flavor after the cluster is already creted 09:58:46 <brtknr> at the moment, we can delete nodegroup and create a new nodegroup with different flavor 09:58:53 <brtknr> but we cant do this for default worker 09:59:24 <brtknr> i was wondering how involved a change it would be to allow update of default-worker flavor 09:59:55 <strigazi> I think the best approach is to make the default worker deletable 10:00:05 <flwang1> brtknr: maybe we should allow create an empty cluster 10:00:07 <flwang1> like EKS 10:00:22 <strigazi> changing the flavor sounds orthogonal to nova to me 10:00:45 <flwang1> i don't think resizing the instance is a good idea 10:00:51 <strigazi> +1 10:01:09 <brtknr> yes i'd prefer to make the default worker deletable 10:01:13 <strigazi> EKS logic gives little value 10:01:22 <brtknr> and create cluster with 0 workers 10:02:10 <flwang1> btw, i'd like to propose a patch to support distribute master nodes to different AZs, strigazi, do you like the idea? 10:02:22 <brtknr> flwang1: is that what you mean by empty cluster ? 10:02:33 <flwang1> brtknr: no, different idea 10:02:35 <brtknr> or does this also include 0 masters? 10:02:41 <jakeyip> flwang1: +1 we will like that for workers 10:02:48 <flwang1> it's like the Regional cluster in GKE 10:03:33 <flwang1> jakeyip: for workers, we already can support it with node groups, but i know where you come from 10:04:15 <strigazi> master across AZs could work, but it feels hacky 10:04:41 <strigazi> I mean you can deploy N NGs for masters by default 10:04:47 <strigazi> this sounds better 10:05:07 <flwang1> strigazi: what do you mean deploy N NGs for masters? 10:05:48 <strigazi> if you have 3 AZs in your cloud, deploy three master NGs 10:05:54 <strigazi> on creation 10:06:03 <strigazi> N=3 in this case 10:06:22 <brtknr> that sounds like a better approach 10:06:29 <flwang1> you mean create 3NGs for master nodes? 10:06:34 <flwang1> how can i do that? 10:06:58 <strigazi> yes 10:07:04 <strigazi> one sec 10:10:08 <strigazi> still looking 10:10:26 <strigazi> I don't remember were it is 10:11:33 <brtknr> i didnt know you could create master N G 10:11:35 <brtknr> i didnt know you could create master NG 10:11:57 <flwang1> yep, i thought master can only be in default-master NG 10:11:59 <strigazi> you can't atm, we have similar code though 10:12:00 <flwang1> that's why i asked 10:12:29 <flwang1> strigazi: can you please share the code with me when you found it? 10:12:35 <flwang1> i'm keen to learn that 10:13:35 <strigazi> found ir 10:13:36 <strigazi> found it 10:13:37 <strigazi> wait 10:14:19 <strigazi> https://github.com/openstack/magnum/blob/e52f77b299c50f004ee5a875c68d7f129b88a2af/magnum/conductor/handlers/cluster_conductor.py#L56 10:14:57 <strigazi> you can add the additional NGs there, needs some extra work for the LB to be shared across the stacks 10:15:26 <flwang1> hmm... what's the benefit to put masters into different NG? 10:15:46 <flwang1> anyway, thanks, i will think about this 10:15:47 <strigazi> to spawn across AZs? 10:15:57 <strigazi> that is the goal no? 10:16:26 <strigazi> Then you will have a NG with nodes in many AZs inside? 10:16:26 <flwang1> if there is an AZ list, we should be able to spawn masters into different AZ 10:16:50 <strigazi> So then all the nodegroup work we did is kind of worthless 10:17:12 <strigazi> as you want 10:17:24 <flwang1> strigazi: please don't say words like this 10:17:41 <flwang1> i'm not trying to make anyone's work worthless 10:18:02 <strigazi> The biggest use case of NGs was AZs 10:19:42 <flwang1> ok, then if you guys do have a good solution for this, please share, otherwise, what i am doing is trying to figure out a way to achieve that 10:20:15 <strigazi> well, I might be wrong. IN GKE they spawn AZs across zones, if I read correctly 10:20:18 <strigazi> https://cloud.google.com/kubernetes-engine/docs/concepts/node-pools 10:20:24 <strigazi> https://cloud.google.com/kubernetes-engine/docs/concepts/node-pools#nodes_in_multi-zonal_clusters 10:22:09 <strigazi> Maybe the thing with the list works 10:22:29 <flwang1> https://cloud.google.com/kubernetes-engine/docs/concepts/types-of-clusters 10:22:38 <strigazi> It will require some serious hacking in the heat templates, maybe 10:22:39 <flwang1> let's use a spec to track this work 10:22:53 <flwang1> i will start to draft that 10:23:24 <flwang1> brtknr: strigazi: could you please help review this one Volume AZ regression fix - https://review.opendev.org/705592 ? 10:23:47 <flwang1> it's kind of a regression issue, i didn't fix the volume AZ issue at the first shot 10:23:56 <strigazi> let's wrap the meeting? with train many bugs arrived 10:24:10 <flwang1> cinder doesn't respect "" as AZ 10:24:15 <flwang1> strigazi: sure 10:24:17 <strigazi> flwang1: I will have a look, I might have time only on monday though 10:24:23 <flwang1> #endmeeting