09:01:39 #startmeeting magnum 09:01:40 Meeting started Wed Feb 5 09:01:39 2020 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:43 The meeting name has been set to 'magnum' 09:01:51 #topic roll call 09:01:57 o/ 09:01:58 o/ 09:02:01 brtknr: ? 09:02:32 strigazi: so far i think just you and me 09:03:38 strigazi: is there anything you want to discuss? 09:03:42 flwang1: hi 09:03:44 o/ 09:04:07 was scoffing down breakfast, apologies 09:04:16 brtknr: all good 09:04:35 brtknr: strigazi: do you guys have special topic to discuss? 09:04:46 otherwise we go through the agenda? 09:04:55 flwang1: brtknr: the new heat agent logs are worse 09:05:15 elaborate 09:05:42 flwang1: brtknr: the output of a command must be printed when it is executed 09:05:49 +1 09:06:06 flwang1: brtknr: atm everything is printed at the end 09:06:10 there is some time where nothing gets printed 09:06:19 would be nice to fix that 09:06:22 flwang1: brtknr: can't catch trhings that are stucj 09:06:26 strigazi: yep, i notice that as well 09:06:27 flwang1: brtknr: can't catch things that are stuck 09:06:42 flwang1: brtknr: when did this happen? 09:06:49 i'd like to see the logs being printed for each execution 09:07:08 i think since last time brtknr fixed it? 09:07:10 its been like this since we moved everything from cloud-init-output to journalctl 09:07:25 when i fixed it, nothing was getting printed at all 09:07:53 "nothing was getting printed at all" when was this true? 09:07:58 when i fixed it, everything was getting printed on a single line 09:08:08 ok 09:08:34 i simply changed it so that newlines were parsed correctly 09:08:38 I don't remember that. Anything we can do? 09:09:11 nothing special, just review the code and see how can we fix it 09:09:36 is it our code or os-collect-config? 09:10:15 probably os-collect-config, i need to check the code, can't remember the details 09:13:46 ok who is going to look into this? 09:14:38 i just created a story to track this https://storyboard.openstack.org/#!/story/2007256 09:14:49 strigazi: can you work on this? 09:15:18 flwang1: I can have a look 09:16:18 next topic? 09:17:05 where is the etherpad? 09:17:19 https://etherpad.openstack.org/p/magnum-weekly-meeting 09:17:24 it is not in he channel anymore 09:17:49 thx 09:20:19 shall we go through the agenda? 09:20:28 yes 09:20:49 are you guys happy to provide feedback for cinder_csi_enabled 09:21:03 is it ready? 09:21:07 to support out of tree cinder? 09:21:25 yep but i dont understand waht the pep8 failure is all about 09:21:51 brtknr: so you have already tested and it works? 09:21:55 i'm happy to test it 09:22:07 yep ive tested and it works 09:22:17 great 09:22:23 works for xinliang too 09:23:09 i will help take a look the pep8 and docs job failure 09:23:26 brtknr: I will have a look, at CERN we have two more CSI drivers. Need to check that this is compatible. 09:25:23 there are some generic csi components, only one of them is cinder_csi replated 09:25:26 related 09:25:27 strigazi: if cern can give some comments on this csi patch, it would be great 09:25:30 brtknr: don't you need any of: --feature-gates=CSINodeInfo=true,CSIDriverRegistry=true ? 09:25:45 brtknr: or --runtime-config=storage.k8s.io/v1alpha1=true" ? 09:26:03 didnt need those 09:26:27 seemed to work with v1.16.x and 1.17.x 09:26:46 interesting 09:27:19 ok, I will test 09:27:21 CSIDriverRegistry went into beta from 1.14 09:27:35 same for CSINodeInfo 09:27:43 so true by default 09:27:48 brtknr: means on by default? 09:27:49 ok 09:28:09 The world will end based on a default 09:28:21 XD 09:28:37 I'll quote you on that when the world ends 09:29:07 100% sure 09:29:33 some will change the nuke_everything default to true in qa, and this it 09:30:11 anyway, let's stay on track, I will test it 09:30:36 brtknr: would you like to do manila provisioner too? 09:31:16 I'm good, let's move on? 09:32:38 ok 09:32:56 strigazi: i tried the manila provisioner but lost my patience with it 09:32:57 brtknr: what's the nginx issue? 09:35:04 with regards to nginx and traefik, at the moment, we need to label an ingress node. we wanted to explore adding another option which runs a deployment with n replicas behind a load balancer 09:36:25 brtknr: what do you propose? 09:37:10 effectively a label which changes nginx and traefik services from ClusterIP to LoadBalancer type 09:37:17 fairly small change 09:37:32 and DS to deploy and hostNetwork: false 09:37:35 it can be ClusterIP by default 09:37:47 and remove node-selector 09:39:02 possibly those too :S needs to look into it more closely 09:40:03 as mentioned before, just don't break the default :) 09:41:06 ok im glad you're okay to support the alternative, ill refine what it would involve a bit more in a PS 09:41:43 for prometheus, ping dioguerra , I think it works but double check with him 09:42:08 re prometheus, what's the status of removing heapster and adding metrics-server 09:42:13 i take back the promethus issue 09:42:16 it is done 09:42:25 i can see the metrics-server pods is broken on my local 09:42:44 metrics server doesnt work for me either at 9.2.0 09:43:00 strigazi: you mean the heapster->metrics-server work is done, right 09:43:02 ？ 09:43:04 lemme grab the log 09:43:05 yes 09:43:41 metrics server works for me. Only the logs of the master node are not collected due to the cert 09:44:09 strigazi: ok, i will test again 09:44:15 what's the k8s version you're using? 09:44:48 17.2 09:45:26 fairly new :) 09:45:32 ok, i will give it a try 09:45:36 let's move on? 09:46:17 yes 09:46:22 https://review.opendev.org/#/c/700565 work for ARM 09:46:35 personally i'm ok with that 09:46:45 i created a story for the metrics-server issue: https://storyboard.openstack.org/#!/story/2007264 09:47:20 do you need special label for metrics server? 09:47:29 flwang1: have you tested https://storyboard.openstack.org/#!/story/2007264 09:47:38 flwang1: have you tested https://review.opendev.org/#/c/700565 09:47:56 https://storyboard.openstack.org/#!/story/2007264 this is the heat-agent 09:48:36 i created a story for the metrics-server issue: https://storyboard.openstack.org/#!/story/2007265 09:48:39 sorry 09:48:40 no i haven't, i just had a quick look about the code 09:49:20 the ARCH change looks trivial, i wanted to get your thoughts on the general approach 09:49:24 brtknr: DNS k8s-podman-calico-k7tz6itxx7k6-node-0 on 10.254.0.10:53: no such host 09:49:34 brtknr: will do 09:49:58 for ARCH I will that the defailt x86_64 works 09:50:05 i.e. determining arch inside /write-heat-params.sh script 09:50:10 I can't test arm 09:51:07 strigazi: what is the fix for DNS k8s-podman-calico-k7tz6itxx7k6-node-0 on 10.254.0.10:53: no such host 09:51:31 left a comment 09:53:11 strigazi: ok thanks, that was not obvious to me :) 09:53:52 https://pbs.twimg.com/media/C9o9ZO0UwAAi3ib.jpg 09:54:09 lol 09:54:54 defaults usually assume DNS. I have more scenarios for the end of the world, all of them include DNS as well 09:55:27 move on? 09:55:32 we have only 5 mins 09:56:07 hi all 09:56:38 brtknr: what's this 'Modify default-worker flavor after cluster creation'? 09:56:41 jakeyip: hi 09:58:09 flwang1: so we have a use case where we want to be able to change default worker flavor after the cluster is already creted 09:58:46 at the moment, we can delete nodegroup and create a new nodegroup with different flavor 09:58:53 but we cant do this for default worker 09:59:24 i was wondering how involved a change it would be to allow update of default-worker flavor 09:59:55 I think the best approach is to make the default worker deletable 10:00:05 brtknr: maybe we should allow create an empty cluster 10:00:07 like EKS 10:00:22 changing the flavor sounds orthogonal to nova to me 10:00:45 i don't think resizing the instance is a good idea 10:00:51 +1 10:01:09 yes i'd prefer to make the default worker deletable 10:01:13 EKS logic gives little value 10:01:22 and create cluster with 0 workers 10:02:10 btw, i'd like to propose a patch to support distribute master nodes to different AZs, strigazi, do you like the idea? 10:02:22 flwang1: is that what you mean by empty cluster ? 10:02:33 brtknr: no, different idea 10:02:35 or does this also include 0 masters? 10:02:41 flwang1: +1 we will like that for workers 10:02:48 it's like the Regional cluster in GKE 10:03:33 jakeyip: for workers, we already can support it with node groups, but i know where you come from 10:04:15 master across AZs could work, but it feels hacky 10:04:41 I mean you can deploy N NGs for masters by default 10:04:47 this sounds better 10:05:07 strigazi: what do you mean deploy N NGs for masters? 10:05:48 if you have 3 AZs in your cloud, deploy three master NGs 10:05:54 on creation 10:06:03 N=3 in this case 10:06:22 that sounds like a better approach 10:06:29 you mean create 3NGs for master nodes? 10:06:34 how can i do that? 10:06:58 yes 10:07:04 one sec 10:10:08 still looking 10:10:26 I don't remember were it is 10:11:33 i didnt know you could create master N G 10:11:35 i didnt know you could create master NG 10:11:57 yep, i thought master can only be in default-master NG 10:11:59 you can't atm, we have similar code though 10:12:00 that's why i asked 10:12:29 strigazi: can you please share the code with me when you found it? 10:12:35 i'm keen to learn that 10:13:35 found ir 10:13:36 found it 10:13:37 wait 10:14:19 https://github.com/openstack/magnum/blob/e52f77b299c50f004ee5a875c68d7f129b88a2af/magnum/conductor/handlers/cluster_conductor.py#L56 10:14:57 you can add the additional NGs there, needs some extra work for the LB to be shared across the stacks 10:15:26 hmm... what's the benefit to put masters into different NG? 10:15:46 anyway, thanks, i will think about this 10:15:47 to spawn across AZs? 10:15:57 that is the goal no? 10:16:26 Then you will have a NG with nodes in many AZs inside? 10:16:26 if there is an AZ list, we should be able to spawn masters into different AZ 10:16:50 So then all the nodegroup work we did is kind of worthless 10:17:12 as you want 10:17:24 strigazi: please don't say words like this 10:17:41 i'm not trying to make anyone's work worthless 10:18:02 The biggest use case of NGs was AZs 10:19:42 ok, then if you guys do have a good solution for this, please share, otherwise, what i am doing is trying to figure out a way to achieve that 10:20:15 well, I might be wrong. IN GKE they spawn AZs across zones, if I read correctly 10:20:18 https://cloud.google.com/kubernetes-engine/docs/concepts/node-pools 10:20:24 https://cloud.google.com/kubernetes-engine/docs/concepts/node-pools#nodes_in_multi-zonal_clusters 10:22:09 Maybe the thing with the list works 10:22:29 https://cloud.google.com/kubernetes-engine/docs/concepts/types-of-clusters 10:22:38 It will require some serious hacking in the heat templates, maybe 10:22:39 let's use a spec to track this work 10:22:53 i will start to draft that 10:23:24 brtknr: strigazi: could you please help review this one Volume AZ regression fix - https://review.opendev.org/705592 ? 10:23:47 it's kind of a regression issue, i didn't fix the volume AZ issue at the first shot 10:23:56 let's wrap the meeting? with train many bugs arrived 10:24:10 cinder doesn't respect "" as AZ 10:24:15 strigazi: sure 10:24:17 flwang1: I will have a look, I might have time only on monday though 10:24:23 #endmeeting