09:00:14 #startmeeting magnum 09:00:15 Meeting started Wed Oct 2 09:00:14 2019 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:19 The meeting name has been set to 'magnum' 09:00:29 #topic roll call 09:01:11 o/ 09:01:17 o/ 09:01:35 o/ 09:01:41 o/ 09:01:46 wow 09:01:55 hello folks, thanks for joining 09:02:05 #topic agenda 09:02:15 https://etherpad.openstack.org/p/magnum-weekly-meeting 09:02:46 let's go through the topic list one by one 09:02:55 anyone you guys want to talk first? 09:02:58 I start? 09:03:06 strigazi: go for it 09:03:33 I completed the work for moving to 1.16 here: https://review.opendev.org/#/q/status:open+project:openstack/magnum+branch:master+topic:v1.16 09:03:43 minus the FCOS patch 09:03:54 we don't care about that for 1.16 09:04:28 the main reason for moving to podman is that I couldn't make the kubelet start with atomic install https://review.opendev.org/#/c/685749/ 09:05:14 I can shrink the patch above ^^ and start only kubelet with podman 09:05:38 strigazi: yep, that's my question. should we merge them into one patch? 09:05:55 to get the review easier if we have to have all of them together for v1.16 09:06:10 the benefit is that we don't have to build atomic containers, just maintain the systemd units 09:06:41 for me the review is easier in many but I won't review so I can squash 09:07:00 I'd prefer they are separate as its easier to follow what each patch is doing 09:07:08 the podman one is so big 09:07:49 ok, then let's keep it as it is 09:08:01 i will start to review the podman patch first 09:08:07 the podman one is 7 systemd units basically 09:08:27 the podman patch is also necessity for moving to coreos so good that we have it :) 09:08:58 I can help you review it 09:09:01 thanks for the work, that was a very quick turnaround! i was expecting it would take much longer 09:09:26 i'm happy to get the small patches merged quickly 09:09:44 actually, i have reviewed them without leaving comments 09:10:03 I can break it the big one per component if it helps 09:10:55 strigazi: no, that's OK 09:11:07 i just want to be focus, all good 09:11:11 thanks for the great work 09:11:22 flwang1: brtknr jakeyip do yo need any clarfication on any of the patches? 09:11:49 strigazi: i do have some small questions 09:11:52 ttsiouts: can passes by my office when he has questions and vice versa :) 09:12:06 for https://review.opendev.org/#/c/685746/3 i saw you change from version 0.3 to 0.2 why? 09:12:07 s/can// 09:12:08 strigazi: I think its clear to me, I agree with flwang1 about merging small patches quickly 09:12:21 perhaps the rp filter one can go before the podman one 09:12:42 hmm I have some question, e.g. for etcd service why is kill and rm in ExecStartPre and not ExecStop ? 09:12:54 flwang1 because upstream flannel said so https://github.com/coreos/flannel/pull/1174/files 09:13:33 strigazi: thanks 09:13:48 brtknr +1 for rp filter 09:14:00 jakeyip: I can make it the same as the other ones 09:14:23 jakeyip: maybe rm is useful, eg when you want to change image 09:14:57 I'll consolidate with the same practice everywhere 09:15:10 strigazi: so labelling via kubelet arg is no longer valid? --node-labels=node-role.kubernetes.io/master=\"\"" 09:15:18 nope 09:15:33 the kubelet does not accept node-role 09:16:15 since 1.16? 09:16:15 strigazi: as for https://review.opendev.org/#/c/685748/3 the reference issue is still open, https://github.com/kubernetes/kubernetes/issues/75457 then why do we have to drop the current way? 09:16:38 brtknr: we're asking same question, good 09:17:04 flwang1: yes since 1.16 the change is in the kubelet 09:17:16 strigazi: good, thanks for clarification 09:17:47 strigazi: yes understand. Why did you choose to put it in ExecStartPre instead of ExecStop? seems more reasonable to cleanup in stop stage. is this a standard from somewhere? 09:18:24 shall we move to next topic? the fcos driver 09:18:25 jakeyip I copied it from some redhat docs 09:18:52 for the podman patch, i prefer to leave comments on that one since it's bigger 09:20:35 jakeyip: i suppose it ensures that the cleanup always happens before the start 09:20:37 jakeyip: https://docs.fedoraproject.org/en-US/fedora-coreos/getting-started/ 09:21:28 move to fcos? 09:21:34 yes 09:21:37 thanks, yes 09:22:03 I made the agent work after flwang1's excellent work in heat 09:22:30 \o/ nice work both! 09:22:32 strigazi: thanks, i will try to push heat team to allow us backport it to train 09:22:43 the only diff with the atomic driver is, how the agent starts and docker-storage-setup that is dropped. 09:23:15 strigazi: are we trying to get fcos driver in train? 09:23:19 flwang1: it will be crazy if they say no 09:23:31 :) 09:23:35 flwang1: I think we must do it 09:23:45 yep, i think so 09:23:55 just wanna make sure we all on the same page 09:24:38 I can finish it as a new driver today 09:25:07 strigazi: it would be great, if you can start it as a new driver, i will test it tomorrow 09:25:21 +1 09:25:44 strigazi: you're welcome to propose patchset on my one or start with a new patch 09:26:02 if the heat team doesn't merge ignition support, what we do? 09:26:16 cry? 09:26:18 patch ignition? 09:26:55 strigazi: i will blame them in heat channel everyday for the whole U release 09:27:05 brtknr: well at CERN we will cherry-pick to rocky even, but for other people may not be an option 09:27:22 strigazi: we will cherrypick as well 09:27:28 it's a such simple patch 09:27:46 i can't see any reason they can't accept it for train 09:27:49 cherry-pick in heat or magnum? 09:27:53 or both? 09:28:00 heat 09:28:02 brtknr: heat 09:28:27 we can try the upstream route in parallel too 09:29:00 what does this mean ^^ 09:29:15 but may be time for upgrade before a release is cut :) 09:29:39 cherry-pick upstream back to rocky 09:30:01 brtknr: i'm a bit confused 09:30:14 we're talking about the ignition patch in heat 09:30:31 yes, i'm proposing cherry-picking back to rocky upstream 09:30:48 is that clear? 09:30:56 hmm... it doesn't really matter for us, TBH 09:31:06 although its not too relevent for us as we are planning to upgrade to stein soon 09:31:13 Q,R,S could have it 09:31:28 we care about cherrypick to train, because, otherwise, our fcos driver won't work in Train 09:31:39 train sound enough for me 09:31:58 i will follow up that, don't worry, guys 09:32:22 strigazi: so action on you: a new patch(set) for fcos driver 09:32:31 +1 09:32:37 thanks 09:34:12 next topic? 09:34:52 σθρε 09:34:54 sure 09:36:04 brtknr: do you want to talk about yours? 09:36:28 sure 09:36:33 about the compatibility matrix 09:36:43 I'd like feedback on the accuracy of information 09:37:03 i am confident about stein and train but not so sure about compatibility of kube tags before then 09:37:15 let's what we will mamage to merge :) 09:37:34 i tried to sift through the commit logs and derive answer but only got so far 09:37:40 for stein ++ 09:37:49 when is the dealine? 09:37:54 deadline 09:38:11 before the autorelease is cut? 09:38:29 two weeks? but we can do kind if late releases 09:38:32 two weeks? but we can do kind of late releases 09:38:38 next week 09:38:46 https://releases.openstack.org/train/schedule.html 09:38:52 will it be rc2 or final? 09:39:16 both work 09:40:03 I don't think we need pike and queens in compat matrix, pike is unmaintained and queens is till 2019-10-25 which is soon https://releases.openstack.org/ 09:40:40 jakeyip: +1 09:40:54 just stein and rocky are good enough 09:40:54 jakeyip: fine, i'll remove those 09:42:59 I can help test R, see how it goes with 1.15.x 09:43:31 Bharat Kunwar proposed openstack/magnum master: Add compatibility matrix for kube_tag https://review.opendev.org/685675 09:43:46 please check this is accurate^ 09:45:00 this does need testing to confirm 09:45:05 let's move to next topic? 09:46:16 +1 09:47:19 i would like to discus the rolling upgrade 09:47:52 so far the k8s version rolling upgrade runs very good on our cloud, we haven't seen issue with that 09:48:20 but magnum does need to support the os upgrade as well 09:48:52 with current limitation, i'm proposing this solution https://review.opendev.org/#/c/686033/ 09:49:04 using the ostree command to do upgrade 09:49:36 before we fix the master resizing issue, this seems the only working solution 09:49:47 strigazi: brtknr: jakeyip: thoughts? 09:49:52 +1 09:50:56 flwang1: is that the right link? 09:50:58 strigazi: i have fixed the issue your found 09:51:13 sorry https://review.opendev.org/#/c/669593/ 09:51:15 will test today 09:51:43 strigazi: and it could work for the fcos driver with minion changes in the future 09:52:00 since fcos does use ostree as well 09:52:33 maybe yes 09:52:38 we will need to find alternative way to do upgrades if we are dropping ostree 09:53:13 with nodegroups now we can add a new ng and drop the old one 09:53:31 strigazi: but we still need to fix the master 09:53:34 node is easy 09:53:41 this will replace the VMs and you get 100% gurantee that the nodes will work 09:53:44 I dunno if I like the idea of in-place upgrade. 09:53:58 and the most important thing is, how to get 0 downtime 09:54:05 changing a kernel is always a risk 09:54:18 jakeyip: we don't have good choice at this moment 09:54:26 0 downtime of what? 09:54:33 your k8s service 09:54:38 i think he means the cluster? 09:54:41 sorry, your k8s cluster 09:55:00 a node down doesn't mean clutser down 09:55:01 is that assuming multimaster? 09:55:15 strigazi: ok, i mean the service running on the cluster 09:55:21 my bad brain 09:55:27 is there a constraint of "only allow upgrades when its a multimaster" 09:55:32 I rather we blow away old VMs and build new ones This is why we are doing k8s instead of docker ya? 09:55:50 +1 ^^ 09:55:53 is in-place upgrade only for master 09:56:01 guys, we need to be on the same page about the 0 downtime 09:56:06 that is what I meant with using nodegroups 09:56:24 let me find some docs for you guys 09:56:33 trust me, i spent quite a lot time on this 09:57:01 https://blog.gruntwork.io/zero-downtime-server-updates-for-your-kubernetes-cluster-902009df5b33 09:57:21 there are 4 blogs 09:57:33 there is nothing in this world with 0 downtime, there are services with good SLA 09:57:52 strigazi: fine, you can definitely argue that 09:58:01 I am a little confused as fedora atomic is EOL so there won't be anything to upgrade to 09:58:20 ^ +1 :P 09:58:38 brtknr: that's not true 09:59:01 for example, our customer are using fa29 -20190801 09:59:20 and then there is a big security issue happened today 20191002 09:59:27 how can you upgrade that? 10:00:21 the upgrade is not just for fa28-fa29, it does support fa29-0001 to fa29-0002 10:00:29 flwang1: when you mention node is easy, it's master we need to care about - are we just solving this problem for clusters without multi master? 10:01:13 works for single master too with some api downtime, not service 10:01:30 jakeyip: no, when i say master is not easy, i mean currently magnum doesn't support resize master, which means we can't just add new master and remove an old one 10:01:53 may be 0 downtime is not a good word 10:02:15 but the target is get a minimum downtime for services running on the cluster 10:02:40 to get somewhere 10:02:50 when we say minimum, that means, at least, we should be able to do drain before doing upgrade 10:02:54 for train we will have inplace uprgade for OS and k9s 10:02:56 for train we will have inplace uprgade for OS and k8ss 10:03:08 strigazi: yes 10:03:18 i don't like the in place either 10:03:20 for U will have node-replacement 10:03:45 for node-replacement, we still need a way to drain the node before delete it 10:03:45 Merged openstack/magnum master: Set cniVersion for flannel https://review.opendev.org/685746 10:03:52 for T users will be able to do node-replacement by adding a new NG and migrate the workload 10:04:22 isn't this good enough for train ^^ 10:04:29 users will have maximum control 10:04:46 that's ok for train 10:04:58 i mean it's good for train 10:05:04 +1 10:05:25 for U we have two options 10:05:25 for U, i will still insist on having a drain before removing node 10:06:22 we can do it with the agent with the DELETE SD 10:06:29 +1 to drain before delete 10:06:51 or we can have controller that does this in the cluster 10:07:26 or the magnum-controller 10:07:30 so three options 10:07:33 Merged openstack/magnum master: Add hostname-override to kube-proxy https://review.opendev.org/685747 10:07:34 strigazi: we already discussed the 'controller' in cluster in catalyst 10:08:01 strigazi: we just need to extend the magnum-auto-healer 10:08:11 to magnum-controller 10:08:50 that's my goal for U :) 10:09:06 I think the magnum-auto-healer needs to come closer to the us. the cloud-provider repo is a horrible place to host it 10:09:20 strigazi: i agree 10:09:25 -2 to have it in that repo 10:09:29 we need a better place 10:09:34 :D 10:09:52 we don't have a good golang based code repo in openstack 10:10:18 we can try to push the CPO team to split the repo 10:10:31 anything else team? 10:10:34 i need to go 10:10:40 thats all 10:11:15 thank you very much 10:11:18 see you 10:11:23 good discussion 10:11:25 strigazi: thank you 10:11:38 strigazi: i look forward your fcos driver 10:11:49 same 10:11:50 sure 10:11:56 this after noon 10:12:01 brtknr: can i get your bless on https://review.opendev.org/#/c/675511/ ? 10:12:43 brtknr: i'd like to get it in train 10:13:03 flwang1: np, I'll quickly test it first 10:13:13 brtknr: thank you very much 10:13:26 see you, team 10:14:16 bye 10:14:55 Merged openstack/magnum master: k8s_fedora: Label master nodes with kubectl https://review.opendev.org/685748 10:15:56 flwang1: brtknr https://review.opendev.org/#/c/686028/1 10:16:19 strigazi: what about it? 10:16:29 merge? xD 10:16:36 i left a comment 10:16:38 also end meeting? 10:16:57 it needs to come before podman no? 10:16:57 Spyros Trigazis proposed openstack/magnum master: k8s_fedora: Move rp_filter=1 for calico up https://review.opendev.org/686028 10:17:50 brtknr: it is before now 10:17:51 i've +2ed 10:18:03 flwang1: ? 10:18:06 to you flwang1 if you're still tere 10:18:16 over to you flwang1 if you're still there 10:18:27 he also needs to end the meeting 10:18:33 I'll try 10:18:37 #endmeetinh 10:18:38 #endmeeting