09:00:14 <flwang1> #startmeeting magnum 09:00:15 <openstack> Meeting started Wed Oct 2 09:00:14 2019 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:19 <openstack> The meeting name has been set to 'magnum' 09:00:29 <flwang1> #topic roll call 09:01:11 <brtknr> o/ 09:01:17 <ttsiouts> o/ 09:01:35 <strigazi> o/ 09:01:41 <jakeyip> o/ 09:01:46 <flwang1> wow 09:01:55 <flwang1> hello folks, thanks for joining 09:02:05 <flwang1> #topic agenda 09:02:15 <flwang1> https://etherpad.openstack.org/p/magnum-weekly-meeting 09:02:46 <flwang1> let's go through the topic list one by one 09:02:55 <flwang1> anyone you guys want to talk first? 09:02:58 <strigazi> I start? 09:03:06 <flwang1> strigazi: go for it 09:03:33 <strigazi> I completed the work for moving to 1.16 here: https://review.opendev.org/#/q/status:open+project:openstack/magnum+branch:master+topic:v1.16 09:03:43 <strigazi> minus the FCOS patch 09:03:54 <strigazi> we don't care about that for 1.16 09:04:28 <strigazi> the main reason for moving to podman is that I couldn't make the kubelet start with atomic install https://review.opendev.org/#/c/685749/ 09:05:14 <strigazi> I can shrink the patch above ^^ and start only kubelet with podman 09:05:38 <flwang1> strigazi: yep, that's my question. should we merge them into one patch? 09:05:55 <flwang1> to get the review easier if we have to have all of them together for v1.16 09:06:10 <strigazi> the benefit is that we don't have to build atomic containers, just maintain the systemd units 09:06:41 <strigazi> for me the review is easier in many but I won't review so I can squash 09:07:00 <brtknr> I'd prefer they are separate as its easier to follow what each patch is doing 09:07:08 <flwang1> the podman one is so big 09:07:49 <flwang1> ok, then let's keep it as it is 09:08:01 <flwang1> i will start to review the podman patch first 09:08:07 <strigazi> the podman one is 7 systemd units basically 09:08:27 <brtknr> the podman patch is also necessity for moving to coreos so good that we have it :) 09:08:58 <strigazi> I can help you review it 09:09:01 <brtknr> thanks for the work, that was a very quick turnaround! i was expecting it would take much longer 09:09:26 <flwang1> i'm happy to get the small patches merged quickly 09:09:44 <flwang1> actually, i have reviewed them without leaving comments 09:10:03 <strigazi> I can break it the big one per component if it helps 09:10:55 <flwang1> strigazi: no, that's OK 09:11:07 <flwang1> i just want to be focus, all good 09:11:11 <flwang1> thanks for the great work 09:11:22 <strigazi> flwang1: brtknr jakeyip do yo need any clarfication on any of the patches? 09:11:49 <flwang1> strigazi: i do have some small questions 09:11:52 <strigazi> ttsiouts: can passes by my office when he has questions and vice versa :) 09:12:06 <flwang1> for https://review.opendev.org/#/c/685746/3 i saw you change from version 0.3 to 0.2 why? 09:12:07 <strigazi> s/can// 09:12:08 <brtknr> strigazi: I think its clear to me, I agree with flwang1 about merging small patches quickly 09:12:21 <brtknr> perhaps the rp filter one can go before the podman one 09:12:42 <jakeyip> hmm I have some question, e.g. for etcd service why is kill and rm in ExecStartPre and not ExecStop ? 09:12:54 <strigazi> flwang1 because upstream flannel said so https://github.com/coreos/flannel/pull/1174/files 09:13:33 <flwang1> strigazi: thanks 09:13:48 <strigazi> brtknr +1 for rp filter 09:14:00 <strigazi> jakeyip: I can make it the same as the other ones 09:14:23 <strigazi> jakeyip: maybe rm is useful, eg when you want to change image 09:14:57 <strigazi> I'll consolidate with the same practice everywhere 09:15:10 <brtknr> strigazi: so labelling via kubelet arg is no longer valid? --node-labels=node-role.kubernetes.io/master=\"\"" 09:15:18 <strigazi> nope 09:15:33 <strigazi> the kubelet does not accept node-role 09:16:15 <brtknr> since 1.16? 09:16:15 <flwang1> strigazi: as for https://review.opendev.org/#/c/685748/3 the reference issue is still open, https://github.com/kubernetes/kubernetes/issues/75457 then why do we have to drop the current way? 09:16:38 <flwang1> brtknr: we're asking same question, good 09:17:04 <strigazi> flwang1: yes since 1.16 the change is in the kubelet 09:17:16 <flwang1> strigazi: good, thanks for clarification 09:17:47 <jakeyip> strigazi: yes understand. Why did you choose to put it in ExecStartPre instead of ExecStop? seems more reasonable to cleanup in stop stage. is this a standard from somewhere? 09:18:24 <flwang1> shall we move to next topic? the fcos driver 09:18:25 <strigazi> jakeyip I copied it from some redhat docs 09:18:52 <flwang1> for the podman patch, i prefer to leave comments on that one since it's bigger 09:20:35 <brtknr> jakeyip: i suppose it ensures that the cleanup always happens before the start 09:20:37 <strigazi> jakeyip: https://docs.fedoraproject.org/en-US/fedora-coreos/getting-started/ 09:21:28 <strigazi> move to fcos? 09:21:34 <brtknr> yes 09:21:37 <jakeyip> thanks, yes 09:22:03 <strigazi> I made the agent work after flwang1's excellent work in heat 09:22:30 <brtknr> \o/ nice work both! 09:22:32 <flwang1> strigazi: thanks, i will try to push heat team to allow us backport it to train 09:22:43 <strigazi> the only diff with the atomic driver is, how the agent starts and docker-storage-setup that is dropped. 09:23:15 <flwang1> strigazi: are we trying to get fcos driver in train? 09:23:19 <strigazi> flwang1: it will be crazy if they say no 09:23:31 <flwang1> :) 09:23:35 <strigazi> flwang1: I think we must do it 09:23:45 <flwang1> yep, i think so 09:23:55 <flwang1> just wanna make sure we all on the same page 09:24:38 <strigazi> I can finish it as a new driver today 09:25:07 <flwang1> strigazi: it would be great, if you can start it as a new driver, i will test it tomorrow 09:25:21 <brtknr> +1 09:25:44 <flwang1> strigazi: you're welcome to propose patchset on my one or start with a new patch 09:26:02 <strigazi> if the heat team doesn't merge ignition support, what we do? 09:26:16 <brtknr> cry? 09:26:18 <strigazi> patch ignition? 09:26:55 <flwang1> strigazi: i will blame them in heat channel everyday for the whole U release 09:27:05 <strigazi> brtknr: well at CERN we will cherry-pick to rocky even, but for other people may not be an option 09:27:22 <flwang1> strigazi: we will cherrypick as well 09:27:28 <flwang1> it's a such simple patch 09:27:46 <flwang1> i can't see any reason they can't accept it for train 09:27:49 <brtknr> cherry-pick in heat or magnum? 09:27:53 <brtknr> or both? 09:28:00 <strigazi> heat 09:28:02 <flwang1> brtknr: heat 09:28:27 <brtknr> we can try the upstream route in parallel too 09:29:00 <strigazi> what does this mean ^^ 09:29:15 <brtknr> but may be time for upgrade before a release is cut :) 09:29:39 <brtknr> cherry-pick upstream back to rocky 09:30:01 <flwang1> brtknr: i'm a bit confused 09:30:14 <flwang1> we're talking about the ignition patch in heat 09:30:31 <brtknr> yes, i'm proposing cherry-picking back to rocky upstream 09:30:48 <brtknr> is that clear? 09:30:56 <flwang1> hmm... it doesn't really matter for us, TBH 09:31:06 <brtknr> although its not too relevent for us as we are planning to upgrade to stein soon 09:31:13 <strigazi> Q,R,S could have it 09:31:28 <flwang1> we care about cherrypick to train, because, otherwise, our fcos driver won't work in Train 09:31:39 <strigazi> train sound enough for me 09:31:58 <flwang1> i will follow up that, don't worry, guys 09:32:22 <flwang1> strigazi: so action on you: a new patch(set) for fcos driver 09:32:31 <strigazi> +1 09:32:37 <flwang1> thanks 09:34:12 <flwang1> next topic? 09:34:52 <strigazi> σθρε 09:34:54 <strigazi> sure 09:36:04 <flwang1> brtknr: do you want to talk about yours? 09:36:28 <brtknr> sure 09:36:33 <brtknr> about the compatibility matrix 09:36:43 <brtknr> I'd like feedback on the accuracy of information 09:37:03 <brtknr> i am confident about stein and train but not so sure about compatibility of kube tags before then 09:37:15 <strigazi> let's what we will mamage to merge :) 09:37:34 <brtknr> i tried to sift through the commit logs and derive answer but only got so far 09:37:40 <strigazi> for stein ++ 09:37:49 <brtknr> when is the dealine? 09:37:54 <brtknr> deadline 09:38:11 <brtknr> before the autorelease is cut? 09:38:29 <strigazi> two weeks? but we can do kind if late releases 09:38:32 <strigazi> two weeks? but we can do kind of late releases 09:38:38 <flwang1> next week 09:38:46 <flwang1> https://releases.openstack.org/train/schedule.html 09:38:52 <brtknr> will it be rc2 or final? 09:39:16 <flwang1> both work 09:40:03 <jakeyip> I don't think we need pike and queens in compat matrix, pike is unmaintained and queens is till 2019-10-25 which is soon https://releases.openstack.org/ 09:40:40 <flwang1> jakeyip: +1 09:40:54 <flwang1> just stein and rocky are good enough 09:40:54 <brtknr> jakeyip: fine, i'll remove those 09:42:59 <jakeyip> I can help test R, see how it goes with 1.15.x 09:43:31 <openstackgerrit> Bharat Kunwar proposed openstack/magnum master: Add compatibility matrix for kube_tag https://review.opendev.org/685675 09:43:46 <brtknr> please check this is accurate^ 09:45:00 <flwang1> this does need testing to confirm 09:45:05 <flwang1> let's move to next topic? 09:46:16 <strigazi> +1 09:47:19 <flwang1> i would like to discus the rolling upgrade 09:47:52 <flwang1> so far the k8s version rolling upgrade runs very good on our cloud, we haven't seen issue with that 09:48:20 <flwang1> but magnum does need to support the os upgrade as well 09:48:52 <flwang1> with current limitation, i'm proposing this solution https://review.opendev.org/#/c/686033/ 09:49:04 <flwang1> using the ostree command to do upgrade 09:49:36 <flwang1> before we fix the master resizing issue, this seems the only working solution 09:49:47 <flwang1> strigazi: brtknr: jakeyip: thoughts? 09:49:52 <strigazi> +1 09:50:56 <brtknr> flwang1: is that the right link? 09:50:58 <flwang1> strigazi: i have fixed the issue your found 09:51:13 <flwang1> sorry https://review.opendev.org/#/c/669593/ 09:51:15 <strigazi> will test today 09:51:43 <flwang1> strigazi: and it could work for the fcos driver with minion changes in the future 09:52:00 <flwang1> since fcos does use ostree as well 09:52:33 <strigazi> maybe yes 09:52:38 <brtknr> we will need to find alternative way to do upgrades if we are dropping ostree 09:53:13 <strigazi> with nodegroups now we can add a new ng and drop the old one 09:53:31 <flwang1> strigazi: but we still need to fix the master 09:53:34 <flwang1> node is easy 09:53:41 <strigazi> this will replace the VMs and you get 100% gurantee that the nodes will work 09:53:44 <jakeyip> I dunno if I like the idea of in-place upgrade. 09:53:58 <flwang1> and the most important thing is, how to get 0 downtime 09:54:05 <strigazi> changing a kernel is always a risk 09:54:18 <flwang1> jakeyip: we don't have good choice at this moment 09:54:26 <strigazi> 0 downtime of what? 09:54:33 <flwang1> your k8s service 09:54:38 <brtknr> i think he means the cluster? 09:54:41 <flwang1> sorry, your k8s cluster 09:55:00 <strigazi> a node down doesn't mean clutser down 09:55:01 <brtknr> is that assuming multimaster? 09:55:15 <flwang1> strigazi: ok, i mean the service running on the cluster 09:55:21 <flwang1> my bad brain 09:55:27 <brtknr> is there a constraint of "only allow upgrades when its a multimaster" 09:55:32 <jakeyip> I rather we blow away old VMs and build new ones This is why we are doing k8s instead of docker ya? 09:55:50 <strigazi> +1 ^^ 09:55:53 <jakeyip> is in-place upgrade only for master 09:56:01 <flwang1> guys, we need to be on the same page about the 0 downtime 09:56:06 <strigazi> that is what I meant with using nodegroups 09:56:24 <flwang1> let me find some docs for you guys 09:56:33 <flwang1> trust me, i spent quite a lot time on this 09:57:01 <flwang1> https://blog.gruntwork.io/zero-downtime-server-updates-for-your-kubernetes-cluster-902009df5b33 09:57:21 <flwang1> there are 4 blogs 09:57:33 <strigazi> there is nothing in this world with 0 downtime, there are services with good SLA 09:57:52 <flwang1> strigazi: fine, you can definitely argue that 09:58:01 <brtknr> I am a little confused as fedora atomic is EOL so there won't be anything to upgrade to 09:58:20 <jakeyip> ^ +1 :P 09:58:38 <flwang1> brtknr: that's not true 09:59:01 <flwang1> for example, our customer are using fa29 -20190801 09:59:20 <flwang1> and then there is a big security issue happened today 20191002 09:59:27 <flwang1> how can you upgrade that? 10:00:21 <flwang1> the upgrade is not just for fa28-fa29, it does support fa29-0001 to fa29-0002 10:00:29 <jakeyip> flwang1: when you mention node is easy, it's master we need to care about - are we just solving this problem for clusters without multi master? 10:01:13 <strigazi> works for single master too with some api downtime, not service 10:01:30 <flwang1> jakeyip: no, when i say master is not easy, i mean currently magnum doesn't support resize master, which means we can't just add new master and remove an old one 10:01:53 <flwang1> may be 0 downtime is not a good word 10:02:15 <flwang1> but the target is get a minimum downtime for services running on the cluster 10:02:40 <strigazi> to get somewhere 10:02:50 <flwang1> when we say minimum, that means, at least, we should be able to do drain before doing upgrade 10:02:54 <strigazi> for train we will have inplace uprgade for OS and k9s 10:02:56 <strigazi> for train we will have inplace uprgade for OS and k8ss 10:03:08 <flwang1> strigazi: yes 10:03:18 <flwang1> i don't like the in place either 10:03:20 <strigazi> for U will have node-replacement 10:03:45 <flwang1> for node-replacement, we still need a way to drain the node before delete it 10:03:45 <openstackgerrit> Merged openstack/magnum master: Set cniVersion for flannel https://review.opendev.org/685746 10:03:52 <strigazi> for T users will be able to do node-replacement by adding a new NG and migrate the workload 10:04:22 <strigazi> isn't this good enough for train ^^ 10:04:29 <strigazi> users will have maximum control 10:04:46 <flwang1> that's ok for train 10:04:58 <flwang1> i mean it's good for train 10:05:04 <strigazi> +1 10:05:25 <strigazi> for U we have two options 10:05:25 <flwang1> for U, i will still insist on having a drain before removing node 10:06:22 <strigazi> we can do it with the agent with the DELETE SD 10:06:29 <brtknr> +1 to drain before delete 10:06:51 <strigazi> or we can have controller that does this in the cluster 10:07:26 <strigazi> or the magnum-controller 10:07:30 <strigazi> so three options 10:07:33 <openstackgerrit> Merged openstack/magnum master: Add hostname-override to kube-proxy https://review.opendev.org/685747 10:07:34 <flwang1> strigazi: we already discussed the 'controller' in cluster in catalyst 10:08:01 <flwang1> strigazi: we just need to extend the magnum-auto-healer 10:08:11 <flwang1> to magnum-controller 10:08:50 <flwang1> that's my goal for U :) 10:09:06 <strigazi> I think the magnum-auto-healer needs to come closer to the us. the cloud-provider repo is a horrible place to host it 10:09:20 <flwang1> strigazi: i agree 10:09:25 <strigazi> -2 to have it in that repo 10:09:29 <flwang1> we need a better place 10:09:34 <flwang1> :D 10:09:52 <flwang1> we don't have a good golang based code repo in openstack 10:10:18 <flwang1> we can try to push the CPO team to split the repo 10:10:31 <flwang1> anything else team? 10:10:34 <flwang1> i need to go 10:10:40 <brtknr> thats all 10:11:15 <flwang1> thank you very much 10:11:18 <strigazi> see you 10:11:23 <jakeyip> good discussion 10:11:25 <flwang1> strigazi: thank you 10:11:38 <flwang1> strigazi: i look forward your fcos driver 10:11:49 <brtknr> same 10:11:50 <strigazi> sure 10:11:56 <strigazi> this after noon 10:12:01 <flwang1> brtknr: can i get your bless on https://review.opendev.org/#/c/675511/ ? 10:12:43 <flwang1> brtknr: i'd like to get it in train 10:13:03 <brtknr> flwang1: np, I'll quickly test it first 10:13:13 <flwang1> brtknr: thank you very much 10:13:26 <flwang1> see you, team 10:14:16 <strigazi> bye 10:14:55 <openstackgerrit> Merged openstack/magnum master: k8s_fedora: Label master nodes with kubectl https://review.opendev.org/685748 10:15:56 <strigazi> flwang1: brtknr https://review.opendev.org/#/c/686028/1 10:16:19 <brtknr> strigazi: what about it? 10:16:29 <strigazi> merge? xD 10:16:36 <brtknr> i left a comment 10:16:38 <strigazi> also end meeting? 10:16:57 <brtknr> it needs to come before podman no? 10:16:57 <openstackgerrit> Spyros Trigazis proposed openstack/magnum master: k8s_fedora: Move rp_filter=1 for calico up https://review.opendev.org/686028 10:17:50 <strigazi> brtknr: it is before now 10:17:51 <brtknr> i've +2ed 10:18:03 <strigazi> flwang1: ? 10:18:06 <brtknr> to you flwang1 if you're still tere 10:18:16 <brtknr> over to you flwang1 if you're still there 10:18:27 <strigazi> he also needs to end the meeting 10:18:33 <strigazi> I'll try 10:18:37 <strigazi> #endmeetinh 10:18:38 <strigazi> #endmeeting