09:00:14 <flwang1> #startmeeting magnum
09:00:15 <openstack> Meeting started Wed Oct  2 09:00:14 2019 UTC and is due to finish in 60 minutes.  The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:19 <openstack> The meeting name has been set to 'magnum'
09:00:29 <flwang1> #topic roll call
09:01:11 <brtknr> o/
09:01:17 <ttsiouts> o/
09:01:35 <strigazi> o/
09:01:41 <jakeyip> o/
09:01:46 <flwang1> wow
09:01:55 <flwang1> hello folks, thanks for joining
09:02:05 <flwang1> #topic agenda
09:02:15 <flwang1> https://etherpad.openstack.org/p/magnum-weekly-meeting
09:02:46 <flwang1> let's go through the topic list one by one
09:02:55 <flwang1> anyone you guys want to talk first?
09:02:58 <strigazi> I start?
09:03:06 <flwang1> strigazi: go for it
09:03:33 <strigazi> I completed the work for moving to 1.16 here: https://review.opendev.org/#/q/status:open+project:openstack/magnum+branch:master+topic:v1.16
09:03:43 <strigazi> minus the FCOS patch
09:03:54 <strigazi> we don't care about that for 1.16
09:04:28 <strigazi> the main reason for moving to podman is that I couldn't make the kubelet start with atomic install https://review.opendev.org/#/c/685749/
09:05:14 <strigazi> I can shrink the patch above ^^ and start only kubelet with podman
09:05:38 <flwang1> strigazi: yep, that's my question. should we merge them into one patch?
09:05:55 <flwang1> to  get the review easier if we have to have all of them together for v1.16
09:06:10 <strigazi> the benefit is that we don't have to build atomic containers, just maintain the systemd units
09:06:41 <strigazi> for me the review is easier in many but I won't review so I can squash
09:07:00 <brtknr> I'd  prefer they are separate as its easier to follow what each patch is doing
09:07:08 <flwang1> the podman one is so big
09:07:49 <flwang1> ok, then let's keep it as it is
09:08:01 <flwang1> i will start to review the podman patch first
09:08:07 <strigazi> the podman one is 7 systemd units basically
09:08:27 <brtknr> the podman patch is also necessity for moving to coreos so good that we have it :)
09:08:58 <strigazi> I can help you review it
09:09:01 <brtknr> thanks for the work, that was a very quick turnaround! i was expecting it would take much longer
09:09:26 <flwang1> i'm happy to get the small patches merged quickly
09:09:44 <flwang1> actually, i have reviewed them without leaving comments
09:10:03 <strigazi> I can break it the big one per component if it helps
09:10:55 <flwang1> strigazi: no, that's OK
09:11:07 <flwang1> i just want to be focus, all good
09:11:11 <flwang1> thanks for the great work
09:11:22 <strigazi> flwang1: brtknr jakeyip do yo need any clarfication on any of the patches?
09:11:49 <flwang1> strigazi: i do have some small questions
09:11:52 <strigazi> ttsiouts: can passes by my office when he has questions and vice versa :)
09:12:06 <flwang1> for https://review.opendev.org/#/c/685746/3   i saw you change from version 0.3 to 0.2 why?
09:12:07 <strigazi> s/can//
09:12:08 <brtknr> strigazi: I think its clear to me, I agree with flwang1 about merging small patches quickly
09:12:21 <brtknr> perhaps the rp filter one can go before the podman one
09:12:42 <jakeyip> hmm I have some question, e.g. for etcd service why is kill and rm in ExecStartPre and not ExecStop ?
09:12:54 <strigazi> flwang1 because upstream flannel said so https://github.com/coreos/flannel/pull/1174/files
09:13:33 <flwang1> strigazi: thanks
09:13:48 <strigazi> brtknr +1 for rp filter
09:14:00 <strigazi> jakeyip: I can make it the same as the other ones
09:14:23 <strigazi> jakeyip: maybe rm is useful, eg when you want to change image
09:14:57 <strigazi> I'll consolidate with the same practice everywhere
09:15:10 <brtknr> strigazi: so labelling via kubelet arg is no longer valid? --node-labels=node-role.kubernetes.io/master=\"\""
09:15:18 <strigazi> nope
09:15:33 <strigazi> the kubelet does not accept node-role
09:16:15 <brtknr> since 1.16?
09:16:15 <flwang1> strigazi: as for https://review.opendev.org/#/c/685748/3   the reference issue is still open, https://github.com/kubernetes/kubernetes/issues/75457 then why do we have to drop the current way?
09:16:38 <flwang1> brtknr: we're asking same question, good
09:17:04 <strigazi> flwang1: yes since 1.16 the change is in the kubelet
09:17:16 <flwang1> strigazi: good, thanks for clarification
09:17:47 <jakeyip> strigazi:  yes understand. Why did you choose to put it in ExecStartPre instead of ExecStop? seems more reasonable to cleanup in stop stage. is this a standard from somewhere?
09:18:24 <flwang1> shall we move to next topic? the fcos driver
09:18:25 <strigazi> jakeyip I copied it from some redhat docs
09:18:52 <flwang1> for the podman patch, i prefer to leave comments on that one since it's bigger
09:20:35 <brtknr> jakeyip: i suppose it ensures that the cleanup always happens before the start
09:20:37 <strigazi> jakeyip: https://docs.fedoraproject.org/en-US/fedora-coreos/getting-started/
09:21:28 <strigazi> move to fcos?
09:21:34 <brtknr> yes
09:21:37 <jakeyip> thanks, yes
09:22:03 <strigazi> I made the agent work after flwang1's excellent work in heat
09:22:30 <brtknr> \o/ nice work both!
09:22:32 <flwang1> strigazi: thanks, i will try to push heat team to allow us backport it to train
09:22:43 <strigazi> the only diff with the atomic driver is, how the agent starts and docker-storage-setup that is dropped.
09:23:15 <flwang1> strigazi: are we trying to get fcos driver in train?
09:23:19 <strigazi> flwang1: it will be crazy if they say no
09:23:31 <flwang1> :)
09:23:35 <strigazi> flwang1: I think we must do it
09:23:45 <flwang1> yep, i think so
09:23:55 <flwang1> just wanna make sure we all on the same page
09:24:38 <strigazi> I can finish it as a new driver today
09:25:07 <flwang1> strigazi: it would be great, if you can start it as a new driver, i will test it tomorrow
09:25:21 <brtknr> +1
09:25:44 <flwang1> strigazi: you're welcome to propose patchset on my one or start with a new patch
09:26:02 <strigazi> if the heat team doesn't merge ignition support, what we do?
09:26:16 <brtknr> cry?
09:26:18 <strigazi> patch ignition?
09:26:55 <flwang1> strigazi: i will blame them in heat channel everyday for the whole U release
09:27:05 <strigazi> brtknr: well at CERN we will cherry-pick to rocky even, but for other people may not be an option
09:27:22 <flwang1> strigazi: we will cherrypick as well
09:27:28 <flwang1> it's a such simple patch
09:27:46 <flwang1> i can't see any reason they can't accept it for train
09:27:49 <brtknr> cherry-pick in heat or magnum?
09:27:53 <brtknr> or both?
09:28:00 <strigazi> heat
09:28:02 <flwang1> brtknr: heat
09:28:27 <brtknr> we can try the upstream route in parallel too
09:29:00 <strigazi> what does this mean ^^
09:29:15 <brtknr> but may be time for upgrade before a release is cut :)
09:29:39 <brtknr> cherry-pick upstream back to rocky
09:30:01 <flwang1> brtknr: i'm a bit confused
09:30:14 <flwang1> we're talking about the ignition patch in heat
09:30:31 <brtknr> yes, i'm proposing cherry-picking back to rocky upstream
09:30:48 <brtknr> is that clear?
09:30:56 <flwang1> hmm... it doesn't really matter for us, TBH
09:31:06 <brtknr> although its not too relevent for us as we are planning to upgrade to stein soon
09:31:13 <strigazi> Q,R,S could have it
09:31:28 <flwang1> we care about cherrypick to train, because, otherwise, our fcos driver won't work in Train
09:31:39 <strigazi> train sound enough for me
09:31:58 <flwang1> i will follow up that, don't worry, guys
09:32:22 <flwang1> strigazi: so action on you: a new patch(set) for fcos driver
09:32:31 <strigazi> +1
09:32:37 <flwang1> thanks
09:34:12 <flwang1> next topic?
09:34:52 <strigazi> σθρε
09:34:54 <strigazi> sure
09:36:04 <flwang1> brtknr: do you want to talk about yours?
09:36:28 <brtknr> sure
09:36:33 <brtknr> about the compatibility matrix
09:36:43 <brtknr> I'd like feedback on the accuracy of information
09:37:03 <brtknr> i am confident about stein and train but not so sure about compatibility of kube tags before then
09:37:15 <strigazi> let's what we will mamage to merge :)
09:37:34 <brtknr> i tried to sift through the commit logs and derive answer but only got so far
09:37:40 <strigazi> for stein ++
09:37:49 <brtknr> when is the dealine?
09:37:54 <brtknr> deadline
09:38:11 <brtknr> before the autorelease is cut?
09:38:29 <strigazi> two weeks? but we can do kind if late releases
09:38:32 <strigazi> two weeks? but we can do kind of late releases
09:38:38 <flwang1> next week
09:38:46 <flwang1> https://releases.openstack.org/train/schedule.html
09:38:52 <brtknr> will it be rc2 or final?
09:39:16 <flwang1> both work
09:40:03 <jakeyip> I don't think we need pike and queens in compat matrix, pike is unmaintained and queens is till 2019-10-25 which is soon https://releases.openstack.org/
09:40:40 <flwang1> jakeyip: +1
09:40:54 <flwang1> just stein and rocky are good enough
09:40:54 <brtknr> jakeyip: fine, i'll remove those
09:42:59 <jakeyip> I can help test R, see how it goes with 1.15.x
09:43:31 <openstackgerrit> Bharat Kunwar proposed openstack/magnum master: Add compatibility matrix for kube_tag  https://review.opendev.org/685675
09:43:46 <brtknr> please check this is accurate^
09:45:00 <flwang1> this does need testing to confirm
09:45:05 <flwang1> let's move to next topic?
09:46:16 <strigazi> +1
09:47:19 <flwang1> i would like to discus the rolling upgrade
09:47:52 <flwang1> so far the k8s version rolling upgrade runs very good on our cloud, we haven't seen issue with that
09:48:20 <flwang1> but magnum does need to support the os upgrade as well
09:48:52 <flwang1> with current limitation, i'm proposing this solution https://review.opendev.org/#/c/686033/
09:49:04 <flwang1> using the ostree command to do upgrade
09:49:36 <flwang1> before we fix the master resizing issue, this seems the only working solution
09:49:47 <flwang1> strigazi: brtknr: jakeyip: thoughts?
09:49:52 <strigazi> +1
09:50:56 <brtknr> flwang1: is that the right link?
09:50:58 <flwang1> strigazi: i have fixed the issue your found
09:51:13 <flwang1> sorry https://review.opendev.org/#/c/669593/
09:51:15 <strigazi> will test today
09:51:43 <flwang1> strigazi: and it could work for the fcos driver with minion changes in the future
09:52:00 <flwang1> since fcos does use ostree as well
09:52:33 <strigazi> maybe yes
09:52:38 <brtknr> we will need to find alternative way to do upgrades if we are dropping ostree
09:53:13 <strigazi> with nodegroups now we can add a new ng and drop the old one
09:53:31 <flwang1> strigazi: but we still need to fix the master
09:53:34 <flwang1> node is easy
09:53:41 <strigazi> this will replace the VMs and you get 100% gurantee that the nodes will work
09:53:44 <jakeyip> I dunno if I like the idea of in-place upgrade.
09:53:58 <flwang1> and the most important thing is, how to get 0 downtime
09:54:05 <strigazi> changing a kernel is always a risk
09:54:18 <flwang1> jakeyip: we don't have good choice at this moment
09:54:26 <strigazi> 0 downtime of what?
09:54:33 <flwang1> your k8s service
09:54:38 <brtknr> i think he means the cluster?
09:54:41 <flwang1> sorry, your k8s cluster
09:55:00 <strigazi> a node down doesn't mean clutser down
09:55:01 <brtknr> is that assuming multimaster?
09:55:15 <flwang1> strigazi: ok, i mean the service running on the cluster
09:55:21 <flwang1> my bad brain
09:55:27 <brtknr> is there a constraint of "only allow upgrades when its a multimaster"
09:55:32 <jakeyip> I rather we blow away old VMs and build new ones This is why we are doing k8s instead of docker ya?
09:55:50 <strigazi> +1 ^^
09:55:53 <jakeyip> is in-place upgrade only for master
09:56:01 <flwang1> guys, we need to be on the same page about the 0 downtime
09:56:06 <strigazi> that is what I meant with using nodegroups
09:56:24 <flwang1> let me find some docs for you guys
09:56:33 <flwang1> trust me, i spent quite a lot time on this
09:57:01 <flwang1> https://blog.gruntwork.io/zero-downtime-server-updates-for-your-kubernetes-cluster-902009df5b33
09:57:21 <flwang1> there are 4 blogs
09:57:33 <strigazi> there is nothing in this world with 0 downtime, there are services with good SLA
09:57:52 <flwang1> strigazi: fine, you can definitely argue that
09:58:01 <brtknr> I am a little confused as fedora atomic is EOL so there won't be anything to upgrade to
09:58:20 <jakeyip> ^ +1 :P
09:58:38 <flwang1> brtknr: that's not true
09:59:01 <flwang1> for example, our customer are using fa29 -20190801
09:59:20 <flwang1> and then there is a big security issue happened today 20191002
09:59:27 <flwang1> how can you upgrade that?
10:00:21 <flwang1> the upgrade is not just for fa28-fa29, it does support fa29-0001 to fa29-0002
10:00:29 <jakeyip> flwang1: when you mention node is easy, it's master we need to care about - are we just solving this problem for clusters without multi master?
10:01:13 <strigazi> works for single master too with some api downtime, not service
10:01:30 <flwang1> jakeyip: no, when i say master is not easy, i mean currently magnum doesn't support resize master, which means we can't just add new master and remove an old one
10:01:53 <flwang1> may be 0 downtime is not a good word
10:02:15 <flwang1> but the target is get a minimum downtime for services running on the cluster
10:02:40 <strigazi> to get somewhere
10:02:50 <flwang1> when we say minimum, that means, at least, we should be able to do drain before doing upgrade
10:02:54 <strigazi> for train we will have inplace uprgade for OS and k9s
10:02:56 <strigazi> for train we will have inplace uprgade for OS and k8ss
10:03:08 <flwang1> strigazi: yes
10:03:18 <flwang1> i don't like the in place either
10:03:20 <strigazi> for U will have node-replacement
10:03:45 <flwang1> for node-replacement, we still need a way to drain the node before delete it
10:03:45 <openstackgerrit> Merged openstack/magnum master: Set cniVersion for flannel  https://review.opendev.org/685746
10:03:52 <strigazi> for T users will be able to do node-replacement by adding a new NG and migrate the workload
10:04:22 <strigazi> isn't this good enough for train ^^
10:04:29 <strigazi> users will have maximum control
10:04:46 <flwang1> that's ok for train
10:04:58 <flwang1> i mean it's good for train
10:05:04 <strigazi> +1
10:05:25 <strigazi> for U we have two options
10:05:25 <flwang1> for U, i will still insist on having a drain before removing node
10:06:22 <strigazi> we can do it with the agent with the DELETE SD
10:06:29 <brtknr> +1 to drain before delete
10:06:51 <strigazi> or we can have  controller that does this in the cluster
10:07:26 <strigazi> or the magnum-controller
10:07:30 <strigazi> so three options
10:07:33 <openstackgerrit> Merged openstack/magnum master: Add hostname-override to kube-proxy  https://review.opendev.org/685747
10:07:34 <flwang1> strigazi: we already discussed the 'controller' in cluster in catalyst
10:08:01 <flwang1> strigazi: we just need to extend the magnum-auto-healer
10:08:11 <flwang1> to magnum-controller
10:08:50 <flwang1> that's my goal for U :)
10:09:06 <strigazi> I think the magnum-auto-healer needs to come closer to the us. the cloud-provider repo is a horrible place to host it
10:09:20 <flwang1> strigazi: i agree
10:09:25 <strigazi> -2 to have it in that repo
10:09:29 <flwang1> we need a better place
10:09:34 <flwang1> :D
10:09:52 <flwang1> we don't have a good golang based code repo in openstack
10:10:18 <flwang1> we can try to push the CPO team to split the repo
10:10:31 <flwang1> anything else team?
10:10:34 <flwang1> i need to go
10:10:40 <brtknr> thats all
10:11:15 <flwang1> thank you very much
10:11:18 <strigazi> see you
10:11:23 <jakeyip> good discussion
10:11:25 <flwang1> strigazi: thank you
10:11:38 <flwang1> strigazi: i look forward your fcos driver
10:11:49 <brtknr> same
10:11:50 <strigazi> sure
10:11:56 <strigazi> this after noon
10:12:01 <flwang1> brtknr: can i get your bless on https://review.opendev.org/#/c/675511/ ?
10:12:43 <flwang1> brtknr: i'd like to get it in train
10:13:03 <brtknr> flwang1: np, I'll quickly test it first
10:13:13 <flwang1> brtknr: thank you very much
10:13:26 <flwang1> see you, team
10:14:16 <strigazi> bye
10:14:55 <openstackgerrit> Merged openstack/magnum master: k8s_fedora: Label master nodes with kubectl  https://review.opendev.org/685748
10:15:56 <strigazi> flwang1: brtknr https://review.opendev.org/#/c/686028/1
10:16:19 <brtknr> strigazi: what about it?
10:16:29 <strigazi> merge? xD
10:16:36 <brtknr> i left a comment
10:16:38 <strigazi> also end meeting?
10:16:57 <brtknr> it needs to come before podman no?
10:16:57 <openstackgerrit> Spyros Trigazis proposed openstack/magnum master: k8s_fedora: Move rp_filter=1 for calico up  https://review.opendev.org/686028
10:17:50 <strigazi> brtknr: it is before now
10:17:51 <brtknr> i've +2ed
10:18:03 <strigazi> flwang1: ?
10:18:06 <brtknr> to you flwang1 if you're still tere
10:18:16 <brtknr> over to you flwang1 if you're still there
10:18:27 <strigazi> he also needs to end the meeting
10:18:33 <strigazi> I'll try
10:18:37 <strigazi> #endmeetinh
10:18:38 <strigazi> #endmeeting