09:01:48 <flwang1> #startmeeting Magnum
09:01:49 <openstack> Meeting started Wed Nov 13 09:01:48 2019 UTC and is due to finish in 60 minutes.  The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:01:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:01:53 <openstack> The meeting name has been set to 'magnum'
09:02:04 <flwang1> #topic roll call
09:02:13 <flwang1> o/
09:02:17 <strigazi> o/
09:02:17 <jakeyip> o/
09:02:25 <brtknr> o/
09:02:29 <flwang1> strigazi: long time no see
09:02:38 <flwang1> thank you for joining, guys
09:03:15 <strigazi> last week I was in the summit. I could not join.
09:03:24 <flwang1> before we go through the agenda on https://etherpad.openstack.org/p/magnum-weekly-meeting, anything you guys want to start first?
09:03:41 <flwang1> strigazi: anything you can share from the summit?
09:04:42 <brtknr> How’s the summit?
09:04:53 <strigazi> I'm reading the etherpad, give me a moment
09:06:02 <strigazi> The summit was 1800 attendees
09:07:12 <strigazi> From my experience, an english speaking conference in China is not going to be attractive. I'm sure the conference would be more popular in chinese.
09:07:50 <flwang1> strigazi: i can imagine
09:08:28 <strigazi> I was not able to attend the PTG, there was an issue with my flight and I had to leave earlier. (Strike in germany)
09:09:21 <strigazi> I didn't see many new projects/products in the summit.
09:09:33 <strigazi> s/many/any/
09:09:47 <flwang1> sigh...
09:10:50 <brtknr> Stig, the only person who attended from StackHPC shared similar feelings
09:11:24 <flwang1> strigazi: do you know where will be the next summit?
09:11:35 <flwang1> North America?
09:11:42 <brtknr> I believe its in Vancouver
09:11:45 <strigazi> IMO, the TC should focus on stabilizing the core-projects. No new crazy changes
09:12:02 <strigazi> yes, vancouver
09:12:23 <jakeyip> that'll be the third time in vancouver
09:12:32 <flwang1> strigazi: I'd like to see TC take more responsibility on T instead of others
09:12:53 <flwang1> i'd like to go the next one
09:12:55 <brtknr> What OpenStack needs is fewer projects that work well
09:13:05 <flwang1> i have missed the the other two in vancouver
09:13:22 <jakeyip> long long flight for us flwang1 :)
09:13:32 <strigazi> Keystone, glance, nova, neutron should work. The rest is debatable.
09:13:36 <flwang1> jakeyip: i know, my friend :)
09:13:51 <brtknr> Equally long time to get there from the UK I think
09:14:13 <flwang1> brtknr: which city are you based in UK?
09:14:32 <brtknr> Bristol, west coast... not that UK is very wide to begin with
09:14:39 <flwang1> jakeyip: are you in Sydney or Melbourne?
09:14:54 <jakeyip> Melbourne. The Core team is in Melbourne.
09:15:19 <jakeyip> Nectar Cloud Core Services team, to clear up any confusion with the 'core' work
09:15:21 <jakeyip> word
09:15:32 <flwang1> jakeyip: :)
09:15:39 <flwang1> strigazi: anything else you want to share?
09:15:59 <strigazi> no, there was nothing else
09:16:04 <jakeyip> was there much interest in magnum? :)
09:16:27 <brtknr> that was going to be my question
09:16:55 <strigazi> I don't think any of our contributors was there. Mohamed was there
09:17:10 <strigazi> We didn't have a Project Update.
09:17:57 <strigazi> So I can not tell what was the interest.
09:18:43 <flwang1> fair enough
09:19:15 <strigazi> Finally,
09:19:36 <strigazi> Manila and other teams will have additional on-line TPGs
09:19:46 <strigazi> s/TPG/PTG/
09:20:26 <brtknr> Are we other teams?
09:20:44 <openstackgerrit> Feilong Wang proposed openstack/magnum master: Support TimeoutStartSec for k8s systemd services  https://review.opendev.org/690445
09:20:49 <strigazi> We should be
09:21:21 <brtknr> Do we have a date/time
09:21:26 <strigazi> No
09:21:53 <flwang1> strigazi: brtknr: if you guys all think we should have a dedicated PTG, then we can plan it
09:22:02 <flwang1> before the Xmas holiday
09:22:22 <brtknr> I think it would be useful to have some kind of planning meeting, even if we dont call it a PTG
09:22:25 <strigazi> Let's decide next week? Ricardo is not here. I prefer that he is available before we (cern) can commit to something.
09:22:40 <flwang1> strigazi: works for me
09:22:43 <brtknr> sounds good
09:22:51 <flwang1> when we say PTG, how long the session we need?
09:23:06 <flwang1> given we're a world wide team, the TZ is still a problem for us
09:23:15 <strigazi> Two two-hour sessions?
09:23:29 <strigazi> In different days?
09:23:31 <flwang1> then can we split it into 2 days?
09:23:36 <strigazi> I don't think we need more
09:23:46 <strigazi> yes, exactly.
09:23:50 <flwang1> 4 hours is enough i think
09:23:55 <strigazi> yeap
09:24:08 <flwang1> next Wed and Thu?
09:24:36 <brtknr> Would a meet/hangout be option?
09:24:48 <brtknr> Would  meet/hangout be an option?
09:24:53 <strigazi> I can not say for sure now, I need to talk to others here
09:24:54 <flwang1> brtknr: yep
09:25:00 <brtknr> or would it be iRC only?
09:25:41 <flwang1> or etherpad
09:26:08 <flwang1> all good for me, i prefer to start with meet/hangout to say hi for each other
09:26:16 <flwang1> then we can stay on etherpad
09:26:40 <flwang1> and use the voice call for necessary cases
09:27:39 <strigazi> ok
09:28:19 <flwang1> strigazi: did you see my email about master resize?
09:28:46 <brtknr> okay shall we move to a topic on the agenda from roll call?
09:28:49 <flwang1> the main thing i'd like to do in U release is the master resize and containerized master nodes
09:29:25 <flwang1> sure
09:29:41 <flwang1> #topic stable/stein 8.2.0
09:30:10 <flwang1> brtknr: would you like to give us an updates?
09:31:09 <brtknr> Yes, so we recently noticed that the dns autoscaler is broken in stein as the docker repo has been removed completely
09:31:30 <brtknr> also fa27 as also been removed so CI jobs are failing
09:31:31 <jakeyip> same here...
09:31:45 <brtknr> stein 8.2.0 incorporates these changes
09:32:08 <brtknr> for us at stackhpc, we also need to support multiple NICS on a cluster and I have backported changes from master to enable this
09:33:16 <brtknr> lastly, i'd like to also incorporate changes to support 1.14.7,1.14.8 in stein, possibly also 1.15.x, but havent managed to get to the bottom of why 1.14.7 and 1.14.8 clusters fail successfuly to spawn calico and flannel services in kube-system namespace
09:33:58 <brtknr> does any of it seem controversial?
09:34:07 <flwang1> brtknr: i think that's why strigazi replaced the atomic system container with podman
09:34:32 <flwang1> strigazi: do you know the root cause why the 1.15.x doesn't work on atomic system container?
09:34:35 <brtknr> i think podman was to support 1.16.x
09:35:39 <flwang1> brtknr: without podman, the max version of v1.15.x  working for me is v1.15.3
09:36:04 <flwang1> after cherry-pick the podman patch, v1.15.5 works for me
09:36:23 <flwang1> i'm curious the root cause
09:37:06 <strigazi> I haven't tried, I don't know. we are using 1.15.3 with atomic.
09:37:07 <jakeyip> wondering if it is efficient to spend time figuring out why stein won't work with 1.14.7+? I think users would like to see 1.15 / 1.16 support more
09:38:06 <jakeyip> for us I think we will support at least one good version in stein and figure out how to get to train ASAP
09:38:23 <jakeyip> so many nice new features
09:38:29 <strigazi> 1.14.x should work in atomic.
09:38:40 <strigazi> where x any version
09:38:49 <strigazi> I will try and let you know
09:39:08 <brtknr> strigazi: we have multiple sites where 1.14.7 and 1.14.8 are consistently failing to spawn with upstream stable/stein
09:39:09 <strigazi> #action strigazi to try latest 1.14.x with atomic
09:39:23 <brtknr> as in, calico and flannel pods fail to start
09:39:40 <strigazi> brtknr: what is the failure? why they don't start? what is the error?
09:39:44 <jakeyip> same thing I am seeing (flannel crashing)
09:39:57 <brtknr> but please try with upstream stable/stein, not a modified branch
09:40:09 <brtknr> yes, lots of CrashLoopBacks
09:40:12 <strigazi> yes, but why? it can't read its token?
09:40:22 <strigazi> can you do logs?
09:40:32 <brtknr> it caues everything else to stay in pending state
09:40:48 <brtknr> cant read logs, says IP not assigned
09:40:51 <jakeyip> logs is broken for us I still haven't figured out why? does it work for you?
09:41:06 <strigazi> ssh to node, docker logs
09:41:14 <jakeyip> same here brtknr. (I feel like I'm saying that a lot this meeting)
09:41:37 <strigazi> also k get nodes?
09:41:41 <strigazi> do you see an IP?
09:42:12 <strigazi> if k8s doesn't node ips (i.e. the occm hasn't given one)
09:42:16 <strigazi> logs won't work
09:43:19 <brtknr> occm has a daemonset but doesnt spwan a pod
09:43:28 <brtknr> occm has a daemonset but doesnt spwan the pod
09:43:59 <flwang1> i think it maybe related to the occm
09:44:12 <brtknr> when i do k get nodes, no IP
09:44:33 <strigazi> that is why logs don't work
09:44:38 <jakeyip> I've got a failing cluster on hand, where should I dump the output?
09:44:48 <strigazi> paste.openstack.org
09:44:58 <strigazi> in fedora: fpaste <file>
09:45:05 <brtknr> jakeyip: you can also do | nc seashells.io 1337
09:45:10 <brtknr> much easier :)
09:45:36 <jakeyip> https://seashells.io/v/2MkCnqdw
09:45:52 <jakeyip> brtknr:  exactly what I was looking for :P
09:46:15 <strigazi> brtknr: well not easier than fedora
09:46:27 <brtknr> https://seashells.io/p/2MkCnqdw for plaintext
09:46:28 <strigazi> 5 chars vs 21
09:46:43 <strigazi> 6 chars :)
09:46:45 <brtknr> nc seashells.io 1337 is platform agnistic :P
09:47:12 <strigazi> and not community managed
09:47:31 <jakeyip> ok can we concentrate on the error message please :P
09:47:40 <brtknr> tbh i didnt know about fpaste, good to know...
09:48:03 <strigazi> jakeyip: ssh to master, docker ps | grep flannel
09:48:13 <strigazi> docker logs <flannel container>
09:49:11 <strigazi> brtknr: jakeyip flwang1 before continueing with debugging, anything else for the meeting?
09:49:28 <jakeyip> blank when I run logs
09:49:32 <jakeyip> http://paste.openstack.org/raw/786024/
09:49:45 <brtknr> are we happy with the shopping list for stein-8.2.0
09:49:45 <strigazi> To sync via email for the online planning/PTG
09:49:49 <brtknr> anything else people want to ad
09:50:17 <strigazi> jakeyip: docker ps -a | grep flannel | grep -v pause
09:50:18 <jakeyip> oh I have a minor bug. I upgraded to stein and my public templates were all not visible to the users anymore
09:50:28 <flwang1> strigazi: i'd like to know the master resize work you mentioned before
09:50:54 <jakeyip> turns out the new column in DB 'hidden' had the values set to 'NULL' instead of 1 or 0
09:51:22 <strigazi> flwang1: We did some work on adding/dropping members of the clusters. That's it.
09:51:34 <strigazi> flwang1: We did some work on adding/dropping members from the etcd clusters. That's it.
09:51:36 <jakeyip> I told brtknr it's minor and don't need to bother fixing it. But since there's going to be a new stein version not sure if we should fix this.
09:51:36 <flwang1> jakeyip: it can be fixed by update the existing cluster's 'hidden' field
09:51:55 <flwang1> strigazi: where can i see the code?
09:52:17 <jakeyip> flwang1: yes all fixed but just to bring it up because it's a breaking behaviour
09:53:19 <brtknr> jakeyip: if you can locate the commit, please cherry-pick it to stein-8.1.1... i agree 8.2.0 implies there are new features but its mostly bug fixes
09:53:23 <strigazi> we haven't pushed it. But we need to decide first on VMs vm k8s cluster for master nodes.
09:53:29 <strigazi> flwang1: ^^
09:54:07 <strigazi> flwang1: You have fork that runs the control in k8s, the work I mentioned is irrelevant to that use case.
09:54:15 <strigazi> flwang1: You have fork that runs the control-place in k8s, the work I mentioned is irrelevant to that use case.
09:54:43 <strigazi> flwang1: It makes sense only if the master nodes are in dedicated VMs and run etcd
09:55:09 <flwang1> strigazi: ok
09:55:21 <flwang1> i will think about it again
09:55:25 <jakeyip> http://paste.openstack.org/show/786025/
09:55:25 <flwang1> thanks for sharing that
09:57:48 <strigazi> jakeyip: so flannel, calico etc, can't read the token to talk to the k8s api.
09:58:05 <strigazi> jakeyip: I couldn't make it work without podman.
09:58:13 <flwang1> https://stackoverflow.com/questions/46178684/flannel-fails-in-kubernetes-cluster-due-to-failure-of-subnet-manager
09:58:18 <strigazi> jakeyip: but this was for 1.16.x
09:58:47 <brtknr> looks like they have backported the same changes to 1.14.7 and 1.14.8
09:58:57 <flwang1> it sounds like we need another mount for the kubelet atomic system container
09:59:45 <strigazi> we have /var/lib/kubelet already.
09:59:58 <flwang1> :(
10:00:09 <strigazi> looks like they have backported the same changes to 1.14.7 and 1.14.8 What this means??
10:00:21 <strigazi> which changes?
10:00:52 <flwang1> i think brtknr doesn't know the changes, he just guess there are some changes :)
10:01:01 <strigazi> oh, ok :)
10:01:01 <brtknr> yes, its a guess
10:01:06 <flwang1> should we call this meeting done ?
10:01:12 <strigazi> +
10:01:14 <flwang1> i'm going to leave
10:01:15 <strigazi> +1
10:01:21 <brtknr> goodnight!
10:01:23 <flwang1> thank you guys
10:01:27 <flwang1> #endmeeting