09:01:48 #startmeeting Magnum 09:01:49 Meeting started Wed Nov 13 09:01:48 2019 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:53 The meeting name has been set to 'magnum' 09:02:04 #topic roll call 09:02:13 o/ 09:02:17 o/ 09:02:17 o/ 09:02:25 o/ 09:02:29 strigazi: long time no see 09:02:38 thank you for joining, guys 09:03:15 last week I was in the summit. I could not join. 09:03:24 before we go through the agenda on https://etherpad.openstack.org/p/magnum-weekly-meeting, anything you guys want to start first? 09:03:41 strigazi: anything you can share from the summit? 09:04:42 How’s the summit? 09:04:53 I'm reading the etherpad, give me a moment 09:06:02 The summit was 1800 attendees 09:07:12 From my experience, an english speaking conference in China is not going to be attractive. I'm sure the conference would be more popular in chinese. 09:07:50 strigazi: i can imagine 09:08:28 I was not able to attend the PTG, there was an issue with my flight and I had to leave earlier. (Strike in germany) 09:09:21 I didn't see many new projects/products in the summit. 09:09:33 s/many/any/ 09:09:47 sigh... 09:10:50 Stig, the only person who attended from StackHPC shared similar feelings 09:11:24 strigazi: do you know where will be the next summit? 09:11:35 North America? 09:11:42 I believe its in Vancouver 09:11:45 IMO, the TC should focus on stabilizing the core-projects. No new crazy changes 09:12:02 yes, vancouver 09:12:23 that'll be the third time in vancouver 09:12:32 strigazi: I'd like to see TC take more responsibility on T instead of others 09:12:53 i'd like to go the next one 09:12:55 What OpenStack needs is fewer projects that work well 09:13:05 i have missed the the other two in vancouver 09:13:22 long long flight for us flwang1 :) 09:13:32 Keystone, glance, nova, neutron should work. The rest is debatable. 09:13:36 jakeyip: i know, my friend :) 09:13:51 Equally long time to get there from the UK I think 09:14:13 brtknr: which city are you based in UK? 09:14:32 Bristol, west coast... not that UK is very wide to begin with 09:14:39 jakeyip: are you in Sydney or Melbourne? 09:14:54 Melbourne. The Core team is in Melbourne. 09:15:19 Nectar Cloud Core Services team, to clear up any confusion with the 'core' work 09:15:21 word 09:15:32 jakeyip: :) 09:15:39 strigazi: anything else you want to share? 09:15:59 no, there was nothing else 09:16:04 was there much interest in magnum? :) 09:16:27 that was going to be my question 09:16:55 I don't think any of our contributors was there. Mohamed was there 09:17:10 We didn't have a Project Update. 09:17:57 So I can not tell what was the interest. 09:18:43 fair enough 09:19:15 Finally, 09:19:36 Manila and other teams will have additional on-line TPGs 09:19:46 s/TPG/PTG/ 09:20:26 Are we other teams? 09:20:44 Feilong Wang proposed openstack/magnum master: Support TimeoutStartSec for k8s systemd services https://review.opendev.org/690445 09:20:49 We should be 09:21:21 Do we have a date/time 09:21:26 No 09:21:53 strigazi: brtknr: if you guys all think we should have a dedicated PTG, then we can plan it 09:22:02 before the Xmas holiday 09:22:22 I think it would be useful to have some kind of planning meeting, even if we dont call it a PTG 09:22:25 Let's decide next week? Ricardo is not here. I prefer that he is available before we (cern) can commit to something. 09:22:40 strigazi: works for me 09:22:43 sounds good 09:22:51 when we say PTG, how long the session we need? 09:23:06 given we're a world wide team, the TZ is still a problem for us 09:23:15 Two two-hour sessions? 09:23:29 In different days? 09:23:31 then can we split it into 2 days? 09:23:36 I don't think we need more 09:23:46 yes, exactly. 09:23:50 4 hours is enough i think 09:23:55 yeap 09:24:08 next Wed and Thu? 09:24:36 Would a meet/hangout be option? 09:24:48 Would meet/hangout be an option? 09:24:53 I can not say for sure now, I need to talk to others here 09:24:54 brtknr: yep 09:25:00 or would it be iRC only? 09:25:41 or etherpad 09:26:08 all good for me, i prefer to start with meet/hangout to say hi for each other 09:26:16 then we can stay on etherpad 09:26:40 and use the voice call for necessary cases 09:27:39 ok 09:28:19 strigazi: did you see my email about master resize? 09:28:46 okay shall we move to a topic on the agenda from roll call? 09:28:49 the main thing i'd like to do in U release is the master resize and containerized master nodes 09:29:25 sure 09:29:41 #topic stable/stein 8.2.0 09:30:10 brtknr: would you like to give us an updates? 09:31:09 Yes, so we recently noticed that the dns autoscaler is broken in stein as the docker repo has been removed completely 09:31:30 also fa27 as also been removed so CI jobs are failing 09:31:31 same here... 09:31:45 stein 8.2.0 incorporates these changes 09:32:08 for us at stackhpc, we also need to support multiple NICS on a cluster and I have backported changes from master to enable this 09:33:16 lastly, i'd like to also incorporate changes to support 1.14.7,1.14.8 in stein, possibly also 1.15.x, but havent managed to get to the bottom of why 1.14.7 and 1.14.8 clusters fail successfuly to spawn calico and flannel services in kube-system namespace 09:33:58 does any of it seem controversial? 09:34:07 brtknr: i think that's why strigazi replaced the atomic system container with podman 09:34:32 strigazi: do you know the root cause why the 1.15.x doesn't work on atomic system container? 09:34:35 i think podman was to support 1.16.x 09:35:39 brtknr: without podman, the max version of v1.15.x working for me is v1.15.3 09:36:04 after cherry-pick the podman patch, v1.15.5 works for me 09:36:23 i'm curious the root cause 09:37:06 I haven't tried, I don't know. we are using 1.15.3 with atomic. 09:37:07 wondering if it is efficient to spend time figuring out why stein won't work with 1.14.7+? I think users would like to see 1.15 / 1.16 support more 09:38:06 for us I think we will support at least one good version in stein and figure out how to get to train ASAP 09:38:23 so many nice new features 09:38:29 1.14.x should work in atomic. 09:38:40 where x any version 09:38:49 I will try and let you know 09:39:08 strigazi: we have multiple sites where 1.14.7 and 1.14.8 are consistently failing to spawn with upstream stable/stein 09:39:09 #action strigazi to try latest 1.14.x with atomic 09:39:23 as in, calico and flannel pods fail to start 09:39:40 brtknr: what is the failure? why they don't start? what is the error? 09:39:44 same thing I am seeing (flannel crashing) 09:39:57 but please try with upstream stable/stein, not a modified branch 09:40:09 yes, lots of CrashLoopBacks 09:40:12 yes, but why? it can't read its token? 09:40:22 can you do logs? 09:40:32 it caues everything else to stay in pending state 09:40:48 cant read logs, says IP not assigned 09:40:51 logs is broken for us I still haven't figured out why? does it work for you? 09:41:06 ssh to node, docker logs 09:41:14 same here brtknr. (I feel like I'm saying that a lot this meeting) 09:41:37 also k get nodes? 09:41:41 do you see an IP? 09:42:12 if k8s doesn't node ips (i.e. the occm hasn't given one) 09:42:16 logs won't work 09:43:19 occm has a daemonset but doesnt spwan a pod 09:43:28 occm has a daemonset but doesnt spwan the pod 09:43:59 i think it maybe related to the occm 09:44:12 when i do k get nodes, no IP 09:44:33 that is why logs don't work 09:44:38 I've got a failing cluster on hand, where should I dump the output? 09:44:48 paste.openstack.org 09:44:58 in fedora: fpaste 09:45:05 jakeyip: you can also do | nc seashells.io 1337 09:45:10 much easier :) 09:45:36 https://seashells.io/v/2MkCnqdw 09:45:52 brtknr: exactly what I was looking for :P 09:46:15 brtknr: well not easier than fedora 09:46:27 https://seashells.io/p/2MkCnqdw for plaintext 09:46:28 5 chars vs 21 09:46:43 6 chars :) 09:46:45 nc seashells.io 1337 is platform agnistic :P 09:47:12 and not community managed 09:47:31 ok can we concentrate on the error message please :P 09:47:40 tbh i didnt know about fpaste, good to know... 09:48:03 jakeyip: ssh to master, docker ps | grep flannel 09:48:13 docker logs 09:49:11 brtknr: jakeyip flwang1 before continueing with debugging, anything else for the meeting? 09:49:28 blank when I run logs 09:49:32 http://paste.openstack.org/raw/786024/ 09:49:45 are we happy with the shopping list for stein-8.2.0 09:49:45 To sync via email for the online planning/PTG 09:49:49 anything else people want to ad 09:50:17 jakeyip: docker ps -a | grep flannel | grep -v pause 09:50:18 oh I have a minor bug. I upgraded to stein and my public templates were all not visible to the users anymore 09:50:28 strigazi: i'd like to know the master resize work you mentioned before 09:50:54 turns out the new column in DB 'hidden' had the values set to 'NULL' instead of 1 or 0 09:51:22 flwang1: We did some work on adding/dropping members of the clusters. That's it. 09:51:34 flwang1: We did some work on adding/dropping members from the etcd clusters. That's it. 09:51:36 I told brtknr it's minor and don't need to bother fixing it. But since there's going to be a new stein version not sure if we should fix this. 09:51:36 jakeyip: it can be fixed by update the existing cluster's 'hidden' field 09:51:55 strigazi: where can i see the code? 09:52:17 flwang1: yes all fixed but just to bring it up because it's a breaking behaviour 09:53:19 jakeyip: if you can locate the commit, please cherry-pick it to stein-8.1.1... i agree 8.2.0 implies there are new features but its mostly bug fixes 09:53:23 we haven't pushed it. But we need to decide first on VMs vm k8s cluster for master nodes. 09:53:29 flwang1: ^^ 09:54:07 flwang1: You have fork that runs the control in k8s, the work I mentioned is irrelevant to that use case. 09:54:15 flwang1: You have fork that runs the control-place in k8s, the work I mentioned is irrelevant to that use case. 09:54:43 flwang1: It makes sense only if the master nodes are in dedicated VMs and run etcd 09:55:09 strigazi: ok 09:55:21 i will think about it again 09:55:25 http://paste.openstack.org/show/786025/ 09:55:25 thanks for sharing that 09:57:48 jakeyip: so flannel, calico etc, can't read the token to talk to the k8s api. 09:58:05 jakeyip: I couldn't make it work without podman. 09:58:13 https://stackoverflow.com/questions/46178684/flannel-fails-in-kubernetes-cluster-due-to-failure-of-subnet-manager 09:58:18 jakeyip: but this was for 1.16.x 09:58:47 looks like they have backported the same changes to 1.14.7 and 1.14.8 09:58:57 it sounds like we need another mount for the kubelet atomic system container 09:59:45 we have /var/lib/kubelet already. 09:59:58 :( 10:00:09 looks like they have backported the same changes to 1.14.7 and 1.14.8 What this means?? 10:00:21 which changes? 10:00:52 i think brtknr doesn't know the changes, he just guess there are some changes :) 10:01:01 oh, ok :) 10:01:01 yes, its a guess 10:01:06 should we call this meeting done ? 10:01:12 + 10:01:14 i'm going to leave 10:01:15 +1 10:01:21 goodnight! 10:01:23 thank you guys 10:01:27 #endmeeting