09:01:00 #startmeeting magnum 09:01:01 Meeting started Wed Apr 15 09:01:00 2020 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:04 The meeting name has been set to 'magnum' 09:01:12 o/ 09:01:12 #topic roll call 09:01:18 o/ 09:01:22 o/ 09:01:35 anyone else? 09:01:42 cosmicsound: ttsiouts: ? 09:01:42 o/ 09:01:52 o/ 09:02:08 here is the agenda for today https://etherpad.opendev.org/p/magnum-weekly-meeting 09:02:31 #topic labels override 09:02:42 ttsiouts: any updates? 09:03:42 brtknr: ^ 09:03:42 flwang1: I tried to create a PoC without DB migrations 09:03:53 I can push it today 09:04:21 strigazi: are you ware of the PoC's design? 09:04:29 aware 09:04:51 the idea is that there is going to be a new boolean flag when creating a cluster or nodegroup 09:04:56 ttsiouts: without DB migration? how so? 09:04:59 yes, but let's see it gerrit. It is more or less based on brtknr idea 09:05:23 brtknr: flwang1: isn't it better to discuss it in gerrit? 09:05:23 brtknr: let's discuss it in gerrit 09:05:41 gerrit++ 09:05:41 i'll push asap 09:05:47 i think we are on track, right ttsiouts ? 09:06:00 ttsiouts: will you update the spec? 09:06:09 strigazi: yes 09:06:16 ok, let's review it on gerrit 09:06:30 anything else we need to discuss about this topic? 09:06:49 i cant wait to try it :) 09:06:57 brtknr: sure but I wanted to see if it's possible to have something without DB migrations 09:07:16 ttsiouts: does adding a new field != migration? 09:07:28 no new field in the DB 09:07:37 o/ 09:07:45 let's review it on gerrit, team 09:07:46 #topic Private clusters got stuck at CREATE_INPROGRESS 09:07:50 brtknr: ^ 09:07:53 ok 09:08:18 cosmicsound: found this corner case 09:08:28 looks like the regression was introduced in train 09:08:33 https://review.opendev.org/#/c/720022/ is the fix 09:09:34 brtknr: how to reproduce it? 09:10:06 it is because the nodegroups change by ttsiouts assumed floating_ip_enabled was a cluster template property whereas by the time we merged the nodegroups changes, it was possible to set floating_ip_enabled in the cluster scope 09:10:27 to reproduce, create CT with floating IP enabled on then create cluster with floating ip disabled 09:10:43 ok, i see 09:10:58 to reproduce, create CT with --floating-ip-enabled flag on then create cluster with --floating-ip-disabled 09:11:12 i will test it tomorrow 09:11:34 flwang1: you may not have noticed this in your prod if your cluster templates have --floating-ip-disabled by default 09:11:58 yep, that's right 09:12:07 anything else about this? 09:12:14 thats all 09:12:17 #topic Fedora CoreOS docker storage https://review.opendev.org/718296 09:13:12 I missed to add the docker storage for worker node in my previous patch, and this patch removes the hardcode for vdb 09:13:23 brtknr: strigazi: pls help review it 09:13:34 let me know if you help any question about this 09:14:06 flwang1: you also forgot to mention that it uses /etc/fstab instead of systemd unit file + ext4->xfs default in the commit message :) 09:14:50 ok, will do, can you please leave your comments on that patch? 09:15:29 flwang1: already have 09:15:51 brtknr: great 09:15:56 #topic stable/train 09:16:09 now our train gate is broken 09:16:38 and we're hoping it can be fixed by a heat patch, but that patch is also broken now :( 09:17:22 flwang1: http://zuul.openstack.org/status/change/718522,3 looks promising though 09:17:22 I tried to totally remove the py2 functional test, but seems the infra team believe there is a better way 09:17:36 it is running right now 09:17:49 hopefully 09:17:52 let's see 09:18:13 if we can get the strain gate back to normal, we do have several patches to backport 09:18:33 that said, we may need a 9.3.1 or 9.4.0 09:19:12 I vote for 9.3.1 09:19:27 Because SEMVER :) 09:19:29 it's all yours ;) 09:19:32 whatever you prefer team 09:19:56 #topic Expose traefik metrics - https://review.opendev.org/#/c/697044/ 09:20:16 dioguerra: ^ 09:20:19 note: 9.3.0 works as long as the user is using the latest stable fcos image 09:20:20 what's that for? 09:20:43 because zincati does not have a reason to rebooot the machine 09:20:46 this was actually a split do divide traefik and autoscaler expose metrics 09:20:55 i am here just in case 09:20:59 to be used with this patch: 09:21:01 https://review.opendev.org/#/c/715142/ 09:21:17 I think im going to pool everything together 09:21:47 dioguerra: ok, got it 09:21:50 Meanwhile, i would like you to give a quick grance to the ticket above 09:21:52 thanks for working on that 09:22:14 before i fnish adding stuff, to see if you want to add or remove something 09:22:32 dioguerra: i like the helm client tag 09:22:45 dioguerra: are you going to work on the helm v3? 09:22:55 what do we discuss now? helm_client or traefik or helm3? 09:23:33 It all depends on what strigazi says i guess 09:23:41 I am confused too 09:23:47 strigazi: i don't think there are much to discuss about the helm_client tag and the trafik patch 09:24:05 1. +1 to helm_client_tag 09:24:23 sorry, i copied wrong link: this https://review.opendev.org/#/c/720027/ 09:24:33 so i just think aloud, sorry if it confused you guys 09:25:13 2. +1 to https://review.opendev.org/#/c/697044/5/magnum/drivers/common/templates/kubernetes/fragments/enable-ingress-traefik.sh 09:26:32 anything else about this topic? 09:27:15 dtomasgu: ^^ 09:27:18 if you think helm3 is not related to this, we can discuss it in the future 09:27:40 helm and label for client is not really related to this 09:28:04 flwang1: helm3 is another important topic, we can discuss it when we finish the agenda 09:28:13 strigazi: sure 09:28:20 dtomasgu: are you covered for https://review.opendev.org/#/c/697044 ? 09:28:32 I haave added helm3 to the end of the agendsa 09:28:49 #topic Build cluster autoscaler containers: https://review.opendev.org/#/c/714986/ 09:29:07 can anybody help me understand why we need a new version of auotscaler? 09:30:02 flwang1: these are just the new releases, no? 09:30:15 flwang1: I don't think we "need" them 09:30:23 flwang1: the current version we are running is 1.15.x 09:30:26 I think brtknr just wants to build them 09:30:40 And anyone can use them, right? 09:30:54 i assumed the autoscaler versions somehow reflect k8s version 09:31:57 They do, but for magnum their is no difference, 09:32:09 it is possible nothing new has been added for magnum driver 09:32:32 nothing new so far, Thomas from our team is working on NGs. 09:32:37 i know thomas is doing some work for nodegroup 09:32:46 but i don't think the work has been merged yet 09:33:01 but that's alright, i just wanna undestand why we need the new versions 09:33:02 Let's move one? This patch is only for adding new builds 09:33:20 #topic Make helm jobs retry internally: https://review.opendev.org/#/c/718794/ 09:33:21 yes 09:33:53 strigazi: do we need a discussion for the above one? 09:34:02 as long as there is a timeout, i'm ok with that 09:34:17 +1 ^^ 09:34:20 flwang1: yes, retry 60 times 09:34:33 == timeout after 5 mins 09:34:34 do you mean 60s? 09:34:43 for each job? 09:34:47 yes 09:34:54 fair enough 09:35:03 let's move on 09:35:15 #topic tempest/automation test 09:35:36 at catalyst cloud, we're trying to upgrade to stable/train 09:36:02 and i'm realizing that the testing is getting more and more hard because we don't have a reasonable functional test suite 09:36:19 strigazi: brtknr: how did you guys do automation test in your org? 09:36:49 flwang1: For the upgrade we didn't have any automated testing. We did it by hand. 09:37:16 flwang1: my colleagues use refstack for the core openstack services 09:37:18 flwang1: we have rally running periocally and we started adding more tests to it. 09:37:21 internally, we have some very simple shell script which can cover dns, loadbalancer, pvc, networkpolicy, ingress, etc 09:37:27 run manually 09:37:48 for magnum, its mostly manual testing 09:37:50 strigazi: brtknr: i'm talking about the magnum scope 09:38:11 flwang1: I was refering to magnum 09:38:25 i'm planning to put the shell script and some yamls into a docker container 09:38:46 We have a list of functionalities we need to test and we do it by hand. 09:38:47 so that we can run it after the tempest create the cluster, how you guys think? 09:39:15 strigazi: but, by hand is so complex 09:39:18 flwang1: which one of our CI jobs creates a cluster ATM? 09:39:50 brtknr: the openstack infra don't have the nested virt, otherwise, sure, we can 09:39:53 i thought the functional-k8s test was currently disabled 09:40:26 brtknr: even if it's working, it's far more away from an useful functional test 09:40:46 because, currently, we have so many new features 09:41:16 flwang1: they only reasonable thing would be conformance 09:41:27 conformance test is not enough 09:41:51 flwang1: It should be. There are 4000 tests, we run only ~200 of them 09:41:56 it doesn't cover many things like lb, pvc, keystone auth, rolling upgrade, etc 09:42:23 lb, pvc should be there 09:42:35 Is theere any chance we could enable nested virt? 09:42:54 there is nested virt enabled 09:43:06 dioguerra: mnaser said vexxhost has already enabled that, and we can use it 09:43:20 it worked before, but it's broken now 09:43:31 there was kernel incompatibility issue with ubuntu 09:43:44 this was 12months ago 09:43:52 strigazi: do you know the latest status? 09:44:12 flwang1: no, and I abandoned the idea of the openstack CI 09:44:31 what do you mean abandon the idea of openstack ci? 09:44:40 flwang1: that it will ever work 09:44:49 never or ever? 09:45:01 flwang1: maybe it will work for a couple of weeks 09:45:17 flwang1: for local testing, i have a terraform script which creates a cluster with master lb with bunch of labels enabled which covers most use cases 09:46:04 I have a meeting in 14' mins sharp. Is there anything else for the meeting? 09:46:07 brtknr: we should have one we all can contribute 09:46:23 ok, we can discuss this later 09:46:30 #topic helm3 09:46:37 ok maybe worth contacting mnaser again 09:46:49 strigazi: any comments about helm3? 09:46:57 or guilhermesp who works on magnum and works at vexxhost 09:47:18 Does someone work on helm3? We need an assignee 09:47:51 I can give it a go after the helm_client_tag is merged 09:47:59 cool 09:49:02 Has anyone tried installing the charts we have via helm3? 09:49:22 it would be nice to not drop tiller and keep compatibility of the scripts. 09:49:34 brtknr: everything works with helm3 09:49:49 afaik, helm2 and helm3 can coexist on the same cluster 09:49:57 they can 09:50:48 strigazi: before you leaving, can you revisit https://review.opendev.org/710384 ? 09:51:54 brtknr: strigazi: dioguerra: anything else? 09:52:00 looks ok, I need to test 09:52:30 strigazi: pls do, it would be appreciated 09:53:46 i'm going to end the meeting now 09:53:54 thanks flwang1 09:53:56 anything else we need to discuss? 09:54:04 strigazi: brtknr: thank you team 09:54:07 #endmeeting