09:01:00 <flwang1> #startmeeting magnum 09:01:01 <openstack> Meeting started Wed Apr 15 09:01:00 2020 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:04 <openstack> The meeting name has been set to 'magnum' 09:01:12 <brtknr> o/ 09:01:12 <flwang1> #topic roll call 09:01:18 <strigazi> o/ 09:01:22 <flwang1> o/ 09:01:35 <flwang1> anyone else? 09:01:42 <flwang1> cosmicsound: ttsiouts: ? 09:01:42 <brtknr> o/ 09:01:52 <ttsiouts> o/ 09:02:08 <flwang1> here is the agenda for today https://etherpad.opendev.org/p/magnum-weekly-meeting 09:02:31 <flwang1> #topic labels override 09:02:42 <flwang1> ttsiouts: any updates? 09:03:42 <flwang1> brtknr: ^ 09:03:42 <ttsiouts> flwang1: I tried to create a PoC without DB migrations 09:03:53 <ttsiouts> I can push it today 09:04:21 <flwang1> strigazi: are you ware of the PoC's design? 09:04:29 <flwang1> aware 09:04:51 <ttsiouts> the idea is that there is going to be a new boolean flag when creating a cluster or nodegroup 09:04:56 <brtknr> ttsiouts: without DB migration? how so? 09:04:59 <strigazi> yes, but let's see it gerrit. It is more or less based on brtknr idea 09:05:23 <strigazi> brtknr: flwang1: isn't it better to discuss it in gerrit? 09:05:23 <ttsiouts> brtknr: let's discuss it in gerrit 09:05:41 <brtknr> gerrit++ 09:05:41 <ttsiouts> i'll push asap 09:05:47 <strigazi> i think we are on track, right ttsiouts ? 09:06:00 <brtknr> ttsiouts: will you update the spec? 09:06:09 <ttsiouts> strigazi: yes 09:06:16 <flwang1> ok, let's review it on gerrit 09:06:30 <flwang1> anything else we need to discuss about this topic? 09:06:49 <brtknr> i cant wait to try it :) 09:06:57 <ttsiouts> brtknr: sure but I wanted to see if it's possible to have something without DB migrations 09:07:16 <brtknr> ttsiouts: does adding a new field != migration? 09:07:28 <ttsiouts> no new field in the DB 09:07:37 <dioguerra> o/ 09:07:45 <flwang1> let's review it on gerrit, team 09:07:46 <flwang1> #topic Private clusters got stuck at CREATE_INPROGRESS 09:07:50 <flwang1> brtknr: ^ 09:07:53 <brtknr> ok 09:08:18 <brtknr> cosmicsound: found this corner case 09:08:28 <brtknr> looks like the regression was introduced in train 09:08:33 <brtknr> https://review.opendev.org/#/c/720022/ is the fix 09:09:34 <flwang1> brtknr: how to reproduce it? 09:10:06 <brtknr> it is because the nodegroups change by ttsiouts assumed floating_ip_enabled was a cluster template property whereas by the time we merged the nodegroups changes, it was possible to set floating_ip_enabled in the cluster scope 09:10:27 <brtknr> to reproduce, create CT with floating IP enabled on then create cluster with floating ip disabled 09:10:43 <flwang1> ok, i see 09:10:58 <brtknr> to reproduce, create CT with --floating-ip-enabled flag on then create cluster with --floating-ip-disabled 09:11:12 <flwang1> i will test it tomorrow 09:11:34 <brtknr> flwang1: you may not have noticed this in your prod if your cluster templates have --floating-ip-disabled by default 09:11:58 <flwang1> yep, that's right 09:12:07 <flwang1> anything else about this? 09:12:14 <brtknr> thats all 09:12:17 <flwang1> #topic Fedora CoreOS docker storage https://review.opendev.org/718296 09:13:12 <flwang1> I missed to add the docker storage for worker node in my previous patch, and this patch removes the hardcode for vdb 09:13:23 <flwang1> brtknr: strigazi: pls help review it 09:13:34 <flwang1> let me know if you help any question about this 09:14:06 <brtknr> flwang1: you also forgot to mention that it uses /etc/fstab instead of systemd unit file + ext4->xfs default in the commit message :) 09:14:50 <flwang1> ok, will do, can you please leave your comments on that patch? 09:15:29 <brtknr> flwang1: already have 09:15:51 <flwang1> brtknr: great 09:15:56 <flwang1> #topic stable/train 09:16:09 <flwang1> now our train gate is broken 09:16:38 <flwang1> and we're hoping it can be fixed by a heat patch, but that patch is also broken now :( 09:17:22 <brtknr> flwang1: http://zuul.openstack.org/status/change/718522,3 looks promising though 09:17:22 <flwang1> I tried to totally remove the py2 functional test, but seems the infra team believe there is a better way 09:17:36 <brtknr> it is running right now 09:17:49 <flwang1> hopefully 09:17:52 <flwang1> let's see 09:18:13 <flwang1> if we can get the strain gate back to normal, we do have several patches to backport 09:18:33 <flwang1> that said, we may need a 9.3.1 or 9.4.0 09:19:12 <brtknr> I vote for 9.3.1 09:19:27 <brtknr> Because SEMVER :) 09:19:29 <flwang1> it's all yours ;) 09:19:32 <strigazi> whatever you prefer team 09:19:56 <flwang1> #topic Expose traefik metrics - https://review.opendev.org/#/c/697044/ 09:20:16 <flwang1> dioguerra: ^ 09:20:19 <brtknr> note: 9.3.0 works as long as the user is using the latest stable fcos image 09:20:20 <flwang1> what's that for? 09:20:43 <brtknr> because zincati does not have a reason to rebooot the machine 09:20:46 <dioguerra> this was actually a split do divide traefik and autoscaler expose metrics 09:20:55 <cosmicsound> i am here just in case 09:20:59 <dioguerra> to be used with this patch: 09:21:01 <dioguerra> https://review.opendev.org/#/c/715142/ 09:21:17 <dioguerra> I think im going to pool everything together 09:21:47 <flwang1> dioguerra: ok, got it 09:21:50 <dioguerra> Meanwhile, i would like you to give a quick grance to the ticket above 09:21:52 <flwang1> thanks for working on that 09:22:14 <dioguerra> before i fnish adding stuff, to see if you want to add or remove something 09:22:32 <flwang1> dioguerra: i like the helm client tag 09:22:45 <flwang1> dioguerra: are you going to work on the helm v3? 09:22:55 <strigazi> what do we discuss now? helm_client or traefik or helm3? 09:23:33 <dioguerra> It all depends on what strigazi says i guess 09:23:41 <brtknr> I am confused too 09:23:47 <flwang1> strigazi: i don't think there are much to discuss about the helm_client tag and the trafik patch 09:24:05 <strigazi> 1. +1 to helm_client_tag 09:24:23 <dioguerra> sorry, i copied wrong link: this https://review.opendev.org/#/c/720027/ 09:24:33 <flwang1> so i just think aloud, sorry if it confused you guys 09:25:13 <strigazi> 2. +1 to https://review.opendev.org/#/c/697044/5/magnum/drivers/common/templates/kubernetes/fragments/enable-ingress-traefik.sh 09:26:32 <flwang1> anything else about this topic? 09:27:15 <strigazi> dtomasgu: ^^ 09:27:18 <flwang1> if you think helm3 is not related to this, we can discuss it in the future 09:27:40 <dioguerra> helm and label for client is not really related to this 09:28:04 <strigazi> flwang1: helm3 is another important topic, we can discuss it when we finish the agenda 09:28:13 <flwang1> strigazi: sure 09:28:20 <strigazi> dtomasgu: are you covered for https://review.opendev.org/#/c/697044 ? 09:28:32 <brtknr> I haave added helm3 to the end of the agendsa 09:28:49 <flwang1> #topic Build cluster autoscaler containers: https://review.opendev.org/#/c/714986/ 09:29:07 <flwang1> can anybody help me understand why we need a new version of auotscaler? 09:30:02 <strigazi> flwang1: these are just the new releases, no? 09:30:15 <strigazi> flwang1: I don't think we "need" them 09:30:23 <brtknr> flwang1: the current version we are running is 1.15.x 09:30:26 <strigazi> I think brtknr just wants to build them 09:30:40 <strigazi> And anyone can use them, right? 09:30:54 <brtknr> i assumed the autoscaler versions somehow reflect k8s version 09:31:57 <strigazi> They do, but for magnum their is no difference, 09:32:09 <brtknr> it is possible nothing new has been added for magnum driver 09:32:32 <strigazi> nothing new so far, Thomas from our team is working on NGs. 09:32:37 <flwang1> i know thomas is doing some work for nodegroup 09:32:46 <flwang1> but i don't think the work has been merged yet 09:33:01 <flwang1> but that's alright, i just wanna undestand why we need the new versions 09:33:02 <strigazi> Let's move one? This patch is only for adding new builds 09:33:20 <flwang1> #topic Make helm jobs retry internally: https://review.opendev.org/#/c/718794/ 09:33:21 <brtknr> yes 09:33:53 <flwang1> strigazi: do we need a discussion for the above one? 09:34:02 <flwang1> as long as there is a timeout, i'm ok with that 09:34:17 <strigazi> +1 ^^ 09:34:20 <brtknr> flwang1: yes, retry 60 times 09:34:33 <brtknr> == timeout after 5 mins 09:34:34 <flwang1> do you mean 60s? 09:34:43 <flwang1> for each job? 09:34:47 <brtknr> yes 09:34:54 <flwang1> fair enough 09:35:03 <flwang1> let's move on 09:35:15 <flwang1> #topic tempest/automation test 09:35:36 <flwang1> at catalyst cloud, we're trying to upgrade to stable/train 09:36:02 <flwang1> and i'm realizing that the testing is getting more and more hard because we don't have a reasonable functional test suite 09:36:19 <flwang1> strigazi: brtknr: how did you guys do automation test in your org? 09:36:49 <strigazi> flwang1: For the upgrade we didn't have any automated testing. We did it by hand. 09:37:16 <brtknr> flwang1: my colleagues use refstack for the core openstack services 09:37:18 <strigazi> flwang1: we have rally running periocally and we started adding more tests to it. 09:37:21 <flwang1> internally, we have some very simple shell script which can cover dns, loadbalancer, pvc, networkpolicy, ingress, etc 09:37:27 <flwang1> run manually 09:37:48 <brtknr> for magnum, its mostly manual testing 09:37:50 <flwang1> strigazi: brtknr: i'm talking about the magnum scope 09:38:11 <strigazi> flwang1: I was refering to magnum 09:38:25 <flwang1> i'm planning to put the shell script and some yamls into a docker container 09:38:46 <strigazi> We have a list of functionalities we need to test and we do it by hand. 09:38:47 <flwang1> so that we can run it after the tempest create the cluster, how you guys think? 09:39:15 <flwang1> strigazi: but, by hand is so complex 09:39:18 <brtknr> flwang1: which one of our CI jobs creates a cluster ATM? 09:39:50 <flwang1> brtknr: the openstack infra don't have the nested virt, otherwise, sure, we can 09:39:53 <brtknr> i thought the functional-k8s test was currently disabled 09:40:26 <flwang1> brtknr: even if it's working, it's far more away from an useful functional test 09:40:46 <flwang1> because, currently, we have so many new features 09:41:16 <strigazi> flwang1: they only reasonable thing would be conformance 09:41:27 <flwang1> conformance test is not enough 09:41:51 <strigazi> flwang1: It should be. There are 4000 tests, we run only ~200 of them 09:41:56 <flwang1> it doesn't cover many things like lb, pvc, keystone auth, rolling upgrade, etc 09:42:23 <strigazi> lb, pvc should be there 09:42:35 <dioguerra> Is theere any chance we could enable nested virt? 09:42:54 <strigazi> there is nested virt enabled 09:43:06 <flwang1> dioguerra: mnaser said vexxhost has already enabled that, and we can use it 09:43:20 <flwang1> it worked before, but it's broken now 09:43:31 <strigazi> there was kernel incompatibility issue with ubuntu 09:43:44 <strigazi> this was 12months ago 09:43:52 <flwang1> strigazi: do you know the latest status? 09:44:12 <strigazi> flwang1: no, and I abandoned the idea of the openstack CI 09:44:31 <flwang1> what do you mean abandon the idea of openstack ci? 09:44:40 <strigazi> flwang1: that it will ever work 09:44:49 <flwang1> never or ever? 09:45:01 <strigazi> flwang1: maybe it will work for a couple of weeks 09:45:17 <brtknr> flwang1: for local testing, i have a terraform script which creates a cluster with master lb with bunch of labels enabled which covers most use cases 09:46:04 <strigazi> I have a meeting in 14' mins sharp. Is there anything else for the meeting? 09:46:07 <flwang1> brtknr: we should have one we all can contribute 09:46:23 <flwang1> ok, we can discuss this later 09:46:30 <flwang1> #topic helm3 09:46:37 <brtknr> ok maybe worth contacting mnaser again 09:46:49 <flwang1> strigazi: any comments about helm3? 09:46:57 <brtknr> or guilhermesp who works on magnum and works at vexxhost 09:47:18 <strigazi> Does someone work on helm3? We need an assignee 09:47:51 <brtknr> I can give it a go after the helm_client_tag is merged 09:47:59 <strigazi> cool 09:49:02 <brtknr> Has anyone tried installing the charts we have via helm3? 09:49:22 <strigazi> it would be nice to not drop tiller and keep compatibility of the scripts. 09:49:34 <strigazi> brtknr: everything works with helm3 09:49:49 <brtknr> afaik, helm2 and helm3 can coexist on the same cluster 09:49:57 <strigazi> they can 09:50:48 <flwang1> strigazi: before you leaving, can you revisit https://review.opendev.org/710384 ? 09:51:54 <flwang1> brtknr: strigazi: dioguerra: anything else? 09:52:00 <strigazi> looks ok, I need to test 09:52:30 <flwang1> strigazi: pls do, it would be appreciated 09:53:46 <flwang1> i'm going to end the meeting now 09:53:54 <strigazi> thanks flwang1 09:53:56 <flwang1> anything else we need to discuss? 09:54:04 <flwang1> strigazi: brtknr: thank you team 09:54:07 <flwang1> #endmeeting