09:01:00 <flwang1> #startmeeting magnum
09:01:01 <openstack> Meeting started Wed Apr 15 09:01:00 2020 UTC and is due to finish in 60 minutes.  The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:01:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:01:04 <openstack> The meeting name has been set to 'magnum'
09:01:12 <brtknr> o/
09:01:12 <flwang1> #topic roll call
09:01:18 <strigazi> o/
09:01:22 <flwang1> o/
09:01:35 <flwang1> anyone else?
09:01:42 <flwang1> cosmicsound:  ttsiouts: ?
09:01:42 <brtknr> o/
09:01:52 <ttsiouts> o/
09:02:08 <flwang1> here is the agenda for today https://etherpad.opendev.org/p/magnum-weekly-meeting
09:02:31 <flwang1> #topic labels override
09:02:42 <flwang1> ttsiouts: any updates?
09:03:42 <flwang1> brtknr: ^
09:03:42 <ttsiouts> flwang1: I tried to create a PoC without DB migrations
09:03:53 <ttsiouts> I can push it today
09:04:21 <flwang1> strigazi: are you ware of the PoC's design?
09:04:29 <flwang1> aware
09:04:51 <ttsiouts> the idea is that there is going to be a new boolean flag when creating a cluster or nodegroup
09:04:56 <brtknr> ttsiouts: without DB migration? how so?
09:04:59 <strigazi> yes, but let's see it gerrit. It is more or less based on brtknr idea
09:05:23 <strigazi> brtknr: flwang1: isn't it better to discuss it in gerrit?
09:05:23 <ttsiouts> brtknr: let's discuss it in gerrit
09:05:41 <brtknr> gerrit++
09:05:41 <ttsiouts> i'll push asap
09:05:47 <strigazi> i think we are on track, right ttsiouts ?
09:06:00 <brtknr> ttsiouts: will you update the spec?
09:06:09 <ttsiouts> strigazi: yes
09:06:16 <flwang1> ok, let's review it on gerrit
09:06:30 <flwang1> anything else we need to discuss about this topic?
09:06:49 <brtknr> i cant wait to try it :)
09:06:57 <ttsiouts> brtknr: sure but I wanted to see if it's possible to have something without DB migrations
09:07:16 <brtknr> ttsiouts: does adding a new field != migration?
09:07:28 <ttsiouts> no new field in the DB
09:07:37 <dioguerra> o/
09:07:45 <flwang1> let's review it on gerrit, team
09:07:46 <flwang1> #topic Private clusters got stuck at CREATE_INPROGRESS
09:07:50 <flwang1> brtknr: ^
09:07:53 <brtknr> ok
09:08:18 <brtknr> cosmicsound: found this corner case
09:08:28 <brtknr> looks like the regression was introduced in train
09:08:33 <brtknr> https://review.opendev.org/#/c/720022/ is the fix
09:09:34 <flwang1> brtknr: how to reproduce it?
09:10:06 <brtknr> it is because the nodegroups change by ttsiouts assumed floating_ip_enabled was a cluster template property whereas by the time we merged the nodegroups changes, it was possible to set floating_ip_enabled in the cluster scope
09:10:27 <brtknr> to reproduce, create CT with floating IP enabled on  then create cluster with floating ip disabled
09:10:43 <flwang1> ok, i see
09:10:58 <brtknr> to reproduce, create CT with --floating-ip-enabled flag on then create cluster with --floating-ip-disabled
09:11:12 <flwang1> i will  test it tomorrow
09:11:34 <brtknr> flwang1: you may not have noticed this in your prod if your cluster templates have --floating-ip-disabled by default
09:11:58 <flwang1> yep, that's right
09:12:07 <flwang1> anything else about this?
09:12:14 <brtknr> thats all
09:12:17 <flwang1> #topic Fedora CoreOS docker storage https://review.opendev.org/718296
09:13:12 <flwang1> I missed to add the docker storage for worker node in my previous patch, and this patch removes the hardcode for vdb
09:13:23 <flwang1> brtknr: strigazi: pls help review it
09:13:34 <flwang1> let me know if you help any question about this
09:14:06 <brtknr> flwang1: you also forgot to mention that it uses /etc/fstab instead of systemd unit file + ext4->xfs default in the commit message :)
09:14:50 <flwang1> ok, will do, can you please leave your comments on that patch?
09:15:29 <brtknr> flwang1: already have
09:15:51 <flwang1> brtknr: great
09:15:56 <flwang1> #topic stable/train
09:16:09 <flwang1> now our train gate is broken
09:16:38 <flwang1> and we're hoping it can be fixed by a heat patch, but that patch is also broken now :(
09:17:22 <brtknr> flwang1: http://zuul.openstack.org/status/change/718522,3 looks promising though
09:17:22 <flwang1> I tried to totally remove the py2 functional test, but seems the infra team believe there is a better way
09:17:36 <brtknr> it is running right now
09:17:49 <flwang1> hopefully
09:17:52 <flwang1> let's see
09:18:13 <flwang1> if we can get the strain gate back to normal, we do have several patches to backport
09:18:33 <flwang1> that said, we may need a 9.3.1 or 9.4.0
09:19:12 <brtknr> I vote for 9.3.1
09:19:27 <brtknr> Because SEMVER :)
09:19:29 <flwang1> it's all yours ;)
09:19:32 <strigazi> whatever you prefer team
09:19:56 <flwang1> #topic Expose traefik metrics - https://review.opendev.org/#/c/697044/
09:20:16 <flwang1> dioguerra: ^
09:20:19 <brtknr> note: 9.3.0 works as long as the user is using the latest stable fcos image
09:20:20 <flwang1> what's that for?
09:20:43 <brtknr> because zincati does not have a reason to rebooot the machine
09:20:46 <dioguerra> this was actually a split do divide traefik and autoscaler expose metrics
09:20:55 <cosmicsound> i am here just in case
09:20:59 <dioguerra> to be used with this patch:
09:21:01 <dioguerra> https://review.opendev.org/#/c/715142/
09:21:17 <dioguerra> I think im going to pool everything together
09:21:47 <flwang1> dioguerra: ok, got it
09:21:50 <dioguerra> Meanwhile, i would like you to give a quick grance to the ticket above
09:21:52 <flwang1> thanks for working on that
09:22:14 <dioguerra> before i fnish adding stuff, to see if you want to add or remove something
09:22:32 <flwang1> dioguerra: i like the helm client tag
09:22:45 <flwang1> dioguerra: are you going to work on the helm v3?
09:22:55 <strigazi> what do we discuss now? helm_client or traefik or helm3?
09:23:33 <dioguerra> It all depends on what strigazi says i guess
09:23:41 <brtknr> I am confused too
09:23:47 <flwang1> strigazi: i don't think there are much to discuss about the helm_client tag and the trafik patch
09:24:05 <strigazi> 1. +1 to helm_client_tag
09:24:23 <dioguerra> sorry, i copied wrong link: this https://review.opendev.org/#/c/720027/
09:24:33 <flwang1> so i just think aloud, sorry if it confused you guys
09:25:13 <strigazi> 2. +1  to https://review.opendev.org/#/c/697044/5/magnum/drivers/common/templates/kubernetes/fragments/enable-ingress-traefik.sh
09:26:32 <flwang1> anything else about this topic?
09:27:15 <strigazi> dtomasgu: ^^
09:27:18 <flwang1> if you think helm3 is not related to this, we can discuss it in the future
09:27:40 <dioguerra> helm and label for client is not really related to this
09:28:04 <strigazi> flwang1: helm3 is another important topic, we can discuss it when we finish the agenda
09:28:13 <flwang1> strigazi: sure
09:28:20 <strigazi> dtomasgu: are you covered for https://review.opendev.org/#/c/697044 ?
09:28:32 <brtknr> I haave added helm3 to the end of the agendsa
09:28:49 <flwang1> #topic Build cluster autoscaler containers: https://review.opendev.org/#/c/714986/
09:29:07 <flwang1> can anybody help me understand why we need a new version of auotscaler?
09:30:02 <strigazi> flwang1: these are just the new releases, no?
09:30:15 <strigazi> flwang1: I don't think we "need" them
09:30:23 <brtknr> flwang1: the current version we are running is 1.15.x
09:30:26 <strigazi> I think brtknr just wants to build them
09:30:40 <strigazi> And anyone can use them, right?
09:30:54 <brtknr> i assumed the autoscaler versions somehow reflect k8s version
09:31:57 <strigazi> They do, but for magnum their is no difference,
09:32:09 <brtknr> it is possible nothing new has been added for magnum driver
09:32:32 <strigazi> nothing new so far, Thomas from our team is working on NGs.
09:32:37 <flwang1> i know thomas is doing some work for nodegroup
09:32:46 <flwang1> but i don't think the work has been merged yet
09:33:01 <flwang1> but that's alright, i just wanna undestand why we need the new versions
09:33:02 <strigazi> Let's move one? This patch is only for adding new builds
09:33:20 <flwang1> #topic Make helm jobs retry internally: https://review.opendev.org/#/c/718794/
09:33:21 <brtknr> yes
09:33:53 <flwang1> strigazi: do we need a discussion for the above one?
09:34:02 <flwang1> as long as there is a timeout, i'm ok with that
09:34:17 <strigazi> +1 ^^
09:34:20 <brtknr> flwang1: yes, retry 60 times
09:34:33 <brtknr> == timeout after 5 mins
09:34:34 <flwang1> do you mean 60s?
09:34:43 <flwang1> for each job?
09:34:47 <brtknr> yes
09:34:54 <flwang1> fair enough
09:35:03 <flwang1> let's move on
09:35:15 <flwang1> #topic tempest/automation test
09:35:36 <flwang1> at catalyst cloud, we're trying to upgrade to stable/train
09:36:02 <flwang1> and i'm realizing that the testing is getting more and more hard because we don't have a reasonable functional test suite
09:36:19 <flwang1> strigazi: brtknr: how did you guys do automation test in your org?
09:36:49 <strigazi> flwang1: For the upgrade we didn't have any automated testing. We did it by hand.
09:37:16 <brtknr> flwang1: my colleagues use refstack for the core openstack services
09:37:18 <strigazi> flwang1: we have rally running periocally and we started adding more tests to it.
09:37:21 <flwang1> internally, we have some very simple shell script which can cover dns, loadbalancer, pvc, networkpolicy, ingress, etc
09:37:27 <flwang1> run manually
09:37:48 <brtknr> for magnum, its mostly manual testing
09:37:50 <flwang1> strigazi: brtknr: i'm talking about the magnum scope
09:38:11 <strigazi> flwang1: I was refering to magnum
09:38:25 <flwang1> i'm planning to put the shell script and some yamls into a docker container
09:38:46 <strigazi> We have a list of functionalities we need to test and we do it by hand.
09:38:47 <flwang1> so that we can run it after the tempest create the cluster, how you guys think?
09:39:15 <flwang1> strigazi: but, by hand is so complex
09:39:18 <brtknr> flwang1: which one of our CI jobs creates a cluster ATM?
09:39:50 <flwang1> brtknr: the openstack infra don't have the nested virt, otherwise, sure, we can
09:39:53 <brtknr> i thought the functional-k8s test was currently disabled
09:40:26 <flwang1> brtknr: even if it's working, it's far more away from an useful functional test
09:40:46 <flwang1> because, currently, we have so many new features
09:41:16 <strigazi> flwang1: they only reasonable thing would be conformance
09:41:27 <flwang1> conformance test is not enough
09:41:51 <strigazi> flwang1: It should be. There are 4000 tests, we run only ~200 of them
09:41:56 <flwang1> it doesn't cover many things like lb, pvc, keystone auth, rolling upgrade, etc
09:42:23 <strigazi> lb, pvc should be there
09:42:35 <dioguerra> Is theere any chance we could enable nested virt?
09:42:54 <strigazi> there is nested virt enabled
09:43:06 <flwang1> dioguerra: mnaser said vexxhost has already enabled that, and we can use it
09:43:20 <flwang1> it worked before, but it's broken now
09:43:31 <strigazi> there was kernel incompatibility issue with ubuntu
09:43:44 <strigazi> this was 12months ago
09:43:52 <flwang1> strigazi: do you know the latest status?
09:44:12 <strigazi> flwang1: no, and I abandoned the idea of the openstack CI
09:44:31 <flwang1> what do you mean abandon the idea of openstack ci?
09:44:40 <strigazi> flwang1: that it will ever work
09:44:49 <flwang1> never or ever?
09:45:01 <strigazi> flwang1: maybe it will work for a couple of weeks
09:45:17 <brtknr> flwang1: for local testing, i have a terraform script which creates a cluster with master lb with bunch of labels enabled which covers most use cases
09:46:04 <strigazi> I have a meeting in 14' mins sharp. Is there anything else for the meeting?
09:46:07 <flwang1> brtknr: we should have one we all can contribute
09:46:23 <flwang1> ok, we can discuss this later
09:46:30 <flwang1> #topic helm3
09:46:37 <brtknr> ok maybe worth contacting mnaser again
09:46:49 <flwang1> strigazi: any comments about helm3?
09:46:57 <brtknr> or guilhermesp who works on magnum and works at vexxhost
09:47:18 <strigazi> Does someone work on helm3? We need an assignee
09:47:51 <brtknr> I can give it a go after the helm_client_tag is merged
09:47:59 <strigazi> cool
09:49:02 <brtknr> Has anyone tried installing the charts we have via helm3?
09:49:22 <strigazi> it would be nice to not drop tiller and keep compatibility of the scripts.
09:49:34 <strigazi> brtknr: everything works with helm3
09:49:49 <brtknr> afaik, helm2 and helm3 can coexist on the same cluster
09:49:57 <strigazi> they can
09:50:48 <flwang1> strigazi: before you leaving, can you revisit https://review.opendev.org/710384 ?
09:51:54 <flwang1> brtknr: strigazi: dioguerra: anything else?
09:52:00 <strigazi> looks ok, I need to test
09:52:30 <flwang1> strigazi: pls do, it would be appreciated
09:53:46 <flwang1> i'm going to end the meeting now
09:53:54 <strigazi> thanks flwang1
09:53:56 <flwang1> anything else we need to discuss?
09:54:04 <flwang1> strigazi: brtknr: thank you team
09:54:07 <flwang1> #endmeeting