08:59:33 <flwang1> #startmeeting magnum
08:59:34 <openstack> Meeting started Wed Oct  9 08:59:33 2019 UTC and is due to finish in 60 minutes.  The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:59:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
08:59:37 <openstack> The meeting name has been set to 'magnum'
08:59:43 <flwang1> #topic roll call
08:59:48 <flwang1> o/
08:59:49 <strigazi> o/
09:01:26 <flwang1> brtknr: ?
09:01:34 <brtknr> oo/
09:02:02 <flwang1> #topic fcos driver
09:02:10 <flwang1> strigazi: do you have give us an update?
09:02:24 <flwang1> s/have/wanna
09:02:39 <strigazi> the patch is in working state, I want to only add UTs
09:02:51 <jakeyip> o/
09:02:52 <strigazi> works with 1.15.x and 1.16.x
09:03:12 <brtknr> strigazi: when I tested it yesterday, I was havving issues  with the $ssh_cmd
09:03:29 <strigazi> The only thing I want to check before merging is selinux
09:03:32 <strigazi> brtknr: works for me
09:03:44 <strigazi> brtknr: can you be more specific?
09:03:51 <brtknr> why do we use $ssh_cmd in some places and not others
09:04:06 <strigazi> is this the problem?
09:04:17 <brtknr> + ssh -F /srv/magnum/.ssh/config root@localhost openssl genrsa -out /etc/kubernetes/certs/kubelet.key
09:04:18 <brtknr> ssh: connect to host 127.0.0.1 port 22: Connection refused
09:04:45 <strigazi> sshd down?
09:04:49 <brtknr> this is in the worker node after which it fails to join the cluster
09:05:07 <brtknr> when I tried the command manually, it worked
09:05:23 <brtknr> also when i restarted the heat-container-agent manually, it joined the cluster
09:05:34 <strigazi> I have After=sshd.service
09:05:45 <strigazi> I can add requires too
09:05:52 <brtknr> wondering if heat-container-agent is racing against sshd
09:06:07 <strigazi> transient error? I never saw this after adding After=sshd.service
09:06:12 <flwang1> strigazi: it would be nice to have After=sshd.service
09:06:26 <strigazi> we do have it!!!
09:06:34 <strigazi> https://review.opendev.org/#/c/678458/8/magnum/drivers/k8s_fedora_coreos_v1/templates/user_data.json@75
09:06:43 <flwang1> then good
09:07:12 <flwang1> i haven't got time to test your new patch
09:07:25 <strigazi> hm, no it only in configure-agent-env.service
09:07:43 <strigazi> but
09:08:16 <strigazi> heat-container-agent.service has After=network-online.target configure-agent-env.service
09:08:27 <strigazi> and configure-agent-env.service has After=sshd.service
09:08:29 <flwang1> so it would work
09:08:39 <strigazi> I'll add Requires as well
09:09:07 <strigazi> anyway, that is a detail
09:09:12 <flwang1> strigazi: re UT, i'm happy with having limited UT for this new driver given the gate is closing
09:09:46 <strigazi> the important parts are two
09:10:00 <strigazi> one, the iginition patch is no working :(
09:10:10 <flwang1> ???
09:10:16 <flwang1> your mean my heat patch?
09:10:19 <strigazi> I missed testing the last PS
09:10:20 <strigazi> yes
09:10:29 <flwang1> why?
09:10:33 <flwang1> what's the problem?
09:10:46 <strigazi> https://review.opendev.org/#/c/683723/5..13/heat/engine/clients/os/nova.py@454
09:10:48 <flwang1> is it a regression issue?
09:11:01 <strigazi> /var/lib/os-collect-config/local-data must be a direcroty
09:11:05 <strigazi> not a file
09:11:33 <strigazi> see https://github.com/openstack/os-collect-config/blob/master/os_collect_config/local.py#L70
09:11:57 <flwang1> shit
09:12:07 <flwang1> do you have a fix ?
09:12:35 <strigazi> not a proper one
09:12:54 <strigazi> if we write a file in /var/lib/os-collect-config/local-data our agent doesn't work still
09:13:12 <strigazi> but at least os-collect-config is not throwing an exception
09:13:26 <ttsiouts> o/ sorry I was late..
09:14:01 <strigazi> my fix was to just copy the file to /var/lib/cloud/data/cfn-init-data"
09:14:26 <strigazi> I don't know why the local collector is not working
09:14:27 <flwang1> can we propose a patch to still use the two 'old' file paths?
09:14:49 <strigazi> yes a milllion times
09:16:08 <strigazi> or the heat team can help us using the local collector which will require us to patch our agent
09:16:52 <strigazi> so action is to patch heat again with the old files?
09:16:54 <flwang1> ok, i will take this
09:17:08 <flwang1> i will try the current way first
09:17:16 <strigazi> FYI, FCOS team is happy to patch ignition as well
09:17:24 <strigazi> I'm testing that too
09:17:36 <strigazi> they wrote the patch aleary
09:17:38 <flwang1> ok, let's do it in parallel
09:17:39 <strigazi> they wrote the patch already
09:17:40 <flwang1> just in case
09:17:44 <strigazi> yeap
09:18:00 <strigazi> and the second part for fcos driver is:
09:18:30 <strigazi> will we enable selinux? conformance pass (I think), I'll post the result to be sure
09:18:47 <flwang1> if we can enable selinux, why not?
09:19:36 <strigazi> I don't know how it will work with the storage plugins
09:19:57 <strigazi> flwang1: you can test cinder if you use it
09:20:07 <strigazi> not sure if cinder works with 1.16
09:20:51 <flwang1> strigazi: you mean test cinder with 1.16 with selinxu enabled? or they are 2 different cases
09:21:00 <strigazi> one
09:21:08 <strigazi> test cinder with 1.16 with selinxux enabled
09:21:21 <strigazi> does it work with 1.15?
09:21:31 <flwang1> iirc, in 1.16, k8s won't use the build-in cinder driver, right?
09:22:09 <strigazi> excellent :)
09:22:16 <strigazi> so nothing to test?
09:22:33 <strigazi> will we use csi?
09:22:43 <flwang1> i don't know, i need to confirm with CPO team
09:22:58 <flwang1> i will let you guys know
09:23:04 <strigazi> we get off-topic, but leave a comment in the review for cinder
09:23:20 <strigazi> we go with selinux if conformance passes.
09:23:28 <strigazi> that's it for me
09:23:43 <flwang1> cool, thanks
09:25:58 <strigazi> next? NGs?
09:26:10 <flwang1> ttsiouts: ng?
09:26:26 <ttsiouts> flwang1: strigazi: sure
09:26:39 <flwang1> i can see there are still several patches for NG, are we still going to get them in train?
09:27:02 <strigazi> we need them, they are fixes really
09:27:31 <ttsiouts> flwang1: yes
09:27:59 <ttsiouts> flwang1: the NG upgrade is the main thing I wanted to talk about
09:27:59 <flwang1> ok
09:28:52 <ttsiouts> the WIP patch https://review.opendev.org/#/c/686733/ does what it is supposed to do
09:29:00 <ttsiouts> it upgrades the given NG
09:30:07 <ttsiouts> the user has to be really careful as to what labels to use
09:30:29 <flwang1> ttsiouts: ok, user can upgrade a specific ng with the current upgrade command, right?
09:30:41 <ttsiouts> flwang1: exactly
09:30:54 <ttsiouts> flwang1: by defining --nodegroup <nodegroup>
09:30:57 <flwang1> what do you mean 'what labels to use'?
09:31:52 <ttsiouts> flwang1: for example if the availability zone label is in the cluster template, and it is different than the one set in the NG
09:32:05 <ttsiouts> flwang1: it will cause the nodes of the NG to be rebuilt
09:32:27 <brtknr> ttsiouts: i thought all other labels get ignored apart from kube_tag?
09:32:45 <strigazi> the heat_container_agent_tag can be destructive too
09:32:57 <flwang1> brtknr: it should be, that's why i asked
09:33:41 <ttsiouts> hmm.. I tried to remove the labels that could cause such things here: https://review.opendev.org/#/c/686733/1/magnum/drivers/k8s_fedora_atomic_v1/driver.py@121
09:34:21 <ttsiouts> flwang1: brtknr: I am not sure they are ignored. But I can test again
09:34:42 <flwang1> hmm... i'd like to test as well to understand it better
09:35:05 <strigazi> the happy path scenarios work great
09:35:18 <ttsiouts> strigazi: yes
09:35:36 <strigazi> corner cases like changing one of the labels ttsiouts linked in the patch can be descrtuctive
09:35:52 <strigazi> or a bit rebuildy
09:36:21 <ttsiouts> yes this is why I wanted to raise this here.
09:36:38 <ttsiouts> so we are on the same page
09:36:48 <flwang1> ok
09:36:50 <flwang1> thanks
09:38:11 <flwang1> anything else?
09:38:23 <brtknr> I would rather prevent upgrade taking place if the specific subset of labels do not match
09:38:25 <strigazi> and we need these for tain
09:38:28 <strigazi> and we need these for train
09:38:33 <brtknr> e.g. force those specific labels to match
09:38:39 <brtknr> ttsiouts: ^
09:38:48 <strigazi> for U yes
09:38:55 <strigazi> for T a bit late?
09:40:45 <ttsiouts> brtknr: the logic is that the non-default NGs get upgraded to match the cluster
09:41:34 <ttsiouts> brtknr: there is a validation in the api that checks that the user provided the same CT as the cluster has (only for non-default NGs)
09:41:49 <brtknr> strigazi: Yes, I am happy to develop it further later... perhaps add accompanying notes for bits that are "hacks" for this release... we dont want code to silently ignore...
09:42:34 <ttsiouts> brtknr: If we have one CT the labels cannot match to all cluster NGs
09:43:26 <brtknr> ttsiouts: i meant, matching label during upgrades
09:43:44 <brtknr> you have CT-A-v1 which has AZ-A
09:44:23 <brtknr> then you want that to upgrade nodegroup from CT-A-v1 to CT-A-v2 which has AZ-B
09:44:40 <brtknr> I'd rather the upgrade wasnt allowed in this situation
09:44:50 <brtknr> as it might be error on part of the user...
09:44:53 <strigazi> this doesn't make sense
09:45:15 <strigazi> I have ng-1-az-A and ng-1-az-B
09:45:30 <strigazi> I won't to upgrade both
09:45:41 <strigazi> I want to upgrade both
09:45:52 <strigazi> the CT in the cluster can be only one.
09:46:19 <strigazi> the rule above would never allow me to upgrade one of the two NGs
09:47:03 <ttsiouts> brtknr: check here: https://review.opendev.org/#/c/686733/1/magnum/api/controllers/v1/cluster_actions.py@153
09:47:31 <strigazi> for U we will put it in the NG spec with details.
09:47:54 <strigazi> for T we can say that only kube_tag will be taken into account
09:48:03 <strigazi> sounds reasonable?
09:48:08 <brtknr> im happy with only kube_tag for T
09:49:10 <strigazi> so we add only kube_tag in https://review.opendev.org/#/c/686733 ?
09:49:16 <strigazi> ttsiouts: brtknr flwang1 ^^
09:49:23 <flwang1> i'm ok
09:49:33 <ttsiouts> this is the safest for now
09:49:46 <strigazi> and safeest
09:49:50 <strigazi> and safest
09:49:56 <brtknr> yes, does it still require stripping things out?
09:50:18 <brtknr> or would it be better to only select kube_tag during upgrade?
09:51:15 <ttsiouts> brtknr: we could use the code that checks the validity of the versions and pass only the kube_tag
09:51:16 <brtknr> it might be better to use "upgradeable" parts rather than "skip these labels"
09:51:37 <brtknr> it might be better to use "upgradeable labels" rather than "skip these labels"
09:51:46 <brtknr> for clarity
09:52:19 <brtknr> we can expand the list of "upgradeable labels" as we are able to upgrade more things
09:52:20 <flwang1> we're running out time
09:52:26 <brtknr> does that make sense?
09:52:49 <strigazi> let's keep it simple for T
09:52:56 <strigazi> pass onlt kube_tag, end of story
09:53:00 <strigazi> pass only kube_tag, end of story
09:53:31 <strigazi> add it in reno too, and we are clear
09:53:41 <brtknr> sounds good
09:54:02 <brtknr> lets start backporting things
09:54:36 <flwang1> #topic backport for train
09:54:57 <brtknr> bfv+ng+podman+fcos
09:55:30 <strigazi> basically current master plus ng and fcos?
09:55:56 <brtknr> yes
09:56:07 <strigazi> flwang1: sounds good?
09:57:22 <flwang1> yep, good for me
09:58:19 <strigazi> anything else for the topic/meeting?
09:58:28 <flwang1> i'm good
09:58:37 <flwang1> actually, i'm sleepy ;)
09:58:41 <brtknr> thats all
09:58:48 <strigazi> on time
09:59:30 <flwang1> cool, thank you, guys
09:59:32 <flwang1> #endmeeting