08:59:33 <flwang1> #startmeeting magnum 08:59:34 <openstack> Meeting started Wed Oct 9 08:59:33 2019 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:59:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:59:37 <openstack> The meeting name has been set to 'magnum' 08:59:43 <flwang1> #topic roll call 08:59:48 <flwang1> o/ 08:59:49 <strigazi> o/ 09:01:26 <flwang1> brtknr: ? 09:01:34 <brtknr> oo/ 09:02:02 <flwang1> #topic fcos driver 09:02:10 <flwang1> strigazi: do you have give us an update? 09:02:24 <flwang1> s/have/wanna 09:02:39 <strigazi> the patch is in working state, I want to only add UTs 09:02:51 <jakeyip> o/ 09:02:52 <strigazi> works with 1.15.x and 1.16.x 09:03:12 <brtknr> strigazi: when I tested it yesterday, I was havving issues with the $ssh_cmd 09:03:29 <strigazi> The only thing I want to check before merging is selinux 09:03:32 <strigazi> brtknr: works for me 09:03:44 <strigazi> brtknr: can you be more specific? 09:03:51 <brtknr> why do we use $ssh_cmd in some places and not others 09:04:06 <strigazi> is this the problem? 09:04:17 <brtknr> + ssh -F /srv/magnum/.ssh/config root@localhost openssl genrsa -out /etc/kubernetes/certs/kubelet.key 09:04:18 <brtknr> ssh: connect to host 127.0.0.1 port 22: Connection refused 09:04:45 <strigazi> sshd down? 09:04:49 <brtknr> this is in the worker node after which it fails to join the cluster 09:05:07 <brtknr> when I tried the command manually, it worked 09:05:23 <brtknr> also when i restarted the heat-container-agent manually, it joined the cluster 09:05:34 <strigazi> I have After=sshd.service 09:05:45 <strigazi> I can add requires too 09:05:52 <brtknr> wondering if heat-container-agent is racing against sshd 09:06:07 <strigazi> transient error? I never saw this after adding After=sshd.service 09:06:12 <flwang1> strigazi: it would be nice to have After=sshd.service 09:06:26 <strigazi> we do have it!!! 09:06:34 <strigazi> https://review.opendev.org/#/c/678458/8/magnum/drivers/k8s_fedora_coreos_v1/templates/user_data.json@75 09:06:43 <flwang1> then good 09:07:12 <flwang1> i haven't got time to test your new patch 09:07:25 <strigazi> hm, no it only in configure-agent-env.service 09:07:43 <strigazi> but 09:08:16 <strigazi> heat-container-agent.service has After=network-online.target configure-agent-env.service 09:08:27 <strigazi> and configure-agent-env.service has After=sshd.service 09:08:29 <flwang1> so it would work 09:08:39 <strigazi> I'll add Requires as well 09:09:07 <strigazi> anyway, that is a detail 09:09:12 <flwang1> strigazi: re UT, i'm happy with having limited UT for this new driver given the gate is closing 09:09:46 <strigazi> the important parts are two 09:10:00 <strigazi> one, the iginition patch is no working :( 09:10:10 <flwang1> ??? 09:10:16 <flwang1> your mean my heat patch? 09:10:19 <strigazi> I missed testing the last PS 09:10:20 <strigazi> yes 09:10:29 <flwang1> why? 09:10:33 <flwang1> what's the problem? 09:10:46 <strigazi> https://review.opendev.org/#/c/683723/5..13/heat/engine/clients/os/nova.py@454 09:10:48 <flwang1> is it a regression issue? 09:11:01 <strigazi> /var/lib/os-collect-config/local-data must be a direcroty 09:11:05 <strigazi> not a file 09:11:33 <strigazi> see https://github.com/openstack/os-collect-config/blob/master/os_collect_config/local.py#L70 09:11:57 <flwang1> shit 09:12:07 <flwang1> do you have a fix ? 09:12:35 <strigazi> not a proper one 09:12:54 <strigazi> if we write a file in /var/lib/os-collect-config/local-data our agent doesn't work still 09:13:12 <strigazi> but at least os-collect-config is not throwing an exception 09:13:26 <ttsiouts> o/ sorry I was late.. 09:14:01 <strigazi> my fix was to just copy the file to /var/lib/cloud/data/cfn-init-data" 09:14:26 <strigazi> I don't know why the local collector is not working 09:14:27 <flwang1> can we propose a patch to still use the two 'old' file paths? 09:14:49 <strigazi> yes a milllion times 09:16:08 <strigazi> or the heat team can help us using the local collector which will require us to patch our agent 09:16:52 <strigazi> so action is to patch heat again with the old files? 09:16:54 <flwang1> ok, i will take this 09:17:08 <flwang1> i will try the current way first 09:17:16 <strigazi> FYI, FCOS team is happy to patch ignition as well 09:17:24 <strigazi> I'm testing that too 09:17:36 <strigazi> they wrote the patch aleary 09:17:38 <flwang1> ok, let's do it in parallel 09:17:39 <strigazi> they wrote the patch already 09:17:40 <flwang1> just in case 09:17:44 <strigazi> yeap 09:18:00 <strigazi> and the second part for fcos driver is: 09:18:30 <strigazi> will we enable selinux? conformance pass (I think), I'll post the result to be sure 09:18:47 <flwang1> if we can enable selinux, why not? 09:19:36 <strigazi> I don't know how it will work with the storage plugins 09:19:57 <strigazi> flwang1: you can test cinder if you use it 09:20:07 <strigazi> not sure if cinder works with 1.16 09:20:51 <flwang1> strigazi: you mean test cinder with 1.16 with selinxu enabled? or they are 2 different cases 09:21:00 <strigazi> one 09:21:08 <strigazi> test cinder with 1.16 with selinxux enabled 09:21:21 <strigazi> does it work with 1.15? 09:21:31 <flwang1> iirc, in 1.16, k8s won't use the build-in cinder driver, right? 09:22:09 <strigazi> excellent :) 09:22:16 <strigazi> so nothing to test? 09:22:33 <strigazi> will we use csi? 09:22:43 <flwang1> i don't know, i need to confirm with CPO team 09:22:58 <flwang1> i will let you guys know 09:23:04 <strigazi> we get off-topic, but leave a comment in the review for cinder 09:23:20 <strigazi> we go with selinux if conformance passes. 09:23:28 <strigazi> that's it for me 09:23:43 <flwang1> cool, thanks 09:25:58 <strigazi> next? NGs? 09:26:10 <flwang1> ttsiouts: ng? 09:26:26 <ttsiouts> flwang1: strigazi: sure 09:26:39 <flwang1> i can see there are still several patches for NG, are we still going to get them in train? 09:27:02 <strigazi> we need them, they are fixes really 09:27:31 <ttsiouts> flwang1: yes 09:27:59 <ttsiouts> flwang1: the NG upgrade is the main thing I wanted to talk about 09:27:59 <flwang1> ok 09:28:52 <ttsiouts> the WIP patch https://review.opendev.org/#/c/686733/ does what it is supposed to do 09:29:00 <ttsiouts> it upgrades the given NG 09:30:07 <ttsiouts> the user has to be really careful as to what labels to use 09:30:29 <flwang1> ttsiouts: ok, user can upgrade a specific ng with the current upgrade command, right? 09:30:41 <ttsiouts> flwang1: exactly 09:30:54 <ttsiouts> flwang1: by defining --nodegroup <nodegroup> 09:30:57 <flwang1> what do you mean 'what labels to use'? 09:31:52 <ttsiouts> flwang1: for example if the availability zone label is in the cluster template, and it is different than the one set in the NG 09:32:05 <ttsiouts> flwang1: it will cause the nodes of the NG to be rebuilt 09:32:27 <brtknr> ttsiouts: i thought all other labels get ignored apart from kube_tag? 09:32:45 <strigazi> the heat_container_agent_tag can be destructive too 09:32:57 <flwang1> brtknr: it should be, that's why i asked 09:33:41 <ttsiouts> hmm.. I tried to remove the labels that could cause such things here: https://review.opendev.org/#/c/686733/1/magnum/drivers/k8s_fedora_atomic_v1/driver.py@121 09:34:21 <ttsiouts> flwang1: brtknr: I am not sure they are ignored. But I can test again 09:34:42 <flwang1> hmm... i'd like to test as well to understand it better 09:35:05 <strigazi> the happy path scenarios work great 09:35:18 <ttsiouts> strigazi: yes 09:35:36 <strigazi> corner cases like changing one of the labels ttsiouts linked in the patch can be descrtuctive 09:35:52 <strigazi> or a bit rebuildy 09:36:21 <ttsiouts> yes this is why I wanted to raise this here. 09:36:38 <ttsiouts> so we are on the same page 09:36:48 <flwang1> ok 09:36:50 <flwang1> thanks 09:38:11 <flwang1> anything else? 09:38:23 <brtknr> I would rather prevent upgrade taking place if the specific subset of labels do not match 09:38:25 <strigazi> and we need these for tain 09:38:28 <strigazi> and we need these for train 09:38:33 <brtknr> e.g. force those specific labels to match 09:38:39 <brtknr> ttsiouts: ^ 09:38:48 <strigazi> for U yes 09:38:55 <strigazi> for T a bit late? 09:40:45 <ttsiouts> brtknr: the logic is that the non-default NGs get upgraded to match the cluster 09:41:34 <ttsiouts> brtknr: there is a validation in the api that checks that the user provided the same CT as the cluster has (only for non-default NGs) 09:41:49 <brtknr> strigazi: Yes, I am happy to develop it further later... perhaps add accompanying notes for bits that are "hacks" for this release... we dont want code to silently ignore... 09:42:34 <ttsiouts> brtknr: If we have one CT the labels cannot match to all cluster NGs 09:43:26 <brtknr> ttsiouts: i meant, matching label during upgrades 09:43:44 <brtknr> you have CT-A-v1 which has AZ-A 09:44:23 <brtknr> then you want that to upgrade nodegroup from CT-A-v1 to CT-A-v2 which has AZ-B 09:44:40 <brtknr> I'd rather the upgrade wasnt allowed in this situation 09:44:50 <brtknr> as it might be error on part of the user... 09:44:53 <strigazi> this doesn't make sense 09:45:15 <strigazi> I have ng-1-az-A and ng-1-az-B 09:45:30 <strigazi> I won't to upgrade both 09:45:41 <strigazi> I want to upgrade both 09:45:52 <strigazi> the CT in the cluster can be only one. 09:46:19 <strigazi> the rule above would never allow me to upgrade one of the two NGs 09:47:03 <ttsiouts> brtknr: check here: https://review.opendev.org/#/c/686733/1/magnum/api/controllers/v1/cluster_actions.py@153 09:47:31 <strigazi> for U we will put it in the NG spec with details. 09:47:54 <strigazi> for T we can say that only kube_tag will be taken into account 09:48:03 <strigazi> sounds reasonable? 09:48:08 <brtknr> im happy with only kube_tag for T 09:49:10 <strigazi> so we add only kube_tag in https://review.opendev.org/#/c/686733 ? 09:49:16 <strigazi> ttsiouts: brtknr flwang1 ^^ 09:49:23 <flwang1> i'm ok 09:49:33 <ttsiouts> this is the safest for now 09:49:46 <strigazi> and safeest 09:49:50 <strigazi> and safest 09:49:56 <brtknr> yes, does it still require stripping things out? 09:50:18 <brtknr> or would it be better to only select kube_tag during upgrade? 09:51:15 <ttsiouts> brtknr: we could use the code that checks the validity of the versions and pass only the kube_tag 09:51:16 <brtknr> it might be better to use "upgradeable" parts rather than "skip these labels" 09:51:37 <brtknr> it might be better to use "upgradeable labels" rather than "skip these labels" 09:51:46 <brtknr> for clarity 09:52:19 <brtknr> we can expand the list of "upgradeable labels" as we are able to upgrade more things 09:52:20 <flwang1> we're running out time 09:52:26 <brtknr> does that make sense? 09:52:49 <strigazi> let's keep it simple for T 09:52:56 <strigazi> pass onlt kube_tag, end of story 09:53:00 <strigazi> pass only kube_tag, end of story 09:53:31 <strigazi> add it in reno too, and we are clear 09:53:41 <brtknr> sounds good 09:54:02 <brtknr> lets start backporting things 09:54:36 <flwang1> #topic backport for train 09:54:57 <brtknr> bfv+ng+podman+fcos 09:55:30 <strigazi> basically current master plus ng and fcos? 09:55:56 <brtknr> yes 09:56:07 <strigazi> flwang1: sounds good? 09:57:22 <flwang1> yep, good for me 09:58:19 <strigazi> anything else for the topic/meeting? 09:58:28 <flwang1> i'm good 09:58:37 <flwang1> actually, i'm sleepy ;) 09:58:41 <brtknr> thats all 09:58:48 <strigazi> on time 09:59:30 <flwang1> cool, thank you, guys 09:59:32 <flwang1> #endmeeting