08:59:33 #startmeeting magnum 08:59:34 Meeting started Wed Oct 9 08:59:33 2019 UTC and is due to finish in 60 minutes. The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:59:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:59:37 The meeting name has been set to 'magnum' 08:59:43 #topic roll call 08:59:48 o/ 08:59:49 o/ 09:01:26 brtknr: ? 09:01:34 oo/ 09:02:02 #topic fcos driver 09:02:10 strigazi: do you have give us an update? 09:02:24 s/have/wanna 09:02:39 the patch is in working state, I want to only add UTs 09:02:51 o/ 09:02:52 works with 1.15.x and 1.16.x 09:03:12 strigazi: when I tested it yesterday, I was havving issues with the $ssh_cmd 09:03:29 The only thing I want to check before merging is selinux 09:03:32 brtknr: works for me 09:03:44 brtknr: can you be more specific? 09:03:51 why do we use $ssh_cmd in some places and not others 09:04:06 is this the problem? 09:04:17 + ssh -F /srv/magnum/.ssh/config root@localhost openssl genrsa -out /etc/kubernetes/certs/kubelet.key 09:04:18 ssh: connect to host 127.0.0.1 port 22: Connection refused 09:04:45 sshd down? 09:04:49 this is in the worker node after which it fails to join the cluster 09:05:07 when I tried the command manually, it worked 09:05:23 also when i restarted the heat-container-agent manually, it joined the cluster 09:05:34 I have After=sshd.service 09:05:45 I can add requires too 09:05:52 wondering if heat-container-agent is racing against sshd 09:06:07 transient error? I never saw this after adding After=sshd.service 09:06:12 strigazi: it would be nice to have After=sshd.service 09:06:26 we do have it!!! 09:06:34 https://review.opendev.org/#/c/678458/8/magnum/drivers/k8s_fedora_coreos_v1/templates/user_data.json@75 09:06:43 then good 09:07:12 i haven't got time to test your new patch 09:07:25 hm, no it only in configure-agent-env.service 09:07:43 but 09:08:16 heat-container-agent.service has After=network-online.target configure-agent-env.service 09:08:27 and configure-agent-env.service has After=sshd.service 09:08:29 so it would work 09:08:39 I'll add Requires as well 09:09:07 anyway, that is a detail 09:09:12 strigazi: re UT, i'm happy with having limited UT for this new driver given the gate is closing 09:09:46 the important parts are two 09:10:00 one, the iginition patch is no working :( 09:10:10 ??? 09:10:16 your mean my heat patch? 09:10:19 I missed testing the last PS 09:10:20 yes 09:10:29 why? 09:10:33 what's the problem? 09:10:46 https://review.opendev.org/#/c/683723/5..13/heat/engine/clients/os/nova.py@454 09:10:48 is it a regression issue? 09:11:01 /var/lib/os-collect-config/local-data must be a direcroty 09:11:05 not a file 09:11:33 see https://github.com/openstack/os-collect-config/blob/master/os_collect_config/local.py#L70 09:11:57 shit 09:12:07 do you have a fix ? 09:12:35 not a proper one 09:12:54 if we write a file in /var/lib/os-collect-config/local-data our agent doesn't work still 09:13:12 but at least os-collect-config is not throwing an exception 09:13:26 o/ sorry I was late.. 09:14:01 my fix was to just copy the file to /var/lib/cloud/data/cfn-init-data" 09:14:26 I don't know why the local collector is not working 09:14:27 can we propose a patch to still use the two 'old' file paths? 09:14:49 yes a milllion times 09:16:08 or the heat team can help us using the local collector which will require us to patch our agent 09:16:52 so action is to patch heat again with the old files? 09:16:54 ok, i will take this 09:17:08 i will try the current way first 09:17:16 FYI, FCOS team is happy to patch ignition as well 09:17:24 I'm testing that too 09:17:36 they wrote the patch aleary 09:17:38 ok, let's do it in parallel 09:17:39 they wrote the patch already 09:17:40 just in case 09:17:44 yeap 09:18:00 and the second part for fcos driver is: 09:18:30 will we enable selinux? conformance pass (I think), I'll post the result to be sure 09:18:47 if we can enable selinux, why not? 09:19:36 I don't know how it will work with the storage plugins 09:19:57 flwang1: you can test cinder if you use it 09:20:07 not sure if cinder works with 1.16 09:20:51 strigazi: you mean test cinder with 1.16 with selinxu enabled? or they are 2 different cases 09:21:00 one 09:21:08 test cinder with 1.16 with selinxux enabled 09:21:21 does it work with 1.15? 09:21:31 iirc, in 1.16, k8s won't use the build-in cinder driver, right? 09:22:09 excellent :) 09:22:16 so nothing to test? 09:22:33 will we use csi? 09:22:43 i don't know, i need to confirm with CPO team 09:22:58 i will let you guys know 09:23:04 we get off-topic, but leave a comment in the review for cinder 09:23:20 we go with selinux if conformance passes. 09:23:28 that's it for me 09:23:43 cool, thanks 09:25:58 next? NGs? 09:26:10 ttsiouts: ng? 09:26:26 flwang1: strigazi: sure 09:26:39 i can see there are still several patches for NG, are we still going to get them in train? 09:27:02 we need them, they are fixes really 09:27:31 flwang1: yes 09:27:59 flwang1: the NG upgrade is the main thing I wanted to talk about 09:27:59 ok 09:28:52 the WIP patch https://review.opendev.org/#/c/686733/ does what it is supposed to do 09:29:00 it upgrades the given NG 09:30:07 the user has to be really careful as to what labels to use 09:30:29 ttsiouts: ok, user can upgrade a specific ng with the current upgrade command, right? 09:30:41 flwang1: exactly 09:30:54 flwang1: by defining --nodegroup 09:30:57 what do you mean 'what labels to use'? 09:31:52 flwang1: for example if the availability zone label is in the cluster template, and it is different than the one set in the NG 09:32:05 flwang1: it will cause the nodes of the NG to be rebuilt 09:32:27 ttsiouts: i thought all other labels get ignored apart from kube_tag? 09:32:45 the heat_container_agent_tag can be destructive too 09:32:57 brtknr: it should be, that's why i asked 09:33:41 hmm.. I tried to remove the labels that could cause such things here: https://review.opendev.org/#/c/686733/1/magnum/drivers/k8s_fedora_atomic_v1/driver.py@121 09:34:21 flwang1: brtknr: I am not sure they are ignored. But I can test again 09:34:42 hmm... i'd like to test as well to understand it better 09:35:05 the happy path scenarios work great 09:35:18 strigazi: yes 09:35:36 corner cases like changing one of the labels ttsiouts linked in the patch can be descrtuctive 09:35:52 or a bit rebuildy 09:36:21 yes this is why I wanted to raise this here. 09:36:38 so we are on the same page 09:36:48 ok 09:36:50 thanks 09:38:11 anything else? 09:38:23 I would rather prevent upgrade taking place if the specific subset of labels do not match 09:38:25 and we need these for tain 09:38:28 and we need these for train 09:38:33 e.g. force those specific labels to match 09:38:39 ttsiouts: ^ 09:38:48 for U yes 09:38:55 for T a bit late? 09:40:45 brtknr: the logic is that the non-default NGs get upgraded to match the cluster 09:41:34 brtknr: there is a validation in the api that checks that the user provided the same CT as the cluster has (only for non-default NGs) 09:41:49 strigazi: Yes, I am happy to develop it further later... perhaps add accompanying notes for bits that are "hacks" for this release... we dont want code to silently ignore... 09:42:34 brtknr: If we have one CT the labels cannot match to all cluster NGs 09:43:26 ttsiouts: i meant, matching label during upgrades 09:43:44 you have CT-A-v1 which has AZ-A 09:44:23 then you want that to upgrade nodegroup from CT-A-v1 to CT-A-v2 which has AZ-B 09:44:40 I'd rather the upgrade wasnt allowed in this situation 09:44:50 as it might be error on part of the user... 09:44:53 this doesn't make sense 09:45:15 I have ng-1-az-A and ng-1-az-B 09:45:30 I won't to upgrade both 09:45:41 I want to upgrade both 09:45:52 the CT in the cluster can be only one. 09:46:19 the rule above would never allow me to upgrade one of the two NGs 09:47:03 brtknr: check here: https://review.opendev.org/#/c/686733/1/magnum/api/controllers/v1/cluster_actions.py@153 09:47:31 for U we will put it in the NG spec with details. 09:47:54 for T we can say that only kube_tag will be taken into account 09:48:03 sounds reasonable? 09:48:08 im happy with only kube_tag for T 09:49:10 so we add only kube_tag in https://review.opendev.org/#/c/686733 ? 09:49:16 ttsiouts: brtknr flwang1 ^^ 09:49:23 i'm ok 09:49:33 this is the safest for now 09:49:46 and safeest 09:49:50 and safest 09:49:56 yes, does it still require stripping things out? 09:50:18 or would it be better to only select kube_tag during upgrade? 09:51:15 brtknr: we could use the code that checks the validity of the versions and pass only the kube_tag 09:51:16 it might be better to use "upgradeable" parts rather than "skip these labels" 09:51:37 it might be better to use "upgradeable labels" rather than "skip these labels" 09:51:46 for clarity 09:52:19 we can expand the list of "upgradeable labels" as we are able to upgrade more things 09:52:20 we're running out time 09:52:26 does that make sense? 09:52:49 let's keep it simple for T 09:52:56 pass onlt kube_tag, end of story 09:53:00 pass only kube_tag, end of story 09:53:31 add it in reno too, and we are clear 09:53:41 sounds good 09:54:02 lets start backporting things 09:54:36 #topic backport for train 09:54:57 bfv+ng+podman+fcos 09:55:30 basically current master plus ng and fcos? 09:55:56 yes 09:56:07 flwang1: sounds good? 09:57:22 yep, good for me 09:58:19 anything else for the topic/meeting? 09:58:28 i'm good 09:58:37 actually, i'm sleepy ;) 09:58:41 thats all 09:58:48 on time 09:59:30 cool, thank you, guys 09:59:32 #endmeeting