21:04:18 <strigazi> #startmeeting containers 21:04:18 <openstack> Meeting started Tue Apr 9 21:04:18 2019 UTC and is due to finish in 60 minutes. The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:04:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:04:21 <openstack> The meeting name has been set to 'containers' 21:04:21 <strigazi> #topic Roll Call 21:04:30 <strigazi> o/ 21:04:33 <colin-> hello 21:04:36 <ttsiouts> o/ 21:06:10 <strigazi> #topic Stories/Tasks 21:06:11 <imdigitaljim> o/ 21:06:25 <brtknr> o/ 21:06:46 <strigazi> Last week I attempted to upgrade the default version of k8s to 1.14.0 but calico v2 wasn't passing 21:07:01 <strigazi> wasn't passing the conformance test 21:07:13 <strigazi> I have the patch and results here: 21:07:40 <strigazi> https://review.openstack.org/#/c/649609/ 21:08:01 <strigazi> flwang: suggest that the latest calico, may work. I'll give it a go 21:08:23 <imdigitaljim> we use the latest calico 21:08:29 <colby_> Hey Guys. Whats the latest version of kubernetes I can use on queens version of magnum (6.3.0). I tried with kube_tag=1.11.1-5 and 1.12 and both failed to build. The default 1.9.3 builds fine. 21:08:37 <strigazi> imdigitaljim: i know, that is why I'm not asking :) 21:08:50 <imdigitaljim> ah :D 21:09:05 <colby_> I mean kube_tag=v1.11.1-5 21:09:05 <imdigitaljim> conformance was passing as well 21:09:13 <imdigitaljim> so you might be right 21:10:24 <strigazi> For upgrades, I did some modifications for the worker nodes and with the heat API works pretty well for worker and it validates the passed nodegroup. 21:10:38 <strigazi> Some more clean up and it will works with the API. 21:10:51 <imdigitaljim> strigazi: https://kubernetes.io/docs/setup/version-skew-policy/ 21:11:03 <imdigitaljim> have you seen that for upgrades? 21:11:16 <imdigitaljim> specifically https://kubernetes.io/docs/setup/version-skew-policy/#supported-component-upgrade-order 21:11:32 <flwang> o/ 21:11:35 <strigazi> The only missing part is the container registry on clusters 21:12:21 <strigazi> imdigitaljim: yes, but it doesn't enforce it 21:12:22 <flwang> sorry i'm late, NZ just had a daylight saving 21:13:06 <strigazi> this madness with daylight will end soon, at least in the EU 21:13:36 <flwang> strigazi: yep 21:13:54 <flwang> strigazi: so are you still going to do the master upgrade in your existing patch? 21:13:59 <strigazi> yes 21:14:01 <flwang> or you will propose another one? 21:14:07 <strigazi> this one 21:14:44 <strigazi> flwang: do you want to the 1.14.0, it is calico related 21:14:52 <strigazi> flwang: do you want to the 1.14.0 pathc, it is calico related 21:14:56 <strigazi> also 1.14.1 is out 21:15:38 <flwang> want to (do)? 21:15:59 <strigazi> flwang: do you want to take the 1.14.0 patch, it is calico related 21:16:06 <flwang> hehe, sure i can 21:16:42 <flwang> but i'm busy on the auto scaling regression issue and the upgrade testing/review, is the v1.14.0 urgent for you? 21:17:08 <strigazi> not really really urgent 21:17:22 <flwang> strigazi: ok, then i can take it, no problem 21:17:32 <strigazi> i said not :) 21:18:18 <strigazi> regarding the *possible* regression with the autoscale. I wasn't able to reproduce. Can you describe it in storyboard? 21:18:44 <flwang> strigazi: sure, are you using devstack or stable/rocky? 21:18:49 <strigazi> devstack 21:19:00 <flwang> and are you using the image from opentstackmagnum? 21:19:08 <strigazi> but a in a good machine :) 21:19:21 <strigazi> yes 21:19:34 <flwang> are you using my patch or a home-made autoscaler yaml? 21:19:52 <strigazi> from the CA repo, not your patch 21:20:39 <strigazi> I don't think this is the issue https://github.com/kubernetes/autoscaler/issues/1870 21:20:43 <flwang> my code also from ca repo but i'd like to understand the difference, and i think it is a corner case, but we need to figure it out 21:21:24 <flwang> strigazi: not sure, and I also got a scale down issue which autoscaler and magnum/heat are using different format of UUID 21:21:27 <strigazi> ok, with your patch is it 100% reproducible? 21:21:57 <flwang> i think it's reproduceible, but i don't think it's 100%, better give it a try by yourself 21:22:07 <flwang> and that would be really appreciated 21:22:16 <strigazi> ok, where do you test? dsvm? 21:22:23 <strigazi> master branch? 21:22:29 <flwang> master branch 21:22:32 <strigazi> ok 21:22:44 <flwang> with all latest code, including the patch NG-5 21:23:03 <strigazi> ok 21:23:28 <flwang> i will dig it today as well 21:23:40 <flwang> back to your upgrade patch, did you see all my comments? 21:23:47 <strigazi> cool, I'll check gerrit tmr 21:24:36 <flwang> now i can the minion upgrade works with those changed i mentioned in the patch, but in my testing, the master node will be rebuilt though i didn't change the image 21:25:38 <strigazi> I am lost in the comments, they are too many. what changes? 21:26:05 <strigazi> for the additional mounts it is fixed. 21:26:14 <flwang> i suggest you review all my comments, because that took me a lot of time for testing 21:26:30 <flwang> the additional mounts is for the minion side 21:26:35 <flwang> i'm talking about the master 21:26:40 <strigazi> sure, I'll address them 21:27:11 <flwang> so do you mean i shouldn't care about the master behaviour now since you haven't done it? 21:27:33 <strigazi> master is expected to fail atm. 21:27:59 <flwang> strigazi: it's not "fail", it's being rebuilt 21:28:12 <flwang> after rebuilt, master is using the new version of k8s 21:28:29 <strigazi> that is kind of a failure :) 21:28:44 <strigazi> I'll fix it 21:28:50 <flwang> let me explain a bit 21:29:22 <strigazi> I know the issue, it is because of user data 21:29:23 <flwang> after rebuilt, all components except kubelet will be back soon, and i have to restart kubelet to get it back 21:29:42 <flwang> it's really like the issue we're seeting for autoscaler's master rebuilt 21:29:50 <flwang> s/seeting/seeing 21:30:18 <strigazi> yeap, it is the issue we had with cluster_update some months ago and it was fixed 21:30:22 <flwang> i just wanna highlight that to see if you have any idea 21:30:31 <strigazi> yeap, it is the same issue we had with cluster_update some months ago and we fixed it 21:30:45 <flwang> which patch fixed it? 21:31:02 <flwang> with the autoscaler testing, i'm using master 21:31:15 <flwang> and i also rebased the upgrade patch locally for testing 21:31:23 <flwang> so i'm wondering which patch you're talking about 21:31:28 <strigazi> no, I mean the cause is the same as in cluster_update in the past. 21:32:03 <flwang> strigazi: so you mean you fixed it in your existing patch? 21:32:04 <strigazi> https://github.com/openstack/magnum/commit/3f773f1fd045a507c3962ae509fcd57352cdc9ae 21:32:07 <strigazi> no 21:32:28 <strigazi> flwang: let's take a step back. 21:32:53 <strigazi> The current patch for upgrades it is expected to "fail" for master. 21:33:17 <strigazi> The reason is "change of user_data of the vm" 21:33:17 <flwang> i get that 21:33:36 <strigazi> This reason used to break cluster_update and it was fixed. 21:33:58 <strigazi> I don't know what breaks autoscale, I''ll check. 21:34:10 <flwang> The reason is "change of user_data of the vm" --- we have to do same thing for master like we did for minion? 21:34:28 <flwang> to put those scripts "into" heat-container-agent? 21:34:47 <flwang> ok, i understand 21:34:49 <strigazi> in the current patch in gerrit I'll push a fix for master upgrade. 21:34:55 <strigazi> yes 21:34:58 <flwang> strigazi: cool 21:36:07 <flwang> based on my understanding of heat update, the master nodes being rebuilt is still caused by something changed for the master 21:36:16 <strigazi> correct 21:36:30 <flwang> we just need to figure out what has been changed which slip out our eyes 21:36:47 <flwang> cool, good to know we're on the same page 21:36:51 <strigazi> for upgrades yes, for the autoscaler to be checked. 21:37:16 <flwang> dioguerra was doubting the new security group rules from master to nodes 21:37:44 <flwang> but still failed after revert that one 21:38:01 <flwang> dioguerra: can you give us more details? 21:38:03 <strigazi> it comes from Ricardo too and I also mentioned it, but I didn't have enough courage to insist 21:38:41 <strigazi> in the patches in gerrit I mentioned that it breaks the pattern we use for ingress 21:39:03 <strigazi> this is the remove of pors 80/443 21:39:41 <strigazi> the other port is ssh which change the default behaviour. 21:39:46 <strigazi> the other port is ssh which changed the default behaviour. 21:40:08 <strigazi> I mentioned this in gerrit as well. 21:40:26 <flwang> if we confirmed the issue is caused by the security rules, i think we can revisit this part 21:41:33 <strigazi> as I mentione before, in clouds that don't have octavia (like ours) or even if they do, but users don't want to use it. 21:41:57 <strigazi> ingress works with a traefik or nginx or appscode/voyager 21:42:16 <flwang> strigazi: i can see your point, we maybe able to introduce a config to let cloud provider to config those rules? 21:42:16 <strigazi> using ports 80/443 in the workers 21:43:19 <strigazi> For this, we can open when open when traefik or nginx is used or with another label 21:43:50 <strigazi> same for ssh 21:43:52 <flwang> strigazi: yep, that would be better and easier 21:44:04 <strigazi> can be cloud wide or with labels 21:44:13 <strigazi> can be cloud/magnum-deployment wide or with labels 21:45:36 <flwang> yep, we can discuss this later for more details 21:46:11 <strigazi> we can put additional details in storyboard 21:46:19 <flwang> sure 21:47:55 <brtknr> what is the update on CRUD for nodegroups ttsiouts? 21:49:07 <flwang> strigazi: ttsiouts: is the ng-6 the last one we need for NG? on server side 21:49:59 <strigazi> is there an NG-6 in gerrit? 21:50:13 <flwang> https://review.openstack.org/#/c/647792/ 21:50:57 <strigazi> before the new driver, i think this is the last one 21:51:17 <flwang> ok, good 21:51:32 <strigazi> and client 21:51:43 <flwang> i'm reviewing the client now 21:51:57 <ttsiouts> brtknr: I am refactoring the scripts for the deployment of the cluster 21:52:02 <strigazi> I guess we need a microversion 21:52:13 <flwang> strigazi: do you think we can start to merge upgrade api now? 21:52:16 <ttsiouts> brtknr: in heat side 21:52:48 <strigazi> flwang: we just need a check for master VS worker and we are good 21:53:42 <flwang> strigazi: ok, cool, i have done the api, ref, and client, and it generally works with your functional patch 21:54:09 <strigazi> yeap 21:54:09 <flwang> so as long as your master upgrade work submitted, we can start to do integration testing and get things done 21:54:49 <flwang> strigazi: thank you for working on this, i know it's a hard one 21:54:51 <brtknr> ttsiouts: sounds good! let me know when its ready for testing :) 21:55:35 <strigazi> :) 21:55:53 <strigazi> just before closing, I want to make a shameless plug 21:56:07 <strigazi> if you use barbican, you may like this one: 21:56:20 <strigazi> https://techblog.web.cern.ch/techblog/post/helm-barbican-plugin/ 21:56:37 <flwang> install barbican on k8s? 21:56:47 <strigazi> Ricardo wrote an excellent plugin 21:57:24 <strigazi> it can be easily added as a kubectl plugin 21:57:24 <flwang> strigazi: ah, that's nice, so you still need to have barbican deployed already, right? 21:57:39 <strigazi> yes, you need the barbican API 21:57:51 <flwang> and then just use barbican as the secret backend for k8s? 21:58:04 <brtknr> strigazi: I like that Kustomize is mentioned :) 21:58:12 <flwang> that's cool, actually, we already have customer asking that 21:58:27 <strigazi> this plugin is for client side usage. 21:58:48 <strigazi> For KMS there is an implementation in the CPO repo 21:59:40 <flwang> strigazi: cool 22:01:36 <strigazi> let's end the meeting? 22:02:39 <strigazi> Said once 22:02:51 <strigazi> said twice 22:02:56 <flwang> i'm good 22:03:01 <strigazi> thanks for joining everyone 22:03:01 <flwang> thanks strigazi 22:03:05 <strigazi> flwang: cheers 22:03:11 <strigazi> #endmeeting