#openstack-containers log

21:04:18 <strigazi> #startmeeting containers
21:04:18 <openstack> Meeting started Tue Apr  9 21:04:18 2019 UTC and is due to finish in 60 minutes.  The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:04:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:04:21 <openstack> The meeting name has been set to 'containers'
21:04:21 <strigazi> #topic Roll Call
21:04:30 <strigazi> o/
21:04:33 <colin-> hello
21:04:36 <ttsiouts> o/
21:06:10 <strigazi> #topic Stories/Tasks
21:06:11 <imdigitaljim> o/
21:06:25 <brtknr> o/
21:06:46 <strigazi> Last week I attempted to upgrade the default version of k8s to 1.14.0 but calico v2 wasn't passing
21:07:01 <strigazi> wasn't passing the conformance  test
21:07:13 <strigazi> I have the patch and results here:
21:07:40 <strigazi> https://review.openstack.org/#/c/649609/
21:08:01 <strigazi> flwang: suggest that the latest calico, may work. I'll give it a go
21:08:23 <imdigitaljim> we use the latest calico
21:08:29 <colby_> Hey Guys. Whats the latest version of kubernetes I can use on queens version of magnum (6.3.0). I tried with kube_tag=1.11.1-5 and 1.12 and both failed to build. The default 1.9.3 builds fine.
21:08:37 <strigazi> imdigitaljim: i know, that is why I'm not asking :)
21:08:50 <imdigitaljim> ah :D
21:09:05 <colby_> I mean kube_tag=v1.11.1-5
21:09:05 <imdigitaljim> conformance was passing as well
21:09:13 <imdigitaljim> so you might be right
21:10:24 <strigazi> For upgrades, I did some modifications for the worker nodes and with the heat API works pretty well for worker and it validates the passed nodegroup.
21:10:38 <strigazi> Some more clean up and it will works with the API.
21:10:51 <imdigitaljim> strigazi: https://kubernetes.io/docs/setup/version-skew-policy/
21:11:03 <imdigitaljim> have you seen that for upgrades?
21:11:16 <imdigitaljim> specifically https://kubernetes.io/docs/setup/version-skew-policy/#supported-component-upgrade-order
21:11:32 <flwang> o/
21:11:35 <strigazi> The only missing part is the container registry on clusters
21:12:21 <strigazi> imdigitaljim: yes, but it doesn't enforce it
21:12:22 <flwang> sorry i'm late, NZ just had a daylight saving
21:13:06 <strigazi> this madness with daylight will end soon, at least in the EU
21:13:36 <flwang> strigazi: yep
21:13:54 <flwang> strigazi: so are you still going to do the master upgrade in your existing patch?
21:13:59 <strigazi> yes
21:14:01 <flwang> or you will propose another one?
21:14:07 <strigazi> this one
21:14:44 <strigazi> flwang: do you want to the 1.14.0, it is calico related
21:14:52 <strigazi> flwang: do you want to the 1.14.0 pathc, it is calico related
21:14:56 <strigazi> also 1.14.1 is out
21:15:38 <flwang> want to (do)?
21:15:59 <strigazi> flwang: do you want to take the 1.14.0 patch, it is calico related
21:16:06 <flwang> hehe, sure i can
21:16:42 <flwang> but i'm busy on the auto scaling regression issue and the upgrade testing/review, is the v1.14.0 urgent for you?
21:17:08 <strigazi> not really really urgent
21:17:22 <flwang> strigazi: ok, then i can take it, no problem
21:17:32 <strigazi> i said not :)
21:18:18 <strigazi> regarding the *possible* regression with the autoscale. I wasn't able to reproduce. Can you describe it in  storyboard?
21:18:44 <flwang> strigazi: sure, are you using devstack or stable/rocky?
21:18:49 <strigazi> devstack
21:19:00 <flwang> and are you using the image from opentstackmagnum?
21:19:08 <strigazi> but a in a good machine :)
21:19:21 <strigazi> yes
21:19:34 <flwang> are you using my patch or a home-made autoscaler yaml?
21:19:52 <strigazi> from the CA repo, not your patch
21:20:39 <strigazi> I don't think this is the issue https://github.com/kubernetes/autoscaler/issues/1870
21:20:43 <flwang> my code also from ca repo but i'd like to understand the difference, and i think it is a corner case, but we need to figure it out
21:21:24 <flwang> strigazi: not sure, and I also got a scale down issue which autoscaler and magnum/heat are using different format of UUID
21:21:27 <strigazi> ok, with your patch is it 100% reproducible?
21:21:57 <flwang> i think it's reproduceible, but i don't think it's 100%, better give it a try by yourself
21:22:07 <flwang> and that would be really appreciated
21:22:16 <strigazi> ok, where do you test? dsvm?
21:22:23 <strigazi> master branch?
21:22:29 <flwang> master branch
21:22:32 <strigazi> ok
21:22:44 <flwang> with all latest code, including the patch NG-5
21:23:03 <strigazi> ok
21:23:28 <flwang> i will dig it today as well
21:23:40 <flwang> back to your upgrade patch, did you see all my comments?
21:23:47 <strigazi> cool, I'll check gerrit tmr
21:24:36 <flwang> now i can the minion upgrade works with those changed i mentioned in the patch, but in my testing, the master node will be rebuilt though i didn't change the image
21:25:38 <strigazi> I am lost in the comments, they are too many. what changes?
21:26:05 <strigazi> for the additional mounts it is fixed.
21:26:14 <flwang> i suggest you review all my comments, because that took me a lot of time for testing
21:26:30 <flwang> the additional mounts is for the minion side
21:26:35 <flwang> i'm talking about the master
21:26:40 <strigazi> sure, I'll address them
21:27:11 <flwang> so do you mean i shouldn't care about the master behaviour now since you haven't done it?
21:27:33 <strigazi> master is expected to fail atm.
21:27:59 <flwang> strigazi: it's not "fail", it's being rebuilt
21:28:12 <flwang> after rebuilt, master is using the new version of k8s
21:28:29 <strigazi> that is kind of a failure :)
21:28:44 <strigazi> I'll fix it
21:28:50 <flwang> let me explain a bit
21:29:22 <strigazi> I know the issue, it is because of user data
21:29:23 <flwang> after rebuilt, all components except kubelet will be back soon, and i have to restart kubelet to get it back
21:29:42 <flwang> it's really like the issue we're seeting for autoscaler's master rebuilt
21:29:50 <flwang> s/seeting/seeing
21:30:18 <strigazi> yeap, it is the issue we had with cluster_update some months ago and it was fixed
21:30:22 <flwang> i just wanna highlight that to see if you have any idea
21:30:31 <strigazi> yeap, it is the same issue we had with cluster_update some months ago and we fixed it
21:30:45 <flwang> which patch fixed it?
21:31:02 <flwang> with the autoscaler testing, i'm using master
21:31:15 <flwang> and i also rebased the upgrade patch locally for testing
21:31:23 <flwang> so i'm wondering which patch you're talking about
21:31:28 <strigazi> no, I mean the cause is the same as in cluster_update in the past.
21:32:03 <flwang> strigazi: so you mean you fixed it in your existing patch?
21:32:04 <strigazi> https://github.com/openstack/magnum/commit/3f773f1fd045a507c3962ae509fcd57352cdc9ae
21:32:07 <strigazi> no
21:32:28 <strigazi> flwang: let's take a step back.
21:32:53 <strigazi> The current patch for upgrades it is expected to "fail" for master.
21:33:17 <strigazi> The reason is "change of user_data of the vm"
21:33:17 <flwang> i get that
21:33:36 <strigazi> This reason used to break cluster_update and it was fixed.
21:33:58 <strigazi> I don't know what breaks autoscale, I''ll check.
21:34:10 <flwang> The reason is "change of user_data of the vm" --- we have to do same thing for master like we did for minion?
21:34:28 <flwang> to put those scripts "into" heat-container-agent?
21:34:47 <flwang> ok, i understand
21:34:49 <strigazi> in the current patch in gerrit I'll push a fix for master upgrade.
21:34:55 <strigazi> yes
21:34:58 <flwang> strigazi: cool
21:36:07 <flwang> based on my understanding of heat update, the master nodes being rebuilt is still caused by something changed for the master
21:36:16 <strigazi> correct
21:36:30 <flwang> we just need to figure out what has been changed which slip out our eyes
21:36:47 <flwang> cool, good to know we're on the same page
21:36:51 <strigazi> for upgrades yes, for the autoscaler to be checked.
21:37:16 <flwang> dioguerra was doubting the new security group rules from master to nodes
21:37:44 <flwang> but still failed after revert that one
21:38:01 <flwang> dioguerra: can you give us more details?
21:38:03 <strigazi> it comes from Ricardo too and I also mentioned it, but I didn't have enough courage to insist
21:38:41 <strigazi> in the patches in gerrit I mentioned that it breaks the pattern we use for ingress
21:39:03 <strigazi> this is the remove of pors 80/443
21:39:41 <strigazi> the other port is ssh which change the default behaviour.
21:39:46 <strigazi> the other port is ssh which changed the default behaviour.
21:40:08 <strigazi> I mentioned this in gerrit as well.
21:40:26 <flwang> if we confirmed the issue is caused by the security rules, i think we can revisit this part
21:41:33 <strigazi> as I mentione before, in clouds that don't have octavia (like ours) or even if they do, but users don't want to use it.
21:41:57 <strigazi> ingress works with a traefik or nginx or appscode/voyager
21:42:16 <flwang> strigazi: i can see your point, we maybe able to introduce a config to let cloud provider to config those rules?
21:42:16 <strigazi> using ports 80/443 in the workers
21:43:19 <strigazi> For this, we can open when open when traefik or nginx is used or with another label
21:43:50 <strigazi> same for ssh
21:43:52 <flwang> strigazi: yep, that would be better and easier
21:44:04 <strigazi> can be cloud wide or with labels
21:44:13 <strigazi> can be cloud/magnum-deployment wide or with labels
21:45:36 <flwang> yep, we can discuss this later for more details
21:46:11 <strigazi> we can put additional details in storyboard
21:46:19 <flwang> sure
21:47:55 <brtknr> what is the update on CRUD for nodegroups ttsiouts?
21:49:07 <flwang> strigazi: ttsiouts: is the ng-6 the last one we need for NG? on server side
21:49:59 <strigazi> is there an NG-6 in gerrit?
21:50:13 <flwang> https://review.openstack.org/#/c/647792/
21:50:57 <strigazi> before the new driver, i think this is the last one
21:51:17 <flwang> ok, good
21:51:32 <strigazi> and client
21:51:43 <flwang> i'm reviewing the client now
21:51:57 <ttsiouts> brtknr: I am refactoring the scripts for the deployment of the cluster
21:52:02 <strigazi> I guess we need a microversion
21:52:13 <flwang> strigazi: do you think we can start to merge upgrade api now?
21:52:16 <ttsiouts> brtknr: in heat side
21:52:48 <strigazi> flwang: we just need a check for master VS worker and we are good
21:53:42 <flwang> strigazi: ok, cool, i have done the api, ref, and client,  and it generally works with your functional patch
21:54:09 <strigazi> yeap
21:54:09 <flwang> so as long as your master upgrade work submitted, we can start to do integration testing and get things done
21:54:49 <flwang> strigazi: thank you for working on this, i know it's a hard one
21:54:51 <brtknr> ttsiouts: sounds good! let me know when its ready for testing :)
21:55:35 <strigazi> :)
21:55:53 <strigazi> just before closing, I want to make a shameless plug
21:56:07 <strigazi> if you use barbican, you may like this one:
21:56:20 <strigazi> https://techblog.web.cern.ch/techblog/post/helm-barbican-plugin/
21:56:37 <flwang> install barbican on k8s?
21:56:47 <strigazi> Ricardo wrote an excellent plugin
21:57:24 <strigazi> it can be easily added as a kubectl plugin
21:57:24 <flwang> strigazi: ah, that's nice, so you still need to have barbican deployed already, right?
21:57:39 <strigazi> yes, you need the barbican API
21:57:51 <flwang> and then just use barbican as the secret backend for k8s?
21:58:04 <brtknr> strigazi: I like that Kustomize is mentioned :)
21:58:12 <flwang> that's cool, actually, we already have customer asking that
21:58:27 <strigazi> this plugin is for client side usage.
21:58:48 <strigazi> For KMS there is an implementation in the CPO repo
21:59:40 <flwang> strigazi: cool
22:01:36 <strigazi> let's end the meeting?
22:02:39 <strigazi> Said once
22:02:51 <strigazi> said twice
22:02:56 <flwang> i'm good
22:03:01 <strigazi> thanks for joining everyone
22:03:01 <flwang> thanks strigazi
22:03:05 <strigazi> flwang: cheers
22:03:11 <strigazi> #endmeeting