09:01:01 <flwang1> #startmeeting magnum
09:01:02 <openstack> Meeting started Wed Aug 26 09:01:01 2020 UTC and is due to finish in 60 minutes.  The chair is flwang1. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:01:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:01:05 <openstack> The meeting name has been set to 'magnum'
09:01:10 <flwang1> #topic roll call
09:01:12 <flwang1> o/
09:01:24 <brtknr> o/
09:01:25 <openstackgerrit> Spyros Trigazis proposed openstack/magnum master: Very-WIP: Drop hyperkube  https://review.opendev.org/748141
09:01:25 <dioguerra> o/
09:01:29 <strigazi> o/
09:02:56 <flwang1> thank you for join, guys
09:03:13 <flwang1> shall we go through the topic list?
09:03:57 <strigazi> +1
09:04:47 <flwang1> ok, recently we (catalyst cloud) just done a security review by 3rd party and we got some good comments to improve
09:05:29 <flwang1> hence you probably see there is a patch i proposed to separate the ca for k8s, etcd and front-proxy, though we discussed this long time ago
09:06:01 <flwang1> the patch https://review.opendev.org/746864 has been rebased on the ca rotate patch and tested locally
09:06:02 <strigazi> We knew that before, right? That each node could contact etcd.
09:06:09 <flwang1> strigazi: yep
09:06:37 <flwang1> each node can use the kubelet cert to access etcd
09:06:43 <flwang1> to be more clear ^
09:07:06 <strigazi> or kube-proxy
09:07:10 <flwang1> yes
09:07:16 <strigazi> or any cert signed by the CA
09:07:36 <flwang1> so, please review that one asap, hopefully we can get it in this V cycle
09:08:02 <strigazi> I have a question though
09:08:06 <flwang1> sure
09:08:21 <flwang1> i'm listening
09:08:27 <strigazi> We need this patch. I'm not against it by any means it helps a lot.
09:09:07 <strigazi> The trust and trustee are in all nodes anyway. So one can get whatever certs they need, this is known, right?
09:09:34 <strigazi> And
09:09:53 <strigazi> What about serving service account keypair via the API?
09:10:18 <strigazi> That's it
09:10:58 <flwang1> re the trust and trustee, yep. that's a good point, we can try to limit the request in the future to only allow it came from master node
09:11:20 <flwang1> though i don't know how yet
09:11:20 <strigazi> So RBAC on magnum API
09:11:46 <strigazi> I see different trustees or application creds as a solution.
09:11:47 <flwang1> some silly(home-made) RBAC probably
09:11:57 <flwang1> strigazi: that also works
09:12:12 <strigazi> Why silly, it can't get better than this
09:12:34 <strigazi> different policy per role
09:12:41 <flwang1> sorry, i mean, openstack don't reayll have a good rbac design yet
09:13:52 <strigazi> This point needs discussion. Maybe a spec. I just wanted to mention it.
09:13:58 <strigazi> For the 2nd point?
09:14:20 <strigazi> Since we pass type in the API we could serve the service account RSA keypair
09:14:30 <flwang1> yep, it's a very good point. thanks for the reminder
09:15:35 <flwang1> strigazi: can you explain more about the benefit of serving the service account rsa keypair by api?
09:15:53 <strigazi> Add a new master NG
09:16:47 <strigazi> ATM we can't access the RSA keypair. It is a hidden (as it should) parameter in the heat stack
09:16:52 <flwang1> master NG for what? resizing?
09:17:04 <strigazi> resizing is not blocked by this
09:17:14 <strigazi> adding a new one
09:17:19 <strigazi> master-ng-B
09:17:21 <flwang1> i can't see the point of have another master NG
09:17:48 <flwang1> for worker nodes, NG makes good sense
09:18:04 <strigazi> I want to use bigger master nodes, how do I do this?
09:18:10 <flwang1> but what's the value of multi master NG
09:18:25 <flwang1> nova resizing?
09:19:28 <strigazi> That's an option, but you see my point.
09:19:45 <flwang1> sure
09:19:54 <flwang1> very good comments, i appreciate it
09:20:00 <flwang1> next one?
09:20:04 <strigazi> +1
09:20:16 <flwang1> #topic PodSecurityPolicy and Calico
09:20:30 <strigazi> What did they break again?
09:20:54 <flwang1> i'm still working on this, maybe you guys can help confirm
09:21:29 <flwang1> after using --admission-controller-list label with PodSecurityPolicy, the calico pod can't start, the cluster can't be started
09:22:20 <flwang1> with this post http://blog.tundeoladipupo.com/2019/06/01/Kubernetes,-PodSecurityPolicy-and-Kubeadm/ i think we need a dedicated psp for calico if we want to enable PodSecurityPolicy
09:22:50 <strigazi> we have a very privileged PSP for calico, it doesn't work?
09:23:00 <strigazi> calico is using a PSP already
09:23:18 <flwang1> strigazi: good to know. i haven't reviewed the code yet
09:23:35 <flwang1> i will do another test, and it would be nice if you guys can help test it as well.
09:23:53 <flwang1> the PSP is getting a common requirement for enterprise user
09:24:03 <flwang1> EKS is enabling it by default
09:24:28 <flwang1> btw, i just found our default admission list in the code is very old, should we update  it?
09:24:39 <strigazi> sure
09:24:53 <strigazi> At CERN we have it in out CT
09:24:58 <strigazi> At CERN we have it in our CT
09:25:01 <flwang1> right
09:25:35 <strigazi> https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/calico-service-v3-3-x.sh#L18
09:25:36 <flwang1> i will propose a patch based on the default list from v1.16.x, sounds ok?
09:27:36 <strigazi> sure
09:27:51 <strigazi> maybe for V we do the list for 1.19?
09:28:52 <flwang1> strigazi: can do
09:29:25 <strigazi> cool
09:29:26 <flwang1> re the calico psp, maybe i missed something, but at line https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/calico-service.sh#L30
09:29:44 <strigazi> yeap, that's it
09:29:44 <flwang1> i can see this role name, but seems we didn't create the role?
09:29:53 <strigazi> we do
09:30:18 <strigazi> https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/kube-apiserver-to-kubelet-role.sh#L125
09:31:21 <flwang1> ok, got it, so we are using magnum.privileged as the psp
09:31:26 <strigazi> yes
09:31:42 <strigazi> same as GKE was doing (at least when I checked)
09:32:20 <flwang1> ok, i will test this again
09:32:33 <flwang1> let's move on?
09:32:36 <strigazi> +1
09:32:55 <flwang1> #topic Add observations field to CT
09:33:06 <flwang1> dioguerra: ^
09:34:21 <dioguerra> Hey, so the idea of this is to add a new field visible to the user where we can add observations
09:34:53 <dioguerra> The idea came so that we could label the CT with DEPRECATED/PRODUCTION/BETA or something similar
09:35:09 <flwang1> does the field have to be a enum?
09:35:21 <flwang1> or the list can be defined by the admin via config option?
09:35:50 <flwang1> sorry, i haven't read the code yet
09:35:56 <dioguerra> no, i made it so it is free text (other admins might want to add other observations like HA, Multimaster
09:36:19 <dioguerra> you agree?
09:37:14 <flwang1> i'm not sure, if it's kind like a label or tag, then if i want to have several tags for the CT, i have to do something like AAA-BBB-CCC, is it?
09:39:39 <dioguerra> its not a label, its a new field with free text
09:39:56 <dioguerra> so --obersvations <something>
09:39:59 <flwang1> i understand
09:40:23 <flwang1> i'm just saying a free text is making it like a free tag
09:40:26 <brtknr> Its basically a tag to filter cluster templates right? "observations" is a bit of a mouthful. Can we just call it "tags"
09:41:43 <brtknr> does the current implementation allow user to filter using this field?
09:41:46 <dioguerra> it is not a filter (although you might use it for that) its just a ref http://paste.openstack.org/show/797160/
09:42:18 <dioguerra> brtknr: no
09:43:38 <flwang1> hmm...i understand this is neat than putting the HA or Prod into the cluster template name, but I expect to make it more useful to deserve having a dedicated db field
09:44:37 <flwang1> dioguerra: i'm not saying i don't like the idea. actually it will be useful, i probably need a bit time to think about it.
09:45:06 <dioguerra> well, the idea of doing filtering with the field cross my mind. We can add it now if you would like or later...
09:46:23 <brtknr> dioguerra: i think that would be the thing which would add value to this proposal
09:46:59 <flwang1> shall we move to next topic?
09:47:04 <jakeyip> will it be easier to filter using tags instead of free text?
09:47:05 <flwang1> we have 15 mins left
09:47:17 <jakeyip> database don't match well on TEXT
09:47:54 <jakeyip> sorry, cont' please
09:49:07 <dioguerra> in our usecase we usually only have some visible templates (3 to 4) so filtering is not really required. everything else is hidden
09:49:36 <dioguerra> jakeyip: tag would be better for the DB yes. but that restricts what you want to put.
09:50:23 <jakeyip> do we have a description field?
09:50:25 <flwang1> i would suggest putting the discussion into the patch
09:50:35 <flwang1> not here
09:50:42 <flwang1> move on?
09:50:47 <jakeyip> +1
09:51:00 <flwang1> #topic Drop hyperkube https://review.opendev.org/748141
09:51:02 <flwang1> strigazi: ^
09:51:12 <flwang1> tell us more?
09:51:48 <strigazi> k8s community and some ex-openstack members though we should not have too much fun with hyperkube and dropped it.
09:52:22 <strigazi> I have a solution there that gets a tarball with kubelet, kubectl, kubadm and kube-proxy (90mb)
09:52:36 <strigazi> it works, running the kubelet from a bin
09:52:51 <strigazi> and the rest components from their respective images.
09:53:01 <strigazi> All good so far, now the problems
09:53:04 <flwang1> kubeadm?
09:53:19 <strigazi> flwang1: well, it is there
09:53:28 <strigazi> flwang1: can't skip it
09:53:44 <strigazi> even the kube-proxy binary, we don't need it
09:53:50 <flwang1> sounds like another breaking change
09:53:51 <strigazi> we need only kubelet and kubectl
09:54:07 <strigazi> flwang1: which one? kubadm?
09:54:23 <brtknr> hmm why does k8s.gcr.io make other binaries available in containers but not kubelet I wonder
09:55:00 <strigazi> flwang1: I just mention what is in the tarball. Is it clear?
09:55:07 <flwang1> strigazi: i wonder if the new change will allow old cluster to be upgraded to
09:55:56 <brtknr> i suppose that is why the PS is "Very WIP"
09:55:57 <strigazi> flwang1: there is literally nothing we can do for clusters that reference k8s.gcr.io/hyperkube.
09:56:38 <strigazi> flwang1: if your clusters use <my-registry>/hyperkube, we can build a hyperkube
09:57:11 <dioguerra> i need to go o/
09:57:18 <strigazi> brtknr: does it matter? They decided to stop building it. (the reason is CVEs in debian base image)
09:58:09 <strigazi> brtknr: flwang1: hello?
09:58:23 <flwang1> strigazi: i appreciate the work, my only concern is we need to work out a design that make sure old cluster can be ugpraded
09:59:24 <strigazi> flwang1: what is your situation? (regarding there is literally nothing we can do for clusters that reference k8s.gcr.io/hyperkube. && if your clusters use <my-registry>/hyperkube, we can build a hyperkube)
09:59:52 <flwang1> we're getting hyperkube from dockerhub/catalystcloud
10:00:06 <strigazi> pro account i guess
10:00:17 <flwang1> now i'm trying to build hyperkube for v1.19.x
10:00:22 <strigazi> the free account won't cut it any more
10:00:55 <strigazi> it relatively easy. But let me rephrase
10:01:15 <strigazi> Shall we move to the binary for V, so that we don't have to maintain a new image?
10:01:38 <strigazi> brtknr: ^^ flwang1 ^^
10:02:40 <flwang1> strigazi: moving to binary is our goal i think, and we don't have choice
10:02:49 <strigazi> flwang1: easy to build, hard to maintain.
10:03:06 <flwang1> you mean maintain the binanry?
10:03:15 <strigazi> flwang1: no, the image
10:03:29 <flwang1> i see
10:03:48 <brtknr> I'm okay with that, I echo flwang's concern that existing clusters can also be upgraded, which should be possible.
10:04:06 <flwang1> yep, but again, as public cloud, and a GAed service, we can't break the upgrade path
10:04:51 <flwang1> though we should be able to do magic in the upgrade-k8s.sh
10:05:11 <flwang1> at least, the good thing is we don't have to replace the operating system
10:05:15 <strigazi> The upstream project broke it. So for V we do the binary hoping they won't break it again.
10:05:55 <flwang1> that's a good excuse for us, but we can't use it for our public cloud customer unfortunately :(
10:06:40 <flwang1> strigazi: i will review your patch and see how can we resolve the upgrade issue
10:09:02 <flwang1> strigazi: brtknr: anything else?
10:09:08 <strigazi> I'm good
10:10:24 <brtknr> strigazi: any reason not to use binaries everything?
10:11:00 <brtknr> there is also a server binaries tarball i assume is for master node
10:11:37 <strigazi> brtknr: they are 300mb and I think it is more secure and elegant running in containers.
10:12:22 <flwang1> ok, let me end the meeting first
10:12:27 <flwang1> #endmeeting