#openstack-containers log

10:00:40 <strigazi> #startmeeting containers
10:00:41 <openstack> Meeting started Tue Jul 17 10:00:40 2018 UTC and is due to finish in 60 minutes.  The chair is strigazi. Information about MeetBot at http://wiki.debian.org/MeetBot.
10:00:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
10:00:44 <openstack> The meeting name has been set to 'containers'
10:00:48 <strigazi> #topic Roll Call
10:00:54 <strigazi> o/
10:01:10 <flwang1> o/
10:02:49 <strigazi> It is the two of us then, let's do it quickly
10:02:56 <flwang1> ok
10:02:58 <sfilatov> o/
10:03:08 <strigazi> hey sfilatov
10:03:09 <sfilatov> I'm here to discuss deletion :)
10:03:25 <strigazi> #topic Blueprints/Bugs/Ideas
10:03:40 <strigazi> Test v1.11.0 images
10:03:47 <strigazi> #link https://hub.docker.com/r/openstackmagnum/kubernetes-kubelet/tags/
10:03:59 <strigazi> conformance tests are passing for me
10:04:11 <flwang1> nice nice
10:04:26 <strigazi> Note: you must use cgroupfs as cgroup_driver
10:04:45 <flwang1> so are we going to bump k8s version for rocky to 1.11.0?
10:04:50 <strigazi> with systemd the node is ready, but it cannot schedule pods
10:04:53 <brtknr> o/
10:05:09 <strigazi> flwang1: I think yes, it is better
10:05:10 <flwang1> strigazi: then we need to document it somewhere
10:05:24 <strigazi> of course
10:05:24 <flwang1> strigazi: cool, i can do that
10:05:37 <strigazi> but we need to test it first :)
10:05:42 <strigazi> not only me :)
10:05:56 <flwang1> strigazi: sure, when i say i can do that, i mean  i will test it
10:06:10 <flwang1> and if it works for me, i will propose a patch and add doc
10:06:36 <strigazi> btw I'm still evaluating the gcr.io hyperkube containers vs a fedora based one
10:07:11 <flwang1> any benefit using hyperkube?
10:07:56 <strigazi> we don't build kubernetes at all we just package the hyperkube container
10:07:59 <strigazi> but
10:08:25 <strigazi> hyperkube is based on debian, incompatibilities may occur
10:08:57 <strigazi> I'll push two patches for the two ways to build
10:09:04 <flwang1> hmm...
10:09:30 <flwang1> ok
10:09:37 <strigazi> to build the rpms is trivial like: git clone kube && bazel build ./build/rpms
10:10:52 <strigazi> like this: http://paste.openstack.org/show/726097/
10:11:57 <flwang1> nice
10:12:27 <strigazi> bazel is a black box for me, but seems to work pretty well and pretty fast
10:12:51 <strigazi> we can take this offile
10:12:56 <strigazi> we can take this offline
10:12:58 <strigazi> next:
10:13:18 <strigazi> this is a trivial change after all: https://review.openstack.org/#/c/582506/
10:13:23 <strigazi> Resolve stack outputs only on COMPLETE
10:13:42 <strigazi> but I expect to help a lot when magnum poll heat
10:14:04 <strigazi> in devstack you can not see it, but in big stack will make a difference
10:14:48 <strigazi> are you looking? should I move one?
10:15:09 <sfilatov> + from me
10:15:26 <flwang1> that looks good for me
10:15:48 <strigazi> next is the keypair issue and scaling
10:15:50 <flwang1> though i may need to take a look at the resolve_outputs param
10:16:18 <strigazi> flwang1: there is a show_params api
10:16:27 <flwang1> cool
10:16:44 <strigazi> flwang1: resolve_outputs says to heat to not bother with the outputs of the stack
10:17:02 <flwang1> and outputs means more api calls in heat i guess?
10:17:14 <flwang1> to other services
10:17:15 <strigazi> flwang1: even during stack creation heat will though all server to get the ips
10:17:28 <strigazi> flwang1: it means more load on the engine
10:17:39 <flwang1> right, matching what i thought
10:17:44 <flwang1> all good
10:17:53 <strigazi> flwang1: and it means slow api response
10:18:09 <strigazi> flwang1: normally I have 250ms response time
10:18:45 <strigazi> flwang1: with a 50 node cluster in progress, any api call goes to 15 seconds
10:19:17 <flwang1> omg
10:19:26 <strigazi> flwang1: all magnum nodes hit all heat api eventually with the same request and the apis block
10:19:41 <strigazi> but
10:20:07 <strigazi> if you create the stack with the heat api and magnum is not hammering, all good
10:20:29 <strigazi> without output resolve the stack get call is a simple lookup
10:20:44 <flwang1> strigazi: thanks for the clarification
10:21:31 <strigazi> I created a 500 node cluster 2 weeks ago and immediatly I stop magnum from hitting heat, everything was smooth
10:21:53 <strigazi> next, keypair
10:22:15 <strigazi> keypair is an OS resource that it cannot be shared in a project
10:22:44 <strigazi> and a cluster owns let's say a key
10:23:02 <strigazi> admin can't do stack update or other members
10:23:07 <strigazi> with this patch:
10:23:24 <strigazi> #link https://review.openstack.org/#/c/582880/
10:23:40 <strigazi> users can do a stack update freely
10:23:46 <strigazi> with these params in heat:
10:23:51 <strigazi> deferred_auth_method = trusts
10:23:52 <strigazi> reauthentication_auth_method = trusts
10:24:06 <strigazi> all good so far but
10:24:28 <strigazi> if the user is deleted its trust is deleted so the stack can not be touched again
10:25:18 <strigazi> afaik only solution: pass the key in user_data and not as a keypair in nova
10:25:22 <strigazi> thoughts?
10:25:53 <brtknr> How does it allow other users to make changes? it is not immediately obvious to me
10:25:59 <brtknr> looking at the patch
10:26:21 <strigazi> brtknr the keypair is created on cluster creation
10:26:54 <flwang1> so instead of using the original user's public key, we just generate a common one for everybody?
10:27:10 <strigazi> brtknr: and authtication is done using the trust, so userB will authenticate behind the scenes with the trust of userA
10:27:32 <strigazi> flwang1 the thing is that there is no such thing as common jey
10:27:34 <strigazi> flwang1 the thing is that there is no such thing as common key
10:27:51 <strigazi> the key is only visible by the creator
10:28:06 <brtknr> how does userA enable trust for userB? I suppose this has to be set somewhere?
10:28:06 <flwang1> can't we just generate one and 'share' it to all users in the tenant?
10:28:27 <strigazi> flwang1 impossible in nova
10:28:32 <flwang1> strigazi: ok
10:28:32 <sfilatov> flwang1: can we share nova keys
10:28:35 <sfilatov> y
10:28:45 <strigazi> brtknr: heat does this
10:28:49 <strigazi> sfilatov: no we can't
10:29:55 <strigazi> sfilatov: https://docs.openstack.org/horizon/pike/user/configure-access-and-security-for-instances.html
10:30:01 <strigazi> A key pair belongs to an individual user, not to a project. To share a key pair across multiple users, each user needs to import that key pair.
10:30:23 <strigazi> that is not correctly expressed
10:30:49 <strigazi> it means all users must import the same public_key
10:31:04 <sfilatov> strigazi: yep, so we can't do it natively
10:31:31 <strigazi> we can simulate the shared key with heat using the trust
10:31:48 <strigazi> but as I said if the user is deleted the trust is gone
10:32:32 <brtknr> hmm
10:32:48 <brtknr> would be nice to set group priviledge
10:33:03 <brtknr> e.g any user in admin group can modify the cluster
10:33:03 <strigazi> I don't think it is possible
10:33:54 <brtknr> but this is certainly the next best thing
10:34:02 <strigazi> brtknr: and it is not only desirable for admins
10:34:15 <strigazi> brtknr: in private clouds you have shared resources
10:34:33 <strigazi> in public too, but not as much as in private
10:34:58 <strigazi> and as I mentioned passing the key in user data will work in all cases.
10:35:16 <strigazi> does this sound bad to you ^^
10:36:37 <brtknr> how does it limit who is allowed to make changes to the cluster or not?
10:36:53 <brtknr> or is it not limited at all?
10:36:55 <strigazi> it does not
10:37:16 <brtknr> sounds a little worrying lol
10:38:19 <brtknr> how about a heat parameter that is a list of users that are allowed to make change
10:38:33 <brtknr> or we assume that anyone in the project is allowed to make changes
10:39:10 <strigazi> you know about this right? https://github.com/strigazi.keys
10:39:36 <strigazi> let's take this offline, I need to explain the problem more I guess
10:39:46 <strigazi> it is a limitation of nova
10:39:54 <strigazi> not magnum or heat
10:40:18 <sfilatov> Let's talk about k8s loadbalancer deletions then
10:40:32 <strigazi> I have one more thing for upgrades, sorry
10:40:36 <sfilatov> sure
10:41:12 <strigazi> For the past weeks, I'm trying to drain nodes before rebuilding them
10:41:49 <strigazi> The issue is that this api call must be executed before every node rebuild
10:42:01 <strigazi> so it must be in the heat workflow
10:42:22 <strigazi> otherwise heat is not managing the status
10:42:32 <strigazi> of the infrastructure anymore
10:43:16 <strigazi> I'm trying this pattern so far: http://paste.openstack.org/show/726098/
10:43:45 <strigazi> with no success so far
10:44:14 <strigazi> I'm thinking of putting the workflow in the master or in magnum
10:45:07 <sfilatov> And btw is draining the node the only roght way to do this? Are there issues behind upgrading in-place?
10:45:17 <sfilatov> downtime?
10:45:26 <strigazi> in-place there is no such problem
10:46:04 <sfilatov> yes, but the workflow you consider is about draining and rebuilding the nodes
10:46:08 <strigazi> but it means, the OS must support upgrades in place and if you have upgraded a few times
10:46:15 <flwang1> can you remind me the limition of in-place upgrade?
10:46:17 <strigazi> thinks will go wrong
10:46:39 <strigazi> s/thinks/things/
10:47:10 <strigazi> 1. GKE and cluster-api are not doing in-place
10:47:33 <strigazi> 2. upgrading a OS in-place is not an atomic operation
10:48:28 <strigazi> rebuild works even in ironic
10:49:09 <strigazi> the suggested way from lifecycle sig is replace
10:49:25 <strigazi> only master nodes in-place
10:50:14 <sfilatov_> sry, I'm back
10:50:38 <strigazi> flwang1: sfilatov_ thoughts?
10:50:54 <sfilatov_> I don't see the history since I reconnected
10:51:20 <strigazi> < flwang1> can you remind me the limition of in-place upgrade?
10:51:20 <sfilatov_> strigazi: could copy and paste it pls
10:51:23 <flwang1> strigazi: fair enough
10:51:27 <strigazi> 1. GKE and cluster-api are not doing in-place
10:51:31 <strigazi> 2. upgrading a OS in-place is not an atomic operation
10:51:36 <strigazi> rebuild works even in ironic
10:51:40 <strigazi> the suggested way from lifecycle sig is replace
10:51:44 <strigazi> only master nodes in-place
10:52:33 <strigazi> with multimaster you can even replace masters one by one with no downtime
10:52:39 <sfilatov_> strigazi: what you mean by upgrading OS
10:53:09 <strigazi> kernel verionN to kernel versionN+1
10:53:24 <sfilatov_> ok, got it
10:53:40 <strigazi> have you ever upgraded docker? it is so nice
10:53:57 <strigazi> but mostly the kernel
10:54:06 <sfilatov_> strigazi: that's true
10:54:17 <sfilatov_> strigazi: are you considering rebuilding masters as well?
10:54:34 <strigazi> yes, with a param
10:54:35 <sfilatov_> strigazi: looks like we have more or less the same issues with this
10:55:26 <sfilatov_> strigazi: I agree then. I llooked through API in the upgrade patch
10:55:39 <sfilatov_> strigazi: and it seems we need nodegroups implemented
10:55:52 <strigazi> let's move this to gerrit then
10:55:57 <sfilatov_> ok
10:55:59 <strigazi> sfilatov_: about delete?
10:56:02 <sfilatov_> yes
10:56:04 <strigazi> what is the issue?
10:56:15 <strigazi> I mean, I know the issue
10:56:23 <strigazi> what is the solution(s)
10:56:29 <sfilatov_> I have almost prepared a patch with software deployments for deletions
10:56:50 <strigazi> with an on-delete SD?
10:56:54 <sfilatov_> yes
10:56:56 <strigazi> push
10:57:06 <sfilatov_> I'd like to discuss 2 issues
10:57:12 <strigazi> shoot :)
10:57:23 <sfilatov_> We still need to wait for LB in neutron
10:57:41 <sfilatov_> since cloud provider does not support waiting for LB delettion
10:57:51 <sfilatov_> we can't wait using kubectl
10:58:11 <strigazi> hmm, that is not nice
10:58:29 <strigazi> flwang1: maybe kong has some input for this?
10:58:30 <flwang1> how do you wait for LB in neutron?
10:58:41 <strigazi> you ask the api I imagine
10:58:43 <sfilatov_> you get the LB with the name
10:58:46 <sfilatov_> yes
10:58:59 <sfilatov_> since you know LB name = 'a' + k8s svc id
10:59:09 <sfilatov_> but it's not really nice
10:59:13 <flwang1> and polling neutron api to see if it's still there?
10:59:20 <sfilatov_> yep
10:59:24 <flwang1> hmm...
10:59:44 <sfilatov_> we can fix this via cloud provider
10:59:53 <strigazi> 1. must be solved in the cloud-provider
11:00:01 <strigazi> 2. polling as a work around
11:00:10 <sfilatov_> got it
11:00:12 <strigazi> lgty?
11:00:15 <strigazi> flwang1: ^^
11:00:24 <flwang1> i'm ok with that
11:00:24 <sfilatov_> the other issue
11:00:35 <sfilatov_> what if the user stopped vms
11:00:42 <sfilatov_> basically - shutdown
11:00:48 <sfilatov_> I have faces the issue
11:00:59 <strigazi> why? I dont get it
11:01:01 <sfilatov_> and there is nothing I can do about it
11:01:11 <sfilatov_> vms are shutdown
11:01:18 <sfilatov_> and k8s api is not available
11:01:37 <sfilatov_> so when delete is triggered we can delete the resources
11:01:44 <sfilatov_> can't *
11:02:32 <flwang1> hmm... i'm wondering if magnum should take care such a corner case
11:02:34 <strigazi> there is no solution for this
11:02:43 <strigazi> only what flwang1 said
11:03:00 <strigazi> if we do it for the corner case though
11:03:14 <flwang1> for this case,  user need open a support ticket to ops
11:03:20 <flwang1> and get it removed :D
11:03:32 <strigazi> or he can remove it manually
11:03:38 <flwang1> to avoid magnum ops too boring
11:03:51 <sfilatov_> there's a way
11:03:57 <sfilatov_> to solve all the issues
11:04:10 <flwang1> i even don't think magnum should just bravely delete a cluster and everything on the cluster
11:04:20 <sfilatov_> if we add cluster_id to lb metadata
11:04:40 <flwang1> sfilatov_: Lingxian kong is working on that
11:04:41 <strigazi> and then?
11:04:45 <sfilatov_> and delete lb based on their metadata
11:05:02 <sfilatov_> in this case we don't need to access k8s API
11:05:05 <flwang1> that's the current solution
11:05:28 <strigazi> is there anything stopping us from doing this now?
11:05:40 <sfilatov_> we need to patch cloud provider
11:06:40 <strigazi> flwang1: Lingxian's patch is not in?
11:06:58 <flwang1> https://github.com/kubernetes/cloud-provider-openstack/pull/223
11:07:31 <flwang1> they're just putting the cluster name in lb description
11:07:41 <flwang1> so we're ok to go with current way i think
11:08:02 <sfilatov_> so there's no need for my patch?
11:08:10 <flwang1> i guess so?
11:08:15 <sfilatov_> with software deployment
11:08:16 <flwang1> if you're happy with this way
11:08:18 <strigazi> we still need a patch
11:08:25 <strigazi> in magnum
11:08:51 <flwang1> i can propose a new patch set on this https://review.openstack.org/#/c/497144/
11:09:04 <flwang1> to check the cluster name
11:09:11 <flwang1> then we should be ok
11:09:22 <strigazi> uuid is better I guess
11:10:00 <flwang1> strigazi: i think so, there is probably limition, i will check with CPO team
11:10:07 <strigazi> folks anything else? we are 10mins late and I'm 10 mins late for another meeting
11:10:16 <strigazi> flwang1: it accepts a string
11:10:21 <flwang1> good for me
11:10:24 <strigazi> flwang1: it can be anything
11:10:42 <flwang1> i mean CPO maybe hard to get the UUID of magnum's cluster
11:11:26 <strigazi> flwang1: it looks like a generic parameter to me, let's see
11:11:27 <flwang1> unless we pass it to somewhere so that CPO can easily get it, just my guess
11:11:36 <flwang1> need to check with the author
11:11:42 <strigazi> cool
11:11:56 <strigazi> sfilatov_: anything else?
11:12:42 <flwang1> strigazi: i think you're good to go ;)
11:12:47 <strigazi> let's wrap this then
11:12:54 <strigazi> thanks flwang1 sfilatov_ and brtknr
11:12:55 <strigazi> #endmeeting