14:30:39 #startmeeting k8s integration discussion 14:30:40 Meeting started Wed Mar 16 14:30:39 2016 UTC and is due to finish in 60 minutes. The chair is apuimedo. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:30:42 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:30:45 The meeting name has been set to 'k8s_integration_discussion' 14:30:53 Gal is not able to join today 14:30:54 Who is here for the meeting? 14:30:59 o/ 14:31:11 I am 14:31:21 I am 14:31:26 banix? 14:31:34 I happen to be here. 14:31:35 morning 14:31:37 o/ 14:31:41 o/ 14:31:52 fawadkhaliq: good for you you moved to DST this week 14:31:53 vikasc: nice to see you after some time :-) 14:32:19 salv-orlando: yes :) 14:32:28 #info irenab mspreitz salv-orlando tfukushima, fawadkhaliq banix vikasc and apuimedo present 14:32:58 apuimedo: thanks apuimedo, was not well !! 14:33:10 vikasc: hope you are very well now :-) 14:33:22 apuimedo: perfect now 14:33:24 :) 14:33:33 We are here to discuss today the status of the two efforts that were proposed in the previous of these meetings 14:33:58 One effort was the CNI->CNM translation integration. Lead by mspreitz 14:34:22 The other one the direct k8s/cni support led by irena 14:34:47 #topic direct CNI->CNM translation 14:34:59 #link https://github.com/kubernetes/kubernetes/pull/21956/commits/06ad511c055e8a9eb8cadf0e128d413ccdcdf64f 14:35:10 #link https://review.openstack.org/#/c/290172/ 14:35:38 Above are the commit that mspreitz submitted to k8s upstream and the devref he has submitted here for review 14:36:01 mspreitz: if you maybe want to give a short summary of the status? 14:36:03 that commit is just T0; the devref is T1 and T2 14:36:07 sure 14:36:26 thanks 14:36:30 The k8s PR is a very basic CNI plugin, but it covers most of what we need in the no-policy case. 14:36:36 mspreitz: the thing you pushed for k8s... is it just a sample about how to CNI or do you have other plands? 14:36:39 *plans 14:37:08 The devref for T1 is about how to implement the Network Policy proposal from the k8s network SIG, still single-tenant 14:37:17 The devref for T2 adds multi-tenancy. 14:37:50 The T0 plugin I have tested, it is very simple and does what it is intended to do. 14:37:58 I have started implementing T1, but am just getting started. 14:38:04 alright. Let's go T by T 14:38:20 sure 14:38:33 apuimedo: and eventually we'll get to Mr T 14:38:43 I pity the fool! 14:39:37 So has anybody looked at the T0 plugin 14:39:38 ? 14:39:42 T0 gets the Neutron network from the cni config file 14:39:51 yes 14:40:06 then talks to neutron to get a port and returns the IP information as per the CNI contract, from what I saw 14:40:22 (through libnetwork connect and disconnect) 14:40:24 yes, but not directly. It goes through Docker's network CLI 14:40:31 exactly 14:40:53 More precisely, it goes through `docker network` to the Kuryr libnetwork plugin 14:40:58 apuimedo, mspreitz: it could then work regardless of neutron 14:41:11 yes, the T0 plugin is not actually specific to Kuryr 14:41:12 salv-orlando: yes. It is just CNI to CNM 14:41:17 it works with any Docker network 14:41:23 yeah, makes sense 14:41:46 the only limitation, is that is pod level only, so you don't get service stuff 14:42:17 That kind of information does not reach CNI presently 14:42:20 The idea of T0 is that this is within the k8s networking context... 14:42:36 in which every pod and every host can open a connection to any pod. 14:42:48 Thus the existing k8s service load balancers work. 14:42:58 so you assume kube-proxy to take care of the service? 14:43:02 mspreitz: sure. I get that. I'm just giving information about what kind of gaps it fills and which it does not 14:43:16 irenab: yes. It relies on kube-proxy 14:43:28 The T0 plugin assumes that something else has completed the functional requirements: that is, enabled every host to open connections to every pod. 14:43:44 As discussed in the T1 devref, this can be accomplished in neutron by adding some routes. 14:44:09 so the vendors that implement T0 must have the network configured for the container accessible from the hosts (for example it being a public external network) 14:44:23 irenab: in short, I am confirming that I am talking about enabling kube-proxy to work. 14:44:34 apuimedo: right 14:44:44 ;-) 14:44:46 mspreitz: how IPAM is done? 14:45:01 irenab: however the Docker network does it 14:45:14 the T0 plugin is just a piece of glue 14:45:17 irenab: cni -> libnetwork -> kuryr ipam -> neutron 14:45:27 so subnet per host is not assumed 14:45:31 right 14:45:42 irenab: no 14:45:47 this is the comment that is discussed over the spec I pushed 14:46:21 irenab: I am not sure I follow. But I have not read the most recent comments. 14:46:24 yet 14:46:59 alright. Did everybody follow properly the assumptions, preconditions and workflow of T0? 14:47:25 yes, CNI for libnetwork 14:47:34 yes 14:47:43 rest is native k8s 14:47:49 salv-orlando: tfukushima: ? 14:47:53 yes 14:47:55 vikasc: ? 14:47:56 yes 14:48:04 Yes. 14:48:41 alright. Let's move on to T1 14:48:48 mspreitz: short intro, please 14:48:56 sure... 14:49:23 The k8s network SIG has been defining a new concept in k8s called "network policy", by which a k8s user can declare what connectivity is allowed. 14:50:00 It is deliberately more abstract than Neutron. Rather than talking about networks, subnets, firewalls, routes, and so on, it talks about which endpoints can open connections to which endpoints, that's all 14:50:17 it is designed, however, in a way that maps pretty directly onto Neutron security groups. 14:50:27 Both are additive sets of rules about allowed connections. 14:50:41 So the T1 devref specifies the mapping from network policies to Neutron security groups. 14:50:48 end of intro. 14:51:11 thanks mspreitz 14:51:38 mspreitz: by design of k8s "network policy", the network definition etc still has to be define separately, right? or you are saying this is policy based type of networking, where everything is just taken care of my magic? 14:51:53 s/my/by/ 14:51:56 No magic here 14:52:06 It is a matter of deliberate abstraction. 14:52:13 mspreitz: IIUC, to what you have in T0, you'd add a policy agent that watches the API for policy needs and creates SGs 14:52:15 mspreitz: I experimented with mappings of policy ingress to security groups and ended up with pretty same understanding as you 14:52:18 The idea is that the k8s user *never* talks about networks. He only talks about connectivity. 14:52:20 mspreitz: okay. makes sense. 14:53:30 frankly talking about network intended as "domain where things can connect each other" is non-relevant to what we have today because kubernetes does everything it can to ensure there could be only one of these things 14:53:38 It is up to the implementation to map allowed connectivity into lower level details. 14:53:50 but the mapping between security groups and network policies make quite sense 14:54:34 I think the obvious choice is that a k8s namespace equates to a Neutron virtual ethernet. 14:54:40 frankly my understanding is that you just need that one additional step due to translating pods and namespaces into source ips 14:54:45 for me it looks very much as GBP talking about App policies, while network policy is not the app concern 14:55:03 because in k8s network policy there is no allowed non-IP traffic between k8s namespaces. 14:55:06 mspreitz: I am not 100% convinced about the statement that k8s implicitly forbids non-IP traffic. IIRC it was said that it was up to the network plugin 14:55:16 irenab: exactly my thoughts 14:55:17 but that it was common to do it like that 14:55:28 "GBP"? 14:55:29 mspreitz: you have to make that choice in the plugin or any logic which sits outside of kubernetes anyway 14:55:51 yeah the british pount 14:55:57 mspreitz: GPB->group based policy 14:55:58 it's roughly 1.27 to the europ 14:56:12 I think the decision that k8s network policy forbids non-IP traffic between namespaces and says nothing else about non-IP traffic was agreed in a meeting and never written down. 14:56:16 let us not talk about GBP. it still hurts. 14:56:32 hey, for the young people, what is GBP? 14:56:35 banix: conceptually :-) 14:56:49 why banix did you moved all your savings into the UK before the dollar strenghtened? :) 14:56:51 group based policy 14:56:52 (young to the OSt world) 14:57:01 banix: :-) 14:57:09 :) 14:57:17 you should put it in CZK 14:57:24 eastern currency best currency 14:57:43 (I lost 10% of my money just so that car makers would be happy...) 14:57:52 Group Based Policies was the moment in which neutron community was very close to break in two 14:58:05 as someone said, they are devaluing their currency, we are going to stop it, and make them pay for it! 14:58:06 but I don't think this is relevant at all here 14:58:14 So we are agreed to ignore group based policy here and now? 14:58:28 mspreitz: yes, that was a just a comment :) 14:58:34 OK. 14:58:40 So getting back to T1... 14:58:44 mspreitz: from T1 devref 14:59:00 There is an unwritten detail about non-IP traffic that implies k8s namespace = neutron virtual ethernet. 14:59:15 I missed what puts the ports of a namespace into different SGs. I take it that it is the policy agent 14:59:46 apuimedo: policy agent ensures that is eventually consistent. CNI plugin speeds that up. 14:59:52 but the algorithm for creating needed sgs and doing the assignment once the nets and ports are created was a bit fuzzy for me 15:00:34 mspreitz: it checks the thirdparty resources for networkpolicy elements? 15:00:38 yes 15:00:40 *does it 15:00:47 it monitors, keeping an in-memory cache 15:01:04 keeping the cache up to date, incrementally responding to changes, after startup transient. 15:01:19 so the burden of converting the allowfrom (ingress and egress) into networkpolicies is on the k8s api, then, right? 15:01:51 no, allowFrom is part of the syntax of a network policy 15:01:53 apuimedo: I'm not sure, but it would be great if the api server did somehting like that 15:01:57 allowfrom is the policy 15:02:12 but I'm afraid the burdgen is on the agent 15:02:13 apuimedo: watcher of policy that transaltes to SGs and assigns ports 15:02:24 The k8s api server just stores the network policies and provides events about their changes. 15:02:36 It is all up to us to interpret the policies 15:02:39 banix: mspreitz: if my understanding is right. There is several levels of policy and overrides that you can have, service, pod and namespace 15:03:02 is the current policy object a frankestein with overrides for each definition? 15:03:09 apuimedo: actually it is deliberately simple, no overrides 15:03:11 *frankenstein 15:03:18 it is only additive, just like neutron SGs 15:03:36 the k8s net policy is just a set of additive rules 15:03:39 mspreitz: good then 15:04:04 I talked to another vendor at kubecon that has added more structure, so they can have negatives and overrides 15:04:12 but that is not in the k8s net sig policy. 15:04:54 Because it is just a set of additive rules, it maps easily to Neutron security groups. 15:05:04 salv-orlando: it is almost a matter of necessity, if they want to keep the networking generic and without vendor lock in that the translation into useful enough policy objects is done by the API. Otherwise it is going to be a mess 15:05:45 apuimedo: I do not understand what you are saying. The k8s api server just stores the policy objects, not translate them into anything. 15:05:46 mspreitz: do you have an example response from the API on the thirdparty creation for people to see. It is generally good to include those kind of things on devref 15:06:02 apuimedo: good point, I will add examples. 15:06:18 both the definitions in the service description, the response from the API, and example SG creation 15:06:26 I do not have examples of translation to SGs, but I think you can find example policy source in the k8s net sig work. 15:07:02 mspreitz: It would be good to come up with a tentative one for the guestbook example and put it on the devref 15:07:12 apuimedo: will do. 15:07:17 some people understand better by example than explanation. So it is good to have both 15:07:18 mspreitz: yeah 15:07:34 agreed, just an oversight in my hurry. 15:07:37 apuimedo: +1 15:07:39 We're starting to be a bit short on time 15:07:48 8minutes for T2 15:07:54 apuimedo: as you attend the sig-network meetings quite reguardly you surely know what approach they're adopting. third party resource with completely external processing are a stopgap while we experiment 15:07:59 and then we go with irenab 15:08:32 apuimedo: I would not be worried to much about what the API server does or doesn't at this stage 15:08:33 actually T2 overlaps with the one of the biggest open issues in irena's proposal, namely multi-tenancy 15:08:42 salv-orlando: I know it's a stopgap, but I'm not sure if it will just graduate to the regular API on 1.3 or they are going to leave it for third parties 15:09:42 So my proposal for T2 is that each tenant gets its own router, and its Neutron networks attach to its router 15:09:50 I would wait with multi-tenancy for the next phase 15:09:58 But since k8s requires all this reachability, we have to augment that with a lot of routes. 15:10:12 mspreitz: just so that everyone follows. Here we are talking about ad-hoc multitenancy, or multitenancy from teh OSt point of view. Previous to any multi-tenancy that may eventually be brought on by k8s 15:10:13 apuimedo: third party APIs in k8s are not really APIs... just a way to stash blob of arbitrary json data 15:10:25 so it'll have to graduate to become a thing 15:10:38 actually this is an important point, MT in k8s 15:10:44 It is not there now, as we all know. 15:10:45 salv-orlando: I sure hope it does, but it's not going to be for 1.2, right? 15:10:55 apuimedo: most definetely not 15:10:56 I have been supposing that some of us will wedge it in there somehow real soon. 15:11:31 (regarding 3rd party resource graduation: yes, it will happen, details TBD, not in 1.2) 15:11:33 salv-orlando: that's my point. That until it does, policy and to a bigger extent, multi-tenancy, are scaffolding 15:11:53 that are useful for our OSt + k8s operators 15:12:05 Some of us are going to do multi-tenancy in k8s real soon, by whatever hook and crook we need to get it creaking along. 15:12:12 but that will be in motion and for which we probably can't guarantee bc 15:12:21 mspreitz: indeed 15:12:23 Right now, I am thinking of a wrapper that adds multi-tenancy and sharding. 15:12:31 sharding is beyond the scope of this discussion. 15:12:48 mspreitz: sharding not based on namespaces? 15:12:53 The important point, I think, is only this: each k8s namespace will belong entirely to exactly one tenant. 15:13:08 tenant can be specified as label on Pod/Namespace 15:13:14 So we can put a label on a k8s namespace that identifies the tenant. 15:13:41 mspreitz: when you say exactly one tenant is that 1 - 1 or 1 - N 15:13:55 sounds reasonable 15:13:56 namespace -> tenant is N -> 1 15:14:04 good. I can agree with that 15:14:45 And that is why I suppose each k8s has a label identifying the tenant 15:14:56 I hope that we all agree that whatever we build for policy and multitenancy will have to be reviewed and possibly changed without BC once the APIs graduate in k8s 15:15:08 "BC" ? 15:15:23 backwards compatibility 15:15:28 British Columbia 15:15:29 right 15:15:39 the whole point of the current status is that it is not stable 15:15:47 or before christ 15:15:48 banix: we were there already ;-) 15:15:51 no promise of stability to users 15:16:10 I don't think anyone is even considering backward compatibility 15:16:19 that was my position 15:16:24 alright. Because our contract with Magnum is that we have to adapt to fit what k8s ends up doing 15:16:28 anyway, back to achieving the MT. 15:16:35 the kubernetes folks are rather pragmatic,,,, they won't enforce something like that on a feature they know noone is using 15:16:37 good that we all understand that 15:16:40 I am not happy with requiring lots of routes 15:16:58 I am also not happy that Neutron routers do not exchange routes. Why is that, anyway? 15:17:20 mspreitz: you mean like implicit bgp? 15:17:23 exchange routes? 15:17:25 right 15:17:27 can we please get tot he point of different approaches? We have less than 15 mins 15:17:31 heh 15:17:45 #topic direct k8s/cni 15:17:52 irenab: the floor is yours 15:17:57 irenab is right we've been digressing on several non-essential points 15:18:24 As I see it the main difference with mspreitz approach is on CNI side 15:18:33 Native CNI plugin that works similary as the Nova plugging and an API watcher 15:19:05 the native API watcher is not that different from mspreitz proposes for networkpolicy monitoring 15:19:11 meaning that neutron API calls are done by watcher and not by CNI 15:19:11 and it could take on that job 15:19:26 CNM is not involved 15:19:27 CNI gets all required data to complete local binding 15:19:37 it gets rid of kube-proxy 15:20:00 apuimedo: hope our notes are synchorized ;-) 15:20:00 apuimedo: i thought floor was Irena’s :) 15:20:21 banix: it is. I'm just wiping it 15:20:23 banix: we both are deeply involved :-) 15:20:24 * apuimedo janitor 15:20:33 :) 15:20:36 it allows the usage of FIPs and LBs that are common tools for OSt operators 15:20:44 irenab: continue please 15:21:12 (the floor is clean now. /me puts slippery floor sign) 15:21:27 I think we have all the details already 15:21:40 * mspreitz thinks "we" does not include me 15:21:47 Watcher gets k8s events and calls neutron 15:21:54 mspreitz: this is the place to ask 15:22:12 CNI receives IP, port uuid, mac address via pod annotations and performs local binding 15:22:40 personally I prefer not using CNM, but I do like the scaffolding for policy and tenancy that mspreitz is pushing 15:22:50 mspreitz: I meant apuimedo and myself by ‘we’ 15:23:16 apuimedo: for me it is complementary and can be achived by the watcher 15:23:47 fawadkhaliq: salv-orlando: vikasc: tfukushima: mspreitz: thoughts? Questions? 15:23:54 banix: you too, okay 15:23:55 so the policy agent that mspreitz mentioned is one of the watchers 15:23:57 I have some questions. Since we already have a libnetwork plugin that does local binding, why avoid using it? 15:24:20 yeah... can you spend some words on why CNI frees you up to use Floating IPs? 15:24:28 I just don't understand why CNM would not 15:25:01 not sure how CNI/CNM related to FIPs 15:25:02 salv-orlando: the API watcher doesn't watch just for Pods (which is what CNI gets info about) 15:25:09 it watcher for services 15:25:20 which can get externalIP requests defined 15:25:29 as well as Loadbalancer type 15:25:36 and replaces kube-proxi 15:25:38 apuimedo: ok... but then if it was the CNI/CNM thing mspreitz did, it could do the same 15:25:57 teh API watcher, when it sees those, it does the FIP/lb assignment to the port it creates for CNI 15:26:05 cool guys, you were then referring to the whole solution not the CNI plugin only. 15:26:06 Q2: it looks like Irena wants to replace kube-proxy with Neutron load balancers. Can we factor that into a separate spec/devref? It has lots of issues. 15:26:14 salv-orlando: exactly 15:26:19 I think there are several not dependent issues we need to discuss 15:26:36 1. CNI approach 15:26:46 2. Service implementation 15:26:59 3. External IP support 15:27:00 3. Policy 15:27:08 ui, got my numbering wrong :P 15:27:32 42. tenancy 15:27:44 (that padding should give enough space) 15:27:47 * salv-orlando apuimedo knows 42 is always ht eirght naswer 15:28:01 :-) 15:28:10 I guess avoiding CNM makes sense if we try to consider other container engines. otherwise, why? 15:28:30 which is something we should do sooner or later… ok perhaps later 15:28:33 and k8s supports rkt 15:28:34 banix: I think we are digressing too much into this CNI/CNM dicotomy 15:28:35 I think avoiding `docker network` makes no sense, it leaves Docker misinformed about connectivity. 15:28:46 banix: there is two extra reasons 15:29:02 rkt and host access to Neutron APIs 15:29:18 frankly I am ok with supporting both approaches.... and then maybe we can use the pure CNI to provide neutron networking with rkt or whatever 15:29:35 salv-orlando: I wanted to say the same :-) 15:29:40 salv-orlando: that's what I proposed from the begining 15:29:51 that we'd have a CNI->CNM translation 15:29:53 I would love to focus more on the "watcher" which will give us 2, 4 (apuimedo's 3) and maybe also 3 15:29:53 sounds good 15:29:54 now 15:29:57 seems we agree 15:30:04 just in time 15:30:24 time for group hug 15:30:25 then build CNI direct integration for rkt and other advanced stuff we may need to do 15:30:34 banix: :-) 15:30:44 one last thing. I'd like a watcher discussion next week 15:30:58 #action mspreitz to update the devrefs with examples 15:31:03 same time same place apuimedo? 15:31:12 apuimedo: I'm up for it 15:31:18 apuimedo: maybe we can push devref by then 15:31:24 banix: yes. THe same bat-hour in the same bat-channel 15:31:35 I won’t be able to attend next week 15:31:36 irenab: which? 15:31:38 I would rather start 30 min earlier, i possible 15:31:44 if possible 15:31:51 mspreitz: sounds good to me 15:31:52 apuimedo: for watcher 15:31:54 start at 14:00 UTC 15:32:03 irenab: yes. THat's a good point 15:32:06 let's do that 15:32:11 ok with me 15:32:12 14 utc everybody? 15:32:19 lgtm 15:32:19 ok with me too 15:32:31 apuimedo: thanks a lot for facilitation 15:32:33 It's better than 15:00 UTC. 15:32:37 very well. Thank you very much to all of you for joining. It was a very nice meeting 15:32:44 thank you all 15:32:45 #endmeeting