18:01:12 <daneyon_> #startmeeting container-networking
18:01:14 <openstack> Meeting started Thu Nov 19 18:01:12 2015 UTC and is due to finish in 60 minutes.  The chair is daneyon_. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:01:15 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:01:18 <openstack> The meeting name has been set to 'container_networking'
18:01:21 <daneyon_> Agenda
18:01:26 <daneyon_> #link https://wiki.openstack.org/wiki/Meetings/Containers#Agenda
18:01:38 <daneyon_> I'll give everyone a minute to review the agenda
18:02:17 <daneyon_> #topic roll call
18:02:36 <hongbin> o/
18:02:38 <Tango> o/
18:02:51 <gus> hi
18:03:17 <daneyon_> Thank you hongbin Tango gus for joining
18:03:19 <Tango> gus: Pretty early for you?
18:03:26 <eghobo_> o/
18:03:40 <gus> Tango: yep, 5am - so use small words ;)
18:03:52 <daneyon_> Thank you eghobo_ for joining
18:03:58 <suro-patz> o/
18:03:58 <daneyon_> #topic Flannel host-gw patch
18:04:03 <daneyon_> #link https://review.openstack.org/#/c/241866/
18:04:09 <adrian_otto> skipped roll call?
18:04:20 <adrian_otto> oh, lagged
18:04:23 <daneyon_> I think the community has settled on option 2 of the patch
18:04:49 <daneyon_> which means the patch has to be refactored since it implements option 3
18:05:11 <daneyon_> Thank you adrian_otto for joining
18:05:17 <Tango> So I tried out the host-gw backend on our atomic-5 image but couldn't get it to work.
18:05:29 <Tango> No connectivity between the containers
18:05:39 <daneyon_> anyone willing to refactor the host-gw patch to implement option 2?
18:05:46 <Tango> Wonder if there is anything else we need to configure
18:06:21 <Tango> daneyon_: I would be interested to carry it forward.
18:06:32 <daneyon_> Tango you may need to open-up the neutron anti-spoofing iptables (enabled by default)
18:06:47 <daneyon_> Tango that's great
18:07:19 <Tango> daneyon_: Does that prevent rules to be added to the iptables?  Because I saw no change in the NAT table.
18:07:21 <gus> Tango: yep, it needs to "spoof" IP, but not MAC>
18:07:22 <gus> .
18:07:29 <daneyon_> If you haven;t done so already, can you leave a comment in the review letting everyone know you are going to carry it forward using option 2?
18:07:49 <Tango> daneyon_: Sure, will do that.
18:08:31 <daneyon_> Tango I have yet to play with the host-gw mode, but I would not be surprised if neutron's anti-spoofing is blocking. ping me more offline to discuss
18:08:46 <daneyon_> i'll move the agenda forward
18:08:48 <Tango> daneyon_: ok, sounds good.
18:08:55 <daneyon_> #topic Latest on Swarm CNM patches
18:08:59 <daneyon_> #link https://review.openstack.org/#/c/245286/2
18:09:03 <daneyon_> #link https://review.openstack.org/#/c/244978/
18:09:07 <daneyon_> #link https://review.openstack.org/#/c/244848/
18:09:20 <daneyon_> good news, 1 of the 3 merged... yay!!!!
18:09:30 <daneyon_> and the other 2 are well on their way.
18:09:56 <daneyon_> adrian_otto if you have time it would be much appreciated if you could review the final 2.
18:10:11 <daneyon_> i have addressed comments from previous patch sets
18:10:16 <daneyon_> and i think they are ready to go
18:10:34 <daneyon_> otherwise we need another core to +2 and +1 workflow
18:10:57 <daneyon_> then we'll have the flannel network-driver implemented for Swarm :-)
18:11:24 <daneyon_> unless their are any questions, I'll move the agenda forward.
18:11:28 <adrian_otto> daneyon_: I will make time to review them, thanks
18:11:43 <daneyon_> adrian_otto thank you sir.
18:11:46 <daneyon_> #topic Flannel for Mesos Update
18:11:59 <daneyon_> #link https://bugs.launchpad.net/magnum/+bug/1516767
18:11:59 <openstack> Launchpad bug 1516767 in Magnum "Add etcd support in mesos bay" [Undecided,New]
18:12:12 * gus is new to this one and the bug doesn't give background - why are we considering flannel for mesos?
18:12:22 <daneyon_> everyone take a moment to checkout out the link
18:13:02 <daneyon_> hongbin and i discussed not implementing the flannel driver for mesos until etcd is supported and has a chance to make
18:13:21 <hongbin> Yes, it is
18:13:31 <gus> My understanding is that mesos uses host IPs (or things NATted to look like host IPs), and so doesn't need any additional routing help since the hosts already know how to reach each other..
18:13:32 <daneyon_> otherwise we have to make considerable changes to how magnum implements k/v stores
18:14:10 <daneyon_> gus, that is correct. mesos is using "legacy" docker networking... containers get nat'd by the host (vm in our case)
18:14:24 <gus> ok, so where does flannel come into it>
18:14:24 <gus> ?
18:14:35 <daneyon_> this mesos issue brings up an important point of the Magnum CNM...
18:14:58 <daneyon_> Not every network-driver needs to be implemented across all bay types
18:15:32 <hongbin> I agree
18:15:38 <gus> right - I don't think there's any need for any additional network driver for mesos?
18:15:42 <Tango> +1
18:15:45 <hongbin> If flannel doesn't fit into mesos, we don't have to support that
18:16:10 <daneyon_> Somce network-drivers may provide significant value to a particular bay type, but not to others. We should be supportive of this instead of trying to jam a particular networking implementation across all bay types
18:16:14 <daneyon_> thoughts?
18:16:23 <Tango> Agree
18:16:42 <Tango> It may not make sense to fill out the matrix
18:17:05 <gus> I hadn't realised network drivers were being considered *across* bay types.  Yes, I strongly suggest network drivers should only be relevant/useful within specific bay types.
18:17:39 <daneyon_> gus just as with swarm, an operator/user may want to use flannel to realize the container network within the cluster... For example, if they do not want container ip's nat'd.
18:18:07 <Tango> So that raises the question:  how do we communicate which is supported where?
18:18:10 <gus> daneyon_: except then you aren't using mesos, right?
18:18:13 <daneyon_> this is what the Magnum CNM is all about, providing choice of container networking implementations, while using sensible defaults for simplicity
18:18:30 <daneyon_> mesos does not yet support the Magnum CNM
18:19:06 <daneyon_> we were planning on using flannel as the 1st implementation of the CNM for mesos, but then we hit the etcd issue
18:19:21 <adrian_otto> the idea is to have what we as a community agree is the best default choice for networking for each bay type, and a way that alternate choices can be selected when needs justify a departure from the default.
18:19:56 <adrian_otto> consistent application of one networking type for all bays is a non-goal.
18:20:04 <daneyon_> this caused me to think about our general approach to implementing the CNM... in that every net-driver does not need to necessarily cover every bay type (ex > mesos).
18:20:53 <daneyon_> Tango a net-driver support matrix is also what hongbin and i discussed
18:21:13 <daneyon_> any takers to create the matrix on the wiki?
18:21:38 <Tango> daneyon_: You would be the best candidate :)
18:22:03 <daneyon_> #action Tango to implement option 2 in the flannel host-gw patch https://review.openstack.org/#/c/241866/
18:22:14 <daneyon_> Tango agreed
18:22:31 <daneyon_> #action danehans to create a network-driver support matrix
18:23:40 <daneyon_> gus the matrix would show a N/A or not supported for flannel/mesos within the matrix
18:23:56 <gus> daneyon_: yep, makes sense.
18:24:05 <daneyon_> adrian_otto thx for sharing
18:24:24 <daneyon_> so we are in agreement on this topic
18:24:35 <daneyon_> thanks for the discussion.
18:24:47 <daneyon_> #topic Open Discussion
18:25:05 <daneyon_> I have a topic to kick it off and it includes Tango
18:25:28 <daneyon_> Tango ran into an issue that requires the flannel pkg to be upgraded
18:25:55 <daneyon_> however our version of atomic (fc21) does not support the req flannel pkg.
18:26:44 <hongbin> A question. Why not put flannel in a container?
18:26:46 <daneyon_> we can do 1 of 2 things: 1. live with the bug (vxlan does not work) or 2. Update the image so we can support a newer version of atomic and req flannel pkg
18:26:50 <daneyon_> thoughts?
18:27:15 <daneyon_> hongbin not a bad idea
18:27:23 <daneyon_> has anyone tried running flannel in a container
18:27:44 <adrian_otto> you's ned to use --net=host
18:27:49 <daneyon_> in my experience working on the kolla project, some services are a PITA to containerize
18:28:12 <daneyon_> it may be easier updating the atomic version, which i think needs to be done no matter what
18:28:16 <eghobo_> hongbin: I am going to try it for one of my real deployment and will share experience later
18:28:23 <daneyon_> and containerizing flannel could be a follow-on
18:28:47 <Tango> Some update: I saw kojipkgs has new update for flannel.  I tried building the packages but they all failed, even with the fc22 base.
18:28:47 <daneyon_> eghobo_ that would be great to know
18:29:02 <hongbin> eghobo_: thx
18:29:09 <adrian_otto> I would urge us not to just containerize flannel as a point solution, but could we containerize the full scope of what we run on bay nodes?
18:29:23 <adrian_otto> it might be much easier to do image maintenance that way
18:29:30 <daneyon_> i have a feeling that adrian_otto is correct using the --net=host, but that should not be an issue since the tenant owns the nova vm
18:30:09 <adrian_otto> it could potentially make bay nodes a bit more secure as well
18:30:19 <daneyon_> adrian_otto agreed. and containerizing flannel would be just 1 of the work streams of the BP.
18:30:24 <eghobo_> adrian_otto: we cannot put kublet at container yet
18:30:34 <hongbin> We have BP to containerize k8s into containers
18:30:37 <hongbin> #link https://blueprints.launchpad.net/magnum/+spec/run-kube-as-container
18:30:37 <gus> adrian_otto: I've certainly run all these bits in the past in containers, with liberal use of --net=host and --privileged.
18:30:46 <daneyon_> is anyone interested in creating this BP and kicking off the implementation?
18:31:07 <adrian_otto> eghobo_: I think it's possible with some elbow grease
18:31:42 <Tango> Would the BP cover just Flannel, or other services as well?
18:31:45 <eghobo_> gus: I talked with Kub and Docker folks and volumes are not going to work
18:32:16 <gus> eghobo_: depending on where you mount things from yes
18:32:17 <daneyon_> Tango I think we just leverage the bp hongbin shared
18:32:17 <eghobo_> as Tim K said only 90% functionality will work
18:33:00 <daneyon_> I think it would be nice to expand the language in that bp, so it's not only k8s... but i'm not hung up on it.
18:33:03 <hongbin> eghobo_: do you know exactly what won't work?
18:33:32 <gus> it's kind of a question of what you actually _want_ to run on the outside - you could run an entire nested docker for example, but you probably don't want to.
18:33:59 <daneyon_> as with the network-driver stuff, even if we can't containerize all services, let's containerize what makes sense.
18:34:26 <hongbin> +1
18:34:56 <gus> note flannel is also implemented in go, so "upgrading" flannel is about as easy as installing a container - you just need to install a single file.
18:35:04 <daneyon_> does anyone want to lead the effort of containerizing the services, maybe starting off with flannel?
18:35:17 <adrian_otto> great, let's proceed on that track, and apply what we learn.
18:35:33 <hongbin> daneyon_: wanghua is the owner of the BP
18:35:46 <hongbin> daneyon_: He said he was interesting to work on that
18:36:07 <daneyon_> #danehans check with wanghua on implementation status of https://blueprints.launchpad.net/magnum/+spec/run-kube-as-container
18:36:13 <daneyon_> #action danehans check with wanghua on implementation status of https://blueprints.launchpad.net/magnum/+spec/run-kube-as-container
18:37:28 <daneyon_> i still think we need to update the atomic image to a newer release, but that's a bit outside the scope of networking.
18:38:06 <daneyon_> any other discussion topics?
18:39:03 <Tango> I am checking the logs for the failed flannel builds.  We may need to get help from the IRC.
18:39:40 <daneyon_> gus that is correct re: the single flanneld bin. however, we have been using pkg's from the distro provider
18:40:33 <gus> yeah sure, just highlighting that we might not need to do that.  We're only supporting a single architecture so just downloading our own flannel binary somewhere is quite feasible.
18:40:47 <daneyon_> Tango it was my understanding that the newer flannel src rpm's will not build w/o going to atomic f22 or newer
18:41:10 <Tango> daneyon_: I tried building it against fc22, but that also failed
18:41:25 <gus> oh and re running everything in containers: I think mesos slaves only have support for a single "special" docker container (--docker_mesos_image) - it will assume it can garbage collect all the others.
18:41:40 <daneyon_> it seems like we are running an old school version of atomic and no matter what we do with flannel we should maintain the distro image
18:41:55 <daneyon_> or drop it and just focus on coreos, ubuntu, etc..
18:42:34 <eghobo> gus: there is bunch of images at https://hub.docker.com/u/mesoscloud/
18:42:57 <eghobo> but I have no idea they works or not ):
18:43:22 <gus> eghobo: yep, the mesos slave is usually just the single service.  But if we start putting various cluster-admin pieces in containers that might no longer be true.
18:43:23 <daneyon_> gus agreed. Since it's a broader discussion topic, we just need to address the design change (from pkg's to src) with the community.
18:44:30 <daneyon_> i took an actio to check with wanghua on the status of the BP. If it's stalled, we will need someone to drive the BP forward
18:44:56 <daneyon_> or update the atomic image to f22, update the flannel pkg, etc..
18:45:34 <daneyon_> or we just live with the bug :-( Not so great since vxlan provides better performance than UDP and we don;t have the host-gw option implemented yet.
18:45:48 <Tango> I am following up on the image
18:45:56 <daneyon_> thanks Tango
18:46:51 <gus> What's the appetite for using a complex neutron setup underneath?
18:47:06 <Tango> gus: Can you elaborate?
18:47:13 <daneyon_> i think if we containerize or not, their will be some services (at least for the foreseeable future) that can not be containerize and we have to support them directly on the OS.
18:47:20 <gus> I'm thinking of something like a route per host - ie: doing all of flannel's work in neutron itself.
18:47:35 <gus> (More like what k8s does on GCE)
18:48:02 <Tango> Would that be BP or patches done in neutron?
18:48:11 <daneyon_> gus: every bay gets a neutron router and a shared network, a float is created so you can ssh to the nodes from outside
18:48:16 <gus> downside: In a typical neutron setup using linuxbridge or ovs (without DVR), it won't scale well in practice since the traffic would all go up through the network node.
18:48:16 <daneyon_> i think it's pretty basic
18:48:47 <daneyon_> we are using the neutron lbaas, but that is needed for ha of api, etcd, etc..
18:49:10 <gus> daneyon_: right, it's basic currently.  Given the questions raised in the host-gw discussion about scaling beyond a single L2 network, I was just thinking of what that future might look like.
18:49:52 <daneyon_> gus i'm open to hearing other neutron network designs. Feel free to start the discussion on the ML
18:50:17 <gus> daneyon_: ack.
18:50:37 <hongbin> gus: Just confirm. Do you know kuryr?
18:50:45 <gus> hongbin: yes.
18:50:55 <daneyon_> note: that we also need to communicate with the kuryr team.
18:51:02 <hongbin> gus: And we plan to have a kuryr net-driver
18:51:07 <daneyon_> so we are in sync
18:51:26 <daneyon_> any other discussion?
18:51:38 <daneyon_> if not i'll close the meeting
18:51:54 <gus> hongbin: right - if it's just a matter of routes, we can probably assemble it using the current neutron api - but it would have some serious implications depending on how neutron itself was configured.
18:52:08 <gus> daneyon_: (nothing else from me)
18:52:15 <daneyon_> ok
18:52:29 <daneyon_> thanks everyone for joining and for the great discussion
18:52:38 <daneyon_> #endmeeting