18:01:37 <daneyon_> #startmeeting container-networking 18:01:38 <openstack> Meeting started Thu Jan 7 18:01:37 2016 UTC and is due to finish in 60 minutes. The chair is daneyon_. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:01:39 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:01:42 <openstack> The meeting name has been set to 'container_networking' 18:01:46 <daneyon_> Agenda 18:01:51 <daneyon_> #link https://wiki.openstack.org/wiki/Meetings/Containers#Agenda 18:02:02 <daneyon_> i'll give everyone a few minutes to review the agenda. 18:02:08 <daneyon_> #topic roll call 18:02:15 <dane_leblanc_> o/ 18:02:34 <Tango> o/ 18:03:15 <daneyon_> Thank you dane_leblanc_ Tango for joining. 18:03:30 <dane_leblanc_> daneyon: The agenda link is for containers, not container networking? 18:04:29 <daneyon_> you will need to scroll down to hit the subteam agenda 18:04:29 <dane_leblanc_> #link https://wiki.openstack.org/wiki/Meetings/Containers#Container_Networking_Subteam_Meeting 18:04:35 <daneyon_> that's it, thx 18:05:22 <daneyon_> #topic Flannel host-gw patch 18:05:29 <daneyon_> #link https://review.openstack.org/#/c/241866/ 18:06:09 <Tango> So I got to spend some more time on this in December before the break 18:06:11 <daneyon_> Tango have you been able to spend any time on this patch? 18:06:26 <daneyon_> how is it coming along? 18:06:46 <Tango> I debugged and found the problem: an extra route on the local host causing the response to be trapped 18:07:06 <Tango> So the work around is to delete these extra route, and everything works very well 18:07:28 <Tango> I notified Angus, he is looking into replicating the problem to fix the code 18:07:38 <daneyon_> so basicly each flanneld host creates a route to other flanneld hosts for the flannel subnet that sits behind the host, correct? 18:08:05 <Tango> Yes, there is one route in the routing table for each other flannel host 18:08:20 <Tango> this allows the packet to reach the target host without encapsulation 18:08:28 <daneyon_> right 18:08:46 <Tango> The performance looks very good: on a 10 GBits/sec network, I am getting 6 GBits/sec 18:08:50 <Tango> with hostgw 18:09:02 <daneyon_> so this adding routes to the other flannel subnets with the next-hop of each flannel host would be expected. 18:09:13 <daneyon_> is the route you;re reffering to some other route that is being added? 18:09:25 <daneyon_> nice! 18:09:30 <Tango> For vxlan, I get 1.7 Gbits/sec 18:09:42 <dane_leblanc_> Wow, significant difference 18:09:52 <daneyon_> i would expect a big difference 18:09:54 <Tango> And for udp, it's pretty bad: 0.385 GBits/sec 18:10:14 <dane_leblanc_> Is this for packets of 500 bytes or so? 18:10:14 <Tango> So this does answer the cost of encapsulation 18:10:22 <hongbin> o/ 18:10:30 <Tango> The MTU is set at 1500 18:10:42 <Tango> I use iperf3 to run the test 18:10:50 <daneyon_> i think the flannel vxlan will be a much better option when magnum has support for ironic hosts and the cloud provider has vxlan hw in the ironic hosts 18:10:57 <eghobo> Tango: we got the same numbers at our infra ;) 18:11:14 <daneyon_> hi hongbin thanks for joining 18:11:24 <Tango> eghobo: great, coorelation 18:11:47 <daneyon_> eghobo for vxlan, hostgw mode or both? 18:12:13 <eghobo> udp, vxlan 18:12:25 <Tango> So I am picking up the patch that Angus started, with the work around for now until the bug is fixed in Flannel 18:12:46 <daneyon_> Tango could you push the latest patch set so others can test drive? 18:13:01 <eghobo> i need OpenStack changes for hostgw, but no changes allow during holidays 18:13:19 <Tango> yep, working on that right now, should have it shortly 18:13:27 <dane_leblanc_> Tango: the bug description lists the host-gw option as unconditional. Should the description be changed to what's listed as option (2)? 18:14:03 <Tango> I will change the option so that the user can specify any option: udp, vxlan, hostgw 18:14:19 <daneyon_> Tango +1 re dane_leblanc_ updating the commit message to state option 2 18:14:25 <dane_leblanc_> Tango: Sounds great. 18:14:38 <daneyon_> Tango I am still confused on the extra route issue 18:15:02 <daneyon_> I understand how hostgw mode works, what was the problem with the routes being added? 18:15:14 <Tango> So there is one route for every other flannel host: if there are n hosts, you would see n-1 route in the table 18:15:43 <Tango> The problem is that there is one route for the local host itself, which is not needed and not correct 18:15:58 <daneyon_> right, these routes are how the flannel host selects the correct hop to send packets. 18:16:00 <Tango> so the number of route is n routes instead of n-1 18:16:12 <daneyon_> ah 18:16:16 <daneyon_> i see now 18:16:41 <Tango> The local route confuses the response packet 18:16:47 <daneyon_> so, that must be an issue when using hostgw in a standalone setup too 18:17:30 <Tango> probably 18:17:51 <daneyon_> is their an issue that has been created in upstream flannel? If not, I think it's a good idea to create one and add a link to it in the review 18:18:12 <daneyon_> Tango thanks for pushing through the issue 18:18:26 <Tango> One question I want check with everyone is, what should the default backend option be: udp, vxlan, or hostgw 18:18:36 <daneyon_> I think the hostgw option will make a lot of magnum users :-) 18:19:01 <hongbin> Yes, if it works 18:19:04 <daneyon_> I say leave the default as-is 18:19:09 <Tango> Angus was suggesting hostgw since it's the best, but this assumes all the nodes are on the same L2 network 18:19:22 <Tango> later when we add more advanced networking, it may break 18:19:23 <daneyon_> when the changes bake for a while and we get feedback, we can then change the default 18:19:36 <Tango> udp is the most general 18:20:00 <daneyon_> and hostygw is a good solution for a small-med size cluster... not for large though 18:20:07 <Tango> true 18:20:32 <Tango> so maybe we leave the default as udp, and in the user guide, give guidance on what to use 18:20:33 <dane_leblanc_> host-gw would eventually be a good option when we get Kuryr integrated, right? 18:20:47 <daneyon_> and when M supports ironic nodes and a cloud provider has vxlan hw, I would expect to see vxlan be a solid option for balancing scale and perf 18:21:13 <Tango> dane_leblanc_: maybe not, since kuryr would allow connecting between different networks 18:21:34 <daneyon_> dane_leblanc_ kuryr would be a completely different network-driver with it's own options 18:22:01 <dane_leblanc_> daneyon: I see. 18:22:23 <daneyon_> or kuryr would fall under the libnetwork driver and pass a label to specify which libnetwork driver (kuryr,calico,weave, etc.) to use 18:23:45 <daneyon_> eg --network-droiver=libnetwork, --label=libnetwork-driver=kuryr or --label=libnetwork-driver=overlay etc.. 18:24:08 <daneyon_> s/droiver/driver 18:25:10 <daneyon_> #action Tango to update https://review.openstack.org/#/c/241866/ to include wip patch set and updated commit msg to indicate option 2 18:25:26 <daneyon_> any other discussion about the hostgw patch? 18:25:51 <daneyon_> #topic Review Action Items 18:26:00 * daneyon_ danehans to address the 2-daemon approach on the kube irc and provide add'l info through Magnum ML 18:26:22 <daneyon_> I sent an msg to the ML before the holiday break. 18:27:19 <daneyon_> Brendan Burns from kube said containerizing etcd, flannel and kube services were out of documentation and support convenience. 18:28:26 <Tango> Do they recommend this approach? 18:28:28 <daneyon_> ATM i think it's still best to run flannel and etcd on the host instead of a container. The 2 docker daemon solution overcomplicates things. 18:28:41 <hongbin> +1 18:28:59 <hongbin> It is not the common practice in CoreOS as well 18:29:06 <daneyon_> Tango they don;t recommend it or recommend against it. They are taking a Switzerland approach ;-) 18:29:26 <daneyon_> I think it's up to us to make the call 18:29:56 <Tango> ok, I guess the reason we were thinking doing this is to simplify things, so if it's more complicated, then maybe not 18:30:16 <daneyon_> However, if we don't containwrize flannel, we need to update the flannel pkg in our images so vxlan works again 18:30:50 <hongbin> daneyon_: I think Tango built a new image already? 18:30:57 <Tango> So I have been working on using diskimagebuilder to build new images 18:31:06 <daneyon_> oh, good 18:31:24 <daneyon_> any details you can share Tango on your DIB progress? 18:31:24 <Tango> I uploaded a new one: fedora-21-7.qcow2 18:31:44 <Tango> This has k8s 1.1, docker 1.9.1, flannel 0.5.5 18:31:56 <daneyon_> is this still an atomic image or simply f21? 18:32:10 <Tango> The image is fedora only without atomic 18:32:13 <Tango> f21 18:32:17 <daneyon_> ok 18:32:40 <Tango> I skip atomic to make it easier to work with, you can run apt-get install 18:32:40 <daneyon_> how is it coming along making it an "official" image? 18:32:55 <daneyon_> yeah, atomic is a PITA 18:33:16 <dane_leblanc_> Yay for apt-get 18:33:25 <Tango> Since we move to k8s 1.1, we need more testing on the API 18:33:25 <daneyon_> i think we still need to support a micro os, but I would like that to be coreos and forget atomic 18:33:28 <hongbin> simply because the heat templates are developed from a guy from Redhat :) 18:33:37 <hongbin> at the very beginning 18:33:39 <daneyon_> not a pressing need, but something more long-term 18:34:02 <daneyon_> as long as we have at least 1 OS that is well supported by Magnum and is easy to use, maintaine, etc.. 18:34:04 <hongbin> CoreOS is almost ready 18:34:21 <hongbin> I have patches that are under review 18:34:50 <hongbin> #link https://review.openstack.org/#/q/status:open+project:openstack/magnum+branch:master+topic:bp/coreos-k8s-bay 18:35:06 <daneyon_> hongbin right on. larsks did a great job, now we need to continue tailoring the templates to meet our needs. 18:35:21 <daneyon_> nice hongbin 18:35:23 <hongbin> daneyon_: agreed 18:35:40 <hongbin> I am all for moving away from Atomic 18:35:58 <Tango> Good to hear agreement on atomic 18:36:20 <daneyon_> Tango can you post a link to the f21 image so we have it recorded? 18:36:26 <eghobo> hongbin: +1 but I still think we need Ubuntu as well 18:36:32 <Tango> https://fedorapeople.org/groups/magnum/fedora-21-7.qcow2 18:36:46 <daneyon_> #link https://fedorapeople.org/groups/magnum/fedora-21-7.qcow2 18:36:55 <daneyon_> Tango thanks 18:37:14 <hongbin> eghobo: want a BP for ubuntu? 18:37:28 <daneyon_> eghobo I am all for supporting Ubuntu or any other add'l distro as long as it's well supported by the community 18:37:37 <daneyon_> the matrix of support can get out of hand 18:38:01 <eghobo> hongbin: I think BP is exists, Tango? 18:38:04 <Tango> I am setting up DIB to choose fedora, ubuntu, and I guess coreOS now that we are close to getting it working 18:38:05 <daneyon_> i would much rather have a solid solution on 1 distro than breakage on multiple distros 18:38:55 <dane_leblanc_> What image should be used as default for upstream gate testing? Whatever is smallest? 18:39:05 <Tango> +1, but we do have to show user how to create their own image 18:39:25 <hongbin> dane_leblanc_: The gate is using Atomic 18:39:56 <Tango> dane_leblanc_: I am also trying to get one of the mininal image to work, either fedora or ubunu 18:40:10 <Tango> this would solve the size problem 18:40:22 <daneyon_> yeah, i think f21 minimal is key to the gate 18:40:30 <daneyon_> otherwise it's a pretty big image 18:41:27 <daneyon_> #topic Future Meetings 18:41:46 <daneyon_> I know we discussed this topic briefly before the holiday break 18:42:27 <daneyon_> Unless the group disagrees, I think we can move our discussions back to the general magnum meetings. 18:42:51 <daneyon_> If the group would like to continue the sub team, I would like to ask someone to chair the meetings. 18:43:09 <daneyon_> thoughts? 18:43:36 <hongbin> daneyon_: you are not available to chair this meeting? 18:43:55 <daneyon_> hongbin I'm divided 18:44:08 <hongbin> np from me to move it back if you want 18:44:12 <dane_leblanc_> daneyon: I would agree with moving back to using just the container meeting. Doesn't seem to be too much network-specific stuff to discuss 18:44:30 <daneyon_> I am being pulled into a few different directions and I don;t feel i have the necessary time to lead the sub team 18:44:52 <Tango> Sounds reasonable, we can resume if there is not enough time in the general meeting 18:45:19 <daneyon_> OK 18:45:54 <daneyon_> The I will consider that an agreement and let Adrian know that we are moving our discussion back to the general meeting. 18:46:36 <daneyon_> I think these meetings have been helpful over the last 6 months. 18:46:52 <daneyon_> I appreciate everyone's involvement in magnum networking. 18:46:59 <Tango> Seems like we have an agreement, and Adrian can kick us back out if we take too much time in the general meeting 18:47:01 <daneyon_> we are headed in the right direction. 18:47:30 <daneyon_> If we can get the hostgw patch merged and add a few network-drivers, then I will be very :-) 18:48:14 <Tango> I do want to mention an observation that we may want to keep an eye on 18:48:28 <daneyon_> #agreed Move the subteam meeting back to the general magnum meeting #link https://wiki.openstack.org/wiki/Meetings/Containers#Weekly_Containers_Team_Meeting 18:48:41 <daneyon_> Tango go ahead 18:48:51 <Tango> There seems to be divergence between the Docker community and Kubernetes community with regard to networking 18:48:58 <daneyon_> agreed 18:49:06 <Tango> This would complicate things for Magnum 18:49:15 <daneyon_> one of the reasons why i chose --network-driver instead of --libnetwork-driver 18:49:28 <Tango> although it does give Magnum a chance to be agnostic and give user choices 18:49:56 <Tango> I not sure how things will shake out, but at the moment it's very confusing 18:50:01 <daneyon_> i feel that I was prepared for the goog<>docker war between networking. 18:51:02 <Tango> kuryr seems to lean toward docker libnetwork for now 18:51:06 <daneyon_> for example contiv is a container networking solution that has support for kube and docker libnetwork 18:51:19 <daneyon_> i believe calico too 18:52:25 <daneyon_> when using --network-driver=calico with a k8s bay type, the M templates will need to make sure the correct DIR's, bins, config files, etc.. are orchestrated 18:52:41 <daneyon_> the same when using --network-driver=calico with a swarm bay type 18:53:23 <Tango> anyway, a lot of activities on networking ahead 18:53:28 <daneyon_> in genreal, as we add drivers we will see a lot more heat templates or conditions in the jinja/heat conditional support templates 18:54:00 <daneyon_> we need to get conditional logic in heat templates or implement the jinja layer that was discussed at the last mid cycle 18:54:22 <daneyon_> I believe Rob Pothier is working with the Heat community to add conditional support 18:54:32 <Tango> daneyon_: Is there a BP yet on adding kuryr as a driver? 18:54:41 <daneyon_> IMO this will be huge for Magnum to support a minimal # of templates 18:54:48 <daneyon_> not that i know of Tango 18:55:01 <daneyon_> the last we discussed, one of 2 things needed to happen 18:55:10 <daneyon_> 1. Magnum add support for ironic hosts 18:55:23 <daneyon_> 2. kuryr add support for nested VM's 18:56:00 <daneyon_> as we discussed in the past, I would expect someone fomr the K team to take the lead on adding the K driver to M 18:56:13 <daneyon_> Same goes with any other driver that gets added. 18:56:33 <daneyon_> If support lags in the future for a particular driver, then it gets pulled 18:56:54 <daneyon_> and M only includes drivers that are actively maintained 18:57:11 <daneyon_> otherwise we go down the same path and pain of our distro support 18:57:18 <hongbin> sounds good 18:57:29 <daneyon_> we have just a few mins left 18:57:35 <daneyon_> #topic Open Discussion 18:57:44 <daneyon_> any open discssion? 18:57:47 <daneyon_> quick? 18:57:48 <daneyon_> :-) 18:58:35 <daneyon_> Otherwise I'll send a msg to Adrian and the mailer that the sub team is merging back to the general community. 18:59:04 <daneyon_> I'll take that as no open discussion 18:59:22 <daneyon_> thanks again for everyone's support!!! 18:59:33 <Tango> Thanks daneyon_ for hosting 18:59:34 <daneyon_> and Happy New Year 18:59:38 <daneyon_> yw 18:59:40 <dane_leblanc_> Happy New Year! 18:59:49 <daneyon_> take care everyone and have a great day. 18:59:56 <daneyon_> #endmeeting