18:01:37 #startmeeting container-networking 18:01:38 Meeting started Thu Jan 7 18:01:37 2016 UTC and is due to finish in 60 minutes. The chair is daneyon_. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:01:39 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:01:42 The meeting name has been set to 'container_networking' 18:01:46 Agenda 18:01:51 #link https://wiki.openstack.org/wiki/Meetings/Containers#Agenda 18:02:02 i'll give everyone a few minutes to review the agenda. 18:02:08 #topic roll call 18:02:15 o/ 18:02:34 o/ 18:03:15 Thank you dane_leblanc_ Tango for joining. 18:03:30 daneyon: The agenda link is for containers, not container networking? 18:04:29 you will need to scroll down to hit the subteam agenda 18:04:29 #link https://wiki.openstack.org/wiki/Meetings/Containers#Container_Networking_Subteam_Meeting 18:04:35 that's it, thx 18:05:22 #topic Flannel host-gw patch 18:05:29 #link https://review.openstack.org/#/c/241866/ 18:06:09 So I got to spend some more time on this in December before the break 18:06:11 Tango have you been able to spend any time on this patch? 18:06:26 how is it coming along? 18:06:46 I debugged and found the problem: an extra route on the local host causing the response to be trapped 18:07:06 So the work around is to delete these extra route, and everything works very well 18:07:28 I notified Angus, he is looking into replicating the problem to fix the code 18:07:38 so basicly each flanneld host creates a route to other flanneld hosts for the flannel subnet that sits behind the host, correct? 18:08:05 Yes, there is one route in the routing table for each other flannel host 18:08:20 this allows the packet to reach the target host without encapsulation 18:08:28 right 18:08:46 The performance looks very good: on a 10 GBits/sec network, I am getting 6 GBits/sec 18:08:50 with hostgw 18:09:02 so this adding routes to the other flannel subnets with the next-hop of each flannel host would be expected. 18:09:13 is the route you;re reffering to some other route that is being added? 18:09:25 nice! 18:09:30 For vxlan, I get 1.7 Gbits/sec 18:09:42 Wow, significant difference 18:09:52 i would expect a big difference 18:09:54 And for udp, it's pretty bad: 0.385 GBits/sec 18:10:14 Is this for packets of 500 bytes or so? 18:10:14 So this does answer the cost of encapsulation 18:10:22 o/ 18:10:30 The MTU is set at 1500 18:10:42 I use iperf3 to run the test 18:10:50 i think the flannel vxlan will be a much better option when magnum has support for ironic hosts and the cloud provider has vxlan hw in the ironic hosts 18:10:57 Tango: we got the same numbers at our infra ;) 18:11:14 hi hongbin thanks for joining 18:11:24 eghobo: great, coorelation 18:11:47 eghobo for vxlan, hostgw mode or both? 18:12:13 udp, vxlan 18:12:25 So I am picking up the patch that Angus started, with the work around for now until the bug is fixed in Flannel 18:12:46 Tango could you push the latest patch set so others can test drive? 18:13:01 i need OpenStack changes for hostgw, but no changes allow during holidays 18:13:19 yep, working on that right now, should have it shortly 18:13:27 Tango: the bug description lists the host-gw option as unconditional. Should the description be changed to what's listed as option (2)? 18:14:03 I will change the option so that the user can specify any option: udp, vxlan, hostgw 18:14:19 Tango +1 re dane_leblanc_ updating the commit message to state option 2 18:14:25 Tango: Sounds great. 18:14:38 Tango I am still confused on the extra route issue 18:15:02 I understand how hostgw mode works, what was the problem with the routes being added? 18:15:14 So there is one route for every other flannel host: if there are n hosts, you would see n-1 route in the table 18:15:43 The problem is that there is one route for the local host itself, which is not needed and not correct 18:15:58 right, these routes are how the flannel host selects the correct hop to send packets. 18:16:00 so the number of route is n routes instead of n-1 18:16:12 ah 18:16:16 i see now 18:16:41 The local route confuses the response packet 18:16:47 so, that must be an issue when using hostgw in a standalone setup too 18:17:30 probably 18:17:51 is their an issue that has been created in upstream flannel? If not, I think it's a good idea to create one and add a link to it in the review 18:18:12 Tango thanks for pushing through the issue 18:18:26 One question I want check with everyone is, what should the default backend option be: udp, vxlan, or hostgw 18:18:36 I think the hostgw option will make a lot of magnum users :-) 18:19:01 Yes, if it works 18:19:04 I say leave the default as-is 18:19:09 Angus was suggesting hostgw since it's the best, but this assumes all the nodes are on the same L2 network 18:19:22 later when we add more advanced networking, it may break 18:19:23 when the changes bake for a while and we get feedback, we can then change the default 18:19:36 udp is the most general 18:20:00 and hostygw is a good solution for a small-med size cluster... not for large though 18:20:07 true 18:20:32 so maybe we leave the default as udp, and in the user guide, give guidance on what to use 18:20:33 host-gw would eventually be a good option when we get Kuryr integrated, right? 18:20:47 and when M supports ironic nodes and a cloud provider has vxlan hw, I would expect to see vxlan be a solid option for balancing scale and perf 18:21:13 dane_leblanc_: maybe not, since kuryr would allow connecting between different networks 18:21:34 dane_leblanc_ kuryr would be a completely different network-driver with it's own options 18:22:01 daneyon: I see. 18:22:23 or kuryr would fall under the libnetwork driver and pass a label to specify which libnetwork driver (kuryr,calico,weave, etc.) to use 18:23:45 eg --network-droiver=libnetwork, --label=libnetwork-driver=kuryr or --label=libnetwork-driver=overlay etc.. 18:24:08 s/droiver/driver 18:25:10 #action Tango to update https://review.openstack.org/#/c/241866/ to include wip patch set and updated commit msg to indicate option 2 18:25:26 any other discussion about the hostgw patch? 18:25:51 #topic Review Action Items 18:26:00 * daneyon_ danehans to address the 2-daemon approach on the kube irc and provide add'l info through Magnum ML 18:26:22 I sent an msg to the ML before the holiday break. 18:27:19 Brendan Burns from kube said containerizing etcd, flannel and kube services were out of documentation and support convenience. 18:28:26 Do they recommend this approach? 18:28:28 ATM i think it's still best to run flannel and etcd on the host instead of a container. The 2 docker daemon solution overcomplicates things. 18:28:41 +1 18:28:59 It is not the common practice in CoreOS as well 18:29:06 Tango they don;t recommend it or recommend against it. They are taking a Switzerland approach ;-) 18:29:26 I think it's up to us to make the call 18:29:56 ok, I guess the reason we were thinking doing this is to simplify things, so if it's more complicated, then maybe not 18:30:16 However, if we don't containwrize flannel, we need to update the flannel pkg in our images so vxlan works again 18:30:50 daneyon_: I think Tango built a new image already? 18:30:57 So I have been working on using diskimagebuilder to build new images 18:31:06 oh, good 18:31:24 any details you can share Tango on your DIB progress? 18:31:24 I uploaded a new one: fedora-21-7.qcow2 18:31:44 This has k8s 1.1, docker 1.9.1, flannel 0.5.5 18:31:56 is this still an atomic image or simply f21? 18:32:10 The image is fedora only without atomic 18:32:13 f21 18:32:17 ok 18:32:40 I skip atomic to make it easier to work with, you can run apt-get install 18:32:40 how is it coming along making it an "official" image? 18:32:55 yeah, atomic is a PITA 18:33:16 Yay for apt-get 18:33:25 Since we move to k8s 1.1, we need more testing on the API 18:33:25 i think we still need to support a micro os, but I would like that to be coreos and forget atomic 18:33:28 simply because the heat templates are developed from a guy from Redhat :) 18:33:37 at the very beginning 18:33:39 not a pressing need, but something more long-term 18:34:02 as long as we have at least 1 OS that is well supported by Magnum and is easy to use, maintaine, etc.. 18:34:04 CoreOS is almost ready 18:34:21 I have patches that are under review 18:34:50 #link https://review.openstack.org/#/q/status:open+project:openstack/magnum+branch:master+topic:bp/coreos-k8s-bay 18:35:06 hongbin right on. larsks did a great job, now we need to continue tailoring the templates to meet our needs. 18:35:21 nice hongbin 18:35:23 daneyon_: agreed 18:35:40 I am all for moving away from Atomic 18:35:58 Good to hear agreement on atomic 18:36:20 Tango can you post a link to the f21 image so we have it recorded? 18:36:26 hongbin: +1 but I still think we need Ubuntu as well 18:36:32 https://fedorapeople.org/groups/magnum/fedora-21-7.qcow2 18:36:46 #link https://fedorapeople.org/groups/magnum/fedora-21-7.qcow2 18:36:55 Tango thanks 18:37:14 eghobo: want a BP for ubuntu? 18:37:28 eghobo I am all for supporting Ubuntu or any other add'l distro as long as it's well supported by the community 18:37:37 the matrix of support can get out of hand 18:38:01 hongbin: I think BP is exists, Tango? 18:38:04 I am setting up DIB to choose fedora, ubuntu, and I guess coreOS now that we are close to getting it working 18:38:05 i would much rather have a solid solution on 1 distro than breakage on multiple distros 18:38:55 What image should be used as default for upstream gate testing? Whatever is smallest? 18:39:05 +1, but we do have to show user how to create their own image 18:39:25 dane_leblanc_: The gate is using Atomic 18:39:56 dane_leblanc_: I am also trying to get one of the mininal image to work, either fedora or ubunu 18:40:10 this would solve the size problem 18:40:22 yeah, i think f21 minimal is key to the gate 18:40:30 otherwise it's a pretty big image 18:41:27 #topic Future Meetings 18:41:46 I know we discussed this topic briefly before the holiday break 18:42:27 Unless the group disagrees, I think we can move our discussions back to the general magnum meetings. 18:42:51 If the group would like to continue the sub team, I would like to ask someone to chair the meetings. 18:43:09 thoughts? 18:43:36 daneyon_: you are not available to chair this meeting? 18:43:55 hongbin I'm divided 18:44:08 np from me to move it back if you want 18:44:12 daneyon: I would agree with moving back to using just the container meeting. Doesn't seem to be too much network-specific stuff to discuss 18:44:30 I am being pulled into a few different directions and I don;t feel i have the necessary time to lead the sub team 18:44:52 Sounds reasonable, we can resume if there is not enough time in the general meeting 18:45:19 OK 18:45:54 The I will consider that an agreement and let Adrian know that we are moving our discussion back to the general meeting. 18:46:36 I think these meetings have been helpful over the last 6 months. 18:46:52 I appreciate everyone's involvement in magnum networking. 18:46:59 Seems like we have an agreement, and Adrian can kick us back out if we take too much time in the general meeting 18:47:01 we are headed in the right direction. 18:47:30 If we can get the hostgw patch merged and add a few network-drivers, then I will be very :-) 18:48:14 I do want to mention an observation that we may want to keep an eye on 18:48:28 #agreed Move the subteam meeting back to the general magnum meeting #link https://wiki.openstack.org/wiki/Meetings/Containers#Weekly_Containers_Team_Meeting 18:48:41 Tango go ahead 18:48:51 There seems to be divergence between the Docker community and Kubernetes community with regard to networking 18:48:58 agreed 18:49:06 This would complicate things for Magnum 18:49:15 one of the reasons why i chose --network-driver instead of --libnetwork-driver 18:49:28 although it does give Magnum a chance to be agnostic and give user choices 18:49:56 I not sure how things will shake out, but at the moment it's very confusing 18:50:01 i feel that I was prepared for the goog<>docker war between networking. 18:51:02 kuryr seems to lean toward docker libnetwork for now 18:51:06 for example contiv is a container networking solution that has support for kube and docker libnetwork 18:51:19 i believe calico too 18:52:25 when using --network-driver=calico with a k8s bay type, the M templates will need to make sure the correct DIR's, bins, config files, etc.. are orchestrated 18:52:41 the same when using --network-driver=calico with a swarm bay type 18:53:23 anyway, a lot of activities on networking ahead 18:53:28 in genreal, as we add drivers we will see a lot more heat templates or conditions in the jinja/heat conditional support templates 18:54:00 we need to get conditional logic in heat templates or implement the jinja layer that was discussed at the last mid cycle 18:54:22 I believe Rob Pothier is working with the Heat community to add conditional support 18:54:32 daneyon_: Is there a BP yet on adding kuryr as a driver? 18:54:41 IMO this will be huge for Magnum to support a minimal # of templates 18:54:48 not that i know of Tango 18:55:01 the last we discussed, one of 2 things needed to happen 18:55:10 1. Magnum add support for ironic hosts 18:55:23 2. kuryr add support for nested VM's 18:56:00 as we discussed in the past, I would expect someone fomr the K team to take the lead on adding the K driver to M 18:56:13 Same goes with any other driver that gets added. 18:56:33 If support lags in the future for a particular driver, then it gets pulled 18:56:54 and M only includes drivers that are actively maintained 18:57:11 otherwise we go down the same path and pain of our distro support 18:57:18 sounds good 18:57:29 we have just a few mins left 18:57:35 #topic Open Discussion 18:57:44 any open discssion? 18:57:47 quick? 18:57:48 :-) 18:58:35 Otherwise I'll send a msg to Adrian and the mailer that the sub team is merging back to the general community. 18:59:04 I'll take that as no open discussion 18:59:22 thanks again for everyone's support!!! 18:59:33 Thanks daneyon_ for hosting 18:59:34 and Happy New Year 18:59:38 yw 18:59:40 Happy New Year! 18:59:49 take care everyone and have a great day. 18:59:56 #endmeeting