15:01:54 <carl_baldwin> #startmeeting neutron_l3 15:01:55 <openstack> Meeting started Thu Oct 2 15:01:54 2014 UTC and is due to finish in 60 minutes. The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:56 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:59 <openstack> The meeting name has been set to 'neutron_l3' 15:02:02 <carl_baldwin> #topic Announcements 15:02:14 <carl_baldwin> #link https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam#Agenda 15:02:35 <carl_baldwin> I don’t know yet if RC1 has been cut. Anyone know for sure? 15:03:20 <armax> carl_baldwin: not yet but we should be in lock down mode now 15:03:33 <armax> carl_baldwin: so that ttx and mestery can do it 15:03:49 <armax> #link https://launchpad.net/neutron/+milestone/juno-rc1 15:03:58 <carl_baldwin> armax: Makes sense. I know there were a few bug fixes they had their eye on last night. 15:04:06 <armax> nothing else should be approved at this point 15:04:15 <armax> carl_baldwin: afaik we’re the last project standing ;) 15:04:50 <carl_baldwin> Yeah, it looks like all of the bugs are marked “fix committed”. Shouldn’t be long then. 15:05:04 <carl_baldwin> #topic Bugs 15:05:44 <carl_baldwin> Actually, the subteam bugs are looking pretty good. I have gotten behind on triage though. 15:06:13 <carl_baldwin> Are there any bugs that need to be brought up? 15:06:48 <carl_baldwin> #action carl_baldwin will triage new bugs this week 15:07:54 <carl_baldwin> At Swami’s request, I’m going to move the DVR report to later in the meeting. 15:08:09 <carl_baldwin> #topic l3-high-availability 15:08:40 <carl_baldwin> amuller, safchain: What is the news here? 15:09:11 <safchain> I submitted a fix about a potential lock wait 15:09:30 <safchain> https://review.openstack.org/#/c/124408/ 15:09:59 <ttx> armax: I need mestery to approve https://review.openstack.org/#/c/125081/ 15:10:07 <safchain> nothing else this week 15:10:09 <ttx> then I branch from there 15:10:18 <armax> ttx: I’ll reach out to him 15:10:20 <carl_baldwin> safchain: I did notice that as a new bug. 15:10:29 <ttx> I already sent him an email 15:11:40 <carl_baldwin> safchain: anything else? How about reviews in progress? 15:11:50 <amuller> Sorry I'm late =/ 15:12:14 <safchain> carl_baldwin, that is the only one I have about the HA 15:12:33 <carl_baldwin> amuller: hi 15:12:53 <amuller> working on https://bugs.launchpad.net/neutron/+bug/1365453 15:12:55 <amuller> some patches up 15:12:58 <amuller> early feedback is welcome 15:13:01 <amuller> more patches on the way 15:14:03 <jschwarz> carl_baldwin, I'm also working with amuller on the HA functional tests patch 15:14:46 <carl_baldwin> amuller: I will have a look at them. 15:14:52 <carl_baldwin> jschwarz: do you have a link? 15:15:01 <jschwarz> it's on hold until some 4 minor patches gets through that the functional tests require (123434, 124752) 15:15:38 <jschwarz> the functional tests patch is https://review.openstack.org/#/c/117994/ 15:16:24 <jschwarz> after these are done we'll go on to writing the integration tests framework (there are some patches there also but they can wait) 15:17:43 <carl_baldwin> jschwarz: I will have a look at those. 117994 was on my radar but the other two somehow were not on there yet. 15:18:19 <carl_baldwin> Anything else for L3 HA? 15:18:24 <jschwarz> carl_baldwin, quite alright since we were waiting for RC1 to push them anyway :) 15:18:41 <amuller> Nothing on my end 15:19:58 <carl_baldwin> Thanks, all. 15:20:07 <carl_baldwin> #topic bgp-dynamic-routing 15:20:12 <carl_baldwin> devvesa: ping 15:20:16 <devvesa> hi 15:20:43 <carl_baldwin> You have posted the spec against Kilo. 15:20:55 <devvesa> yes 15:21:11 <carl_baldwin> do you have a link handy? 15:21:37 <carl_baldwin> Found it. 15:21:39 <carl_baldwin> #link https://review.openstack.org/#/c/125401/ 15:21:49 <devvesa> oh, 2 seconds faster than me :) 15:21:58 <carl_baldwin> :) 15:22:08 <devvesa> I've read your comments 15:22:33 <carl_baldwin> I wrote them late, I hope they make sense. 15:22:34 <devvesa> The first one about register the gateway router as a neutron router makes me think a lot 15:23:37 <devvesa> I think is a great idea, but it is too early to express my crazy conclusions 15:23:58 <carl_baldwin> that comment came out of my thoughts to associate advertisedroutes with a router. Those, in turn, came from my thoughts about advertising floating ips which are already associated with a router. 15:24:19 <carl_baldwin> Could we find some time early next week to discuss it more? 15:24:24 <devvesa> anytime 15:25:19 <carl_baldwin> I’ll be in touch. Until then, we can add discussion to the proposed blueprint. 15:25:23 <devvesa> I'm starting to think that we may not a new kind of agent, but extend the l3 one 15:25:49 <devvesa> that will apply to some 'edge' routers 15:26:21 <carl_baldwin> That sounds like a whole discussion in itself but I think I see where you’re going. Let’s discuss more outside the meeting and then work it in to the proposal. 15:26:25 <devvesa> s/may not /may not need 15:26:33 <devvesa> great 15:27:47 <carl_baldwin> devvesa: Anything else? 15:28:02 <devvesa> Nothing else from me 15:28:49 <carl_baldwin> devvesa: Thanks. 15:29:37 <carl_baldwin> #topic L3 Agent cleanup, refactoring, and possible restructuring. 15:29:42 <carl_baldwin> amuller: still there? 15:29:49 <amuller> yeppers 15:29:56 <carl_baldwin> I know you mentioned some interest here. 15:29:59 <Swami> carl_baldwin: hi 15:30:00 <amuller> I did 15:30:12 <carl_baldwin> Swami: hi, welcome. 15:30:14 <Swami> mike smith was also interested in the this topic 15:30:21 <carl_baldwin> mrsmith: ping 15:30:28 <Swami> mike is away today and will not be able to join 15:30:39 <carl_baldwin> Swami: Right, he’s still out. I knew that. 15:30:54 <carl_baldwin> Swami: I’ll bring him in the loop here when he’s back. 15:30:59 <Swami> ok 15:32:11 <carl_baldwin> I got started looking at the l3 agent yesterday. Some low hanging fruit popped out quickly that I will post for review when I’ve run tests. But, I haven’t really gotten to the meat of the project. 15:32:18 <amuller> I gave it a fair bit of thought, trying to find a design pattern that would fit the requirements and would get rid of the endless if ha, if distributed conditions. I don't have anything too concrete. It's difficult to design something like this without trying it out in limited scale. Also, we have to consider a router that's both HA and distributed, which the code doesn't support right now. 15:32:53 <amuller> So while it makes sense to refactor the code before we enable HA + DVR routers, it makes it more difficult 15:33:40 <Swami> amuller: In the case of the distributed routers, the HA is only valid for the snat scenario. 15:33:48 <amuller> correct 15:33:50 <Swami> So we can't tie the HA to the routers by itself. 15:33:53 <carl_baldwin> Swami: makes sense 15:35:30 <amuller> If someone comes up with a concrete suggestion we can schedule a video chat over it 15:35:31 <Swami> In this case we have rely on the l3_agent mode and if it is running in dvr_snat mode and if HA is enabled for a router that has snat enabled, then we have to apply the HA for the snat for the dvr_snat nodes. 15:35:33 <carl_baldwin> amuller: Swami: I’d like to be iterative on this. Hopefully, we can get some early work in the review/merge queue soon and make continual progress toward goodness. However, it will make sense to have an end goal in mind. 15:35:45 <amuller> I suspect the earliest would be at Paris 15:36:18 <Swami> amuller: let us work together to come out with a plan for the HA with distributed router support 15:36:30 <carl_baldwin> What will be the best way to collaborate on the overall goal? 15:36:51 <carl_baldwin> Wiki? etherpad? 15:36:55 <carl_baldwin> Something else? 15:37:01 <Swami> Do we want to start with a design doc that states the flows and makes sure everyone is on the same page. 15:37:21 <amuller> Ahh, 1 point I wanted to make, is that HA routers have functional testing, so I'm pretty confident in butchering the l3 agent code without ruining something basic. I'd love it if someone could implement functional testing for distributed routers before we make major changes to the agent. 15:37:22 <Swami> Wiki would be a fine. 15:37:38 <pcm__> Was wondering if there was info on the goals - for the uninitiated 15:38:28 <carl_baldwin> pcm__: I fear that different people may have different goals in mind. 15:39:19 <pcm__> carl_baldwin: sounds like a good first step is to come to consensus on goals 15:39:41 <Swami> carl_baldwin: I will create a wiki for the DVR and HA 15:40:06 <Swami> Just to capture all our thoughts in a single page. 15:40:50 <carl_baldwin> I’d like to arrive at a sensible architecture which allows different types of routers, DVR, HA, DVR + HA and works in FWaaS without all of the special cases we have sprinkled throughout the code. Better use of inheritence and composition than there is now. 15:41:54 <amuller> carl_baldwin: I think we have the same idea. I'd like to support all 4 permutations of routers (legacy, HA, DVR, both) via inheritence and composition, without conditionals all over the place. 15:42:05 <carl_baldwin> The agent needs to be split up too. The one class in one file is doing too much. We need some clean separation of concerns. 15:42:07 <amuller> Also, the 'RouterInfo' class as it is is just a struct with no meat 15:42:19 <amuller> and the agent has all the brains 15:42:30 <carl_baldwin> amuller: Agreed 100% about RouterInfo. I was looking at it last night. 15:42:31 <amuller> the agent should be orchestrating, and routers should be configuring themselves 15:42:53 <carl_baldwin> amuller: yes. My thoughts are along the same lines. 15:42:55 <amuller> also the way we have RouterInfo hold a 'router' attribute in it, which is the one in the update, but the other attributes of RouterInfo are the 'old' routers... 15:42:59 <amuller> holy crap that's awful 15:43:30 <carl_baldwin> awful only begins to describe it. 15:43:35 <amuller> We should have two 'Router' instances, one is the 'old' one that the agent has, and one is the 'new' one in the update 15:43:40 <amuller> Routers should be able to diff themselves 15:43:46 <amuller> figure out what's changed 15:44:03 * haleyb likes this suggestion to not have conditionals everywhere 15:44:43 <carl_baldwin> I do want to stress an iterative approach with this. Learning from history, if we make this effort too monolithic, it will likely not be merged in this cycle (maybe never). 15:45:09 <amuller> Luckily there's so much crap to change it shouldn't be a problem doing it piece by piece :) 15:45:12 <pcm__> Right now FW is a superclass, and VPN a subclass of L3 agent too. Seems messy 15:45:58 <carl_baldwin> pcm__: Yes, not the best use of inheritence. We’ll need to evaluate that and decide what a better pattern will be. 15:46:46 <carl_baldwin> These notes should get us started. Let’s translate them to the wiki page and keep the discussion going. Right now, though, we’ve got a little more to get to in this meeting. 15:46:56 <carl_baldwin> Is everyone okay if I move on to another topic? 15:47:16 <pcm__> +1. good stuff though. 15:48:01 <carl_baldwin> #topic neutron-ovs-dvr 15:48:17 <carl_baldwin> Swami: I moved dvr to later to accomodate. 15:48:40 <carl_baldwin> Swami: what do you have for us? 15:49:10 <Swami> carl_baldwin: thanks for getting the dvr later. 15:49:30 <Swami> Right now we are working on fixing bugs and working on back_log items. 15:49:53 <Swami> There are patches out there for review 15:50:08 <Swami> router_migration patch is out there for review. 15:50:27 <Swami> #link https://review.openstack.org/#/c/105855/ 15:50:52 <Swami> #link https://review.openstack.org/#/c/123273/ 15:51:14 <Swami> These two patches addresses the migration of legacy routers to distributed routers. 15:51:56 <carl_baldwin> Swami: Thanks for the links. 15:52:15 <Swami> There is a "DB lockwait issue" with DVR delete_ports which I am trying to fix. 15:53:05 <Swami> But this one I thin carl you have already reviewed it. 15:53:18 <carl_baldwin> Swami: Is there a bug for that? Do you have a link handy? 15:53:21 <Swami> s/thin/think 15:53:50 <Swami> I will check it out, I don't think there was a bug on the launchpad, if not I will create one. 15:54:31 <carl_baldwin> I do recall seeing the patch but I can’t find it in the bugs. When you create it, be sure it has the dvr tag on it. I’ll watch for it. 15:54:34 <Swami> The problem is occassionally we see a "db lockwait timeout" when we clear snat or remove an interface. 15:55:03 <Swami> #link https://review.openstack.org/#/c/124849/ 15:55:35 <Swami> In both cases it is caused when "delete_port" is called from delete_csnat_router_interface. 15:55:58 <carl_baldwin> Swami: Thanks for the link. That one needs a bit more discussion. 15:56:01 <Swami> I will work on today to see if there is any other way to fix this problme. 15:56:40 <carl_baldwin> Also, a rebase because half of it was merged in to RC1. 15:56:43 <Swami> because I do see even in delete port, there are rpc calls from within the db transaction for the l2pop mechanism drivers. 15:57:30 <carl_baldwin> Could you point them out? There are some we’ve been looking at. I’d be interested to know if you’ve found some that I haven’t seen yet. 15:57:44 <Swami> I will keep you posted if I find anything concrete. 15:57:58 <carl_baldwin> Maybe you could reference them in the review or in the bug report that you create. 15:58:11 <Swami> Also there was a patch that was abandoned that we did for VPN. 15:58:12 <carl_baldwin> Swami: Anything else? 15:58:22 <Swami> So I am also working on fixing the VPN with DVR. 15:58:43 <Swami> I am almost halfway through, once I have the patch I will post a quick WIP for review. 15:58:56 <Swami> that's all from me. 15:59:02 <carl_baldwin> Swami: Thanks for the update. 15:59:14 <carl_baldwin> #topic Open Discussion 15:59:21 <carl_baldwin> We have 45 seconds… :) 16:00:16 <carl_baldwin> Thanks everyone. We got a lot accomplished in Juno and I’m looking forward to improving on it and getting more accomplished in Kilo. 16:00:28 <carl_baldwin> Until next time. 16:00:30 <carl_baldwin> #endmeeting