15:02:31 #startmeeting neutron_l3 15:02:31 Meeting started Thu Oct 16 15:02:31 2014 UTC and is due to finish in 60 minutes. The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:32 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:32 Hi 15:02:34 The meeting name has been set to 'neutron_l3' 15:02:37 Rajeev: hi 15:02:49 #topic Announcements 15:03:12 #link https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam 15:03:22 I don’t think I have any announcements. Anyone else? 15:04:04 #topic l3-high-availability 15:04:15 safchain: hi. Anything new? 15:05:13 #undo 15:05:14 Removing item from minutes: 15:05:32 Actually, I thought of an announcement. It just hit me that Juno final should be today. 15:05:44 #topic bgp-dynamic-routing 15:05:46 devvesa: hi 15:06:23 hi 15:07:32 I think we’re almost there with the blueprint. 15:08:45 The question remaining is whether the dr agent is associated with a network or a router. I’m still thinking about it. 15:09:07 me too. I understand you point, it can become annoying to add all the routers 15:09:27 me too :) an alternative could be to associate it to a l3 agent? 15:09:29 but... it can be the use case where you don't want to advertise all? 15:09:34 I can see advantages and disadvantages to both strategies. I’m wondering if we need to hear some more opinions. 15:09:49 matrohon: the alternative is associate the router to the dr_agent 15:10:12 so: a network with all his routers or a single router one by one? 15:11:14 matrohon: I’m not sure an l3 agent fits. they can be associated with many routers. There can be many l3 agents associated with an external network. In some cases, there can be more than one external network associated to an l3 agent. 15:11:25 i see attaching to a router as more intuitive to the users, but providing a way to do 'bulk attach' by specifying external network sounds useful. Perhaps allow both? 15:11:58 ryu25: I was just going to suggest maybe both too. We could start with routers initially maybe. 15:12:28 That complicates the db model a little bit… It is something to think about. 15:13:36 Would we model that with two types of associations (two tables) or one table that can handle different types of association? 15:13:40 devvesa: ^ 15:14:09 yes, why not? 15:14:30 another think to talk about is if we forget the discovery or not 15:14:45 I removed any discover reference in the spec, but you seemed concerned about it 15:14:50 haleyb: hi 15:14:56 right... the db model also feels more natural if the router and dynamic routing are directly attached. If the only reason for going with external network is to provide a way to associate it with multiple routers, then it sounds like it should be a thought of as an API level enhancement 15:14:58 carl_baldwin: hi 15:15:28 devvesa: Are you thinking of putting that off to another blueprint? How important is the discover to your use case? 15:15:42 it is not important, we just need to advertise a full range 15:16:19 let's avoid the discovery but trying to do things in a way that won't be difficult to add it in a future 15:16:25 agreed that advertisement is more useful than discovery 15:16:25 what do you think? 15:16:26 Then I think leaving it out of this blueprint is fine. I’d like to leave the door open to addressing it later. 15:17:06 ok 15:17:06 It does make sense to address advertisement first and discovery later if there is demand for it. 15:18:35 So, have we decided to implement an associated with a router now and follow-on later with an association to a network? I’m happy with that. 15:18:59 I'm happy too 15:19:30 I’m think I’m leaning toward adding a property to the association to indicate if it is a router or a network. What do you think? 15:19:48 sounds good to me. In the future, tenants will want to add BGP to their own routers to advertise routes over VPN too 15:20:38 ryu25: Yes, I think so. I still see an admin involved in that. 15:21:14 devvesa: okay. Anything else? I hope we can get this blueprint in soon. 15:21:18 Ok, I'll take that into account in the next spec review 15:21:32 devvesa: great. Thanks for your work and patience. 15:21:37 me too! thanks carl! 15:21:58 thanks ryu25 and matrohon for your input too. 15:22:09 #topic L3 Agent Refactoring 15:22:11 carl_baldwin: anytime! 15:22:20 please matrohon, review the spec, we want to do something useful for VPN guys 15:22:38 devvesa: matrohon: +1 15:23:00 devvesa : I'll send a potential design for IPVPN attachment 15:23:16 looking forward to see it 15:23:31 you can have a first look here : https://docs.google.com/drawings/d/1NN4tDgnZlBRr8ZUf5-6zzUcnDOUkWSnSiPm8LuuAkoQ/edit 15:23:40 I have to apologize that I have not posted the spec for the refactoring of the L3 agent. I wrote it as promised but I need a sign off from our legal team to post it. I’m still trying to get that. 15:24:12 hi, I have uploaded WIP patch for l3 agent. 15:24:24 https://review.openstack.org/#/c/128846/ 15:25:00 That's quite incompleted, but enough to show the idea/direction 15:25:35 Will this be part of the L3 agent refactoring effort or will this be a separate patch 15:25:54 yamahata: do you have a blueprint? This is a pretty big change and looks like it has a big chance of conflicting with other efforts. 15:25:58 I expect it will be a part of l3 agent refactoring. 15:26:31 yamahata : this leads to modula l3 agent? 15:26:31 yamahata: then you may need to add in your idea to the same blueprint that carl was mentioning 15:26:40 carl_baldwin: no blueprint. 15:27:02 carl_baldwin: so I'd like to discuss before going further. 15:27:09 is modular l3 agent the overall direction? (sorry for the newby question:) 15:27:21 yamahata: I see value in this. Could you hold off on it for just a bit while we try to bring our efforts together? 15:27:41 carl_baldwin: Sure. 15:27:51 matrohon: We would like to get closer to a modular agent. But that is a bit longer term. 15:28:20 carl_baldwin : thanks 15:28:44 yamahata: I like the enthusiasm. I don’t want to squash it. I will have a look at what you have proposed and try to work it in to the overall effort. 15:29:07 matrohon: We won’t get there in one step. However, the L3 should get more modular as the work progresses. 15:29:30 carl_baldwin: so far I heard just refactor of l3 agent. Can you please elaborate what kind of refactoring? 15:29:46 My motivation is routervm. 15:30:08 carl_baldwin : fine : will try to help since I think i'll need it for BGPVPN 15:30:19 yamahata: I was hoping to have the blueprint up to answer these questions. Sigh. Let me try to give you the gist of it. 15:30:29 So I'd like to split out device specific logic and the logic of polling/syncing 15:31:20 I’d like to start by adding an encapsulation for a router to remove all of the router stuff from l3_agent.py. This will leave the agent class to handle RPC updates and queuing them to workers. 15:31:30 This sounds similar to your goal. 15:31:55 carl_baldwin: yes, sound similar. 15:32:24 Another early step will be to encapsulate a namespace and create a manager that will handle the clean up of stale ones. 15:33:09 I like the idea. +1 15:33:19 Then, I’d like to encapsulate plugging ports, especially the external gateway port since there is so much logic around that for DVR vs legacy routers. All of the floating ip namespace handling would go under this encapsulation. 15:33:56 great ideas carl_baldwin 15:34:08 I’m thinking of maybe a driver model to handle the differences between DVR and legacy connection to the external network. 15:35:02 I think with those initial steps, the L3 agent will be significantly less tangled up in itself than it is now. 15:35:25 It should enable further work to clean it up even more. 15:36:07 I’ll try to get the sign-off to post the blueprint and will add you all as reviewers. 15:37:13 #topic neutron-ovs-dvr 15:37:26 carl_baldwin: hi 15:37:29 Swami: mrsmith: Rajeev: anything since yesterday? 15:37:45 We got another bug filed regarding the floating IP status update 15:37:48 Swami: I did not get a chance yesterday to look at the locking issues much. I have that on my plan for today. 15:37:54 We are looking into it right now. 15:38:08 Swami: Ah yes, the new tempest test was added for that. 15:38:13 #link https://bugs.launchpad.net/neutron/+bug/1381617 15:38:34 It seems to be like a timing issue, but we are looking into it to confirm. 15:38:51 We had some headway on the lockwait issue. 15:39:10 Swami: headway since yesterday? 15:39:14 What have you found? 15:39:38 On the lockwait it seems that a port is deleted by "gateway_clear". 15:39:53 The router_interface_delete is trying to delete the same port. 15:40:30 When it goes to the delete_port, it calls "db.get_locked_port_and_binding". 15:40:51 In a normal scenario, this function returns the port that was deleted and then it logs a message and quits. 15:41:03 But in this case this function does not return. 15:41:16 the query to find out the ports that are locked fails. 15:42:44 On another note: Saw the re-occurrence of the race condition in l_3 processing floating ips. 15:43:04 filed defect 1381238 15:43:15 #link https://bugs.launchpad.net/neutron/+bug/1381238 15:43:23 Swami: So, let me see if I have this right... 15:43:31 This is query that fails: port = (session.query(models_v2.Port). enable_eagerloads(False). filter_by(id=port_id). with_lockmode('update'). one()) 15:44:00 delete_port tries to get_locked_port_and_binding and it cannot because gateway_clear has already obtained the lock to delete the port? 15:44:37 Rajeev: Thanks for the link. 15:44:44 Yes in this case the gateway_clear has already deleted the port, it is not clear after deleting the port if it had released the lock or not. 15:45:14 The log message shows that the port has already been deleted. 15:45:57 I have updated the launchpad bug with the neutron-server log and I have mentioned the port-id that was causing this problem. Take a look at it. 15:46:26 Swami: okay. I couldn’t yesterday but will probably have some time today. 15:46:35 carl_baldwin: thanks 15:46:47 Swami: thank you 15:47:31 * carl_baldwin goes to look at bug 1381238 15:48:31 Rajeev: How is this bug going? 15:49:00 carl_baldwin: have a review up with possible fix 15:49:19 #link https://review.openstack.org/#/c/128131/ 15:50:22 Rajeev: Could you link the review to the bug? Somehow it did not get linked. 15:50:42 sure, didn't realize that. 15:50:56 ^ It did not get linked because the first PS didn’t mention the bug in the commit msg. I’ve had that happen. 15:51:23 Swami: mrsmith: Rajeev: Anything else? 15:51:30 I see, how do I link it now ? 15:51:52 carl_baldwin: I don't think I have anything more to add. We are still focussed on bugs and backlog items. 15:51:59 Swami: Thanks. 15:52:06 On bug #1381617 15:52:06 Rajeev, just paste a link in a comment. 15:53:10 from the logs it appears the test is checking the status of the floating ips too quick. 15:53:43 It would be a timing issue with the test as Swami mentioned earlier. 15:53:54 dvr takes a little longer to setup FIPs since we have extra ns, port, etc 15:54:00 #link https://bugs.launchpad.net/neutron/+bug/1381617 15:54:10 mrsmith: yes 15:55:23 no more updates from me except , take a look at review https://review.openstack.org/#/c/128131/ 15:57:28 Thanks all. We’re about out of time. 15:57:34 Keep up the good work. 15:57:43 #endmeeting