15:01:09 <carl_baldwin> #startmeeting neutron_l3 15:01:10 <openstack> Meeting started Thu Aug 7 15:01:09 2014 UTC and is due to finish in 60 minutes. The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:13 <openstack> The meeting name has been set to 'neutron_l3' 15:01:18 <carl_baldwin> #topic Announcements 15:01:32 <carl_baldwin> Juno-3 is September 4th 15:02:03 <carl_baldwin> FeatureProposalFreeze (FPF) is August 21st 15:02:20 <carl_baldwin> That is only two weeks away and you know how these weeks fly by. 15:02:46 <carl_baldwin> #link https://wiki.openstack.org/wiki/Juno_Release_Schedule 15:03:14 <carl_baldwin> #topic neutron-ovs-dvr 15:03:54 <mrsmith> o/ 15:04:10 <Swami> bug fixes are currently in full swing. 15:04:13 <carl_baldwin> #link https://bugs.launchpad.net/neutron/+bugs?field.tag=l3-dvr-backlog 15:04:27 <Swami> I think most us are working on the bug fixes. 15:04:36 <mrsmith> yup 15:04:38 <carl_baldwin> Yup 15:05:04 <Swami> vivek had posted the fix for the critical L2 pop. 15:05:04 <carl_baldwin> I’d like to get this one reviewed and merged very soon: https://bugs.launchpad.net/neutron/+bug/1350485 15:05:20 <carl_baldwin> Swami: the same 15:05:37 <Swami> carl_baldwin: yes you are right. 15:05:38 <mrsmith> yes 15:05:44 <mrsmith> I think the ml2 prob is affecting lots of areas 15:05:47 <carl_baldwin> I was looking at the UTs that were added (PS4) that had a problem. 15:06:08 <carl_baldwin> I have not found the problem and so I trimmed the UTs down to a minimal set for the patch. 15:06:29 <carl_baldwin> So, you’ll see a big difference in UTs from PS4 to PS5. 15:06:32 <Swami> carl_baldwin: Yes I did see your message on that. 15:06:59 <carl_baldwin> I think keeping the patch focused will help us to review and merge it more quickly. 15:07:08 <Swami> Your point is valid to just focus on the bug fix and the related UT for now to make the review easy. 15:07:42 <carl_baldwin> The UTs developed by Vivek can be worked on and proposed as a new patch at a later time. They can still add much value. 15:08:07 <carl_baldwin> I see there is a new bug 1353885 15:08:07 <Rajeev> bug 1353885 L2Pop on OVS broken due to DeferredBridge introduction : Vivek filed, is also in the related area. UTs will help. 15:08:13 <carl_baldwin> Rajeev: :) 15:08:40 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1353885 15:08:40 <Rajeev> carl_baldwin: I ran into it yesterday :) 15:09:01 <carl_baldwin> I actually ran in to the same thing yesterday and was getting ready to file a bug. 15:09:07 <carl_baldwin> #link http://paste.openstack.org/show/91167/ 15:09:51 <carl_baldwin> I just added my trace to the bug. 15:10:20 <Rajeev> carl_baldwin: yes, same symptoms 15:10:25 <Swami> was this Deferred bridge late introduction. 15:10:26 <carl_baldwin> Just a little hint. These stack traces don’t copy/paste well in to bug reports. paste.openstack.org is a good way to get them in there. 15:11:41 <carl_baldwin> I’ll review the patch (a one-liner) and see if it makes sense to add a UT. I think this should merge quickly. 15:12:19 <Swami> carl_baldwin: thanks 15:12:19 <carl_baldwin> Swami: Rajeev: Do we know which patch introduced this bug? 15:12:38 <carl_baldwin> It would be good to note that in the bug report. 15:12:44 <Swami> carl_baldwin: no I am not sure. 15:12:58 <carl_baldwin> Never mind, it is in the bug report. 15:13:07 <carl_baldwin> I was snow blind because of the stack trace. ;) 15:14:21 <Rajeev_> carl_baldwin: sorry lost connection. don't know the patch # but came in last 2 days 15:14:30 <carl_baldwin> #action carl_baldwin will shepherd bug 1353885 through 15:14:41 <carl_baldwin> Rajeev_: np, the patch is reported in the bug report. I had missed it. 15:14:50 <carl_baldwin> Other progress? 15:15:07 <mrsmith> carl_baldwin: I am hitting an issue with delete namespaces 15:15:16 <mrsmith> the driver is throwing an error 15:15:20 <Swami> we are progressing on the migration patch 15:15:22 <mrsmith> "Device or resource not ready" 15:15:30 <mrsmith> anyone seeing this? 15:15:44 <mrsmith> this is for #link https://bugs.launchpad.net/neutron/+bug/1353287 15:15:50 <carl_baldwin> “not ready”? I’m not sure I’m seen that. 15:16:07 <mrsmith> if I put a delay in after intefaces are unpluged and the delete namespace, no error 15:16:24 <mrsmith> we can rely on "delays" 15:16:45 <carl_baldwin> mrsmith: you’re asking? 15:16:48 <mrsmith> this is causing tempest errors 15:17:01 <mrsmith> I'm asking if anyone else has seen this error lately 15:17:12 <mrsmith> in the community 15:17:42 <mrsmith> it seems to be yet another "recent" problem 15:17:46 <mrsmith> we weren't seeing this before 15:17:51 <carl_baldwin> I haven’t but others can speak up. 15:18:01 <mrsmith> k 15:18:08 <mrsmith> I'll keep digging 15:18:22 <carl_baldwin> mrsmith: Could you paste some context around the error and link it to the bug? 15:18:42 <mrsmith> sure 15:19:14 <Rajeev_> mrsmith: I just tried it and got this: Cannot remove /var/run/netns/qrouter-3f587793-02a6-4fc3-8b97-dc38581ef92a: Device or resource busy 15:19:20 <mrsmith> right 15:19:22 <mrsmith> thats it 15:19:39 <mrsmith> that looks like a plain router ns 15:19:45 <mrsmith> I am hitting it with a fip ns 15:19:56 <mrsmith> so - same possible issue in the driver? 15:20:06 <carl_baldwin> mrsmith: Oh, that is different. What OS are you on? 15:20:27 <mrsmith> ubuntu 15:20:43 <Rajeev_> ubuntu here too 15:21:01 <mrsmith> 12.04 15:21:19 <carl_baldwin> mrsmith: Rajeev_: That is a known issue with the iproute package on 12.04. 15:21:57 <carl_baldwin> I don’t remember all of the details but there are broad locks created by execing in the namespace. 15:22:14 <carl_baldwin> This problem is the whole reason why namespace deletion is off by default. 15:22:31 <Rajeev_> carl_baldwin: good to know, any workarounds ? 15:22:42 <carl_baldwin> Update iproute package. 15:22:49 <mrsmith> so jenkins/gate/tempest runs with delete off? 15:23:10 <mrsmith> or we need to support either regardless? 15:23:22 <carl_baldwin> mrsmith: yes, but now that you mention it it could be turned on now that 14.04 is in the gate. 15:24:05 <mrsmith> "could" or "might have" 15:25:02 <carl_baldwin> mrsmith: I’m sure it is off in the gate because off is the default. 15:25:23 <carl_baldwin> #link https://bugs.launchpad.net/neutron/+bug/1052535 15:25:47 <mrsmith> ya - we've talked about this before 15:25:51 <mrsmith> its an old bug 15:26:04 <mrsmith> its just we've been deleting ns pretty reliably for months 15:26:18 <mrsmith> and now it seems to be affecting us more 15:26:30 <mrsmith> I'll look at how to work around this 15:26:34 <mrsmith> in the code 15:26:41 <carl_baldwin> See my comment from 2013-10-01 15:27:15 <carl_baldwin> I might be able to find you an update to iproute with the fix in it. 15:27:26 <carl_baldwin> Any other DVR issues to discuss? 15:27:34 <mrsmith> well - updating iproute is easy enough 15:27:43 <mrsmith> getting the code to be more rubust is what I'm after 15:27:48 <mrsmith> *robust 15:27:54 <mrsmith> we can move on -thanks 15:28:34 <Swami> I think that's all we have for now. 15:28:46 <carl_baldwin> mrsmith: The problem is that if you hit the error, we’ve found that the system tends to get bad after. So, simply handling the error gracefully isn’t really going to cut it. 15:29:37 <mrsmith> fair enough 15:29:37 <carl_baldwin> So, you’ve got to avoid hitting the error in the first place or your machine will not be the same until a reboot. 15:29:52 <carl_baldwin> mrsmith: Let me know what you come up with. 15:30:08 <carl_baldwin> Keep up the good work DVR team. 15:30:14 <PraveenSM> Hello All, 15:30:17 <PraveenSM> We have written a blueprint “DHCP Serivce LoadBalancing Scheduler”. 15:30:28 <PraveenSM> This blue print is written to address the problem of uneven scheduling of DHCP name spaces on multiple network nodes. The problem we faced is, Consider we have 1 Openstack Controller, 4 Network Nodes,100 Compute nodes. We have created 200 Networks and booted 800VMs across 200 networks. When the VMs are booted across networks then DHCP namespaces pertaining to network will be created on Network Nodes. However arou 15:30:40 <carl_baldwin> #topic l3-high-availability 15:30:50 <carl_baldwin> PraveenSM: We’ll catch you in Open Discussion. 15:31:03 <PraveenSM> ok thanks 15:31:06 <carl_baldwin> safchain: amuller: ping 15:31:10 <safchain> hi 15:31:26 <carl_baldwin> How is this progressing? 15:31:48 <carl_baldwin> I did some reviewing last week but some of it was WIP. I’m happy to review this week. 15:32:00 <safchain> base classes and scheduler rebased, all UT work 15:32:16 <safchain> amuller made a great job on the agent side 15:32:30 <safchain> he splitted the agent code into two classes 15:33:27 <carl_baldwin> Is most of it ready for review? 15:33:33 <safchain> sure 15:34:01 <carl_baldwin> Great, I’ll make a pass over them today. Be sure that anything that may not be ready is marked WIP. 15:34:07 <carl_baldwin> safchain: anything else? 15:34:08 <safchain> assaf is still working to add more functional test, but we can start the review 15:34:24 <safchain> no everything is ok 15:34:56 <safchain> ok I'll check the WIP status 15:35:24 <carl_baldwin> safchain: thanks. 15:35:40 <carl_baldwin> #topic l3-svcs-vendor-* 15:35:49 <carl_baldwin> pcm_: Is there anything outstanding on this topic? 15:36:11 <pcm_> No all set. BP done, VPN implemented. 15:36:12 <carl_baldwin> I saw that your Cisco impl was merged, I think. 15:36:22 <carl_baldwin> pcm_: Great. Shall I removed from the agenda? 15:36:35 <pcm_> If other services want to do this, we can do as bugs. 15:36:38 <pcm_> Sure. 15:36:47 <carl_baldwin> pcm_: Okay, great work. 15:36:53 <pcm_> thanks! 15:37:19 <carl_baldwin> #topic bgp-dynamic-routing 15:37:31 <carl_baldwin> devvesa, nextone92: ping 15:37:49 <carl_baldwin> #action carl_baldwin will review bgp code in progress 15:38:00 <carl_baldwin> Looks like they’re not around. 15:38:10 <carl_baldwin> yamamoto: do you have anything? 15:38:17 <yamamoto> nothing 15:38:20 <yamamoto> #link https://review.openstack.org/#/q/topic:bp/bgp-dynamic-routing,n,z 15:39:18 <carl_baldwin> Okay, I guess we’ll take the topic to gerrit. 15:39:22 <carl_baldwin> yamamoto: thanks 15:39:37 <carl_baldwin> #topic Reschedule routers from downed agents 15:39:47 <carl_baldwin> kevinbenton: are you around? 15:39:50 <kevinbenton> https://review.openstack.org/#/c/110893/ 15:40:03 <kevinbenton> it’s now configuration enabled 15:40:16 <carl_baldwin> kevinbenton: that is good. 15:40:56 <kevinbenton> default disabled so people concerned with zombie agents won’t have to worry 15:41:27 <carl_baldwin> I’ll have another look. There is one colleague here at HP who dealt with our rescheduling solution a lot. He may be able to provide better feedback about the sorts of things that go wrong. 15:41:57 <carl_baldwin> I also heard from some other HP guys who were working along the same lines. 15:42:14 <kevinbenton> i’m aware of most of them. one of the guys from redhat already provided quite a bit on the bug report 15:42:19 <carl_baldwin> I’m trying to nudge them to discuss it out in the open. :) 15:42:33 <carl_baldwin> kevinbenton: great. 15:42:47 <carl_baldwin> kevinbenton: thanks for the update. Anything else to discuss? 15:43:02 <kevinbenton> there is nothing more that can be done from the neutron side if we assume neutron is disconnected 15:43:19 <kevinbenton> carl_baldwin: nope, this patch probably isn’t going to change much now 15:43:44 <carl_baldwin> kevinbenton: thanks. 15:44:36 <carl_baldwin> #topic Open Discussion 15:45:07 <PraveenSM> We have written a blueprint “DHCP Serivce LoadBalancing Scheduler”. https://review.openstack.org/#/c/111210/ https://blueprints.launchpad.net/neutron/+spec/dhcpservice-loadbalancing 15:45:24 <PraveenSM> This blue print is written to address the problem of uneven scheduling of DHCP name spaces on multiple network nodes. The problem we faced is, Consider we have 1 Openstack Controller, 4 Network Nodes,100 Compute nodes. We have created 200 Networks and booted 800VMs across 200 networks. When the VMs are booted across networks then DHCP namespaces pertaining to network will be created on Network Nodes. 15:45:34 <PraveenSM> However around 95% of DHCP namespaces will be created on only one Network Node and remaining 5% DHCP namespaces will be distributed among remaining 3 Network Nodes. Hence there will be excess load on only one Network Node. To address this problem we have written the blueprint so that DHCP namespaces will be distributed equally among Network Nodes based on number of DHCP namespaces hosted on each Network node. 15:46:13 <PraveenSM> Please review it and give comments 15:46:14 <carl_baldwin> Like LeastRouters? 15:46:21 <PraveenSM> yes 15:46:30 <carl_baldwin> PraveenSM: I will add it to my radar. Thanks for bringing it up. 15:46:39 <PraveenSM> thanks 15:47:21 <seizadi> Can we register for Kilo Design Sessions? 15:47:43 <carl_baldwin> seizadi: Good question. I have not heard. 15:47:57 <carl_baldwin> Usually that comes after summit talks (voting ended yesterday) 15:48:24 <seizadi> How do we track on twitter? 15:49:49 <carl_baldwin> seizadi: Not sure what you’re asking but I’ve never been on twitter so maybe that’s why. 15:50:53 <seizadi> :) A lot of the summit announcements are on #Openstack I am new and don't know how the process works. 15:52:54 <carl_baldwin> seizadi: I see. I just sort of hear word of mouth or by email. 15:54:03 <seizadi> OK, Thx 15:54:41 <carl_baldwin> if that is all, I will close the meeting. 15:54:56 <carl_baldwin> Great work! 15:54:58 <carl_baldwin> #endmeeting