15:01:09 <carl_baldwin> #startmeeting neutron_l3
15:01:10 <openstack> Meeting started Thu Aug  7 15:01:09 2014 UTC and is due to finish in 60 minutes.  The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:13 <openstack> The meeting name has been set to 'neutron_l3'
15:01:18 <carl_baldwin> #topic Announcements
15:01:32 <carl_baldwin> Juno-3 is September 4th
15:02:03 <carl_baldwin> FeatureProposalFreeze (FPF) is August 21st
15:02:20 <carl_baldwin> That is only two weeks away and you know how these weeks fly by.
15:02:46 <carl_baldwin> #link https://wiki.openstack.org/wiki/Juno_Release_Schedule
15:03:14 <carl_baldwin> #topic neutron-ovs-dvr
15:03:54 <mrsmith> o/
15:04:10 <Swami> bug fixes are currently in full swing.
15:04:13 <carl_baldwin> #link https://bugs.launchpad.net/neutron/+bugs?field.tag=l3-dvr-backlog
15:04:27 <Swami> I think most us are working on the bug fixes.
15:04:36 <mrsmith> yup
15:04:38 <carl_baldwin> Yup
15:05:04 <Swami> vivek had posted the fix for the critical L2 pop.
15:05:04 <carl_baldwin> I’d like to get this one reviewed and merged very soon:  https://bugs.launchpad.net/neutron/+bug/1350485
15:05:20 <carl_baldwin> Swami: the same
15:05:37 <Swami> carl_baldwin: yes you are right.
15:05:38 <mrsmith> yes
15:05:44 <mrsmith> I think the ml2 prob is affecting lots of areas
15:05:47 <carl_baldwin> I was looking at the UTs that were added (PS4) that had a problem.
15:06:08 <carl_baldwin> I have not found the problem and so I trimmed the UTs down to a minimal set for the patch.
15:06:29 <carl_baldwin> So, you’ll see a big difference in UTs from PS4 to PS5.
15:06:32 <Swami> carl_baldwin: Yes I did see your message on that.
15:06:59 <carl_baldwin> I think keeping the patch focused will help us to review and merge it more quickly.
15:07:08 <Swami> Your point is valid to just focus on the bug fix and the related UT for now to make the review easy.
15:07:42 <carl_baldwin> The UTs developed by Vivek can be worked on and proposed as a new patch at a later time.  They can still add much value.
15:08:07 <carl_baldwin> I see there is a new bug 1353885
15:08:07 <Rajeev> bug 1353885 L2Pop on OVS broken due to DeferredBridge introduction : Vivek filed, is also in the related area. UTs will help.
15:08:13 <carl_baldwin> Rajeev: :)
15:08:40 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1353885
15:08:40 <Rajeev> carl_baldwin: I ran into it yesterday :)
15:09:01 <carl_baldwin> I actually ran in to the same thing yesterday and was getting ready to file a bug.
15:09:07 <carl_baldwin> #link http://paste.openstack.org/show/91167/
15:09:51 <carl_baldwin> I just added my trace to the bug.
15:10:20 <Rajeev> carl_baldwin: yes, same symptoms
15:10:25 <Swami> was this Deferred bridge late introduction.
15:10:26 <carl_baldwin> Just a little hint.  These stack traces don’t copy/paste well in to bug reports.  paste.openstack.org is a good way to get them in there.
15:11:41 <carl_baldwin> I’ll review the patch (a one-liner) and see if it makes sense to add a UT.  I think this should merge quickly.
15:12:19 <Swami> carl_baldwin: thanks
15:12:19 <carl_baldwin> Swami: Rajeev:  Do we know which patch introduced this bug?
15:12:38 <carl_baldwin> It would be good to note that in the bug report.
15:12:44 <Swami> carl_baldwin: no I am not sure.
15:12:58 <carl_baldwin> Never mind, it is in the bug report.
15:13:07 <carl_baldwin> I was snow blind because of the stack trace.  ;)
15:14:21 <Rajeev_> carl_baldwin: sorry lost connection. don't know the patch # but came in last 2 days
15:14:30 <carl_baldwin> #action carl_baldwin will shepherd bug 1353885 through
15:14:41 <carl_baldwin> Rajeev_: np, the patch is reported in the bug report.  I had missed it.
15:14:50 <carl_baldwin> Other progress?
15:15:07 <mrsmith> carl_baldwin: I am hitting an issue with delete namespaces
15:15:16 <mrsmith> the driver is throwing an error
15:15:20 <Swami> we are progressing on the migration patch
15:15:22 <mrsmith> "Device or resource not ready"
15:15:30 <mrsmith> anyone seeing this?
15:15:44 <mrsmith> this is for #link https://bugs.launchpad.net/neutron/+bug/1353287
15:15:50 <carl_baldwin> “not ready”?  I’m not sure I’m seen that.
15:16:07 <mrsmith> if I put a delay in after intefaces are unpluged and the delete namespace, no error
15:16:24 <mrsmith> we can rely on "delays"
15:16:45 <carl_baldwin> mrsmith: you’re asking?
15:16:48 <mrsmith> this is causing tempest errors
15:17:01 <mrsmith> I'm asking if anyone else has seen this error lately
15:17:12 <mrsmith> in the community
15:17:42 <mrsmith> it seems to be yet another "recent" problem
15:17:46 <mrsmith> we weren't seeing this before
15:17:51 <carl_baldwin> I haven’t but others can speak up.
15:18:01 <mrsmith> k
15:18:08 <mrsmith> I'll keep digging
15:18:22 <carl_baldwin> mrsmith: Could you paste some context around the error and link it to the bug?
15:18:42 <mrsmith> sure
15:19:14 <Rajeev_> mrsmith: I just tried it and got this: Cannot remove /var/run/netns/qrouter-3f587793-02a6-4fc3-8b97-dc38581ef92a: Device or resource busy
15:19:20 <mrsmith> right
15:19:22 <mrsmith> thats it
15:19:39 <mrsmith> that looks like a plain router ns
15:19:45 <mrsmith> I am hitting it with a fip ns
15:19:56 <mrsmith> so - same possible issue in the driver?
15:20:06 <carl_baldwin> mrsmith: Oh, that is different.  What OS are you on?
15:20:27 <mrsmith> ubuntu
15:20:43 <Rajeev_> ubuntu here too
15:21:01 <mrsmith> 12.04
15:21:19 <carl_baldwin> mrsmith: Rajeev_:  That is a known issue with the iproute package on 12.04.
15:21:57 <carl_baldwin> I don’t remember all of the details but there are broad locks created by execing in the namespace.
15:22:14 <carl_baldwin> This problem is the whole reason why namespace deletion is off by default.
15:22:31 <Rajeev_> carl_baldwin: good to know, any workarounds ?
15:22:42 <carl_baldwin> Update iproute package.
15:22:49 <mrsmith> so jenkins/gate/tempest runs with delete off?
15:23:10 <mrsmith> or we need to support either regardless?
15:23:22 <carl_baldwin> mrsmith: yes, but now that you mention it it could be turned on now that 14.04 is in the gate.
15:24:05 <mrsmith> "could" or "might have"
15:25:02 <carl_baldwin> mrsmith: I’m sure it is off in the gate because off is the default.
15:25:23 <carl_baldwin> #link https://bugs.launchpad.net/neutron/+bug/1052535
15:25:47 <mrsmith> ya - we've talked about this before
15:25:51 <mrsmith> its an old bug
15:26:04 <mrsmith> its just we've been deleting ns pretty reliably for months
15:26:18 <mrsmith> and now it seems to be affecting us more
15:26:30 <mrsmith> I'll look at how to work around this
15:26:34 <mrsmith> in the code
15:26:41 <carl_baldwin> See my comment from 2013-10-01
15:27:15 <carl_baldwin> I might be able to find you an update to iproute with the fix in it.
15:27:26 <carl_baldwin> Any other DVR issues to discuss?
15:27:34 <mrsmith> well - updating iproute is easy enough
15:27:43 <mrsmith> getting the code to be more rubust is what I'm after
15:27:48 <mrsmith> *robust
15:27:54 <mrsmith> we can move on -thanks
15:28:34 <Swami> I think that's all we have for now.
15:28:46 <carl_baldwin> mrsmith: The problem is that if you hit the error, we’ve found that the system tends to get bad after.  So, simply handling the error gracefully isn’t really going to cut it.
15:29:37 <mrsmith> fair enough
15:29:37 <carl_baldwin> So, you’ve got to avoid hitting the error in the first place or your machine will not be the same until a reboot.
15:29:52 <carl_baldwin> mrsmith: Let me know what you come up with.
15:30:08 <carl_baldwin> Keep up the good work DVR team.
15:30:14 <PraveenSM> Hello All,
15:30:17 <PraveenSM> We have written a blueprint “DHCP Serivce LoadBalancing Scheduler”.
15:30:28 <PraveenSM> This blue print is written to address the problem of uneven scheduling of DHCP name spaces on multiple network nodes.  The problem we faced is,  Consider we have 1 Openstack Controller, 4 Network Nodes,100 Compute nodes.  We have created 200 Networks and booted 800VMs across 200 networks. When the VMs are booted across networks then DHCP namespaces pertaining to network will be created on Network Nodes.  However arou
15:30:40 <carl_baldwin> #topic l3-high-availability
15:30:50 <carl_baldwin> PraveenSM: We’ll catch you in Open Discussion.
15:31:03 <PraveenSM> ok thanks
15:31:06 <carl_baldwin> safchain: amuller: ping
15:31:10 <safchain> hi
15:31:26 <carl_baldwin> How is this progressing?
15:31:48 <carl_baldwin> I did some reviewing last week but some of it was WIP.  I’m happy to review this week.
15:32:00 <safchain> base classes and scheduler rebased, all UT work
15:32:16 <safchain> amuller made a great job on the agent side
15:32:30 <safchain> he splitted the agent code into two classes
15:33:27 <carl_baldwin> Is most of it ready for review?
15:33:33 <safchain> sure
15:34:01 <carl_baldwin> Great, I’ll make a pass over them today.  Be sure that anything that may not be ready is marked WIP.
15:34:07 <carl_baldwin> safchain: anything else?
15:34:08 <safchain> assaf is still working to add more functional test, but we can start the review
15:34:24 <safchain> no everything is ok
15:34:56 <safchain> ok I'll check the WIP status
15:35:24 <carl_baldwin> safchain: thanks.
15:35:40 <carl_baldwin> #topic l3-svcs-vendor-*
15:35:49 <carl_baldwin> pcm_: Is there anything outstanding on this topic?
15:36:11 <pcm_> No all set. BP done, VPN implemented.
15:36:12 <carl_baldwin> I saw that your Cisco impl was merged, I think.
15:36:22 <carl_baldwin> pcm_: Great.  Shall I removed from the agenda?
15:36:35 <pcm_> If other services want to do this, we can do as bugs.
15:36:38 <pcm_> Sure.
15:36:47 <carl_baldwin> pcm_: Okay, great work.
15:36:53 <pcm_> thanks!
15:37:19 <carl_baldwin> #topic bgp-dynamic-routing
15:37:31 <carl_baldwin> devvesa, nextone92: ping
15:37:49 <carl_baldwin> #action carl_baldwin will review bgp code in progress
15:38:00 <carl_baldwin> Looks like they’re not around.
15:38:10 <carl_baldwin> yamamoto: do you have anything?
15:38:17 <yamamoto> nothing
15:38:20 <yamamoto> #link https://review.openstack.org/#/q/topic:bp/bgp-dynamic-routing,n,z
15:39:18 <carl_baldwin> Okay, I guess we’ll take the topic to gerrit.
15:39:22 <carl_baldwin> yamamoto: thanks
15:39:37 <carl_baldwin> #topic Reschedule routers from downed agents
15:39:47 <carl_baldwin> kevinbenton: are you around?
15:39:50 <kevinbenton> https://review.openstack.org/#/c/110893/
15:40:03 <kevinbenton> it’s now configuration enabled
15:40:16 <carl_baldwin> kevinbenton: that is good.
15:40:56 <kevinbenton> default disabled so people concerned with zombie agents won’t have to worry
15:41:27 <carl_baldwin> I’ll have another look.  There is one colleague here at HP who dealt with our rescheduling solution a lot.  He may be able to provide better feedback about the sorts of things that go wrong.
15:41:57 <carl_baldwin> I also heard from some other HP guys who were working along the same lines.
15:42:14 <kevinbenton> i’m aware of most of them. one of the guys from redhat already provided quite a bit on the bug report
15:42:19 <carl_baldwin> I’m trying to nudge them to discuss it out in the open.  :)
15:42:33 <carl_baldwin> kevinbenton: great.
15:42:47 <carl_baldwin> kevinbenton: thanks for the update.  Anything else to discuss?
15:43:02 <kevinbenton> there is nothing more that can be done from the neutron side if we assume neutron is disconnected
15:43:19 <kevinbenton> carl_baldwin: nope, this patch probably isn’t going to change much now
15:43:44 <carl_baldwin> kevinbenton: thanks.
15:44:36 <carl_baldwin> #topic Open Discussion
15:45:07 <PraveenSM> We have written a blueprint “DHCP Serivce LoadBalancing Scheduler”. https://review.openstack.org/#/c/111210/ https://blueprints.launchpad.net/neutron/+spec/dhcpservice-loadbalancing
15:45:24 <PraveenSM> This blue print is written to address the problem of uneven scheduling of DHCP name spaces on multiple network nodes.  The problem we faced is,  Consider we have 1 Openstack Controller, 4 Network Nodes,100 Compute nodes.  We have created 200 Networks and booted 800VMs across 200 networks. When the VMs are booted across networks then DHCP namespaces pertaining to network will be created on Network Nodes.
15:45:34 <PraveenSM> However around 95% of DHCP namespaces will be created on only one Network Node and remaining 5% DHCP namespaces will be distributed among remaining 3 Network Nodes. Hence there will be excess load on only one Network Node.  To address this problem we have written the blueprint so that DHCP namespaces will be distributed equally among Network Nodes based on number of DHCP namespaces hosted on each Network node.
15:46:13 <PraveenSM> Please review it and give comments
15:46:14 <carl_baldwin> Like LeastRouters?
15:46:21 <PraveenSM> yes
15:46:30 <carl_baldwin> PraveenSM: I will add it to my radar.  Thanks for bringing it up.
15:46:39 <PraveenSM> thanks
15:47:21 <seizadi> Can we register for Kilo Design Sessions?
15:47:43 <carl_baldwin> seizadi: Good question.  I have not heard.
15:47:57 <carl_baldwin> Usually that comes after summit talks (voting ended yesterday)
15:48:24 <seizadi> How do we track on twitter?
15:49:49 <carl_baldwin> seizadi: Not sure what you’re asking but I’ve never been on twitter so maybe that’s why.
15:50:53 <seizadi> :) A lot of the summit announcements are on #Openstack I am new and don't know how the process works.
15:52:54 <carl_baldwin> seizadi: I see.  I just sort of hear word of mouth or by email.
15:54:03 <seizadi> OK, Thx
15:54:41 <carl_baldwin> if that is all, I will close the meeting.
15:54:56 <carl_baldwin> Great work!
15:54:58 <carl_baldwin> #endmeeting