15:00:08 <Swami> #startmeeting neutron_dvr
15:00:10 <regXboi> in all seriousness, I'll be a bit distracted during the front end of the call
15:00:13 <openstack> Meeting started Wed Nov 11 15:00:08 2015 UTC and is due to finish in 60 minutes.  The chair is Swami. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:17 <openstack> The meeting name has been set to 'neutron_dvr'
15:00:34 <Swami> #topic announcement
15:00:49 <Swami> Do we have anything to share with the team.
15:00:59 <regXboi> just for the minutes, link to the agenda?
15:01:07 <Swami> The gate was broken yesterday, is it stable now.
15:01:13 <Swami> regXboi: sure thanks
15:01:35 <Swami> The agenda for todays meeting can be seein in the following link
15:01:48 <regXboi> #link https://wiki.openstack.org/wiki/Meetings/Neutron-DVR agenda link for today's meeting
15:01:49 <Swami> #link https://wiki.openstack.org/wiki/Meetings/Neutron-DVR
15:01:54 <regXboi> heh - sorry about that
15:02:03 <Swami> regXboi: you are faster than me
15:02:11 <regXboi> too much IRC meeting
15:02:24 <Swami> haleyb will be joining late today, so I will be running this meeting today.
15:02:34 <Swami> Now let us move on
15:03:05 <Swami> #topic Bugs
15:03:53 <Swami> regXboi: Thanks you have added a list of bugs for todays discussion, let me know if I miss anything.
15:03:58 <Swami> The first on the list is
15:04:02 <Swami> #link  https://bugs.launchpad.net/neutron/+bug/1372141
15:04:02 <openstack> Launchpad bug 1372141 in neutron "StaleDataError while updating ml2_dvr_port_bindings" [Medium,Confirmed]
15:04:04 <regXboi> will do
15:04:29 <Swami> regXboi: You have mentioned that this is no more seen for the last seven days in the gate.
15:04:41 <regXboi> yes - let me double check that as of today
15:05:17 <Swami> regXboi: we should probably wait for another week and then close if not seen
15:06:12 <Swami> The next bug in the list is
15:06:19 <regXboi> yes - I see 37 hits in logstash and none of them are from DVR - and I'm good waiting another week
15:06:34 <Swami> ok lets move on.
15:06:46 <Swami> that is good news that it is not seen against DVR.
15:06:52 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1456624
15:06:52 <openstack> Launchpad bug 1456624 in neutron "DVR Connection to external network lost when associating a floating IP" [Medium,In progress] - Assigned to venkata anil (anil-venkata)
15:07:11 <Swami> regXboi: you had a note in there to make sure if this is really a bug.
15:07:26 <regXboi> My problem is that I'm not convinced this is a bug - I think this is expected behavior
15:08:12 <carl_baldwin> I think I agree.  I've been thinking about it and haven't changed my mind.
15:08:28 <Swami> The bug states that while pinging through the snat, when FIP is applied the ping hangs and then resumes if we reping. I am not sure either. If this is the same behavior as the centralized router, then we can close it. But in the case of centralized router if it behaves different, then we should pay attention to it.
15:09:04 <Swami> carl_baldwin: so what is your thought right now
15:09:48 <carl_baldwin> It does behave differently.  But, the fix is very painful.
15:10:18 <carl_baldwin> Maybe we could propose a fix which makes centralized behave the same.  ;)
15:10:39 <obondarev_> carl_baldwin: nice! :-)
15:10:41 <Swami> carl_baldwin: +1
15:10:42 <regXboi> um
15:10:50 <regXboi> I'm not sure how that would work
15:10:55 <regXboi> but that now sounds like an rfe
15:11:29 <carl_baldwin> Just a conntrack kill of existing traffic
15:11:31 <Swami> ok let us tag it as opinion.
15:11:47 * regXboi has to think about that idea
15:11:57 * regXboi notes "that idea" = "conntrack kill"
15:12:08 <Swami> carl_baldwin: I think for now we can document this behavior so that we have captured it and then move on and focus on other issues.
15:12:12 <carl_baldwin> I still don't like the proposed cure.  Very complicated
15:12:33 <regXboi> I'd say let's go opinion and revisit in it bit
15:12:35 <carl_baldwin> Swami: +1
15:12:57 <Swami> is everyone ok with this proposal, if so we can move on.
15:13:00 <carl_baldwin> regXboi: +1 too
15:13:29 <Swami> ok, moving on to the next one.
15:13:35 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1458541
15:13:35 <openstack> Launchpad bug 1458541 in neutron "Decomposite DVR router compute node and network node functionallity to two classes" [Medium,In progress] - Assigned to Ryan Moats (rmoats)
15:14:18 <Swami> I hope it says that we will wait until the binding refactoring. So nothing more to add on this item.
15:14:20 <regXboi> this is on hold for the rebinding
15:14:22 <regXboi> yes
15:14:27 <Swami> regXboi: do you want to add anything to it.
15:14:40 <regXboi> nope that's up to date
15:14:54 <Swami> moving on to the next one.
15:14:59 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1510796
15:14:59 <openstack> Launchpad bug 1510796 in neutron "Function sync_routers always call _get_dvr_sync_data in ha scenario" [Low,In progress] - Assigned to ZongKai LI (lzklibj)
15:15:20 <Swami> There is a already a patch set for this bug.
15:15:34 <Swami> #link https://review.openstack.org/#/c/239908/
15:15:40 <regXboi> yes there is
15:16:19 <Swami> Ok this might require more reviews
15:16:49 <Swami> Anything else to add to this bug.
15:17:19 <regXboi> nope
15:17:25 <Swami> regXboi: ok
15:17:30 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1513678
15:17:30 <openstack> Launchpad bug 1513678 in neutron "At scale router scheduling takes a long time with DVR routers with multiple compute nodes hosting thousands of VMs" [High,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:18:03 <Swami> I had two patches (WIP) on this bug.
15:18:46 <Swami> But I got review comments from carl_baldwin and obondarev_ that one of the patch may be addressed after the scheduler refactor.
15:19:11 <obondarev_> Swami: your point about backporting it sounds reasonable to me
15:19:31 <obondarev_> so I’m ok with both
15:19:53 <Swami> obondarev_: meaning addressing this bug on liberty and kilo rather in the master.
15:20:24 <obondarev_> Swami: we need to merge it in master first before backporting I guess
15:20:32 <carl_baldwin> obondarev_: +1
15:20:39 <regXboi> yes, that's the process
15:20:42 <Swami> obondarev_: ok, so you are good moving forward with these two patches.
15:20:55 <carl_baldwin> I'm good with both once they're in shape.
15:20:57 <obondarev_> Swami: yep
15:21:01 <Swami> ok, no worries, then let us move on. Thanks
15:21:17 <Swami> next one
15:21:21 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1513574
15:21:21 <openstack> Launchpad bug 1513574 in neutron "firewall rules on DVR FIP fails to work for ingress traffic" [Undecided,Confirmed]
15:21:44 <Swami> This one was filed recently and the bug is confirmed.
15:22:03 <regXboi> I saw the spec for FWaaS 2.0, which I agree should address this
15:22:40 <Swami> So we will not fixing this bug right now and will wait for the FWaaS 2.0 and once the rules are applied to ports, this should solve the problem.
15:23:03 <Swami> I will update sc68cal on this in the FWaaS meeting.
15:23:31 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1362242
15:23:31 <openstack> Launchpad bug 1362242 in neutron "bridge_mappings isn't bound to any segment warning from l2pop" [Low,Confirmed] - Assigned to Irena Berezovsky (irenab)
15:24:21 <Swami> I have not seen any patch to this bug and it is there for a while. Since the priority is not higher I have not payed attention to this bug.
15:25:02 <regXboi> yes, I'm trying to clean up some of the lower priority cruft today - I'd like to see if somebody is willing to pick this up
15:25:35 <Swami> We need to see if this is still seen in the master. This bug was reported long back.
15:25:50 <carl_baldwin> I can look at this a bit.  I honestly don't remember filing it.
15:25:58 <regXboi> oh - that I can update - yes it is - I looked yesterday
15:26:00 <Swami> carl_baldwin: thanks
15:26:18 <Swami> Ok, let us move on.
15:26:30 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1447227
15:26:30 <openstack> Launchpad bug 1447227 in neutron "Connecting two or more distributed routers to a subnet doesn't work properly" [Low,In progress] - Assigned to ZongKai LI (lzklibj)
15:27:26 <Swami> I don't think we have a patch for this bug as well.
15:28:00 <Swami> Sorry there was a patch and it was abandoned a while back
15:28:03 <Swami> #link https://review.openstack.org/#/c/191671/
15:28:24 <carl_baldwin> auto-abandoned
15:28:27 <Swami> I am not sure if ZongKai Li is still pursuing this patch.
15:28:43 <regXboi> swami: can you follow up to see?
15:28:48 <regXboi> that was the AI I wanted
15:28:58 <Swami> Ok I will check with ZongKai Li on this.
15:29:01 <regXboi> note: me updated https://bugs.launchpad.net/neutron/+bug/1362242 with logstash information
15:29:01 <openstack> Launchpad bug 1362242 in neutron "bridge_mappings isn't bound to any segment warning from l2pop" [Low,Confirmed] - Assigned to Irena Berezovsky (irenab)
15:29:08 <Swami> ok let us move on.
15:29:37 <Swami> we are half way, we need to move on
15:29:55 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1445255
15:29:55 <openstack> Launchpad bug 1445255 in neutron "DVR FloatingIP to unbound port does not work" [Low,In progress]
15:30:23 <Swami> Yes there is a patch on this bug but it inactive or abandon.
15:30:45 <regXboi> well, there is a reference from obondarev to a merged patch, so I was wondering if this is still an issue
15:30:46 <obondarev_> I thought it was fixed a while ago
15:31:06 <carl_baldwin> What exactly is an unbound port here?
15:31:17 <Swami> This issue might have been fixed, since we do have a way to address floatingip on bound ports once it is bound to a host.
15:31:40 <carl_baldwin> nm, answered my owned question by reading the bug.
15:31:48 <regXboi> can somebody take the AI of verifying if this still exists?
15:31:50 <Swami> carl_baldwin: When a port does not have a host_binding and if that port is associated with a FIP>
15:32:27 <Swami> I think we addressed this bug by supporting late-binding to the FIP. This should be solved by now
15:32:37 <Swami> I will update the bug.
15:32:41 <regXboi> thx
15:32:51 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1452458
15:32:51 <openstack> Launchpad bug 1452458 in neutron "Server returns error 500 when setting a DVR router as a gateway on a network with no subnet, and then adding an interface" [Low,In progress] - Assigned to Nikhil AP (niks3089)
15:33:28 <Swami> I have not seen any patch on this bug yet.
15:33:56 <regXboi> no, the question is to find out if the assignee is working on one
15:34:21 <Swami> I did not see any activity.
15:34:30 <Swami> If not we will find a new owner.
15:34:49 <Swami> I will ask either adolfo or ritesh to take a look at it.
15:34:52 <Swami> Let us move on.
15:35:00 <regXboi> I can take the AI of verifying that it still exists
15:35:15 <regXboi> #action regXboi to verify https://bugs.launchpad.net/neutron/+bug/1452458
15:35:15 <openstack> Launchpad bug 1452458 in neutron "Server returns error 500 when setting a DVR router as a gateway on a network with no subnet, and then adding an interface" [Low,In progress] - Assigned to Nikhil AP (niks3089)
15:35:22 <Swami> regXboi: thanks
15:35:30 <Swami> The next one in the list is
15:35:33 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1463831
15:35:33 <openstack> Launchpad bug 1463831 in neutron "neutron DVR poor performance" [Undecided,Incomplete] - Assigned to Adolfo Duarte (adolfo-duarte)
15:35:55 <Swami> I did not see adolfo in here.
15:36:10 <regXboi> we can follow up with him in channel
15:36:20 <Swami> Ok, I will check with him and update the bug.
15:36:24 <fitoduarte> I'm here
15:36:45 <Swami> fitoduarte: hi
15:36:54 <fitoduarte> this one is cause by the router staying very very quite:)
15:37:06 <regXboi> quiet?
15:37:25 <Swami> It seems that you are the owner on this bug. Is it still seen and valid bug.
15:37:35 <fitoduarte> basically the open vs witch ages out the mac
15:38:19 <regXboi> I'm thinking we don't want that to ever happen, do we?
15:38:25 <fitoduarte> it is seen, not sure how much impact it has. very corner case
15:38:45 <fitoduarte> no we don't
15:38:46 <Swami> but is that behavior different than the cvr in dvr
15:39:22 <fitoduarte> it is in that yiu can come up with a rare case in which only happens for dvr.
15:39:35 <Swami> fitoduarte: ok adolfo, the bug currently says incomplete, if you can update it with the right flag, that would work out.
15:39:47 <fitoduarte> will do
15:39:53 <Swami> ok, thanks we will move
15:39:53 <regXboi> and I'm thinking Low/Wishlist as it *is* a corner case
15:40:07 <Swami> regXboi: agree
15:40:20 <Swami> The next one in the list
15:40:24 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1496201
15:40:24 <openstack> Launchpad bug 1496201 in neutron "DVR: router namespace can't be deleted if bulk delete VMs" [Medium,New] - Assigned to Kasey Alusi (kasey-alusi)
15:40:24 * haleyb wanders in
15:40:44 <Swami> This one I have not triaged it but there is a owner already assigned.
15:40:52 <regXboi> haleyb: we're still walking through the old/crufty bugs that had accumulated at the bottom of the list
15:41:10 <haleyb> regXboi: thanks, trying to read s/b
15:41:15 <regXboi> yes, this needs triage/verification
15:41:20 <Swami> I will ping the owner and see if he has any patch set on it.
15:41:24 <regXboi> thx
15:41:27 <Swami> I will triage this.
15:41:41 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1499045
15:41:41 <openstack> Launchpad bug 1499045 in neutron "get_snat_port_for_internal_port called twice when an interface is added or removed by the l3 agent in the case of DVR routers." [Low,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:42:04 <regXboi> should we tag this low hanging fruit?
15:42:09 <regXboi> it seems simple enough
15:42:16 <Swami> This is part of refactor one. I think carl_baldwin was not happy with this one. So let us close this bug.
15:42:27 <regXboi> ok that works for me too :)
15:42:39 <Swami> regXboi: agreed
15:42:51 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1505571
15:42:51 <openstack> Launchpad bug 1505571 in neutron "FIP disassociation takes longer in non DVR test scenario" [Undecided,Incomplete] - Assigned to Sonu (sonu-sudhakaran)
15:43:07 <Swami> There are couple of Kilo related bugs.
15:43:28 <Swami> Last week we discussed about the state of the Kilo related bugs, do we have any handle on it.
15:43:38 <regXboi> yes - and since we've decided to support kilo, these need review
15:43:41 <Swami> are we going to fix it or only security related bug.
15:43:54 <Swami> regXboi: Ok thanks
15:44:09 <Swami> Since we only have 15 more minutes let us get into the other topics
15:44:22 <regXboi> so, folks - please look at the patchsets for the last two bugs in today's list
15:44:44 <Swami> #topic Gate_failures
15:44:58 <Swami> regXboi: Any updates on this
15:45:07 <regXboi> the gate took a bad hit this week due to other issues
15:45:17 <regXboi> so we can't really say anything
15:45:27 <obondarev_> both dvvr and non-dvr gates were broken
15:45:36 <regXboi> oh yeah, multinode got hammered
15:45:47 <Swami> I had a general question on the failures.
15:45:49 <regXboi> so it was pretty much a lost week :(
15:46:10 <Swami> The "SSHTimeout" failure, I was just doing some research on this. This failure is seen on both cvr and dvr
15:46:20 <Swami> But the frequency of occurence with DVR is more.
15:46:36 <regXboi> oy
15:46:40 <Swami> We have talked about this earlier on how to attack this issue with tempest.
15:47:12 <Swami> As a team what should we do to address this issue in the gate/tempest test to provide more reasonable error
15:47:53 <regXboi> so I've been trying to replicate the race condition locally and I've not hit a success case *yet*
15:48:12 <Swami> haleyb: you mentioned last week that we can get hold of the tempest core to discuss.
15:48:29 <Swami> regXboi: yes that is tough
15:48:31 <regXboi> until I see an example, I'm loathe to fire blindly
15:48:51 <carl_baldwin> regXboi: "success" as in see a failure?
15:48:55 <haleyb> Swami: was this about somehow stopping things during failure to try and debug?
15:48:56 <Swami> we need to invite the tempest folks to the dvr meeting to discuss a strategy.
15:49:10 <regXboi> carl_baldwin: yes in this case "success" is seeing a test fail on SSHTimeout
15:49:23 <Swami> haleyb: yes, that is one thing, the other one is how to improve the error report from the tempest test.
15:50:19 <Swami> carl_baldwin: haleyb: can we get some tempest cores attention on this issue.
15:50:24 <haleyb> Swami: is the bug # in the meeting notes?
15:50:52 <Swami> haleyb: There is no specific bug to this, but it is related to general gate failures for DVR.
15:51:38 <carl_baldwin> Should it have a bug?  Might be good to have something to center a discussion with them around.
15:52:07 <Swami> carl_baldwin: ok we can add a bug to track DVR gate failures. I will file one today.
15:52:11 <haleyb> yes, we should create a bug with the output of the failure
15:52:23 <Swami> will do.
15:52:42 <Swami> anything else on the gate failures that the team wanted to discuss. If not we will move on
15:52:58 <Swami> #topic Performance_scalability
15:53:07 <Swami> obondarev_: are you still there.
15:53:30 <obondarev_> yep
15:53:35 <Swami> any update
15:53:38 <obondarev_> I guess next step in scalability improvement should be binding refactoring
15:54:04 <obondarev_> didn’t have a chance to get scale lab for my tests this week
15:54:33 <obondarev_> should grab it next week and do some verification
15:54:34 <regXboi> I've started an etherpad page: https://etherpad.openstack.org/p/hyper-scale
15:54:40 <Swami> obondarev_: ok thanks
15:54:50 <Swami> regXboi: yes thanks for the link
15:56:10 <Swami> obondarev_: Thanks for the update.
15:56:22 <Swami> we do have 5 more minutes.
15:56:41 <obondarev_> Swami: your patches for scheduling optimization should help scalability as well
15:56:51 <Swami> obondarev_: yes that would.
15:57:11 <Swami> #topic open_discussion
15:57:15 <regXboi> Swami: let's add those as links to the agenda so that people can review
15:57:46 <Swami> regXboi: Yes I added the bug in there but not the patches, will add it to the scalability section.
15:57:53 <regXboi> thx
15:58:23 <Swami> The DVR HA server side patch requires one more core review. #link https://bugs.launchpad.net/neutron/+bug/1365473/
15:58:23 <openstack> Launchpad bug 1365473 in neutron "Unable to create a router that's both HA and distributed" [High,In progress] - Assigned to Adolfo Duarte (adolfo-duarte)
15:58:45 <Swami> I will ping amuller to see if he can bless it.
15:59:21 <Swami> Thanks everyone for joining the meeting.
15:59:27 <Swami> Meet you all next week.
15:59:30 <regXboi> I think we are done - hopefully bugs won't eat us out of house and home next week
15:59:43 <Swami> #endmeeting