15:00:23 <haleyb> #startmeeting neutron_dvr
15:00:26 <openstack> Meeting started Wed May 25 15:00:23 2016 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:29 <openstack> The meeting name has been set to 'neutron_dvr'
15:00:36 <haleyb> #chair Swami
15:00:37 <openstack> Current chairs: Swami haleyb
15:01:08 <haleyb> #topic Announcements
15:02:14 <haleyb> I really have no announcements, other than N-1 is soon (next week)
15:02:37 <Swami> that quick
15:03:05 <carl_baldwin> Always comes quickly.
15:03:42 <Swami> yep
15:03:55 <haleyb> Yes, schedule shows May 30-03
15:04:11 <haleyb> #topic Bugs
15:04:28 <Swami> haleyb: yes
15:04:44 <Swami> This week we had two new bugs that was filed.
15:04:55 <Swami> or kind of tagged with dvr_l3_backlog.
15:04:59 <Swami> The first one.
15:05:20 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1583266
15:05:22 <openstack> Launchpad bug 1583266 in neutron "watch_log_file = true badness" [Undecided,New]
15:05:57 <haleyb> It really seems like DVR is the victim of this, isn't causing it
15:06:24 <Swami> This is related to watch_log_file=True setting. Yes DVR seems to be the most affected since, we are creating/deleting floatingip on different nodes.
15:06:38 <Swami> I am not sure how adding nodes into the mix causes this issue.
15:07:41 <haleyb> There's just not a bug in the DVR code, using LOG.debug() is pretty normal
15:07:48 <Swami> I have not seen any difference, amuller also had a comment in there that he mentioned that he had not seen this till March. I am not sure if there was any change after March that caused this problem.
15:07:53 <haleyb> i'll at least add a comment to the bug
15:08:56 <Swami> There was another bug that was filed against neutron for the live migration.
15:09:00 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1585165
15:09:01 <openstack> Launchpad bug 1585165 in neutron "floating ip not reachable after vm migration" [High,New] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:09:21 <Swami> This seems to me like a duplicate of the bug that we had and addressed in Mitaka.
15:09:44 <carl_baldwin> I just assigned that to you, Swami, to confirm.  I suspected that.
15:09:54 <Swami> But I have not seen the same behavior with the fix.
15:10:24 <Swami> So I need to re-evaluate with my patch and see if I can reproduce the same behavior.
15:10:36 <Swami> carl_baldwin: yes I saw your message and added a comment to it.
15:10:49 <Swami> The unfortunate thing is that the 'nova' patch has not merged yet.
15:10:59 <carl_baldwin> Makes sense.
15:11:12 <Swami> The nova team wants to have the tempest test, before it can merge.
15:11:33 <haleyb> carl_baldwin: we need someone to complete that test, as Swami just said...
15:12:17 <Swami> So I have asked hardik to help me out on the tempest. He said he might have some time tomorrow. If it could not be resolved by tomorrow, we need some help from the tempest or nova folks to fix this test.
15:12:18 <carl_baldwin> Just a tempest test for live migration?
15:12:39 <carl_baldwin> ok
15:12:40 <haleyb> live migration with dvr enabled
15:12:45 <Swami> It seems that we need a tempest test to show that we can ssh into a VM and do a live migration and then the ssh connection does not break.
15:13:25 <Swami> there is already a simple live migration test, but poking into it deeper showed that it might not be doing the right tests.
15:13:40 <Swami> That is what is required by the nova team.
15:15:14 <carl_baldwin> Seems basic enough that this patch shouldn't be held hostage to make it happen.  Seems like it should happen regardless.
15:15:17 <carl_baldwin> But, ok.
15:15:20 <Swami> carl_baldwin or haleyb let me know if you know someone who can help in writing this tempest test, since I am not too comfortable in this tempest test that involves nova api, and neutron scenario tests.
15:15:30 <haleyb> I have reservations about the test being 100% reliable, as a migration could cause a packet drop and result in a connection drop
15:16:09 <Swami> haleyb: yes that is my concern to, on how we are going to achieve it.
15:16:14 <carl_baldwin> Swami: Have you tried Paul from HPE Bristol?  He's done a lot of work with live migration and might have some experience testing.
15:16:32 <haleyb> carl_baldwin: yes, the original comment was it needs to happen, it's turned-into must happen before the nova patch
15:17:05 <Swami> carl_baldwin: I have pinged him couple of times, but he has not reviewed the patch yet. I will try again.
15:18:05 <haleyb> let me add a link for the tempest change to the wiki
15:18:27 <Swami> haleyb: thanks
15:19:23 <Swami> This is the patch that Matt Reidemann was workin on for the tempest test. #link https://review.openstack.org/#/c/286855/
15:19:40 <haleyb> thanks
15:20:09 <Swami> ok, the next one in the list is
15:20:30 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1564776
15:20:31 <openstack> Launchpad bug 1564776 in neutron "DVR l3 agent should check for snat namespace existence before adding or deleting anything from the namespace" [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:20:52 <Swami> carl_baldwin: since you are here, we need you to take a look at this patch
15:21:07 <Swami> #link  https://review.openstack.org/#/c/300358/
15:21:35 <Swami> haleyb and myself had a chat couple of days back about the issue.
15:21:37 <carl_baldwin> Swami: ok
15:21:41 <Swami> But we need your opinion.
15:22:07 <Swami> There seems to be a small race when the namespace is checked and when the device is configured in the namespace through ip_lib.
15:22:36 <Swami> So haleyb recommended that we should not silently ignore it, but raise a warning message and also should recreate the namespace if not available.
15:23:01 <carl_baldwin> I'll take a look.
15:23:08 <Swami> Re-creating the namespace is a generic approach that we should take on all the namespaces.
15:23:34 <Swami> So I thought we should take it up in a different patch where we address, recreating the namespaces and reverting to its original state.
15:23:56 <haleyb> But it's almost an async event, as creating to add an IP address won't rebuild it completely
15:24:05 <Swami> carl_baldwin: yes let me know your thoughts on the patch.
15:24:33 <carl_baldwin> ok
15:24:35 <Swami> haleyb: we need to kind of cache the state and rebuild it completely if something like this happens.
15:25:32 <Swami> The next in the list is
15:25:36 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1541406
15:25:37 <openstack> Launchpad bug 1541406 in neutron "IPv6 prefix delegation does not work with DVR" [Medium,In progress] - Assigned to Ritesh Anand (ritesh-anand)
15:26:11 <haleyb> looks like that just needs a review
15:26:17 <Swami> This patch is pending for a long time.
15:26:20 <Swami> #link https://review.openstack.org/#/c/277657/
15:26:40 <Swami> did the dependent patches got merged.
15:26:40 <haleyb> it was waiting on another prefix delegation patch that has merged
15:26:52 * haleyb should just let swami talk :)
15:27:32 <Swami> haleyb: I went ahead of you.
15:28:32 <Swami> That's all I had for the bugs today.
15:28:54 <Swami> I do have a topic to discuss, we can take it up in the open discussion. This is related to a bug.
15:29:33 <haleyb> Ok, we can cover your RFE work there as well
15:30:00 <Swami> haleyb: thanks
15:30:31 <haleyb> #topic Gate failures
15:31:25 <Swami> The graphite link was not working today.
15:31:34 <haleyb> There have been other failures causing more issues than any dvr or dvr multinode job failure
15:32:05 <haleyb> and yes, today everything is broken, so we will just have to shelve it for next week as there is no status to see
15:32:22 <Swami> no problem
15:32:31 <haleyb> #topic Stable backports
15:33:01 <haleyb> Swami: you've been trying to get those three ipdevice changes to mitaka, still broken?
15:33:44 <Swami> Yes, the last one is still not working and gives me the change-id issue. I can't resolve, it. If I add my id, it accepts but that is not what we want.
15:34:22 <Swami> haleyb: Also I have a bunch of patches that have comments from Ihar on the reason for backport.
15:34:37 <Swami> #link https://review.openstack.org/#/c/313130/ This is the first one in the series and it has many child patches.
15:34:58 <haleyb> should we get the other two in in that series, that way gerrit might be able to do a cherry-pick from the GUI successfully
15:35:08 <Swami> haleyb: can you take a look at it and see how we can resolve it with a clean backport.
15:35:37 <Swami> haleyb: you mean the patch to the ipdevice.
15:35:42 <haleyb> 313130 ?
15:35:49 <haleyb> yes, i'm typing slow
15:36:25 <haleyb> 313130 is a setup for others, let me look
15:36:29 <Swami> Yes 313130 is the first patch and all other patches depend on it.
15:37:07 <haleyb> any other backports?
15:37:27 <Swami> There is one more.
15:37:44 <Swami> I have been also updating the 'Etherpad' link that you posted last week with all the backports.
15:38:06 <haleyb> https://etherpad.openstack.org/p/stable-bug-candidates-from-master
15:38:12 <Swami> #link https://review.openstack.org/#/c/319397/
15:38:22 <Swami> This patch needs another +2.
15:38:35 <Swami> This backport is required only for mitaka and not for liberty.
15:38:48 <haleyb> let me look, i have my Super Powers now :)
15:39:53 <Swami> haleyb: Yeah :))
15:40:26 <haleyb> #topic Open Discussion
15:40:34 <haleyb> Swami: you had some items
15:41:04 <Swami> Yes, this is regarding the floatingip and allowed_address_pair that is associated with multiple VMs that are active.
15:41:20 <haleyb> the fix for lbaas
15:41:26 <Swami> Yes
15:41:51 <Swami> Based on your suggestion I was thinking on can we have this floatingip for this use case addressed by the network-node.
15:42:10 <Swami> Which is a kind of a hybrid scenario.
15:42:38 <Swami> The option that i have is, let us make it user configurable to override the DVR fip behavior for the unbound ports.
15:42:49 <haleyb> so it's a special case based on device owner ?
15:43:18 <Swami> It will be special case of any 'unbound' ports, we are not going to even check for the device owner in this case.
15:43:55 <haleyb> Did you start writing an RFE already :)
15:43:58 <Swami> We don't want to restrict this to just the lbaas, but for any application that uses HA.
15:44:07 <Swami> Yes it is already captured in the RFE.
15:44:11 <Swami> Let me post the link.
15:44:39 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1583694
15:44:40 <openstack> Launchpad bug 1583694 in neutron "[RFE] DVR support for Allowed_address_pair port that are bound to multiple ACTIVE VM ports" [Wishlist,Confirmed]
15:45:12 <haleyb> thanks, added myself
15:46:01 <Swami> ok, the way I am planning to approach this is, I am going to utilize the 'SNAT_Namespace" to add the floatingip functionality for the private IP's that are connected to the unbount allowed_address_pair.
15:46:22 <Swami> This will work with DVR, since all node traffic by default will be forwarded to the SNAT namespace.
15:47:01 <haleyb> Right, sounds good to me
15:47:03 <Swami> In the snat namespace we can add the iptable rules to apply DNAT for the ip's configured for fip.
15:47:18 <Swami> We will not be touching the router_namespace.
15:47:30 <Swami> The only dependency here is the SNAT namespace and the config option.
15:47:42 <Swami> Do you see any issue in backporting the config option?
15:48:11 <haleyb> Yes, new config options are enhancements, and not typically allowed
15:48:40 <Swami> So will that be a problem, if we cannot backport this feature and just make it to work in newton.
15:48:45 <haleyb> even if default is False and getting current behavior
15:49:11 <Swami> Yes if the default is False we will get the current behavior, no issues there.
15:49:16 <haleyb> I haven't looked at the RFE, what is the config controlling ?
15:49:25 <Swami> I have posted a patch.
15:49:43 <Swami> #link https://review.openstack.org/#/c/320669/
15:51:11 <haleyb> i think i added myself, but didn't look closely
15:51:20 <Swami> haleyb: no problem, take a look at it.
15:51:36 <Swami> haleyb: I will try to work on the agent side patch today and see how it goes.
15:51:47 <haleyb> ok, thanks.  anything else?
15:51:59 <Swami> haleyb: agent side should be little complex.
15:52:06 <Swami> That's all I have.
15:52:51 <haleyb> I just remembered I think I found another bug related to DVR Monday
15:53:04 <Swami> haleyb: One more fun.
15:53:29 <haleyb> The metering agent doesn't know how to meter FIP traffic, it only can handle things via the centralized router, so only default SNAT in DVR
15:54:04 <Swami> Have you filed a bug already.
15:54:59 <haleyb> No, i was helping someone trying to use it.  Looked like a config issue, but once we got through that looks like it's broken.  I'll file a bug today
15:55:26 <Swami> ok, will take it.
15:56:08 <haleyb> the metering agent hasn't been maintained much, so guess we should have known dvr would break it
15:56:39 <haleyb> that's all i had, anything else from anyone here?
15:57:16 <Swami_> I should take a look at it.
15:57:34 <Swami_> I don't have anything else to discuss more.
15:57:46 <haleyb> Swami: i'll forward you the email
15:58:02 <Swami_> haleyb: thanks
15:58:24 <haleyb> that's it, thanks everyone
15:58:28 <haleyb> #endmeeting