15:00:23 #startmeeting neutron_dvr 15:00:26 Meeting started Wed May 25 15:00:23 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:29 The meeting name has been set to 'neutron_dvr' 15:00:36 #chair Swami 15:00:37 Current chairs: Swami haleyb 15:01:08 #topic Announcements 15:02:14 I really have no announcements, other than N-1 is soon (next week) 15:02:37 that quick 15:03:05 Always comes quickly. 15:03:42 yep 15:03:55 Yes, schedule shows May 30-03 15:04:11 #topic Bugs 15:04:28 haleyb: yes 15:04:44 This week we had two new bugs that was filed. 15:04:55 or kind of tagged with dvr_l3_backlog. 15:04:59 The first one. 15:05:20 #link https://bugs.launchpad.net/neutron/+bug/1583266 15:05:22 Launchpad bug 1583266 in neutron "watch_log_file = true badness" [Undecided,New] 15:05:57 It really seems like DVR is the victim of this, isn't causing it 15:06:24 This is related to watch_log_file=True setting. Yes DVR seems to be the most affected since, we are creating/deleting floatingip on different nodes. 15:06:38 I am not sure how adding nodes into the mix causes this issue. 15:07:41 There's just not a bug in the DVR code, using LOG.debug() is pretty normal 15:07:48 I have not seen any difference, amuller also had a comment in there that he mentioned that he had not seen this till March. I am not sure if there was any change after March that caused this problem. 15:07:53 i'll at least add a comment to the bug 15:08:56 There was another bug that was filed against neutron for the live migration. 15:09:00 #link https://bugs.launchpad.net/neutron/+bug/1585165 15:09:01 Launchpad bug 1585165 in neutron "floating ip not reachable after vm migration" [High,New] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:09:21 This seems to me like a duplicate of the bug that we had and addressed in Mitaka. 15:09:44 I just assigned that to you, Swami, to confirm. I suspected that. 15:09:54 But I have not seen the same behavior with the fix. 15:10:24 So I need to re-evaluate with my patch and see if I can reproduce the same behavior. 15:10:36 carl_baldwin: yes I saw your message and added a comment to it. 15:10:49 The unfortunate thing is that the 'nova' patch has not merged yet. 15:10:59 Makes sense. 15:11:12 The nova team wants to have the tempest test, before it can merge. 15:11:33 carl_baldwin: we need someone to complete that test, as Swami just said... 15:12:17 So I have asked hardik to help me out on the tempest. He said he might have some time tomorrow. If it could not be resolved by tomorrow, we need some help from the tempest or nova folks to fix this test. 15:12:18 Just a tempest test for live migration? 15:12:39 ok 15:12:40 live migration with dvr enabled 15:12:45 It seems that we need a tempest test to show that we can ssh into a VM and do a live migration and then the ssh connection does not break. 15:13:25 there is already a simple live migration test, but poking into it deeper showed that it might not be doing the right tests. 15:13:40 That is what is required by the nova team. 15:15:14 Seems basic enough that this patch shouldn't be held hostage to make it happen. Seems like it should happen regardless. 15:15:17 But, ok. 15:15:20 carl_baldwin or haleyb let me know if you know someone who can help in writing this tempest test, since I am not too comfortable in this tempest test that involves nova api, and neutron scenario tests. 15:15:30 I have reservations about the test being 100% reliable, as a migration could cause a packet drop and result in a connection drop 15:16:09 haleyb: yes that is my concern to, on how we are going to achieve it. 15:16:14 Swami: Have you tried Paul from HPE Bristol? He's done a lot of work with live migration and might have some experience testing. 15:16:32 carl_baldwin: yes, the original comment was it needs to happen, it's turned-into must happen before the nova patch 15:17:05 carl_baldwin: I have pinged him couple of times, but he has not reviewed the patch yet. I will try again. 15:18:05 let me add a link for the tempest change to the wiki 15:18:27 haleyb: thanks 15:19:23 This is the patch that Matt Reidemann was workin on for the tempest test. #link https://review.openstack.org/#/c/286855/ 15:19:40 thanks 15:20:09 ok, the next one in the list is 15:20:30 #link https://bugs.launchpad.net/neutron/+bug/1564776 15:20:31 Launchpad bug 1564776 in neutron "DVR l3 agent should check for snat namespace existence before adding or deleting anything from the namespace" [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:20:52 carl_baldwin: since you are here, we need you to take a look at this patch 15:21:07 #link https://review.openstack.org/#/c/300358/ 15:21:35 haleyb and myself had a chat couple of days back about the issue. 15:21:37 Swami: ok 15:21:41 But we need your opinion. 15:22:07 There seems to be a small race when the namespace is checked and when the device is configured in the namespace through ip_lib. 15:22:36 So haleyb recommended that we should not silently ignore it, but raise a warning message and also should recreate the namespace if not available. 15:23:01 I'll take a look. 15:23:08 Re-creating the namespace is a generic approach that we should take on all the namespaces. 15:23:34 So I thought we should take it up in a different patch where we address, recreating the namespaces and reverting to its original state. 15:23:56 But it's almost an async event, as creating to add an IP address won't rebuild it completely 15:24:05 carl_baldwin: yes let me know your thoughts on the patch. 15:24:33 ok 15:24:35 haleyb: we need to kind of cache the state and rebuild it completely if something like this happens. 15:25:32 The next in the list is 15:25:36 #link https://bugs.launchpad.net/neutron/+bug/1541406 15:25:37 Launchpad bug 1541406 in neutron "IPv6 prefix delegation does not work with DVR" [Medium,In progress] - Assigned to Ritesh Anand (ritesh-anand) 15:26:11 looks like that just needs a review 15:26:17 This patch is pending for a long time. 15:26:20 #link https://review.openstack.org/#/c/277657/ 15:26:40 did the dependent patches got merged. 15:26:40 it was waiting on another prefix delegation patch that has merged 15:26:52 * haleyb should just let swami talk :) 15:27:32 haleyb: I went ahead of you. 15:28:32 That's all I had for the bugs today. 15:28:54 I do have a topic to discuss, we can take it up in the open discussion. This is related to a bug. 15:29:33 Ok, we can cover your RFE work there as well 15:30:00 haleyb: thanks 15:30:31 #topic Gate failures 15:31:25 The graphite link was not working today. 15:31:34 There have been other failures causing more issues than any dvr or dvr multinode job failure 15:32:05 and yes, today everything is broken, so we will just have to shelve it for next week as there is no status to see 15:32:22 no problem 15:32:31 #topic Stable backports 15:33:01 Swami: you've been trying to get those three ipdevice changes to mitaka, still broken? 15:33:44 Yes, the last one is still not working and gives me the change-id issue. I can't resolve, it. If I add my id, it accepts but that is not what we want. 15:34:22 haleyb: Also I have a bunch of patches that have comments from Ihar on the reason for backport. 15:34:37 #link https://review.openstack.org/#/c/313130/ This is the first one in the series and it has many child patches. 15:34:58 should we get the other two in in that series, that way gerrit might be able to do a cherry-pick from the GUI successfully 15:35:08 haleyb: can you take a look at it and see how we can resolve it with a clean backport. 15:35:37 haleyb: you mean the patch to the ipdevice. 15:35:42 313130 ? 15:35:49 yes, i'm typing slow 15:36:25 313130 is a setup for others, let me look 15:36:29 Yes 313130 is the first patch and all other patches depend on it. 15:37:07 any other backports? 15:37:27 There is one more. 15:37:44 I have been also updating the 'Etherpad' link that you posted last week with all the backports. 15:38:06 https://etherpad.openstack.org/p/stable-bug-candidates-from-master 15:38:12 #link https://review.openstack.org/#/c/319397/ 15:38:22 This patch needs another +2. 15:38:35 This backport is required only for mitaka and not for liberty. 15:38:48 let me look, i have my Super Powers now :) 15:39:53 haleyb: Yeah :)) 15:40:26 #topic Open Discussion 15:40:34 Swami: you had some items 15:41:04 Yes, this is regarding the floatingip and allowed_address_pair that is associated with multiple VMs that are active. 15:41:20 the fix for lbaas 15:41:26 Yes 15:41:51 Based on your suggestion I was thinking on can we have this floatingip for this use case addressed by the network-node. 15:42:10 Which is a kind of a hybrid scenario. 15:42:38 The option that i have is, let us make it user configurable to override the DVR fip behavior for the unbound ports. 15:42:49 so it's a special case based on device owner ? 15:43:18 It will be special case of any 'unbound' ports, we are not going to even check for the device owner in this case. 15:43:55 Did you start writing an RFE already :) 15:43:58 We don't want to restrict this to just the lbaas, but for any application that uses HA. 15:44:07 Yes it is already captured in the RFE. 15:44:11 Let me post the link. 15:44:39 #link https://bugs.launchpad.net/neutron/+bug/1583694 15:44:40 Launchpad bug 1583694 in neutron "[RFE] DVR support for Allowed_address_pair port that are bound to multiple ACTIVE VM ports" [Wishlist,Confirmed] 15:45:12 thanks, added myself 15:46:01 ok, the way I am planning to approach this is, I am going to utilize the 'SNAT_Namespace" to add the floatingip functionality for the private IP's that are connected to the unbount allowed_address_pair. 15:46:22 This will work with DVR, since all node traffic by default will be forwarded to the SNAT namespace. 15:47:01 Right, sounds good to me 15:47:03 In the snat namespace we can add the iptable rules to apply DNAT for the ip's configured for fip. 15:47:18 We will not be touching the router_namespace. 15:47:30 The only dependency here is the SNAT namespace and the config option. 15:47:42 Do you see any issue in backporting the config option? 15:48:11 Yes, new config options are enhancements, and not typically allowed 15:48:40 So will that be a problem, if we cannot backport this feature and just make it to work in newton. 15:48:45 even if default is False and getting current behavior 15:49:11 Yes if the default is False we will get the current behavior, no issues there. 15:49:16 I haven't looked at the RFE, what is the config controlling ? 15:49:25 I have posted a patch. 15:49:43 #link https://review.openstack.org/#/c/320669/ 15:51:11 i think i added myself, but didn't look closely 15:51:20 haleyb: no problem, take a look at it. 15:51:36 haleyb: I will try to work on the agent side patch today and see how it goes. 15:51:47 ok, thanks. anything else? 15:51:59 haleyb: agent side should be little complex. 15:52:06 That's all I have. 15:52:51 I just remembered I think I found another bug related to DVR Monday 15:53:04 haleyb: One more fun. 15:53:29 The metering agent doesn't know how to meter FIP traffic, it only can handle things via the centralized router, so only default SNAT in DVR 15:54:04 Have you filed a bug already. 15:54:59 No, i was helping someone trying to use it. Looked like a config issue, but once we got through that looks like it's broken. I'll file a bug today 15:55:26 ok, will take it. 15:56:08 the metering agent hasn't been maintained much, so guess we should have known dvr would break it 15:56:39 that's all i had, anything else from anyone here? 15:57:16 I should take a look at it. 15:57:34 I don't have anything else to discuss more. 15:57:46 Swami: i'll forward you the email 15:58:02 haleyb: thanks 15:58:24 that's it, thanks everyone 15:58:28 #endmeeting