15:00:28 #startmeeting neutron_dvr 15:00:30 Meeting started Wed Apr 6 15:00:28 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:31 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:33 The meeting name has been set to 'neutron_dvr' 15:00:39 #chair Swami 15:00:39 Current chairs: Swami haleyb 15:01:18 #topic Announcements 15:01:39 hi 15:01:45 obondarev: hi 15:01:49 Just realized i forgot to update this section on the wiki 15:02:17 haleyb: I updated a bit 15:02:35 Guess my only announcement is we've been seeing some issues and proposed some reverts, can cover that in bugs 15:02:37 haleyb: I just updated about the RC3, do you have anything more. 15:03:06 oh, guess I forgot about RC3, yes, it's final final i hope 15:03:32 #topic Bugs 15:03:38 haleyb: hi 15:03:52 This week as you mentioned we have uncovered more bugs. 15:04:13 #link https://bugs.launchpad.net/neutron/+bug/1564776 15:04:14 Launchpad bug 1564776 in neutron "DVR l3 agent should check for snat namespace existence before adding or deleting anything from the namespace" [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:04:57 this one is about manual deleting of snat namespace, right? 15:05:03 This is related to the agent not checking for the existence of the namespace before proceeding with adding or deleting interfaces to the namespace. 15:05:28 obondarev: yes, but it also covers, certain uncleaned namespaces which is caused due to some errors. 15:05:43 This throws in a bunch of log messages in the l3 agent over and over. 15:05:56 Right now i have a patch under review. 15:06:17 #link https://review.openstack.org/#/c/300358/ 15:07:17 Do you have anymore questions on this bug/patch or can I move on. 15:07:56 not from my side 15:08:08 i'll just go review it 15:08:14 ok thanks. 15:08:17 Let us move on 15:08:38 #link https://bugs.launchpad.net/neutron/+bug/1564757 15:08:39 Launchpad bug 1564757 in neutron "Configure DVR fip router fail when l3 agent restarts and it cannot be recovered" [Undecided,New] - Assigned to RaoFei (milo-frao) 15:08:59 This is a new bug filed this week, there is no patch yet for this bug. I need to triage it. 15:09:32 This is related to unfinished FIP namespace state while the agent restarts. 15:10:10 I will try to validate this. There might be some clean up loop that we should be handling for unfinished or errored out floatingips with DVR. 15:10:29 do you have to kill things at just the right time for this? 15:11:18 1. loop create and delete fip on neutron-server side. 15:11:18 2. loop restart l3 agent. 15:11:28 haleyb: What I think is just verify the namespace that is in the node with the namespace data that we have in the agent cache and then clean up if not required. Also when Floatingip errors out, go through the clean loop to clean it up. 15:11:46 haleyb: Yes this is timing related, but we can try to reduce it. 15:12:32 Basically we need to make sure that the agent restart logic is solid with the DVR routers. 15:13:05 Any other questions on this 15:13:09 right, it just doesn't seem the highest priority, but should get it fixed 15:13:27 haleyb: you right. 15:13:41 haleyb: probably it should be a low. 15:14:01 #link https://bugs.launchpad.net/neutron/+bug/1557290 15:14:02 Launchpad bug 1557290 in neutron "DVR FIP agent gateway does not pass traffic directed at fixed IP" [Low,Triaged] 15:14:21 #link https://bugs.launchpad.net/neutron/+bug/1462154 15:14:22 Launchpad bug 1462154 in neutron "With DVR Pings to floating IPs replied with fixed-ips if VMs are on the same network" [High,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:14:51 i have some comments on this 15:14:55 Sorry the first link is the wrong link 15:14:59 haleyb: yes. 15:15:24 This bug was reopened recently by carl_baldwin. 15:15:30 carl_baldwin proposed revert https://review.openstack.org/#/c/301348/ 15:15:43 i wrote-up the explanation why in that review 15:15:50 The reason is the snat functionality is partially done in a router namespace rather than in a snat namespace. 15:16:02 basically it looks like we could have a 5-tuple collision 15:16:32 I'm catching up on the comments. 15:16:54 Swami: right. one solution might be to change the nat code to use port-ranges, such that the snat and qrouter namespace don't collide 15:17:16 The other is to use your patch, which sends everything via the snat namespace 15:18:33 haleyb: yes we should look through it. The issue that Xiao had with my patch was, he was more concerned about the behavior that the packet was going through snat and looped back and reached the fipnamespace even if they both reside on the same node. 15:19:34 Swami: yes, non-optimal but functionally correct. What do you think of forcing a port-range ? 15:19:44 In most cases, it will go through snat ns anyway. 15:20:08 It is just trying to optimize the case where two vms happen to land on the same host. It isn't a compelling optimization IMO. 15:20:20 carl_baldwin: agreed 15:20:56 But, I'm willing to entertain the port range idea a bit too. Though, it does seem a bit like a hack to optimize a lucky case. 15:21:42 carl_baldwin: yes, the odds are small that two are on the same node, except in our non-real-world testing :) 15:21:56 haleyb: very good point! 15:22:00 haleyb:+1 15:22:32 when we can support 500 VMs per compute... 15:24:13 Any other questions on this bug? if not we can move on. 15:25:15 Ok, the next one is 15:25:20 #link https://bugs.launchpad.net/neutron/+bug/1566046 15:25:21 Launchpad bug 1566046 in neutron "Fix TypeError when trying to update an arp entry for ports with allowed_address_pairs on DVR router" [Medium,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:25:43 There is a patch under review for this bug. 15:25:57 #link https://review.openstack.org/#/c/301410/ 15:27:53 is anyone there 15:28:01 I can't see any response 15:28:06 yes, i was looking at the review 15:28:54 haleyb: thanks 15:29:08 It was silent, so I just thought I got dropped. 15:30:05 next? 15:30:14 yes, let me move on 15:30:30 #link https://bugs.launchpad.net/neutron/+bug/1564575 - (MUST FIX) (SNAT Move) 15:30:31 Launchpad bug 1564575 in neutron "DVR router namespaces are deleted when we manually move a DVR router from one SNAT_node to another SNAT_node even though there are active VMs in the node" [Medium,In progress] - Assigned to Brian Haley (brian-haley) 15:30:31 Patch: * https://review.openstack.org/#/c/30 15:30:46 The patch to address this bug is 15:31:00 #link https://review.openstack.org/#/c/300268/ 15:31:09 I need to add some test cases for this patch. 15:31:47 But before we address this patch, there is an auto-scheduler issue with DVR routers that does not help this patch. 15:32:02 So I have reverted the auto-schedule enablement for the dvr router patch. 15:32:16 #link https://review.openstack.org/#/c/301880/ 15:32:54 I did see a comment from obondarev that there is another patch that assaf owns that might work. 15:32:57 Swami: Oleg mentioned https://review.openstack.org/#/c/285480 in that 15:33:11 I have not tested with assaf patch, but will test it out. 15:33:14 that removes the auto-schedue code 15:33:24 yeah 15:33:29 that should help 15:33:42 anyway I thing it's too radical to revert 15:33:52 will test it out, then we can probably push assaf patch first and then address the snat move patch, otherwise it would not work, or I will make mine dependent on the other. 15:34:26 obondarev: sorry I was not intending to revert, but that's how it works. 15:35:39 Ok, that's all I had for the new bugs this week. 15:35:44 Swami: no worries, I just want we handle it as correctly 15:35:55 as possible 15:36:04 obondarev: yes understood. 15:36:53 obondarev: also with the snat manual move, I just want to check if it does not affect the snat ha logic, if it has any dependency on auto-scheduler. 15:37:47 I also have a patch related to SNAT move and also addresses this bug. 15:37:51 #link https://bugs.launchpad.net/neutron/+bug/1557909 15:37:52 Launchpad bug 1557909 in neutron "SNAT namespace is not getting cleared after the manual move of SNAT with dead agent" [Medium,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:38:08 #link https://review.openstack.org/#/c/302068/ 15:38:14 This patch needs review 15:38:49 Right now this bug has been filed against liberty, but this will be also seen when we introduce the snat move again in master. 15:40:03 Quick question about https://bugs.launchpad.net/neutron/+bug/1562110 15:40:04 Launchpad bug 1562110 in neutron "link-lock-address allocater for DVR has a limit of 256 address pairs per node" [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:40:31 carl_baldwin: did you want us to remove the constants.py changes from https://review.openstack.org/#/c/297839/ for this? 15:40:33 haleyb: yes 15:40:50 * carl_baldwin looks... 15:41:22 yes 15:41:29 sorry got disconnected 15:41:50 carl_baldwin: ok, i'll split it out as I added it in the first place 15:42:16 haleyb: what are we planning to split 15:42:51 Swami: removing the constants.py changes from https://review.openstack.org/#/c/297839/ into a follow-on. i.e. fix the bug first, then make it pretty 15:43:12 haleyb: ok got it 15:43:24 i'll take care of it toda 15:43:48 haleyb: thanks 15:44:11 that's all I have for the bugs today. 15:45:27 Any other bugs anyone wants to bring up ? 15:46:26 nothing more. 15:47:09 #topic Gate Failures 15:47:39 The multinode failures have subsided. 15:48:37 But the delta for the single node check job is above normal. 15:49:11 yes, still higher than base job, i can't say i've seen a clear-cut pattern (yet) 15:49:17 I haven't had a chance to see the details, but this is based on the graph. 15:49:50 haleyb: you are right it always fluctuates. 15:51:23 i always say i'll spend a day looking, but never found that much time, so parly just relying on each of us to raise when a specific issue is noticed 15:51:52 haleyb: yes same here 15:52:07 i.e. if you see a failure in one of your patches, look in logs before running recheck 15:52:29 ++ 15:52:38 haleyb:++ 15:52:40 or in patches that you review 15:53:02 we could always blame it in tox_install.sh, or whatever that was last week 15:53:05 :) 15:53:29 moving on... 15:53:38 #topic Stable backports 15:54:37 imo, https://review.openstack.org/#/c/273235/ and https://review.openstack.org/#/c/273236/ need to merge into stable/libery, as they make some base changes that make other backports easier 15:55:16 haleyb: ok will take a look at those patches. 15:55:19 ihar +2'd them today, need to get someone else with stable +2 power to take a look 15:55:50 haleyb: good 15:56:53 #topic Open Discussion 15:57:08 any other issues, got 3 minutes 15:57:18 haleyb: the nova patch that we had for live migration is still idle. 15:57:57 https://review.openstack.org/#/c/275073/ 15:58:31 Do we need to get the tempest change in? maybe ping mattm on that patch 15:58:36 haleyb: yes, it has +1s from the neutron cores right now. 15:58:54 haleyb: ok will ping him again today 15:59:18 i think he was fine with the nova patch merging without the tempest one 15:59:32 haleyb: yes that's what I thought as well 15:59:58 if we can get one +2 the other should follow quickly 16:00:12 haleyb: yes 16:00:14 top of hour, thanks everyone 16:00:17 #endmeeting