15:00:28 <haleyb> #startmeeting neutron_dvr
15:00:30 <openstack> Meeting started Wed Apr  6 15:00:28 2016 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:33 <openstack> The meeting name has been set to 'neutron_dvr'
15:00:39 <haleyb> #chair Swami
15:00:39 <openstack> Current chairs: Swami haleyb
15:01:18 <haleyb> #topic Announcements
15:01:39 <obondarev> hi
15:01:45 <Swami> obondarev: hi
15:01:49 <haleyb> Just realized i forgot to update this section on the wiki
15:02:17 <Swami> haleyb: I updated a bit
15:02:35 <haleyb> Guess my only announcement is we've been seeing some issues and proposed some reverts, can cover that in bugs
15:02:37 <Swami> haleyb: I just updated about the RC3, do you have anything more.
15:03:06 <haleyb> oh, guess I forgot about RC3, yes, it's final final i hope
15:03:32 <haleyb> #topic Bugs
15:03:38 <Swami> haleyb: hi
15:03:52 <Swami> This week as you mentioned we have uncovered more bugs.
15:04:13 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1564776
15:04:14 <openstack> Launchpad bug 1564776 in neutron "DVR l3 agent should check for snat namespace existence before adding or deleting anything from the namespace" [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:04:57 <obondarev> this one is about manual deleting of snat namespace, right?
15:05:03 <Swami> This is related to the agent not checking for the existence of the namespace before proceeding with adding or deleting interfaces to the namespace.
15:05:28 <Swami> obondarev: yes, but it also covers, certain uncleaned namespaces which is caused due to some errors.
15:05:43 <Swami> This throws in a bunch of log messages in the l3 agent over and over.
15:05:56 <Swami> Right now i have a patch under review.
15:06:17 <Swami> #link https://review.openstack.org/#/c/300358/
15:07:17 <Swami> Do you have anymore questions on this bug/patch or can I move on.
15:07:56 <obondarev> not from my side
15:08:08 <haleyb> i'll just go review it
15:08:14 <Swami> ok thanks.
15:08:17 <Swami> Let us move on
15:08:38 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1564757
15:08:39 <openstack> Launchpad bug 1564757 in neutron "Configure DVR fip router fail when l3 agent restarts and it cannot be recovered" [Undecided,New] - Assigned to RaoFei (milo-frao)
15:08:59 <Swami> This is a new bug filed this week, there is no patch yet for this bug. I need to triage it.
15:09:32 <Swami> This is related to unfinished FIP namespace state while the agent restarts.
15:10:10 <Swami> I will try to validate this. There might be some clean up loop that we should be handling for unfinished or errored out floatingips with DVR.
15:10:29 <haleyb> do you have to kill things at just the right time for this?
15:11:18 <haleyb> 1. loop create and delete fip on neutron-server side.
15:11:18 <haleyb> 2. loop restart l3 agent.
15:11:28 <Swami> haleyb: What I think is just verify the namespace that is in the node with the namespace data that we have in the agent cache and then clean up if not required. Also when Floatingip errors out, go through the clean loop to clean it up.
15:11:46 <Swami> haleyb: Yes this is timing related, but we can try to reduce it.
15:12:32 <Swami> Basically we need to make sure that the agent restart logic is solid with the DVR routers.
15:13:05 <Swami> Any other questions on this
15:13:09 <haleyb> right, it just doesn't seem the highest priority, but should get it fixed
15:13:27 <Swami> haleyb: you right.
15:13:41 <Swami> haleyb: probably it should be a low.
15:14:01 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1557290
15:14:02 <openstack> Launchpad bug 1557290 in neutron "DVR FIP agent gateway does not pass traffic directed at fixed IP" [Low,Triaged]
15:14:21 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1462154
15:14:22 <openstack> Launchpad bug 1462154 in neutron "With DVR Pings to floating IPs replied with fixed-ips if VMs are on the same network" [High,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:14:51 <haleyb> i have some comments on this
15:14:55 <Swami> Sorry the first link is the wrong link
15:14:59 <Swami> haleyb: yes.
15:15:24 <Swami> This bug was reopened recently by carl_baldwin.
15:15:30 <haleyb> carl_baldwin proposed revert https://review.openstack.org/#/c/301348/
15:15:43 <haleyb> i wrote-up the explanation why in that review
15:15:50 <Swami> The reason is the snat functionality is partially done in a router namespace rather than in a snat namespace.
15:16:02 <haleyb> basically it looks like we could have a 5-tuple collision
15:16:32 <carl_baldwin> I'm catching up on the comments.
15:16:54 <haleyb> Swami: right.  one solution might be to change the nat code to use port-ranges, such that the snat and qrouter namespace don't collide
15:17:16 <haleyb> The other is to use your patch, which sends everything via the snat namespace
15:18:33 <Swami> haleyb: yes we should look through it. The issue that Xiao had with my patch was, he was more concerned about the behavior that the packet was going through snat and looped back and reached the fipnamespace even if they both reside on the same node.
15:19:34 <haleyb> Swami: yes, non-optimal but functionally correct.  What do you think of forcing a port-range ?
15:19:44 <carl_baldwin> In most cases, it will go through snat ns anyway.
15:20:08 <carl_baldwin> It is just trying to optimize the case where two vms happen to land on the same host.  It isn't a compelling optimization IMO.
15:20:20 <Swami> carl_baldwin: agreed
15:20:56 <carl_baldwin> But, I'm willing to entertain the port range idea a bit too.  Though, it does seem a bit like a hack to optimize a lucky case.
15:21:42 <haleyb> carl_baldwin: yes, the odds are small that two are on the same node, except in our non-real-world testing :)
15:21:56 <carl_baldwin> haleyb: very good point!
15:22:00 <Swami> haleyb:+1
15:22:32 <haleyb> when we can support 500 VMs per compute...
15:24:13 <Swami> Any other questions on this bug? if not we can move on.
15:25:15 <Swami> Ok, the next one is
15:25:20 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1566046
15:25:21 <openstack> Launchpad bug 1566046 in neutron "Fix TypeError when trying to update an arp entry for ports with allowed_address_pairs on DVR router" [Medium,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:25:43 <Swami> There is a patch under review for this bug.
15:25:57 <Swami> #link https://review.openstack.org/#/c/301410/
15:27:53 <Swami> is anyone there
15:28:01 <Swami> I can't see any response
15:28:06 <haleyb> yes, i was looking at the review
15:28:54 <Swami> haleyb: thanks
15:29:08 <Swami> It was silent, so I just thought I got dropped.
15:30:05 <haleyb> next?
15:30:14 <Swami> yes, let me move on
15:30:30 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1564575 - (MUST FIX) (SNAT Move)
15:30:31 <openstack> Launchpad bug 1564575 in neutron "DVR router namespaces are deleted when we manually move a DVR router from one SNAT_node to another SNAT_node even though there are active VMs in the node" [Medium,In progress] - Assigned to Brian Haley (brian-haley)
15:30:31 <Swami> Patch: * https://review.openstack.org/#/c/30
15:30:46 <Swami> The patch to address this bug is
15:31:00 <Swami> #link https://review.openstack.org/#/c/300268/
15:31:09 <Swami> I need to add some test cases for this patch.
15:31:47 <Swami> But before we address this patch, there is an auto-scheduler issue with DVR routers that does not help this patch.
15:32:02 <Swami> So I have reverted the auto-schedule enablement for the dvr router patch.
15:32:16 <Swami> #link https://review.openstack.org/#/c/301880/
15:32:54 <Swami> I did see a comment from obondarev that there is another patch that assaf owns that might work.
15:32:57 <haleyb> Swami: Oleg mentioned https://review.openstack.org/#/c/285480 in that
15:33:11 <Swami> I have not tested with assaf patch, but will test it out.
15:33:14 <haleyb> that removes the auto-schedue code
15:33:24 <obondarev> yeah
15:33:29 <obondarev> that should help
15:33:42 <obondarev> anyway I thing it's too radical to revert
15:33:52 <Swami> will test it out, then we can probably push assaf patch first and then address the snat move patch, otherwise it would not work, or I will make mine dependent on the other.
15:34:26 <Swami> obondarev: sorry I was not intending to revert, but that's how it works.
15:35:39 <Swami> Ok, that's all I had for the new bugs this week.
15:35:44 <obondarev> Swami: no worries, I just want we handle it as correctly
15:35:55 <obondarev> as possible
15:36:04 <Swami> obondarev: yes understood.
15:36:53 <Swami> obondarev: also with the snat manual move, I just want to check if it does not affect the snat ha logic, if it has any dependency on auto-scheduler.
15:37:47 <Swami> I also have a patch related to SNAT move and also addresses this bug.
15:37:51 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1557909
15:37:52 <openstack> Launchpad bug 1557909 in neutron "SNAT namespace is not getting cleared after the manual move of SNAT with dead agent" [Medium,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:38:08 <Swami> #link https://review.openstack.org/#/c/302068/
15:38:14 <Swami> This patch needs review
15:38:49 <Swami> Right now this bug has been filed against liberty, but this will be also seen when we introduce the snat move again in master.
15:40:03 <haleyb> Quick question about https://bugs.launchpad.net/neutron/+bug/1562110
15:40:04 <openstack> Launchpad bug 1562110 in neutron "link-lock-address allocater for DVR has a limit of 256 address pairs per node" [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:40:31 <haleyb> carl_baldwin: did you want us to remove the constants.py changes from https://review.openstack.org/#/c/297839/ for this?
15:40:33 <Swami> haleyb: yes
15:40:50 * carl_baldwin looks...
15:41:22 <carl_baldwin> yes
15:41:29 <Swami_> sorry got disconnected
15:41:50 <haleyb> carl_baldwin: ok, i'll split it out as I added it in the first place
15:42:16 <Swami_> haleyb: what are we planning to split
15:42:51 <haleyb> Swami: removing the constants.py changes from https://review.openstack.org/#/c/297839/ into a follow-on.  i.e. fix the bug first, then make it pretty
15:43:12 <Swami_> haleyb: ok got it
15:43:24 <haleyb> i'll take care of it toda
15:43:48 <Swami_> haleyb: thanks
15:44:11 <Swami_> that's all I have for the bugs today.
15:45:27 <haleyb> Any other bugs anyone wants to bring up ?
15:46:26 <Swami_> nothing more.
15:47:09 <haleyb> #topic Gate Failures
15:47:39 <Swami_> The multinode failures have subsided.
15:48:37 <Swami_> But the delta for the single node check job is above normal.
15:49:11 <haleyb> yes, still higher than base job, i can't say i've seen a clear-cut pattern (yet)
15:49:17 <Swami_> I haven't had a chance to see the details, but this is based on the graph.
15:49:50 <Swami_> haleyb: you are right it always fluctuates.
15:51:23 <haleyb> i always say i'll spend a day looking, but never found that much time, so parly just relying on each of us to raise when a specific issue is noticed
15:51:52 <Swami_> haleyb: yes same here
15:52:07 <haleyb> i.e. if you see a failure in one of your patches, look in logs before running recheck
15:52:29 <obondarev> ++
15:52:38 <Swami_> haleyb:++
15:52:40 <obondarev> or in patches that you review
15:53:02 <haleyb> we could always blame it in tox_install.sh, or whatever that was last week
15:53:05 <haleyb> :)
15:53:29 <haleyb> moving on...
15:53:38 <haleyb> #topic Stable backports
15:54:37 <haleyb> imo, https://review.openstack.org/#/c/273235/ and https://review.openstack.org/#/c/273236/ need to merge into stable/libery, as they make some base changes that make other backports easier
15:55:16 <Swami_> haleyb: ok will take a look at those patches.
15:55:19 <haleyb> ihar +2'd them today, need to get someone else with stable +2 power to take a look
15:55:50 <Swami_> haleyb: good
15:56:53 <haleyb> #topic Open Discussion
15:57:08 <haleyb> any other issues, got 3 minutes
15:57:18 <Swami_> haleyb: the nova patch that we had for live migration is still idle.
15:57:57 <haleyb> https://review.openstack.org/#/c/275073/
15:58:31 <haleyb> Do we need to get the tempest change in?  maybe ping mattm on that patch
15:58:36 <Swami_> haleyb: yes, it has +1s from the neutron cores right now.
15:58:54 <Swami_> haleyb: ok will ping him again today
15:59:18 <haleyb> i think he was fine with the nova patch merging without the tempest one
15:59:32 <Swami_> haleyb: yes that's what I thought as well
15:59:58 <haleyb> if we can get one +2 the other should follow quickly
16:00:12 <Swami_> haleyb: yes
16:00:14 <haleyb> top of hour, thanks everyone
16:00:17 <haleyb> #endmeeting