15:01:46 <haleyb> #startmeeting neutron_dvr
15:01:47 <openstack> Meeting started Wed Sep 14 15:01:46 2016 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:48 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:51 <openstack> The meeting name has been set to 'neutron_dvr'
15:01:54 <haleyb> #chair Swami
15:01:55 <openstack> Current chairs: Swami haleyb
15:02:24 <haleyb> #topic Announcements
15:03:03 <haleyb> RC1 is this week
15:03:16 <Swami> haleyb: great.
15:04:20 <haleyb> yes, we've come a long way i think, lots of changes, bug count decreasing
15:04:41 <haleyb> that said, we still have a few bugs to try and get fixed
15:05:06 <Swami> haleyb: agree
15:05:17 <haleyb> #topic Bugs
15:05:26 <Swami> haleyb: thanks
15:05:35 <Swami> There is no new bugs filed this week.
15:05:51 <Swami> But some of the bugs have been closed recently.
15:06:21 <Swami> Let us go over the list provided by John here.
15:06:24 <Swami> https://bugs.launchpad.net/neutron/+bug/1580648
15:06:27 <openstack> Launchpad bug 1580648 in neutron "Two HA routers in master state during functional test" [Undecided,Opinion]
15:07:01 <haleyb> There was a change merged that i thought fixed this, but apparently not
15:07:04 <Swami> There is a patch that has merged, but John mentioned that would fix the problem and I did see a note by Ann that he could still reproduce this issue.
15:07:24 <Swami> So this is still in plate and to be watched out for.
15:07:48 <akamyshnikova__> I've checked this bug today, it is still reproduced with latest code, although I consider this as keepalived limitation
15:08:04 <akamyshnikova__> I put a comment on the bug report
15:08:25 <Swami> akamyshnikova__: so do you want to close the bug or do some more triaging.
15:09:16 <Swami> akamyshnikova__: let me know your thoughts on this, while we move forward.
15:09:19 <haleyb> akamyshnikova__: is it something fixed in a later keepalived version?
15:10:44 <akamyshnikova__> Swami, I can look through keepalived last changes if it is fixed, but I'm not sure.
15:11:06 <Swami> akamyshnikova__: ok thanks, let us wait then.
15:11:11 <Swami> The next in the list is
15:11:13 <Swami> https://bugs.launchpad.net/neutron/+bug/1602614
15:11:14 <openstack> Launchpad bug 1602614 in neutron "DVR + L3 HA loss during failover is higher that it is expected" [High,Fix released] - Assigned to Carl Baldwin (carl-baldwin)
15:11:32 <Swami> This bug has been fixed with the l2pop patch that merged recently.
15:11:43 <Swami> Thanks to anilvenkata for his efforts.
15:12:36 <Swami> The next one in the list is
15:12:39 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1602320
15:12:40 <openstack> Launchpad bug 1602320 in neutron "ha + distributed router: keepalived process kill vrrp child process" [Medium,Fix released] - Assigned to He Qing (tsinghe-7)
15:13:01 <Swami> This patch had also merged recently and so this bug can be marked as fixed.
15:13:25 <Swami> These are all the updates from the L3+HA+DVR team.
15:14:25 <Swami> Now let us come back to the existing bugs and let us talk about the critical ones.
15:14:41 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1612192
15:14:41 <openstack> Launchpad bug 1612192 in neutron "L3 DVR: Unable to complete operation on subnet" [Critical,Confirmed]
15:15:15 <Swami> haleyb: last week you mentioned if we can drop the severity on this bug. Do you have any input on this.
15:15:57 <haleyb> Swami: i am going to lower, that string has not been seen by logstash since 9/8
15:16:23 <Swami> haleyb: ok, that would be great.
15:16:26 <Swami> haleyb: thanks
15:16:49 <Swami> Does the same thing apply for this bug too.
15:16:52 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1612804
15:16:53 <openstack> Launchpad bug 1612804 in neutron "test_shelve_instance fails with sshtimeout" [Critical,Confirmed]
15:17:53 <haleyb> looking
15:18:31 <haleyb> still see some
15:20:00 <Swami> ok, then we should still get to see what is causing these tests to fail. Are these test failures only seen in the multinode job or also on the single node check jobs.
15:20:30 <haleyb> gate-tempest-dsvm-multinode-full-ubuntu-xenial and some others
15:20:45 <Swami> haleyb: thanks
15:20:59 <Swami> haleyb: so let us monitor this bug and find out the root cause.
15:21:04 <haleyb> some have nothing to do with neutron, but if we're not working it could be seen
15:21:38 <haleyb> i will see if it's in something other than the check queue
15:21:50 <Swami> haleyb: ok,
15:22:06 <Swami> haleyb: I knew that these are some of the vulnerable tests that have failed in the past.
15:22:25 <Swami> haleyb: But so far no clues on why it is failing.
15:22:53 <haleyb> eliminating check queue dropped to 9 in past two days
15:23:26 <Swami> haleyb: good that it dropped.
15:23:27 <haleyb> all others in experimental queue
15:24:26 <Swami> haleyb: ok
15:24:45 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1593354
15:24:46 <openstack> Launchpad bug 1593354 in neutron "SNAT HA failed because of missing nat rule in snat namespace iptable" [Undecided,New]
15:25:16 <Swami> John has already triaged this bug and have mentioned that it is not reproduceable in Master but the bug was reported in mitaka.
15:25:42 <Swami> I don't think john had triaged this in mitaka yet. So let us wait for his input on this.
15:26:31 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1571676
15:26:32 <openstack> Launchpad bug 1571676 in neutron "After binding a floating IP to VM, the static route can't work in DVR." [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:26:43 <Swami> I just reopened this bug since it was outdated.
15:26:48 <Swami> I do have a patch for review.
15:27:14 <Swami> #link https://review.openstack.org/#/c/308068/
15:28:23 <Swami> haleyb: can you take a look at it.
15:28:55 <haleyb> Swami: yes, will take a look, i remember we couldn't merge before because >1 tenant shares the namespace
15:29:27 <Swami> haleyb: yes you are right. Now each router has its own routing table and the extra routes will be addressed in that table.
15:30:09 <Swami> I initially wanted to have it on top of the fast-path exit, then I decided to move to on top so that it can be backported.
15:30:37 <Swami> the next one in the list is.
15:30:40 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1619312
15:30:42 <openstack> Launchpad bug 1619312 in neutron "dvr: can't migrate legacy router to DVR" [High,Confirmed] - Assigned to Brian Haley (brian-haley)
15:31:20 <haleyb> i got some good info from armando and kevin on a direction, will work on patch today
15:31:35 <haleyb> has to do with transaction guard code merged a little while back
15:31:58 <Swami> haleyb: yes I did see your discussion with kevin on this guarded transact
15:32:08 <Swami> s/transact/transaction
15:32:49 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1577488
15:32:50 <openstack> Launchpad bug 1577488 in neutron "[RFE]"Fast exit" for compute node egress flows when using DVR" [Wishlist,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:33:18 <Swami> haleyb: the patches for this RFE is ready for review.
15:33:44 <Swami> #link https://review.openstack.org/#/c/283757/ ( Server side patch)
15:34:04 <Swami> #link https://review.openstack.org/#/c/355062/ - Agent side patch.
15:34:12 <haleyb> Swami: yes, will look
15:34:26 <Swami> haleyb: thanks
15:34:53 <Swami> That's all I had for bugs today.
15:35:42 <haleyb> Any other bugs from anyone?
15:36:51 <haleyb> #topic Gate failures
15:38:37 <haleyb> so there were some fixes recently in a couple of areas that seemed to have helped the dvr jobs
15:38:46 <haleyb> multiple in the dhcp-agent
15:39:01 <Swami> haleyb: has all the fixes in the dhcp merged.
15:39:35 <haleyb> but a really ugly one in the infra code that could have caused two tunnels with the same VNI to be configured - one overcloud, one under
15:39:52 <Swami> haleyb: that is not good.
15:40:06 <haleyb> Swami: a lot of minor dhcp changes have merged, so exceptions are down
15:41:12 <haleyb> Swami: yes, the VNI one was interesting and would have randomly affected multinode jobs where we saw a dhcp failure, packet was sometimes going the wrong way from what i saw in the review
15:41:31 <Swami> haleyb: thanks
15:42:02 <Swami> haleyb: how far are we from making the multinode job voting
15:42:40 <haleyb> http://grafana.openstack.org/dashboard/db/neutron-failure-rate shows the latest, but i'm still confused as to why there are multiple jobs with the same name, think its the xenial change-over
15:43:48 <haleyb> Swami: the dvr-multinode failure rate is ~5% in the gate queue now, that might still be high to change it to voting
15:44:25 <Swami> haleyb: yes
15:45:09 <Swami> haleyb: may be if we need to prioritize the bugs we need to prioritize on the basis of gate failures and so we can make it voting soon.
15:46:25 <haleyb> Swami: yes, we just need another look at the failures to see if it's really a dvr issue
15:47:26 <Swami> haleyb: make sense
15:48:10 <haleyb> Swami: "we" means you or me typically too :)
15:48:29 <Swami> haleyb: yes understood.
15:49:01 <Swami> haleyb: since I am done with the fast-path-exit I might have some bandwidth to help you. Please let me know.
15:50:08 <haleyb> Swami: if you want to go through logstash looking for a gate failure on that job it would help
15:50:41 <Swami> haleyb: sure will do
15:51:39 <haleyb> #topic Stable backports
15:52:14 <Swami> https://review.openstack.org/#/c/363970/
15:52:23 <haleyb> More of a note - keep tagging things as backport potential, or just cherry-pick
15:52:31 <Swami> haleyb: need a +2 on this and related one for liberty.
15:52:44 <Swami> haleyb: sure will do
15:52:56 <haleyb> Swami: will look, hadn't seen that one
15:53:08 <Swami> haleyb: thanks
15:53:43 <haleyb> #topic Open discussion
15:53:59 <Swami> haleyb: not related to neutron backport, but the nova migration patch that just got merged is a candidate for mitaka backport and I have the patch up for review.
15:54:15 <Swami> https://review.openstack.org/#/c/367646/
15:54:52 <haleyb> Swami: had we already packported the neutron change?
15:54:57 <Swami> There was a comment in there about the requirement to backport. If possible can you add in your comment.
15:55:23 <Swami> haleyb: I though neutron change merged in mitaka.
15:55:29 <Swami> s/though/thought
15:55:40 <haleyb> it was so long ago i can't remember...
15:55:47 <haleyb> ;-)
15:56:08 <Swami> Yes it was merged at the end of mitaka cycle, but we could'nt merge the nova patch and it just got merged.
15:56:43 <haleyb> great
15:57:46 <haleyb> That's all i had, if there's anything you need to target at RC1 just ping a core to help out
15:58:51 <haleyb> #endmeeting