#openstack-meeting-alt log

15:00:57 <haleyb> #startmeeting neutron_dvr
15:00:58 <openstack> Meeting started Wed Jul 27 15:00:57 2016 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:59 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:02 <openstack> The meeting name has been set to 'neutron_dvr'
15:01:08 <haleyb> #chair Swami
15:01:13 <openstack> Current chairs: Swami haleyb
15:01:51 <haleyb> #topic Announcements
15:02:27 <Swami> when is the neutron mid cycle.
15:02:43 <haleyb> it's august 15-18 or so
15:03:06 <Swami> haleyb: thanks
15:03:07 <haleyb> I am not attending (family commitments) but carl is
15:03:33 <Swami> haleyb: were is it going to be.
15:03:41 <haleyb> https://etherpad.openstack.org/p/newton-neutron-midcycle
15:03:49 <haleyb> it's in Cork, Ireland
15:04:08 <Swami> haleyb: thanks
15:04:37 <haleyb> doesn't look like anyone from the DVR "team" is going to be there
15:05:13 <Swami> haleyb: no I am not going to be there.
15:05:14 <haleyb> N-3 is near the end of August, so we have some time to land things
15:05:27 <Swami> haleyb: will try to sync up online, but there may be time difference.
15:06:29 <haleyb> Swami: yes, especially for you.
15:06:33 <Swami> We should probably target to push in the fast-exit changes by N3.
15:07:06 <Swami> Also the DVR+SNAT+HA related bugs are piling up.
15:07:20 <haleyb> let's move on to bugs/rfes then
15:07:25 <haleyb> #topic Bugs
15:07:59 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1606741
15:07:59 <openstack> Launchpad bug 1606741 in neutron "Metadata service for instances is unavailable when the l3-agent on the compute host is dvr_snat mode" [High,New] - Assigned to Zhixin Li (lizhixin)
15:08:24 <Swami> It is mentioned that it is seen only with multiple dvr_snat node.
15:08:26 <haleyb> is that valid?  compute shouldn't be dvr_snat
15:08:49 * haleyb should actually look at the bug
15:08:51 <Swami> Yes, a compute should not be a dvr_snat, I have asked the same question in the bug comment.
15:09:15 <Swami> Especially he is configuring every node to be a dvr_snat node in his testing.
15:09:29 <Swami> But let us discuss further about this bug in the launchpad.
15:09:40 <haleyb> I know a single-node devstack can run it, i'll subscribe to bug
15:10:28 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1597461
15:10:28 <openstack> Launchpad bug 1597461 in neutron "L3 HA: 2 masters after reboot of controller" [High,Confirmed] - Assigned to Ann Taraday (akamyshnikova)
15:11:11 <Swami> I think you have already commented on this bug and it has been escalated to high at this point. I don't think it is just related to DVR, but related to L3HA and DVR. It is seen in both cases.
15:11:53 <haleyb> yes, we have seen that internally, and there is another similar bug
15:12:27 <Swami> The next one is
15:12:33 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1602794
15:12:33 <openstack> Launchpad bug 1602794 in neutron "ItemAllocator class can throw a ValueError when file is corrupted" [High,In progress] - Assigned to Brian Haley (brian-haley)
15:12:47 <haleyb> oh, that's me :)
15:12:59 <Swami> haleyb: I think you also have a patch for it.
15:13:08 <haleyb> https://review.openstack.org/#/c/341794/
15:13:33 <Swami> I think the patch is in good shape.
15:13:36 <haleyb> let me ping carl to review, and i'll add oleg
15:13:55 <Swami> haleyb: ok
15:14:00 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1602614
15:14:00 <openstack> Launchpad bug 1602614 in neutron "DVR + L3 HA loss during failover is higher that it is expected" [Undecided,In progress] - Assigned to venkata anil (anil-venkata)
15:14:51 <anilvenkata> I have patch for it https://review.openstack.org/#/c/323314/
15:15:06 <Swami> This has not been triaged yet to see how big is the loss and is it only with DVR combination or in general.
15:15:47 <Swami> haleyb: yes anilvenkata feels that if we fix the binding issue for the HA, these problems might go away.
15:16:10 <anilvenkata> Swami, yes we can reduce faiover time
15:16:30 <haleyb> so does this get back to the DB changes you're doing?
15:16:39 <anilvenkata> yes
15:16:53 <Swami> anilvenkata: can we backport these changes cleanly.
15:16:54 <anilvenkata> explanation is given in the bug https://bugs.launchpad.net/neutron/+bug/1602614/comments/2
15:16:54 <openstack> Launchpad bug 1602614 in neutron "DVR + L3 HA loss during failover is higher that it is expected" [Undecided,In progress] - Assigned to venkata anil (anil-venkata)
15:17:03 <anilvenkata> we can backport
15:17:20 <anilvenkata> i will take that backporting  also
15:17:27 <haleyb> can we canport the DB contraction?
15:17:47 <anilvenkata> I think so, we can check with ihrachys
15:18:15 <anilvenkata> it will be renaming the table and changing the port field
15:18:27 <haleyb> i didn't think we could do that to stable, at least i've never seen it
15:18:52 <ihrachys> anilvenkata: no backports for any alembic scripts
15:19:02 <ihrachys> it's explicitly forbidden by stable policy
15:19:32 <ihrachys> please everyone make yourself comfortable with http://docs.openstack.org/project-team-guide/stable-branches.html#review-guidelines
15:19:59 <Swami> ihrachys: haleyb: thanks, that was my understanding too.
15:19:59 <anilvenkata> I will check and ping u with details about this change
15:22:15 <haleyb> anilvenkata: i'm not against the changes, but they will make backports harder going forward, but that's not a reason to not fix things
15:23:00 <Swami> haleyb: +1
15:23:12 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1593354
15:23:12 <openstack> Launchpad bug 1593354 in neutron "SNAT HA failed because of missing nat rule in snat namespace iptable" [Undecided,New]
15:24:10 <Swami> This bug shows that the 'sg-' port has been removed from one of the namespace while the failover happens.
15:24:39 <Swami> Need to triage this further to see what is deleting the 'sg-'interface from the namepsace on the given node during failover.
15:25:43 <haleyb> do you want to take it or is your plate full?
15:26:10 <Swami> haleyb: my plate is full, I might check with adolfo to get some help here.
15:26:18 <haleyb> ok, great
15:27:05 <Swami> The next one is interesting.
15:27:09 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1596473
15:27:09 <openstack> Launchpad bug 1596473 in neutron "Packet loss with DVR and IPv6" [Undecided,New]
15:27:18 <Swami> haleyb: you may be interested in this.
15:27:43 <haleyb> yes. looks like i asked for more info but didn't get it
15:28:53 <haleyb> changed status to imcomplete until i can verify it or submitter responds
15:29:27 <Swami> haleyb: ok makes sense.
15:29:40 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1506567
15:29:41 <openstack> Launchpad bug 1506567 in neutron "No information from Neutron Metering agent" [Undecided,New]
15:30:31 <haleyb> Swami: there might be another related one, let me look
15:30:35 <Swami> haleyb: I think you have mentioned about this bug earlier.
15:31:01 <Swami> haleyb: I thought that it is the same, but then it was not the one submitted by you.
15:31:39 <haleyb> it's the same.  someone saw this internally and referenced that bug
15:32:37 <Swami> haleyb: ok
15:32:44 <Swami> That's all i had for the new bugs.
15:32:53 <haleyb> it's similar to the IPv6 issue we had wrt using the correct namespace
15:33:37 <Swami> This patch is ready for review again, with lot of changes back and forth. Can you take a look at it. #link https://review.openstack.org/#/c/326729/
15:34:05 <haleyb> yes, i will look today
15:35:03 <haleyb> Swami: what about the RFE "bugs" like fast-exit?
15:35:39 <Swami> haleyb: yes I am working on the create fip interface on all nodes irrespective of the floatingips patch that was failing jenkins.
15:35:44 <Swami> will push it in today.
15:36:05 <Swami> #link https://review.openstack.org/#/c/283757/
15:37:40 <Swami> keep an eye on it, still I see one UT fail in my setup, will fix it and spin it up again.
15:37:43 <haleyb> and i see the tempest change for dvr live migration has been updated, https://review.openstack.org/#/c/286855/
15:38:28 <haleyb> it would be great to get that and the nova change in
15:38:59 <Swami> I think the nova patch has a merge conflict and I need to fix it as well.
15:39:53 <Swami> haleyb: yes I saw it.
15:40:47 <Swami> That's all I had for bugs today.
15:41:01 <haleyb> #topic Gate failures
15:41:48 <Swami> I did see that the multinode failures have a spike
15:42:27 <haleyb> https://goo.gl/L1WODG shows a spike, confirmed by http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:43:24 <haleyb> looks like it started yesterday
15:43:37 <Swami> haleyb: yes
15:44:18 <haleyb> i will have to look in the neutron channel to see if someone is already on it, since it's not just dvr
15:44:44 <Swami> haleyb: yes it is both neutron full and dvr multinode job
15:46:49 <Swami> Anything else on the gate failures.
15:46:58 <haleyb> i don't see anyone in the other channel talking about it, will look at recent changes, we can't be the first to notice
15:47:14 <haleyb> #topic Stable backports
15:48:32 <Swami> This patch needs a worflow #link https://review.openstack.org/#/c/341779/
15:48:39 <haleyb> i had triaged two weeks ago, and reviewed a number of changes the past few days.  Will go through but we're keeping-up with getting things backported
15:49:17 <Swami> haleyb: good
15:49:18 <haleyb> Swami: i'll look after meeting
15:49:24 <Swami> haleyb: thanks
15:49:46 <haleyb> #topic Open Discussion
15:50:08 <haleyb> 10 minutes left, anything else to discuss?
15:50:17 <Swami> I don't have anything more to add.
15:50:33 <Swami> But I will sync up with you to target and address the bugs and reduce the pile up.
15:51:45 <haleyb> Swami: sounds good
15:51:58 <haleyb> ok, if nothing else i'll call the meeting
15:52:06 <haleyb> #endmeeting