15:00:57 <haleyb> #startmeeting neutron_dvr 15:00:58 <openstack> Meeting started Wed Jul 27 15:00:57 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:59 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:02 <openstack> The meeting name has been set to 'neutron_dvr' 15:01:08 <haleyb> #chair Swami 15:01:13 <openstack> Current chairs: Swami haleyb 15:01:51 <haleyb> #topic Announcements 15:02:27 <Swami> when is the neutron mid cycle. 15:02:43 <haleyb> it's august 15-18 or so 15:03:06 <Swami> haleyb: thanks 15:03:07 <haleyb> I am not attending (family commitments) but carl is 15:03:33 <Swami> haleyb: were is it going to be. 15:03:41 <haleyb> https://etherpad.openstack.org/p/newton-neutron-midcycle 15:03:49 <haleyb> it's in Cork, Ireland 15:04:08 <Swami> haleyb: thanks 15:04:37 <haleyb> doesn't look like anyone from the DVR "team" is going to be there 15:05:13 <Swami> haleyb: no I am not going to be there. 15:05:14 <haleyb> N-3 is near the end of August, so we have some time to land things 15:05:27 <Swami> haleyb: will try to sync up online, but there may be time difference. 15:06:29 <haleyb> Swami: yes, especially for you. 15:06:33 <Swami> We should probably target to push in the fast-exit changes by N3. 15:07:06 <Swami> Also the DVR+SNAT+HA related bugs are piling up. 15:07:20 <haleyb> let's move on to bugs/rfes then 15:07:25 <haleyb> #topic Bugs 15:07:59 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1606741 15:07:59 <openstack> Launchpad bug 1606741 in neutron "Metadata service for instances is unavailable when the l3-agent on the compute host is dvr_snat mode" [High,New] - Assigned to Zhixin Li (lizhixin) 15:08:24 <Swami> It is mentioned that it is seen only with multiple dvr_snat node. 15:08:26 <haleyb> is that valid? compute shouldn't be dvr_snat 15:08:49 * haleyb should actually look at the bug 15:08:51 <Swami> Yes, a compute should not be a dvr_snat, I have asked the same question in the bug comment. 15:09:15 <Swami> Especially he is configuring every node to be a dvr_snat node in his testing. 15:09:29 <Swami> But let us discuss further about this bug in the launchpad. 15:09:40 <haleyb> I know a single-node devstack can run it, i'll subscribe to bug 15:10:28 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1597461 15:10:28 <openstack> Launchpad bug 1597461 in neutron "L3 HA: 2 masters after reboot of controller" [High,Confirmed] - Assigned to Ann Taraday (akamyshnikova) 15:11:11 <Swami> I think you have already commented on this bug and it has been escalated to high at this point. I don't think it is just related to DVR, but related to L3HA and DVR. It is seen in both cases. 15:11:53 <haleyb> yes, we have seen that internally, and there is another similar bug 15:12:27 <Swami> The next one is 15:12:33 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1602794 15:12:33 <openstack> Launchpad bug 1602794 in neutron "ItemAllocator class can throw a ValueError when file is corrupted" [High,In progress] - Assigned to Brian Haley (brian-haley) 15:12:47 <haleyb> oh, that's me :) 15:12:59 <Swami> haleyb: I think you also have a patch for it. 15:13:08 <haleyb> https://review.openstack.org/#/c/341794/ 15:13:33 <Swami> I think the patch is in good shape. 15:13:36 <haleyb> let me ping carl to review, and i'll add oleg 15:13:55 <Swami> haleyb: ok 15:14:00 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1602614 15:14:00 <openstack> Launchpad bug 1602614 in neutron "DVR + L3 HA loss during failover is higher that it is expected" [Undecided,In progress] - Assigned to venkata anil (anil-venkata) 15:14:51 <anilvenkata> I have patch for it https://review.openstack.org/#/c/323314/ 15:15:06 <Swami> This has not been triaged yet to see how big is the loss and is it only with DVR combination or in general. 15:15:47 <Swami> haleyb: yes anilvenkata feels that if we fix the binding issue for the HA, these problems might go away. 15:16:10 <anilvenkata> Swami, yes we can reduce faiover time 15:16:30 <haleyb> so does this get back to the DB changes you're doing? 15:16:39 <anilvenkata> yes 15:16:53 <Swami> anilvenkata: can we backport these changes cleanly. 15:16:54 <anilvenkata> explanation is given in the bug https://bugs.launchpad.net/neutron/+bug/1602614/comments/2 15:16:54 <openstack> Launchpad bug 1602614 in neutron "DVR + L3 HA loss during failover is higher that it is expected" [Undecided,In progress] - Assigned to venkata anil (anil-venkata) 15:17:03 <anilvenkata> we can backport 15:17:20 <anilvenkata> i will take that backporting also 15:17:27 <haleyb> can we canport the DB contraction? 15:17:47 <anilvenkata> I think so, we can check with ihrachys 15:18:15 <anilvenkata> it will be renaming the table and changing the port field 15:18:27 <haleyb> i didn't think we could do that to stable, at least i've never seen it 15:18:52 <ihrachys> anilvenkata: no backports for any alembic scripts 15:19:02 <ihrachys> it's explicitly forbidden by stable policy 15:19:32 <ihrachys> please everyone make yourself comfortable with http://docs.openstack.org/project-team-guide/stable-branches.html#review-guidelines 15:19:59 <Swami> ihrachys: haleyb: thanks, that was my understanding too. 15:19:59 <anilvenkata> I will check and ping u with details about this change 15:22:15 <haleyb> anilvenkata: i'm not against the changes, but they will make backports harder going forward, but that's not a reason to not fix things 15:23:00 <Swami> haleyb: +1 15:23:12 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1593354 15:23:12 <openstack> Launchpad bug 1593354 in neutron "SNAT HA failed because of missing nat rule in snat namespace iptable" [Undecided,New] 15:24:10 <Swami> This bug shows that the 'sg-' port has been removed from one of the namespace while the failover happens. 15:24:39 <Swami> Need to triage this further to see what is deleting the 'sg-'interface from the namepsace on the given node during failover. 15:25:43 <haleyb> do you want to take it or is your plate full? 15:26:10 <Swami> haleyb: my plate is full, I might check with adolfo to get some help here. 15:26:18 <haleyb> ok, great 15:27:05 <Swami> The next one is interesting. 15:27:09 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1596473 15:27:09 <openstack> Launchpad bug 1596473 in neutron "Packet loss with DVR and IPv6" [Undecided,New] 15:27:18 <Swami> haleyb: you may be interested in this. 15:27:43 <haleyb> yes. looks like i asked for more info but didn't get it 15:28:53 <haleyb> changed status to imcomplete until i can verify it or submitter responds 15:29:27 <Swami> haleyb: ok makes sense. 15:29:40 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1506567 15:29:41 <openstack> Launchpad bug 1506567 in neutron "No information from Neutron Metering agent" [Undecided,New] 15:30:31 <haleyb> Swami: there might be another related one, let me look 15:30:35 <Swami> haleyb: I think you have mentioned about this bug earlier. 15:31:01 <Swami> haleyb: I thought that it is the same, but then it was not the one submitted by you. 15:31:39 <haleyb> it's the same. someone saw this internally and referenced that bug 15:32:37 <Swami> haleyb: ok 15:32:44 <Swami> That's all i had for the new bugs. 15:32:53 <haleyb> it's similar to the IPv6 issue we had wrt using the correct namespace 15:33:37 <Swami> This patch is ready for review again, with lot of changes back and forth. Can you take a look at it. #link https://review.openstack.org/#/c/326729/ 15:34:05 <haleyb> yes, i will look today 15:35:03 <haleyb> Swami: what about the RFE "bugs" like fast-exit? 15:35:39 <Swami> haleyb: yes I am working on the create fip interface on all nodes irrespective of the floatingips patch that was failing jenkins. 15:35:44 <Swami> will push it in today. 15:36:05 <Swami> #link https://review.openstack.org/#/c/283757/ 15:37:40 <Swami> keep an eye on it, still I see one UT fail in my setup, will fix it and spin it up again. 15:37:43 <haleyb> and i see the tempest change for dvr live migration has been updated, https://review.openstack.org/#/c/286855/ 15:38:28 <haleyb> it would be great to get that and the nova change in 15:38:59 <Swami> I think the nova patch has a merge conflict and I need to fix it as well. 15:39:53 <Swami> haleyb: yes I saw it. 15:40:47 <Swami> That's all I had for bugs today. 15:41:01 <haleyb> #topic Gate failures 15:41:48 <Swami> I did see that the multinode failures have a spike 15:42:27 <haleyb> https://goo.gl/L1WODG shows a spike, confirmed by http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:43:24 <haleyb> looks like it started yesterday 15:43:37 <Swami> haleyb: yes 15:44:18 <haleyb> i will have to look in the neutron channel to see if someone is already on it, since it's not just dvr 15:44:44 <Swami> haleyb: yes it is both neutron full and dvr multinode job 15:46:49 <Swami> Anything else on the gate failures. 15:46:58 <haleyb> i don't see anyone in the other channel talking about it, will look at recent changes, we can't be the first to notice 15:47:14 <haleyb> #topic Stable backports 15:48:32 <Swami> This patch needs a worflow #link https://review.openstack.org/#/c/341779/ 15:48:39 <haleyb> i had triaged two weeks ago, and reviewed a number of changes the past few days. Will go through but we're keeping-up with getting things backported 15:49:17 <Swami> haleyb: good 15:49:18 <haleyb> Swami: i'll look after meeting 15:49:24 <Swami> haleyb: thanks 15:49:46 <haleyb> #topic Open Discussion 15:50:08 <haleyb> 10 minutes left, anything else to discuss? 15:50:17 <Swami> I don't have anything more to add. 15:50:33 <Swami> But I will sync up with you to target and address the bugs and reduce the pile up. 15:51:45 <haleyb> Swami: sounds good 15:51:58 <haleyb> ok, if nothing else i'll call the meeting 15:52:06 <haleyb> #endmeeting