15:00:57 #startmeeting neutron_dvr 15:00:58 Meeting started Wed Jul 27 15:00:57 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:59 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:02 The meeting name has been set to 'neutron_dvr' 15:01:08 #chair Swami 15:01:13 Current chairs: Swami haleyb 15:01:51 #topic Announcements 15:02:27 when is the neutron mid cycle. 15:02:43 it's august 15-18 or so 15:03:06 haleyb: thanks 15:03:07 I am not attending (family commitments) but carl is 15:03:33 haleyb: were is it going to be. 15:03:41 https://etherpad.openstack.org/p/newton-neutron-midcycle 15:03:49 it's in Cork, Ireland 15:04:08 haleyb: thanks 15:04:37 doesn't look like anyone from the DVR "team" is going to be there 15:05:13 haleyb: no I am not going to be there. 15:05:14 N-3 is near the end of August, so we have some time to land things 15:05:27 haleyb: will try to sync up online, but there may be time difference. 15:06:29 Swami: yes, especially for you. 15:06:33 We should probably target to push in the fast-exit changes by N3. 15:07:06 Also the DVR+SNAT+HA related bugs are piling up. 15:07:20 let's move on to bugs/rfes then 15:07:25 #topic Bugs 15:07:59 #link https://bugs.launchpad.net/neutron/+bug/1606741 15:07:59 Launchpad bug 1606741 in neutron "Metadata service for instances is unavailable when the l3-agent on the compute host is dvr_snat mode" [High,New] - Assigned to Zhixin Li (lizhixin) 15:08:24 It is mentioned that it is seen only with multiple dvr_snat node. 15:08:26 is that valid? compute shouldn't be dvr_snat 15:08:49 * haleyb should actually look at the bug 15:08:51 Yes, a compute should not be a dvr_snat, I have asked the same question in the bug comment. 15:09:15 Especially he is configuring every node to be a dvr_snat node in his testing. 15:09:29 But let us discuss further about this bug in the launchpad. 15:09:40 I know a single-node devstack can run it, i'll subscribe to bug 15:10:28 #link https://bugs.launchpad.net/neutron/+bug/1597461 15:10:28 Launchpad bug 1597461 in neutron "L3 HA: 2 masters after reboot of controller" [High,Confirmed] - Assigned to Ann Taraday (akamyshnikova) 15:11:11 I think you have already commented on this bug and it has been escalated to high at this point. I don't think it is just related to DVR, but related to L3HA and DVR. It is seen in both cases. 15:11:53 yes, we have seen that internally, and there is another similar bug 15:12:27 The next one is 15:12:33 #link https://bugs.launchpad.net/neutron/+bug/1602794 15:12:33 Launchpad bug 1602794 in neutron "ItemAllocator class can throw a ValueError when file is corrupted" [High,In progress] - Assigned to Brian Haley (brian-haley) 15:12:47 oh, that's me :) 15:12:59 haleyb: I think you also have a patch for it. 15:13:08 https://review.openstack.org/#/c/341794/ 15:13:33 I think the patch is in good shape. 15:13:36 let me ping carl to review, and i'll add oleg 15:13:55 haleyb: ok 15:14:00 #link https://bugs.launchpad.net/neutron/+bug/1602614 15:14:00 Launchpad bug 1602614 in neutron "DVR + L3 HA loss during failover is higher that it is expected" [Undecided,In progress] - Assigned to venkata anil (anil-venkata) 15:14:51 I have patch for it https://review.openstack.org/#/c/323314/ 15:15:06 This has not been triaged yet to see how big is the loss and is it only with DVR combination or in general. 15:15:47 haleyb: yes anilvenkata feels that if we fix the binding issue for the HA, these problems might go away. 15:16:10 Swami, yes we can reduce faiover time 15:16:30 so does this get back to the DB changes you're doing? 15:16:39 yes 15:16:53 anilvenkata: can we backport these changes cleanly. 15:16:54 explanation is given in the bug https://bugs.launchpad.net/neutron/+bug/1602614/comments/2 15:16:54 Launchpad bug 1602614 in neutron "DVR + L3 HA loss during failover is higher that it is expected" [Undecided,In progress] - Assigned to venkata anil (anil-venkata) 15:17:03 we can backport 15:17:20 i will take that backporting also 15:17:27 can we canport the DB contraction? 15:17:47 I think so, we can check with ihrachys 15:18:15 it will be renaming the table and changing the port field 15:18:27 i didn't think we could do that to stable, at least i've never seen it 15:18:52 anilvenkata: no backports for any alembic scripts 15:19:02 it's explicitly forbidden by stable policy 15:19:32 please everyone make yourself comfortable with http://docs.openstack.org/project-team-guide/stable-branches.html#review-guidelines 15:19:59 ihrachys: haleyb: thanks, that was my understanding too. 15:19:59 I will check and ping u with details about this change 15:22:15 anilvenkata: i'm not against the changes, but they will make backports harder going forward, but that's not a reason to not fix things 15:23:00 haleyb: +1 15:23:12 #link https://bugs.launchpad.net/neutron/+bug/1593354 15:23:12 Launchpad bug 1593354 in neutron "SNAT HA failed because of missing nat rule in snat namespace iptable" [Undecided,New] 15:24:10 This bug shows that the 'sg-' port has been removed from one of the namespace while the failover happens. 15:24:39 Need to triage this further to see what is deleting the 'sg-'interface from the namepsace on the given node during failover. 15:25:43 do you want to take it or is your plate full? 15:26:10 haleyb: my plate is full, I might check with adolfo to get some help here. 15:26:18 ok, great 15:27:05 The next one is interesting. 15:27:09 #link https://bugs.launchpad.net/neutron/+bug/1596473 15:27:09 Launchpad bug 1596473 in neutron "Packet loss with DVR and IPv6" [Undecided,New] 15:27:18 haleyb: you may be interested in this. 15:27:43 yes. looks like i asked for more info but didn't get it 15:28:53 changed status to imcomplete until i can verify it or submitter responds 15:29:27 haleyb: ok makes sense. 15:29:40 #link https://bugs.launchpad.net/neutron/+bug/1506567 15:29:41 Launchpad bug 1506567 in neutron "No information from Neutron Metering agent" [Undecided,New] 15:30:31 Swami: there might be another related one, let me look 15:30:35 haleyb: I think you have mentioned about this bug earlier. 15:31:01 haleyb: I thought that it is the same, but then it was not the one submitted by you. 15:31:39 it's the same. someone saw this internally and referenced that bug 15:32:37 haleyb: ok 15:32:44 That's all i had for the new bugs. 15:32:53 it's similar to the IPv6 issue we had wrt using the correct namespace 15:33:37 This patch is ready for review again, with lot of changes back and forth. Can you take a look at it. #link https://review.openstack.org/#/c/326729/ 15:34:05 yes, i will look today 15:35:03 Swami: what about the RFE "bugs" like fast-exit? 15:35:39 haleyb: yes I am working on the create fip interface on all nodes irrespective of the floatingips patch that was failing jenkins. 15:35:44 will push it in today. 15:36:05 #link https://review.openstack.org/#/c/283757/ 15:37:40 keep an eye on it, still I see one UT fail in my setup, will fix it and spin it up again. 15:37:43 and i see the tempest change for dvr live migration has been updated, https://review.openstack.org/#/c/286855/ 15:38:28 it would be great to get that and the nova change in 15:38:59 I think the nova patch has a merge conflict and I need to fix it as well. 15:39:53 haleyb: yes I saw it. 15:40:47 That's all I had for bugs today. 15:41:01 #topic Gate failures 15:41:48 I did see that the multinode failures have a spike 15:42:27 https://goo.gl/L1WODG shows a spike, confirmed by http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:43:24 looks like it started yesterday 15:43:37 haleyb: yes 15:44:18 i will have to look in the neutron channel to see if someone is already on it, since it's not just dvr 15:44:44 haleyb: yes it is both neutron full and dvr multinode job 15:46:49 Anything else on the gate failures. 15:46:58 i don't see anyone in the other channel talking about it, will look at recent changes, we can't be the first to notice 15:47:14 #topic Stable backports 15:48:32 This patch needs a worflow #link https://review.openstack.org/#/c/341779/ 15:48:39 i had triaged two weeks ago, and reviewed a number of changes the past few days. Will go through but we're keeping-up with getting things backported 15:49:17 haleyb: good 15:49:18 Swami: i'll look after meeting 15:49:24 haleyb: thanks 15:49:46 #topic Open Discussion 15:50:08 10 minutes left, anything else to discuss? 15:50:17 I don't have anything more to add. 15:50:33 But I will sync up with you to target and address the bugs and reduce the pile up. 15:51:45 Swami: sounds good 15:51:58 ok, if nothing else i'll call the meeting 15:52:06 #endmeeting