15:00:28 #startmeeting neutron_dvr 15:00:29 Meeting started Wed Jan 20 15:00:28 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:30 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:32 The meeting name has been set to 'neutron_dvr' 15:00:40 #chair Swami 15:00:41 Current chairs: Swami haleyb 15:01:40 let's get started 15:01:43 #topic Announcements 15:02:15 just a reminder M-2 this week 15:02:50 o/ 15:02:59 And thanks everyone for keeping-up with the code reviews, think we merged a lot of changes in the past week 15:03:13 ++ 15:03:24 good job 15:04:04 and Swami went through the bug list marking things with nice BOLD letters, i'll pass it to him 15:04:08 #topic Bugs 15:04:17 hi 15:04:28 This week we had two new bugs that was filed. 15:04:49 #link https://bugs.launchpad.net/neutron/+bug/1535928 15:04:51 Launchpad bug 1535928 in neutron "Duplicate IPtables rule detected warning message seen in L3 agent" [Undecided,New] 15:05:37 I have been seen this warning message in the l3 agent log, and so went a filed a bug. I did see that there was a patch initiated by Kevin to address this, but it is still under review. 15:06:05 #link https://review.openstack.org/#/c/255484/1 15:06:24 That was just downgrading the LOG message, not fixing the problem 15:06:31 At this point I think this issue is seen both in dvr and non-dvr case. 15:07:17 haleyb: Yes I just captured the related patch. But I think we need to fix this problem. 15:07:39 seems not a dvr issue 15:08:07 haleyb: obondarev: yes it is not a dvr issue, it seems to be an issue with the iptable utils. 15:08:31 obondarev: what would be the best "tag" for this bug. 15:08:32 at least it's always the same rule, so that will help track it down 15:09:23 anything to add to this bug or can we move on. 15:09:54 actually, the only place that rule is added is the DVR code 15:10:23 ah, then it might be dvr issue 15:10:41 haleyb: ok, then it makes sense to be tagged as dvr bug. 15:11:07 ok, let us not change the tag. 15:11:15 could be, we should ping kevin 15:11:37 kevin mentioned in his commit message that this warning message is seen in the gate. 15:11:47 but we can clarify with him later. 15:12:11 ok, let us move on. 15:12:20 The next one that came in yesterday is 15:12:23 #link https://bugs.launchpad.net/neutron/+bug/1536110 15:12:25 Launchpad bug 1536110 in neutron "OVS agent should fail if can't get DVR mac address" [Undecided,In progress] - Assigned to Oleg Bondarev (obondarev) 15:12:41 obondarev has already a patch for this bug. 15:12:54 I saw haleyb you have already provided your review comments. 15:13:08 nitpicking, yes 15:13:12 #link https://review.openstack.org/270130 15:13:23 yeah. I was wondering if there is any reason to continue running in non-dvr mode 15:13:46 if anyone is aware of such a reason please speak 15:14:13 otherwise it's better to fail 15:14:14 obondarev: I will check with vivek on this, I was not sure about the reason behind this fallback option. 15:14:47 Swami: ok thanks 15:15:28 ok, let us move on to the next bug. 15:15:43 The next bug which is targetted for mitaka-2 is HA-DVR 15:16:16 #link https://review.openstack.org/#/c/143169/ 15:16:38 This patch requires some core attention, it has been rebased and in good shape. 15:16:57 Swami: I will try to look today. 15:17:07 carl_baldwin: thanks 15:17:28 I noticed in the comments from yesterday, Armando mentions the failure rate of the DVR job. It has been climbing a little and getting noticed at the higher levels. 15:18:05 We should be sure to hit that topic in this meeting. 15:18:33 carl_baldwin: yes I captured in the gate_failures section. 15:18:47 carl_baldwin: while we move on to that section we can talk about it. 15:19:07 sounds good 15:19:34 The next in the list which has the mitaka tag is 15:19:37 #link https://bugs.launchpad.net/neutron/+bug/1504039 15:19:39 Launchpad bug 1504039 in neutron "Linuxbridge DVR" [Wishlist,In progress] - Assigned to Hirofumi Ichihara (ichihara-hirofumi) 15:19:58 I did see that there were a couple of related patches that went in 15:20:30 #link https://review.openstack.org/#/c/266210/ 15:21:06 This is a related patch and I think right now it is WIP. 15:21:49 Now let us move on to the other bugs. 15:22:08 #link https://bugs.launchpad.net/neutron/+bug/1522824 15:22:09 Launchpad bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] - Assigned to Oleg Bondarev (obondarev) 15:22:42 obondarev: did we get any closure on this patch #link https://review.openstack.org/#/c/215467 15:23:05 obondarev: there were two related patches related to this fix, did you get to review the other one. 15:23:22 I abandoned my fix in favor of https://review.openstack.org/#/c/215467 which was submitted earlier 15:23:32 Swami: there was only one patch 15:23:49 from me I mean 15:24:05 https://review.openstack.org/#/c/215467 is the one that I wasn't aware of 15:24:46 and it should fix several bugs at once 15:24:56 obondarev: that's what I meant. 15:25:21 so.. please reivew 15:25:40 obondarev: will do. 15:25:46 cool 15:25:56 The next one in the list is 15:26:14 #link https://bugs.launchpad.net/neutron/+bug/1450604 15:26:16 Launchpad bug 1450604 in neutron "Fix DVR multinode upstream CI testing" [Medium,In progress] - Assigned to Ryan Moats (rmoats) 15:26:56 Recently the failure rate for the multinode upstream job has shooted up and also there was comment from Armando about increasing failure rate in the single node check job. 15:27:19 I saw several failures due to bug 1522824 15:27:20 bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] https://launchpad.net/bugs/1522824 - Assigned to Oleg Bondarev (obondarev) 15:27:23 Have anyone noticed any specific test failures in the last two days on the single node check job for dvr. 15:28:03 No, but it is climbing along with the regular neutron job 15:28:10 obondarev: is that also affecting the single node check job failure. 15:28:28 haleyb: you mean the multinode job or the single node job. 15:28:30 Swami: no, it's only on multinode 15:29:14 multi-node job also had some infra related issues where I was seeing some "SSHFailures" and "SCP" failures last two days. 15:29:33 another issue wich affects both multinode jobs is LiveBlockMigration test failure 15:29:42 also not dvr specific I guess 15:30:03 obondarev: yes you are right. 15:30:21 well, all have gone up since 1/18. I know there's been some patches on getting MTU's sorted out, but don't know how many have merged yet 15:30:55 so might be no fair to blame dvr only 15:31:01 So based on the discussion, do you all think that the single node check job is still under control. 15:31:02 not* 15:31:48 Swami: single-node DVR? probably still higher than expected over regular job 15:32:13 haleyb: the pattern seems to be similar to me. I don't see any new failures. 15:32:34 haleyb: please let me know if you have seen any new failures in single node or are we missing something here. 15:32:57 https://goo.gl/L1WODG 15:33:21 that shows single-node neutron job at over 25%, dvr 35% maybe 15:33:53 haleyb: the delta seem to have a bumped up a little. 15:34:39 Will this be because of the overall gate related failures that is seen last two days with certain tests talking longer time to complete. 15:36:09 carl_baldwin: do you think we need to investigated the delta here before merging the HA patch? it's hard to not say the issue yesterday in the gate didn't help 15:36:42 didn't help level-out the failure rate i mean 15:36:45 haleyb: I think we do need to continue investigating. 15:37:21 haleyb: I had a hard time parsing your second sentence. What issue yesterday in the gate? 15:38:18 carl_baldwin: there was a pip issue (?) I think, gate was at 15h or so due to failures 15:38:27 haleyb: I am not clear as to what difference would the ha patch make. i 15:38:52 carl_baldwin: yesterday the gate had issues with some keystone upper constraints which took a longer time for most of the tests to pass. armax had a patch for it. 15:39:26 haleyb: I did notice that the gate queue was long but wasn't sure what the cause was. It seems it has been running long for a while. 15:39:30 carl_baldwin: I think, that patch had not merged yet. 15:39:38 fitoduarte: we just don't want to de-stabilize more than today, not that the HA patch isn't ready, but adding fuel to fire is what armando doesn't want 15:41:02 haleyb: a sorry. I thought Armando s comment was about the refactoring patch 15:41:18 We will investigate the failures on the gate further. 15:41:53 fitoduarte: yes armando's comment was on both sides, he mentioned about gate failures shooting up for dvr and also cautioned us to focus on the HA patch rather than pushing other patches. 15:41:55 fitoduarte: it was about which should merge first, i think obondarev answered that HA was priority, but refactoring will continue 15:42:40 fitoduarte: we will investigate further on the gate failures, but that should not stop your patch from getting merged. Both can go in parallel. 15:43:13 swami: sounds good 15:43:39 Ok let us move on. 15:44:07 The next one in the list is 15:44:09 #link https://bugs.launchpad.net/neutron/+bug/1462154 15:44:10 Launchpad bug 1462154 in neutron "With DVR Pings to floating IPs replied with fixed-ips" [High,In progress] - Assigned to ZongKai LI (lzklibj) 15:44:42 #link https://review.openstack.org/#/c/246855/ 15:44:53 still under review 15:45:34 The next one in the list is 15:45:37 #link https://bugs.launchpad.net/neutron/+bug/1445255 15:45:38 Launchpad bug 1445255 in neutron "DVR FloatingIP to unbound allowed_address_pairs does not work" [Low,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:46:10 #link https://review.openstack.org/254439 15:46:20 I need more reviews on this patch. 15:46:57 Swami: yes, sorry, that somehow slipped off my list, will look 15:47:06 haleyb: thanks 15:47:13 that's all I had for bugs today. 15:47:55 Thanks. I will skip over gate failures since we already discussed 15:48:09 haleyb: yes I was about to say that. 15:48:10 #topic Performance/Scalability 15:48:50 obondarev: i see only 4 reviews left ? 15:49:01 https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:bp/improve-dvr-l3-agent-binding 15:49:10 thanks for reviews on sheduling refactoring patches folks 15:49:31 haleyb: yeah, and one of the will not be needed I guess 15:50:17 obondarev: Very nice work on this overall. I was excited to review. 15:50:27 carl_baldwin: thanks 15:50:45 https://review.openstack.org/#/c/254837/ need a little more work to do on migration side 15:51:03 will do it and rebase soon 15:51:47 https://review.openstack.org/262558 and https://review.openstack.org/261477 are ready for review 15:53:15 obondarev: I'll bump it up on my queue again. 15:53:26 carl_baldwin: cool, thanks 15:54:09 #topic Open Discussion 15:54:28 anyone have a random item to discuss? 15:54:31 Swami: hey, can you please restore https://review.openstack.org/#/c/266026/ ? 15:55:15 I'd like to backport that chain of optimizations to stable/liberty 15:56:08 obondarev: yes will do, I do have some issue when I try to address merge conflicts on that patch. 15:56:32 Swami: I can upload new patches, I have it ready 15:56:40 patchset* 15:56:43 obondarev: also I have added a comment on one of your other cherry-pick patch regarding the need for the tempest patch in liberty. 15:57:02 obondarev: if you have one just upload, I will see what is wrong on my side. 15:57:15 Swami: missed that, will check 15:57:36 Swami: so please just restore abandoned one 15:57:47 obondarev: ok 15:57:52 Swami: thanks 15:58:34 Swami: i wil look for the iptables footprint on my test system 15:58:40 haleyb: thanks 15:58:44 obondarev: restored. 15:58:49 and keep an eye on the gate to see if it calms down 15:58:50 Swami: great 15:59:14 haleyb: obondarev: if you find any gate failures let me know. 15:59:25 will do 15:59:31 + 15:59:54 thanks every for making good progress 15:59:55 haleyb: with logstash it is very difficult to find out the new failures that are occuring, since it only reports the first 500 failures and if there is one bad patch that has everything. 16:00:09 Swami: o 16:00:34 thanks folks 16:00:37 i'm going to wait a few hours, or maybe tomorrow, since it's too crazy to filter now i think 16:00:40 #endmeeting