15:00:28 <haleyb> #startmeeting neutron_dvr
15:00:29 <openstack> Meeting started Wed Jan 20 15:00:28 2016 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:30 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:32 <openstack> The meeting name has been set to 'neutron_dvr'
15:00:40 <haleyb> #chair Swami
15:00:41 <openstack> Current chairs: Swami haleyb
15:01:40 <haleyb> let's get started
15:01:43 <haleyb> #topic Announcements
15:02:15 <haleyb> just a reminder M-2 this week
15:02:50 <carl_baldwin> o/
15:02:59 <haleyb> And thanks everyone for keeping-up with the code reviews, think we merged a lot of changes in the past week
15:03:13 <obondarev> ++
15:03:24 <Swami> good job
15:04:04 <haleyb> and Swami went through the bug list marking things with nice BOLD letters, i'll pass it to him
15:04:08 <haleyb> #topic Bugs
15:04:17 <Swami> hi
15:04:28 <Swami> This week we had two new bugs that was filed.
15:04:49 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1535928
15:04:51 <openstack> Launchpad bug 1535928 in neutron "Duplicate IPtables rule detected warning message seen in L3 agent" [Undecided,New]
15:05:37 <Swami> I have been seen this warning message in the l3 agent log, and so went a filed a bug. I did see that there was a patch initiated by Kevin to address this, but it is still under review.
15:06:05 <Swami> #link https://review.openstack.org/#/c/255484/1
15:06:24 <haleyb> That was just downgrading the LOG message, not fixing the problem
15:06:31 <Swami> At this point I think this issue is seen both in dvr and non-dvr case.
15:07:17 <Swami> haleyb: Yes I just captured the related patch. But I think we need to fix this problem.
15:07:39 <obondarev> seems not a dvr issue
15:08:07 <Swami> haleyb: obondarev: yes it is not a dvr issue, it seems to be an issue with the iptable utils.
15:08:31 <Swami> obondarev: what would be the best "tag" for this bug.
15:08:32 <haleyb> at least it's always the same rule, so that will help track it down
15:09:23 <Swami> anything to add to this bug or can we move on.
15:09:54 <haleyb> actually, the only place that rule is added is the DVR code
15:10:23 <obondarev> ah, then it might be dvr issue
15:10:41 <Swami> haleyb: ok, then it makes sense to be tagged as dvr bug.
15:11:07 <Swami> ok, let us not change the tag.
15:11:15 <haleyb> could be, we should ping kevin
15:11:37 <Swami> kevin mentioned in his commit message that this warning message is seen in the gate.
15:11:47 <Swami> but we can clarify with him later.
15:12:11 <Swami> ok, let us move on.
15:12:20 <Swami> The next one that came in yesterday is
15:12:23 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1536110
15:12:25 <openstack> Launchpad bug 1536110 in neutron "OVS agent should fail if can't get DVR mac address" [Undecided,In progress] - Assigned to Oleg Bondarev (obondarev)
15:12:41 <Swami> obondarev has already a patch for this bug.
15:12:54 <Swami> I saw haleyb you have already provided your review comments.
15:13:08 <haleyb> nitpicking, yes
15:13:12 <Swami> #link  https://review.openstack.org/270130
15:13:23 <obondarev> yeah. I was wondering if there is any reason to continue running in non-dvr mode
15:13:46 <obondarev> if anyone is aware of such a reason please speak
15:14:13 <obondarev> otherwise it's better to fail
15:14:14 <Swami> obondarev: I will check with vivek on this, I was not sure about the reason behind this fallback option.
15:14:47 <obondarev> Swami: ok thanks
15:15:28 <Swami> ok, let us move on to the next bug.
15:15:43 <Swami> The next bug which is targetted for mitaka-2 is HA-DVR
15:16:16 <Swami> #link https://review.openstack.org/#/c/143169/
15:16:38 <Swami> This patch requires some core attention, it has been rebased and in good shape.
15:16:57 <carl_baldwin> Swami: I will try to look today.
15:17:07 <Swami> carl_baldwin: thanks
15:17:28 <carl_baldwin> I noticed in the comments from yesterday, Armando mentions the failure rate of the DVR job.  It has been climbing a little and getting noticed at the higher levels.
15:18:05 <carl_baldwin> We should be sure to hit that topic in this meeting.
15:18:33 <Swami> carl_baldwin: yes I captured in the gate_failures section.
15:18:47 <Swami> carl_baldwin: while we move on to that section we can talk about it.
15:19:07 <carl_baldwin> sounds good
15:19:34 <Swami> The next in the list which has the mitaka tag is
15:19:37 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1504039
15:19:39 <openstack> Launchpad bug 1504039 in neutron "Linuxbridge DVR" [Wishlist,In progress] - Assigned to Hirofumi Ichihara (ichihara-hirofumi)
15:19:58 <Swami> I did see that there were a couple of related patches that went in
15:20:30 <Swami> #link https://review.openstack.org/#/c/266210/
15:21:06 <Swami> This is a related patch and I think right now it is WIP.
15:21:49 <Swami> Now let us move on to the other bugs.
15:22:08 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1522824
15:22:09 <openstack> Launchpad bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] - Assigned to Oleg Bondarev (obondarev)
15:22:42 <Swami> obondarev: did we get any closure on this patch #link https://review.openstack.org/#/c/215467
15:23:05 <Swami> obondarev: there were two related patches related to this fix, did you get to review the other one.
15:23:22 <obondarev> I abandoned my fix in favor of https://review.openstack.org/#/c/215467 which was submitted earlier
15:23:32 <obondarev> Swami: there was only one patch
15:23:49 <obondarev> from me I mean
15:24:05 <obondarev> https://review.openstack.org/#/c/215467 is the one that I wasn't aware of
15:24:46 <obondarev> and it should fix several bugs at once
15:24:56 <Swami> obondarev: that's what I meant.
15:25:21 <obondarev> so.. please reivew
15:25:40 <Swami> obondarev: will do.
15:25:46 <obondarev> cool
15:25:56 <Swami> The next one in the list is
15:26:14 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1450604
15:26:16 <openstack> Launchpad bug 1450604 in neutron "Fix DVR multinode upstream CI testing" [Medium,In progress] - Assigned to Ryan Moats (rmoats)
15:26:56 <Swami> Recently the failure rate for the multinode upstream job has shooted up and also there was comment from Armando about increasing failure rate in the single node check job.
15:27:19 <obondarev> I saw several failures due to bug 1522824
15:27:20 <openstack> bug 1522824 in neutron "DVR multinode job: test_shelve_instance failure due to SSHTimeout" [High,In progress] https://launchpad.net/bugs/1522824 - Assigned to Oleg Bondarev (obondarev)
15:27:23 <Swami> Have anyone noticed any specific test failures in the last two days on the single node check job for dvr.
15:28:03 <haleyb> No, but it is climbing along with the regular neutron job
15:28:10 <Swami> obondarev: is that also affecting the single node check job failure.
15:28:28 <Swami> haleyb: you mean the multinode job or the single node job.
15:28:30 <obondarev> Swami: no, it's only on multinode
15:29:14 <Swami> multi-node job also had some infra related issues where I was seeing some "SSHFailures" and "SCP" failures last two days.
15:29:33 <obondarev> another issue wich affects both multinode jobs is LiveBlockMigration test failure
15:29:42 <obondarev> also not dvr specific I guess
15:30:03 <Swami> obondarev: yes you are right.
15:30:21 <haleyb> well, all have gone up since 1/18.  I know there's been some patches on getting MTU's sorted out, but don't know how many have merged yet
15:30:55 <obondarev> so might be no fair to blame dvr only
15:31:01 <Swami> So based on the discussion, do you all think that the single node check job is still under control.
15:31:02 <obondarev> not*
15:31:48 <haleyb> Swami: single-node DVR?  probably still higher than expected over regular job
15:32:13 <Swami> haleyb: the pattern seems to be similar to me. I don't see any new failures.
15:32:34 <Swami> haleyb: please let me know if you have seen any new failures in single node or are we missing something here.
15:32:57 <haleyb> https://goo.gl/L1WODG
15:33:21 <haleyb> that shows single-node neutron job at over 25%, dvr 35% maybe
15:33:53 <Swami> haleyb: the delta seem to have a bumped up a little.
15:34:39 <Swami> Will this be because of the overall gate related failures that is seen last two days with certain tests talking longer time to complete.
15:36:09 <haleyb> carl_baldwin: do you think we need to investigated the delta here before merging the HA patch?  it's hard to not say the issue yesterday in the gate didn't help
15:36:42 <haleyb> didn't help level-out the failure rate i mean
15:36:45 <carl_baldwin> haleyb: I think we do need to continue investigating.
15:37:21 <carl_baldwin> haleyb: I had a hard time parsing your second sentence.  What issue yesterday in the gate?
15:38:18 <haleyb> carl_baldwin: there was a pip issue (?) I think, gate was at 15h or so due to failures
15:38:27 <fitoduarte> haleyb: I am not clear as to what difference would the ha patch make. i
15:38:52 <Swami> carl_baldwin: yesterday the gate had issues with some keystone upper constraints which took a longer time for most of the tests to pass. armax had a patch for it.
15:39:26 <carl_baldwin> haleyb: I did notice that the gate queue was long but wasn't sure what the cause was.  It seems it has been running long for a while.
15:39:30 <Swami> carl_baldwin: I think, that patch had not merged yet.
15:39:38 <haleyb> fitoduarte: we just don't want to de-stabilize more than today, not that the HA patch isn't ready, but adding fuel to fire is what armando doesn't want
15:41:02 <fitoduarte> haleyb: a sorry. I thought Armando s comment was about the refactoring patch
15:41:18 <Swami> We will investigate the failures on the gate further.
15:41:53 <Swami> fitoduarte: yes armando's comment was on both sides, he mentioned about gate failures shooting up for dvr and also cautioned us to focus on the HA patch rather than pushing other patches.
15:41:55 <haleyb> fitoduarte: it was about which should merge first, i think obondarev answered that HA was priority, but refactoring will continue
15:42:40 <Swami> fitoduarte: we will investigate further on the gate failures, but that should not stop your patch from getting merged. Both can go in parallel.
15:43:13 <fitoduarte> swami: sounds good
15:43:39 <Swami> Ok let us move on.
15:44:07 <Swami> The next one in the list is
15:44:09 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1462154
15:44:10 <openstack> Launchpad bug 1462154 in neutron "With DVR Pings to floating IPs replied with fixed-ips" [High,In progress] - Assigned to ZongKai LI (lzklibj)
15:44:42 <Swami> #link https://review.openstack.org/#/c/246855/
15:44:53 <Swami> still under review
15:45:34 <Swami> The next one in the list is
15:45:37 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1445255
15:45:38 <openstack> Launchpad bug 1445255 in neutron "DVR FloatingIP to unbound allowed_address_pairs does not work" [Low,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:46:10 <Swami> #link https://review.openstack.org/254439
15:46:20 <Swami> I need more reviews on this patch.
15:46:57 <haleyb> Swami: yes, sorry, that somehow slipped off my list, will look
15:47:06 <Swami> haleyb: thanks
15:47:13 <Swami> that's all I had for bugs today.
15:47:55 <haleyb> Thanks.  I will skip over gate failures since we already discussed
15:48:09 <Swami> haleyb: yes I was about to say that.
15:48:10 <haleyb> #topic Performance/Scalability
15:48:50 <haleyb> obondarev: i see only 4 reviews left ?
15:49:01 <haleyb> https://review.openstack.org/#/q/status:open+project:openstack/neutron+branch:master+topic:bp/improve-dvr-l3-agent-binding
15:49:10 <obondarev> thanks for reviews on sheduling refactoring patches folks
15:49:31 <obondarev> haleyb: yeah, and one of the will not be needed I guess
15:50:17 <carl_baldwin> obondarev: Very nice work on this overall.  I was excited to review.
15:50:27 <obondarev> carl_baldwin: thanks
15:50:45 <obondarev> https://review.openstack.org/#/c/254837/ need a little more work to do on migration side
15:51:03 <obondarev> will do it and rebase soon
15:51:47 <obondarev> https://review.openstack.org/262558 and https://review.openstack.org/261477 are ready for review
15:53:15 <carl_baldwin> obondarev: I'll bump it up on my queue again.
15:53:26 <obondarev> carl_baldwin: cool, thanks
15:54:09 <haleyb> #topic Open Discussion
15:54:28 <haleyb> anyone have a random item to discuss?
15:54:31 <obondarev> Swami: hey, can you please restore https://review.openstack.org/#/c/266026/ ?
15:55:15 <obondarev> I'd like to backport that chain of optimizations to stable/liberty
15:56:08 <Swami> obondarev: yes will do, I do have some issue when I try to address merge conflicts on that patch.
15:56:32 <obondarev> Swami: I can upload new patches, I have it ready
15:56:40 <obondarev> patchset*
15:56:43 <Swami> obondarev: also I have added a comment on one of your other cherry-pick patch regarding the need for the tempest patch in liberty.
15:57:02 <Swami> obondarev: if you have one just upload, I will see what is wrong on my side.
15:57:15 <obondarev> Swami: missed that, will check
15:57:36 <obondarev> Swami: so please just restore abandoned one
15:57:47 <Swami> obondarev: ok
15:57:52 <obondarev> Swami: thanks
15:58:34 <haleyb> Swami: i wil look for the iptables footprint on my test system
15:58:40 <Swami> haleyb: thanks
15:58:44 <Swami> obondarev: restored.
15:58:49 <haleyb> and keep an eye on the gate to see if it calms down
15:58:50 <obondarev> Swami: great
15:59:14 <Swami> haleyb: obondarev: if you find any gate failures let me know.
15:59:25 <haleyb> will do
15:59:31 <obondarev> +
15:59:54 <haleyb> thanks every for making good progress
15:59:55 <Swami> haleyb: with logstash it is very difficult to find out the new failures that are occuring, since it only reports the first 500 failures and if there is one bad patch that has everything.
16:00:09 <haleyb> Swami: o
16:00:34 <Swami> thanks folks
16:00:37 <haleyb> i'm going to wait a few hours, or maybe tomorrow, since it's too crazy to filter now i think
16:00:40 <haleyb> #endmeeting