15:04:07 <haleyb> #startmeeting neutron_dvr
15:04:08 <openstack> Meeting started Wed Sep 21 15:04:07 2016 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:04:09 <Swami> cool
15:04:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:04:13 <openstack> The meeting name has been set to 'neutron_dvr'
15:04:26 <haleyb> #chair Swami
15:04:26 <openstack> Current chairs: Swami haleyb
15:04:33 <haleyb> #topic Announcements
15:05:07 <haleyb> The sky is blue, the grass is green... and RC1 was cut
15:05:26 <Swami> haleyb: no rain so far.
15:06:09 <haleyb> not much here either
15:06:35 <Swami> haleyb: :-)
15:07:11 <haleyb> Any critical fixes that need to get into newton should be raised to our leader, i am but a minion
15:07:39 <Swami> haleyb: sure
15:08:23 <haleyb> So master is open, and things will start merging, but we're still trying to focus on high priority bug fixes
15:08:47 <haleyb> with that...
15:08:47 <Swami> haleyb: good
15:08:50 <haleyb> #topic Bugs
15:08:58 <Swami> haleyb: thanks
15:09:07 <Swami> this week we had one new bug reported.
15:09:26 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1625333
15:09:27 <openstack> Launchpad bug 1625333 in neutron "Booting VM with a Floating IP and pinging it via that takes a long time with errors in L3-Agent logs when using DVR" [Undecided,New]
15:09:43 <Swami> haleyb: I think you have already replied to this bug.
15:10:01 <Swami> Is this related to the non_local_binding setting.
15:10:42 <Swami> We see the 'GARP' fails from the FIP namespace.
15:10:58 <haleyb> Swami: yes, it seems to be, but that should have been set, so a reproducer is needed
15:11:02 <Swami> So that's the reason for the delay in response.
15:11:48 <Swami> haleyb: Ya there are redundant settings for the non_local_bind to be set. I don't see any error message for it to set.
15:12:04 <Swami> So something else might be causing this issue.
15:13:03 <haleyb> We'd need to know kernel version, or get on a system exhibiting the problem, or see the full logs for the l3-agent as i don't see them there
15:13:36 <Swami> haleyb: yes you are right. I don't think we are seeing such error logs in the gate.
15:14:15 <haleyb> i'll add another comment as i don't think it's reasonable to ask that we load, configure and run this new rally test
15:14:49 <Swami> haleyb: makes sense
15:15:50 <Swami> That's all for the new bugs, and let us go over the old bugs.
15:16:04 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1612192
15:16:07 <openstack> Launchpad bug 1612192 in neutron "L3 DVR: Unable to complete operation on subnet" [High,Confirmed]
15:17:06 <Swami> Do we still see this message in the gate check jobs?
15:18:46 <Swami> You have mentioned last week, that we don't see this error for the last 6 days, if so should we still have the severity as High.
15:19:19 <haleyb> sorry, gott call
15:19:43 <Swami> haleyb: no problem, go ahead
15:20:39 <Swami> The next one in the list is
15:20:53 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1612804
15:20:54 <openstack> Launchpad bug 1612804 in neutron "test_shelve_instance fails with sshtimeout" [High,Confirmed]
15:21:22 <Swami> This bug is also under the watch list.
15:21:58 <haleyb> i'm back
15:22:06 <Swami> haleyb: great
15:22:46 <haleyb> i don't think we've seen that in weeks now, i'll run a logstash and update
15:22:52 <Swami> haleyb: both the bugs mentioned above can be under the watch list and let us see if this has been triggered by something else other than dvr
15:23:43 <haleyb> of course i jinxed myself and see them now
15:24:17 <haleyb> but i need to do some work on it, they might not be in the gate
15:24:24 <Swami> haleyb: do you mean these errors are still seen upstream
15:25:25 <haleyb> the logstash link from the bug shows errors
15:25:49 <Swami> haleyb: ok
15:25:57 <haleyb> but i need to filter by build_queue:gate
15:26:44 <Swami> The most vulnerable tests for dvr are the 'shelve_instance', 'volume_boot_pattern' and dhcpv6_statefull and dhcpv6_stateless.
15:27:16 <haleyb> there have been 7 in the gate the past 7 days
15:27:49 <Swami> haleyb: ok, and is it only seen for the dvr and not for the non-dvr case.
15:28:57 <haleyb> yes, i think just dvr
15:29:08 <Swami> haleyb: ok
15:29:18 <haleyb> actually one is ovn
15:29:30 <Swami> haleyb: thanks
15:29:36 <Swami> The next in the list is
15:29:39 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1606741
15:29:40 <openstack> Launchpad bug 1606741 in neutron "Metadata service for instances is unavailable when the l3-agent on the compute host is dvr_snat mode" [High,In progress] - Assigned to Brian Haley (brian-haley)
15:29:58 <haleyb> why do i own that :)
15:30:05 <Swami> haleyb: we discussed about this bug. Are we still considering this as a bug and need to fix.
15:30:29 <Swami> Not sure, someone assigned your name to the bug recently. May be the recent bug deputy would have done it.
15:30:46 <haleyb> i think i updated the review, jenkins did it
15:31:09 <Swami> It might be a minor fix where it checks for the agent type and takes necessary action.
15:31:37 <Swami> We might not be configuring it for the dvr_snat agent.
15:31:58 <haleyb> the patch needs an update at the least based on the comments
15:32:57 <Swami> haleyb: yes let us wait for the update.
15:34:35 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1571676
15:34:36 <openstack> Launchpad bug 1571676 in neutron "After binding a floating IP to VM, the static route can't work in DVR." [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:34:58 <Swami> Patch needs review #link https://review.openstack.org/#/c/308068/
15:36:16 <haleyb> i need to re-review, but think it's close
15:36:34 <Swami> haleyb: yes it should be close.
15:36:40 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1506567
15:36:41 <openstack> Launchpad bug 1506567 in neutron "No information from Neutron Metering agent" [Undecided,Confirmed]
15:37:21 <Swami> I need to work on it. I started to triage it but got distracted.
15:37:56 <Swami> That's all I had for bugs this week.
15:38:11 <haleyb> i think it has similarities to the ipv6 PD issue/fix ritesh had done, but we need a generic way for an agent to ask what namespace to work on
15:38:27 <haleyb> thanks for all the updates Swami
15:38:39 <Swami> haleyb: yes I agree.
15:39:22 <haleyb> jschwarz: any HA bugs to add?  you fixed them all, right? :)
15:40:22 <Swami> haleyb: I think there is one ha functional test issue still
15:40:24 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1580648
15:40:25 <openstack> Launchpad bug 1580648 in neutron "Two HA routers in master state during functional test" [Undecided,Opinion]
15:40:31 <jschwarz> haleyb, I hope so :)
15:40:33 <jschwarz> haleyb, I was on PTO for the last week, but I haven't received any mails about new bugs, etcv
15:40:49 <Swami> jschwarz:^^
15:41:08 <jschwarz> Swami, I'll add this one to my todo list and will have a good response by next week hopefully
15:41:33 <jschwarz> alas, I haven't got anything to add yet
15:41:56 <haleyb> jschwarz: should i assign it to you?
15:42:28 <jschwarz> haleyb, lets assign it to me with the "John should decide if this is a bug or not" and I'll work on it
15:42:55 <jschwarz> haleyb, opinions seem to be conflicted on that one so that seems like the best in-the-middle approach
15:44:19 <haleyb> jschwarz: it's all yours now :)
15:44:24 <jschwarz> haleyb, yay! :)
15:45:26 <haleyb> #topic Gate failures
15:46:38 <haleyb> the failures had been up and down, but now it seems the tempest dvr gate job is at 10% failure
15:46:49 <haleyb> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=5&fullscreen
15:47:47 <Swami> haleyb: that is not good.
15:47:53 <haleyb> this started yesterday, but actually now i see it's falling this morning, could be directly releated to the regular job
15:49:13 <haleyb> i will watch it today and look for failures, but sometimes we are just the victim
15:49:23 <Swami> haleyb: as always
15:50:03 <haleyb> #topic Open Discussion
15:50:21 <jschwarz> So I talked a few weeks back about adding a new DVR+HA gate
15:50:39 <jschwarz> Unfortunately I just lack the capacity to deal with this altogether and I've left it untouched since
15:50:48 <Swami> jschwarz: yes
15:51:07 <jschwarz> I'd be happy if some generous contributer help me with this one
15:51:32 <haleyb> jschwarz: is it all a yaml change in the infra jobs?
15:51:37 <Swami> jschwarz: I think we should first get the infra permission to run three node for multinode job.
15:52:03 <Swami> jschwarz: where two node can run dvr_snat ha, and the remaining can run the dvr agent type.
15:52:16 <jschwarz> haleyb, I have no idea of how to work infra stuff
15:52:25 <jschwarz> Swami, agreed
15:52:33 <haleyb> i would think we could add an experimental job that is 3 nodes
15:52:48 <jschwarz> this seems like mostly a biocratic task that I don't have the capacity to do atm
15:53:11 <haleyb> jschwarz: hmm, i have done very little there myself, but should be similar to the DVR job
15:53:26 * haleyb googles biocratic :)
15:53:39 * jschwarz 's spelling is bad ;-)
15:53:47 <Swami> haleyb: need to add config details for the ha.
15:54:17 <jschwarz> haleyb, so if you can find the exact code that hosts the DVR jobs, I can probably do some copy-pasting and propose on that, and get the discussion going on that patch instead
15:54:26 <haleyb> Swami: right, and possibly packages?
15:54:44 <Swami> jschwarz: I can point you to the patch in infra that I added for making the multinode job.
15:54:46 <haleyb> jschwarz: i'll look, but some of that infra code is kryptonite
15:54:56 <jschwarz> Swami, awesome
15:55:01 * haleyb takes a step back and lets Swami help
15:55:15 <jschwarz> ok, so lets co-operate on this one and see how it goes
15:55:24 <jschwarz> I'll do my best to put some cycles into this
15:55:30 <Swami> jschwarz: I will send you the link to the patch and we can follow up on this.
15:55:48 <jschwarz> Swami, much appreciated
15:56:15 <jschwarz> In other news, I have plans to submit some patches that refactors the l3 scheduler in the coming weeks
15:56:39 <jschwarz> me and kevinbenton worked on a DB change that got in for N, so refactors can also be backported to N now, which is good for us
15:56:39 <Swami> jschwarz: will look for it.
15:56:49 <haleyb> +1
15:56:51 <jschwarz> I'll make sure to add you guys as reviewers
15:57:12 <jschwarz> seems like haleyb likes to ask for patches to review ;-)
15:57:53 <haleyb> that's the only way i can fall asleep at night now, like reading a book
15:58:18 <Swami> haleyb: great medicine to sleep. :-)
15:58:40 <haleyb> i had one other bug that i must have forgot to add
15:58:42 <haleyb> https://bugs.launchpad.net/neutron/+bug/1624515
15:58:43 <openstack> Launchpad bug 1624515 in neutron "DVR: SNAT port not found in the list error in check jobs" [Medium,In progress] - Assigned to Brian Haley (brian-haley)
15:59:02 <Swami> haleyb: yes
15:59:11 <haleyb> Swami: oleg said he's fine assuming i add a test, so i'll fix that up
15:59:20 <Swami> haleyb: ok thanks,
15:59:56 <haleyb> we need to write a DVR book to hand out at the summit one day...
16:00:05 <jschwarz> haleyb, sounds like a plan
16:00:11 <haleyb> thanks everyone, top of hour
16:00:14 * jschwarz is working on a DVR lecture for some folks atm
16:00:19 <haleyb> #endmeeting