15:00:37 <carl_baldwin> #startmeeting neutron_l3
15:00:38 <regXboi> carl_baldwin: I made it :)
15:00:39 <openstack> Meeting started Thu Sep 17 15:00:37 2015 UTC and is due to finish in 60 minutes.  The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:42 <openstack> The meeting name has been set to 'neutron_l3'
15:00:45 <mlavalle> ahoy
15:00:49 <carl_baldwin> #topic Announcements
15:01:03 <carl_baldwin> The RC period starts next week.
15:01:25 <carl_baldwin> #link https://wiki.openstack.org/wiki/Liberty_Release_Schedule
15:01:35 <neiljerram> o/
15:01:57 <carl_baldwin> We’ve got three candidates for PTL
15:02:01 <carl_baldwin> #link https://review.openstack.org/#/q/project:openstack/election+file:%255Ecandidates/mitaka/Neutron/.*,n,z
15:02:33 <Swami> hi
15:02:44 <carl_baldwin> Your vote is important.  The last election for Neutron was won by one vote if I recall correctly.  ;)
15:02:56 <neiljerram> It's good of those people to stand.
15:03:03 <carl_baldwin> +1
15:03:14 <regXboi> more should feel free :)
15:03:24 <carl_baldwin> Any other announcements?
15:03:45 <carl_baldwin> regXboi: I think nominations are now closed.
15:04:04 <regXboi> carl_baldwin: that was more of a statement for next time
15:04:35 <carl_baldwin> I see
15:04:57 <carl_baldwin> Let’s move on to bugs.
15:05:00 <carl_baldwin> #topic Bugs
15:05:08 <mlavalle> hi
15:05:15 <carl_baldwin> #link https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam#Agenda
15:05:17 <carl_baldwin> mlavalle: hi
15:05:31 <mlavalle> last week we were tracking 2 critical bugs
15:05:39 <carl_baldwin> Looks like they’re both doing well.
15:06:03 <mlavalle> t=yeah, I helped to diagnose one of them and armax proposed a fix over the weekend
15:06:11 <mlavalle> no events since then
15:06:21 <carl_baldwin> Good work.
15:06:31 <mlavalle> the other one had a fix by armax on labor day
15:06:36 <mlavalle> no events since then
15:06:44 <mlavalle> so thank you very much to armax
15:07:15 <mlavalle> I kept 2 high importance bugs in the agenda
15:07:29 <armax> mlavalle: thank you guys for staying on top of things
15:07:37 <carl_baldwin> mlavalle: Any idea where we’re at with HA/DVR?
15:07:45 <mlavalle> https://bugs.launchpad.net/neutron/+bug/1365473
15:07:46 <openstack> Launchpad bug 1365473 in neutron "Unable to create a router that's both HA and distributed" [High,In progress] - Assigned to Adolfo Duarte (adolfo-duarte)
15:08:03 <mlavalle> Adolfo had been actively pushing fixes to this
15:08:17 <mlavalle> seems to be making progress, although it seems difficult to fix
15:08:37 <Swami> carl_baldwin: adolfo mentioned that was rebasing his patch yesterday
15:08:38 <mlavalle> there are 2 fixes, one for the server and one for the agent
15:08:39 <amuller> sadly jschwarz is away on PTO
15:08:51 <amuller> so he won't be able to help there any more for the upcoming week or two
15:09:08 <Swami> carl_baldwin: There was some functional test that broke because of a recent change in upstream and adolfo is working on it.
15:09:13 <mlavalle> last time adolfo pushed code was the end of last week
15:09:40 <mlavalle> so i say let's give adolfo time to keep maming progress
15:09:42 <carl_baldwin> amuller: Any idea where things stood when jshwarz left?
15:09:44 <Swami> mlavalle: last week he was on a training the whole weem.
15:09:52 <amuller> not great...
15:10:11 <amuller> I asked John to create an etherpad for it
15:10:14 <amuller> which is linked from the bug report
15:10:28 <amuller> https://etherpad.openstack.org/p/DVR_HA_Routers
15:11:15 <carl_baldwin> amuller: Thanks for calling attention to that.
15:11:20 <Swami> amuller: thanks for the link, I will communicate to adolfo.
15:12:15 <carl_baldwin> For bug #1494351, pavel has a fix up marked WIP.
15:12:16 <openstack> bug 1494351 in neutron "Observed StaleDataError in gate-neutron-dsvm-api tests if reference IPAM driver is used" [High,In progress] https://launchpad.net/bugs/1494351 - Assigned to Pavel Bondar (pasha117)
15:12:22 <carl_baldwin> #link https://review.openstack.org/#/c/223123/
15:12:23 <mlavalle> correct
15:12:41 <regXboi> carl_baldwin, mlavalle: I don't have anything to add to the list this week - the backlog is now down to 14 bugs without fix commits for RC1 :)
15:13:01 <mlavalle> pavel_bondar: thanks for that progress!
15:13:05 <regXboi> so I will go through these and pull up ones for next week that could use some review love
15:13:28 <carl_baldwin> regXboi: That is great progress.
15:13:42 <regXboi> carl_baldwin: thanks to all the folks pitching in!
15:13:51 <pavel_bondar> I need to rework current fix to use compare-and-swap logic to make it work with Galera, so it may take time
15:14:13 <carl_baldwin> pavel_bondar: Thanks.
15:14:22 <mlavalle> regXboi: I kept https://bugs.launchpad.net/neutron/+bug/1450982
15:14:22 <openstack> Launchpad bug 1450982 in neutron "DVR: Moving FloatingIP between hosts can lead to Floating Agent Gateway Port not being deleted" [Medium,Fix committed] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan)
15:14:29 <mlavalle> in the agenda
15:14:41 <regXboi> mlavalle: I believe all of the patches for that have merged?
15:14:51 <mlavalle> but I think we all the fixes are in, so if that's ok with you, i'll remove it
15:15:06 <Swami> mlavalle: the above bug was fixed and merged.
15:15:18 <regXboi> Swami: thanks, beat me to asking the question :)
15:15:26 <mlavalle> regXboi, Swami: ok, i'll remove it from tracking
15:15:38 <Swami> mlavalle: thanks
15:15:59 <carl_baldwin> Any other bugs to mention?
15:16:16 <Swami> carl_baldwin: there was another bug that I filed yesterday.
15:16:19 <mlavalle> finally there are medium importance bugs that need prunning.... i'll work on it this week
15:16:25 <Swami> carl_baldwin: I have a patch for it
15:16:47 <carl_baldwin> Swami: That’s right, do you have a link for it?
15:16:47 <Swami> #link https://review.openstack.org/#/c/224250/
15:17:01 <carl_baldwin> Swami: thanks
15:17:28 <mlavalle> Swami: I'll track it for next week
15:17:36 <Swami> mlavalle: thanks
15:17:53 <carl_baldwin> #topic Routed network segments
15:18:33 <carl_baldwin> Work is continuing on this.  I will have a spec up by tomorrow.
15:18:56 <neiljerram> Looking forward to that!
15:19:08 <tidwellr> then the party really starts ;0
15:19:33 <carl_baldwin> :)
15:19:43 <carl_baldwin> We should have a lot more to discuss next week.
15:20:03 <carl_baldwin> #topic DVR
15:20:28 <Swami> carl_baldwin: hi
15:20:31 <carl_baldwin> DVR came up in a big way at the QA sprint this week.  regXboi, sc68cal, dougwig, and I represented Neutron there.
15:20:42 <sc68cal> :)
15:20:47 <regXboi> :) :)
15:20:49 <mlavalle> nice representation
15:20:53 <regXboi> (oh my)
15:21:00 <mlavalle> we sent the big guns!
15:21:27 <carl_baldwin> What kicked of the discussion was the failure rate of the DVR job.
15:21:30 <carl_baldwin> #link http://goo.gl/j5UkwT
15:21:39 <regXboi> sc68cal - do you have link to the email you sent to the mailing list as well?
15:21:57 <sc68cal> regXboi: sure
15:22:12 <Swami> carl_baldwin: but that was a spike because of bug introduced on a patch.
15:22:25 <sc68cal> http://lists.openstack.org/pipermail/openstack-dev/2015-September/074433.html
15:22:25 <carl_baldwin> #link http://lists.openstack.org/pipermail/openstack-dev/2015-September/074445.html
15:22:37 <carl_baldwin> Swami: It wasn’t the spike in the multi-node job that sparked the discussion.
15:22:58 <Swami> carl_baldwin: The main think that we need to solve in DVR is consistency
15:23:06 <carl_baldwin> It was the gate-tempest-dsvm-neutron-dvr job that sparked the discussion.
15:23:25 <regXboi> there are *lots* of things to solve in DVR
15:23:32 <Swami> carl_baldwin: we have always seen a 10-12% deviation in the DVR job and all those are related to session timed out.
15:23:57 <carl_baldwin> For those of us who have looked at this graph for a long time now, we know that this job consistently doubles the failure rate of the gate-tempest-dsvm-neutron-full job.
15:24:11 <regXboi> carl_baldwin: ack and #sadpanda
15:24:33 <Swami> carl_baldwin: right now the only difference in floating ip is basically we do have two namespaces for the packets to travel through and we need to figure out a way to debug where it is exactly failing.
15:25:12 <carl_baldwin> Swami: Right, and we need to get to the bottom of it.
15:25:17 <mlavalle> Swami: would it be a metter of training people so we have more eyes to debug issues?
15:25:26 <mlavalle> matter^^^
15:25:40 <Swami> mlavalle: sure I can help anyone if they are interested in poking around.
15:26:18 <mlavalle> Swami: if that helps, we can send an invitation to the ML asking for volunteers....
15:26:37 <Swami> mlavalle:Some of tempest test does not wait for the floatingip status.
15:27:05 <sc68cal> mlavalle: think my post to the ML may cover the ask for volunteers
15:27:20 <regXboi> Swami: that is because some of the scenario tests use nova apis for creating floating IPs and they *can't* check for status
15:27:27 <Swami> mlavalle: long time back I tried to add a patch to check for status and then do an ssh. But that patch got -1 since people said that some of the nova network tests does not having floatingip status.
15:27:43 <Swami> regXboi: you are right, you beat me again.
15:27:46 <mlavalle> sc68cal: yeah! you covered that base
15:28:13 <regXboi> Swami: I think we need to file a bug against the nova legacy apis that if they are going to return 201 then internally they have to check status
15:28:14 <Swami> regXboi: mlavalle: so my question is how do we tackle this problem.
15:28:21 <regXboi> ^^^^
15:28:31 <haleyb> one thing that I'm guilty of myself is always doing a 'recheck' when that job failed, so we were letting possible race conditions survive.  I've seen some interesting errors this past week that I'm filing bugs for
15:28:54 <carl_baldwin> regXboi and haleyb have gratiously agreed to devote some of their time to this.  I’d like for them to lead the effort to root cause this.
15:28:57 <Swami> haleyb: thanks
15:29:13 <regXboi> I talked to mtreinish about this very issue during the sprint and his suggestion was to file the bug on nova
15:29:17 <Swami> carl_baldwin: I will also help them to root cause the issue.
15:29:35 <regXboi> because we don't have to change the API behavior, we need to change what it does before the API behavior
15:29:39 <carl_baldwin> Swami: Much appreciated, thanks.
15:30:23 <haleyb> Swami: sometimes it's ovs, or some other thing like "no router namespace", either way investigating more is a good thing
15:30:25 <Swami> regXboi: haleyb: I can sync up with you guys and keep me in the loop.
15:30:25 <carl_baldwin> I’ll leave it to regXboi and haleyb to decide what the best way to coordinate and communicate is.
15:31:04 <Swami> haleyb: yes I agree.
15:31:40 <Swami> carl_baldwin: so what was the decision taken in the qa meeting.
15:32:02 <Swami> carl_baldwin: should dvr completely line up with CVR before we re-enable the voting.
15:32:45 <regXboi> I would say we should be *much* closer than we are now, but people should be prepared for patches to be -1'd because the n-v dvr job failed
15:33:15 <carl_baldwin> Swami: There will always be instantaneous differences in the failure rates.  But, averaging over a period of recent history should show that they’re about the same.
15:33:24 <Swami> regXboi: haleyb: carl_baldwin: One think we should also keep in mind is look at the non-voting dvr job before pushing in any patch that would save my life.
15:34:02 <Swami> carl_baldwin: got it. thanks.
15:34:46 <haleyb> Swami: i hope you're not putting neutron on a defibrillator :)
15:34:51 <carl_baldwin> We should mention this in the neutron meeting as a reminder to core reviewers to pay attention to oven the non-voting jobs.
15:35:07 <carl_baldwin> s/oven/even/
15:35:10 <regXboi> carl_baldwin: +1
15:35:17 <mlavalle> I think also, that under further notice, we should keep DVR as an explicit agenda topic during this weekly meeting
15:35:41 <mtreinish> carl_baldwin: one option is to make it voting, but on neutron patches only
15:35:49 <mtreinish> but then y'all have to suffer the pain of a 25% fail rate
15:35:49 <haleyb> carl_baldwin: armando did just send out a review about being 'effective', seems like an addition there
15:36:19 <Swami> mlavalle: sure I will try to join the meeting every week. I had to drop of my kids during this time, so I have not joined for a while. But I will make alternate arrangements.
15:36:21 <carl_baldwin> mtreinish: We almost went that way but decided against it.
15:36:24 <carl_baldwin> haleyb: link?
15:36:33 <regXboi> mtreinish: greetings!
15:36:37 <haleyb> https://review.openstack.org/#/c/224419/
15:37:24 <haleyb> i haven't even read it yet myself
15:37:33 <mlavalle> Swami: thanks!
15:38:31 <mtreinish> carl_baldwin: ok sure
15:38:44 <carl_baldwin> mtreinish: Thanks.
15:39:05 <carl_baldwin> I feel like we’re about at the end of this topic, anything else?
15:39:27 <regXboi> carl_baldwin: not this week - there is more, but the prep work isn't done yet and that's on me
15:39:32 <Swami> carl_baldwin: thanks, I don't have anything .
15:40:12 <carl_baldwin> Thanks, all.
15:40:18 <carl_baldwin> #topic BGP dynamic routing
15:40:24 <tidwellr> hi
15:40:30 <carl_baldwin> tidwellr: vikram: hi
15:40:54 <vikram> hi
15:41:17 <vikram> sorry, i don't have much update from my end as I was on vacation..
15:41:33 <vikram> thanks to tidewellr for taking care of things
15:41:41 <vikram> thanks to tidwellr for taking care of things
15:41:58 <tidwellr> good progress in the last week, we've worked out some more kinks Neutron is now peering and advertising routes with other routers
15:42:17 <vikram> Oh My God!!! We did it;)
15:42:32 <carl_baldwin> That sounds like great progress.
15:42:49 <mlavalle> tidwellr: ++
15:43:26 <tidwellr> attention now turns to getting the agent to report status of peering sessions and getting Neutron push the correct routes through the dr_agent
15:44:06 <carl_baldwin> Anything to discuss here?
15:44:32 <tidwellr> don't need any time here
15:44:58 <carl_baldwin> tidwellr: Thanks.
15:45:07 <vikram> great work tidwellr
15:45:11 <carl_baldwin> #topic DNS
15:45:15 <carl_baldwin> mlavalle: hi
15:45:17 <mlavalle> hi again
15:45:56 <mlavalle> progress was a little slower this week, diagnosing bugs and taking a few days off (daughter got engaged in Wasjington DC)
15:46:24 <mlavalle> however, I managed to crank some code in the airplanes, while wife was sleepong
15:46:32 <mlavalle> sleeping^^^^
15:46:50 <carl_baldwin> mlavalle: congrats to daughter!
15:46:52 <mlavalle> I addressed carl_baldwin 's comments in his last review
15:47:14 <mlavalle> he pointed correctly that I was forgetting PTR records
15:47:34 <mlavalle> so I added code for that and also the allembic migration script
15:48:04 <mlavalle> pushed PS8 yesterday, which I am going to debug today in my sandbox
15:48:09 <carl_baldwin> #link https://review.openstack.org/#/c/212213/
15:48:25 <johnbelamaric> mlavalle: congrats. you should have told me you were here in town :)
15:49:00 <mlavalle> also yesterday had a conversation with Kiall (designate PTL) and mestery about adding neutron + designate to the gate
15:49:32 <mlavalle> the decision was to do it experimentally during the rest of Liberty and aim to merge early i M
15:50:13 <mlavalle> expect PS( later today, with the results of my debugging session
15:50:19 <mlavalle> PS9^^^
15:50:41 <mlavalle> and at that point I will ask more reviews, also from the Designate team
15:51:00 <mlavalle> I want them to see the way I am handling the PTR records
15:51:05 <carl_baldwin> mlavalle: +1.  Encouraging reviewers.
15:51:12 <mlavalle> that's it for this week
15:51:26 <carl_baldwin> mlavalle: Thank you.
15:51:45 <carl_baldwin> #topic Open Discussion
15:51:54 <carl_baldwin> Any other topics needing attention this week?
15:53:51 <carl_baldwin> Thanks, all!
15:53:54 <carl_baldwin> #endmeeting