15:01:50 <carl_baldwin> #startmeeting neutron_l3
15:01:50 <johnbelamaric> hello
15:01:51 <openstack> Meeting started Thu Dec  3 15:01:50 2015 UTC and is due to finish in 60 minutes.  The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:53 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:55 <haleyb> hi
15:01:56 <openstack> The meeting name has been set to 'neutron_l3'
15:02:04 <Swami> hi
15:02:05 <carl_baldwin> #topic Announcements
15:02:14 <vikram> hi
15:02:27 * njohnston is auditing this course
15:02:29 <regXboi> moo
15:02:30 <stephen-ma> hi
15:02:50 <carl_baldwin> Mitaka-1 is pretty much now.
15:02:53 <carl_baldwin> #link https://wiki.openstack.org/wiki/Mitaka_Release_Schedule
15:03:26 <carl_baldwin> It just keeps moving along whether we're ready or not.  ;)
15:03:36 <regXboi> ...and there was much rejoicing... (yea!)
15:03:51 <carl_baldwin> Any other announcements?
15:03:54 <john-davidge> morning
15:04:35 <carl_baldwin> #topic Bugs
15:04:41 <carl_baldwin> mlavalle: Do you want to take this one?
15:04:46 <mlavalle> yeah
15:05:07 <mlavalle> Since our last meeting, we closed 3 bugs ++
15:05:23 <mlavalle> We have now 2 alive.
15:05:33 <mlavalle> First one is https://bugs.launchpad.net/neutron/+bug/1478100
15:05:33 <openstack> Launchpad bug 1478100 in neutron "DHCP agent scheduler can schedule dnsmasq to an agent without reachability to the network its supposed to serve" [High,In progress] - Assigned to Cedric Brandily (cbrandily)
15:05:51 <mlavalle> amuller and zzelle are working it
15:06:14 <mlavalle> they have a fix ready for review https://review.openstack.org/#/c/205631
15:06:31 <mlavalle> so I encourage the team to take some time and review it
15:07:16 <carl_baldwin> I was actually just thinking about this yesterday.  There are a lot of things that assume that all hosts have reachability to all networks.  Are there other places where reachability is taken in to account?
15:07:17 <mlavalle> any comments on this one?
15:07:40 <amuller> carl_baldwin: I don't think there's references to bridge_mappings in other places
15:07:44 <amuller> apart from port binding obviously
15:07:58 <amuller> carl_baldwin: mlavalle: Indeed that patch is waiting for reviews now
15:08:03 <amuller> it's pretty mature / far long
15:08:15 <amuller> Eugene has some concern which I've asked him to clarify
15:08:21 <amuller> far along*
15:08:34 <mlavalle> amuller: yeah, but hasn't responded in a bit
15:08:40 <carl_baldwin> amuller: Thanks, I'll take a look.
15:08:49 <amuller> mlavalle: And so I'm asking for additional eyes :)
15:09:02 <carl_baldwin> I'm all for adding this, btw.
15:09:06 <amuller> I had a conversation with Cedric and I'm convinced that the patch makes sense regardless of AZs
15:09:12 <amuller> and I can defend that position
15:09:30 <carl_baldwin> amuller: +1 to that.
15:09:39 <amuller> so to my eyes it's just a matter of reviewing the actual implementation
15:10:00 * amuller will shut up now
15:10:11 <mlavalle> thanks amuller !
15:10:16 <carl_baldwin> amuller: Good work!
15:10:27 <amuller> actually Cedrid did most of the work but ok =D
15:10:31 <amuller> Cedric*
15:10:50 <carl_baldwin> ZZelle:  Good work!
15:11:22 <carl_baldwin> mlavalle: I think we can move on.  :)
15:11:27 <mlavalle> ok, next up is a new one https://bugs.launchpad.net/neutron/+bug/1519926
15:11:27 <openstack> Launchpad bug 1519926 in neutron "L3-agent restart causes VM connectivity loss" [High,Confirmed] - Assigned to Hong Hui Xiao (xiaohhui)
15:11:55 <mlavalle> it has an assignee but not a fix proposed yet, as far as I know
15:12:32 <mlavalle> any comments on this one?
15:13:13 <mlavalle> if not, that's all I have for today's meeting....
15:13:19 <amuller> I don't understand that bug
15:13:33 <amuller> it's stopping the agent, killing the namespace, then starting the agent? What is that supposed to simulate?
15:13:34 <carl_baldwin> stephen-ma: Are you around?
15:14:31 <stephen-ma> yes
15:15:08 <stephen-ma> amuller: it is trying to see whether the L3 agent can restore the states of all the routers after it crashed.
15:15:32 <regXboi> then why do we kill the namespace?
15:15:39 <amuller> stephen-ma: ok, what is the namespace killing supposed to simulate? What happened between the L3 agent crashing and it being respawned?
15:15:43 <carl_baldwin> stephen-ma: Is removing the namespace simulating a node crash?
15:16:12 <stephen-ma> Yes, it is supposed to test what happen if the node crashed.
15:16:30 <amuller> stephen-ma: ok but if you restart the node I don't think the bug will happen
15:16:33 <stephen-ma> I did the same test on kilo and liberty.  The test passed there.
15:16:34 <carl_baldwin> Okay, so more than an agent crash.
15:16:42 <amuller> deleting the namespace is not a good simulation of a node restart
15:16:59 <amuller> (I'm not saying we don't have a bug here, I'm just trying to understand the severity)
15:17:27 <amuller> stephen-ma: any errors in the L3 agent log when it starts?
15:17:42 <carl_baldwin> stephen-ma: Does it happen with a node restart?
15:18:00 <stephen-ma> I didn't see any errors in the L3-agent log.
15:18:12 <stephen-ma> I didn't try by actually restarting the node.
15:18:54 <carl_baldwin> stephen-ma: amuller:  I agree that it is a bug.  I'm thinking of setting it to Medium unless it does happen on node restart.
15:19:43 <carl_baldwin> I set it to High before because I thought it would so maybe I'll keep it there until we know it doesn't happen on node restart.
15:20:15 <amuller> if it happens on restart this would be a critical bug... I *really* doubt that's the case though
15:20:36 <carl_baldwin> stephen-ma: Can we determine whether this is the case?
15:21:07 <stephen-ma> I'll try
15:21:14 <carl_baldwin> stephen-ma: Thanks.
15:21:30 <carl_baldwin> mlavalle: I think we'll take the discussion to the bug report.  We can move on.
15:21:39 <mlavalle> carl_baldwin: done with bugs
15:22:08 <carl_baldwin> mlavalle: Thanks.  I see some without severity on the list.
15:22:26 <carl_baldwin> Should we make a pass through those outside this meeting and discuss any new High ones next week?
15:22:37 <mlavalle> carl_baldwin: ++
15:22:45 <mlavalle> let's do that for next week
15:22:45 <carl_baldwin> mlavalle: Thanks
15:23:31 <carl_baldwin> regXboi: haleyb: Swami: obondarev: Is there anything we need to wrap up from yesterday's defunct DVR meeting?
15:23:52 <regXboi> carl_baldwin: the dvr jobs in the check pipeline are very sick
15:23:57 <Swami> carl_baldwin: The only thing that was a concern was the functional test failures.
15:23:58 <carl_baldwin> There were a couple of rather urgent bugs being discussed yesterday.  I wanted to just check in.
15:24:16 <amuller> haleyb is handling the DVR job failing on metering tests
15:24:20 <regXboi> carl_baldwin: I'm hoping that haleyb's infra patch to get metering back into the list of services will help them be happy
15:24:27 <Swami> carl_baldwin: I could not reproduce the failures in my local setup.
15:24:29 <regXboi> (i.e. what amuller said)
15:24:32 <amuller> the one I'm concerned about is the DVR functional tests failure
15:24:37 <amuller> which I could not reproduce locally either
15:24:41 <haleyb> i created https://review.openstack.org/#/c/252493/ to fix the other issue
15:24:42 <regXboi> the DVR functional tests failure is not consistent
15:24:48 <Swami> carl_baldwin: yes haleyb fix will fix the dvr job in single node.
15:24:54 <amuller> regXboi: that's always the case =p
15:25:08 <regXboi> amuller: not always - sometimes we get lucky :)
15:25:27 <carl_baldwin> amuller: Sometimes they are perfectly consistent.  All failures.  ;)
15:25:45 <regXboi> ^^^^^ is what I mean by "getting lucky"
15:26:25 <carl_baldwin> So, do we think that haleyb 's fix will cure the sickness?  I'm having trouble sensing if that is the case.
15:26:36 <Swami> carl_baldwin: at this point it is not clear if this would have been introduced by a patch in neutron.
15:26:36 <regXboi> carl_baldwin: we are talking about 2 things here
15:26:54 <amuller> The functional job failure rate is higher than can be for a voting job... We have to address this by either skipping these tests or solving the issue but we can't let this situation go on
15:26:56 <Swami> carl_baldwin: haleyb's fix will solve the metering problem and not the functional tests.
15:27:07 <amuller> skipping the tests is not great because this could be a legit bug
15:27:11 <regXboi> amuller is correct :(
15:27:14 <carl_baldwin> I was confused by regXboi 's statement above about making them happy.
15:27:32 <Swami> amuller: ya I agree, we not skip the tests, since this will introduce more bugs.
15:27:43 <regXboi> carl_baldwin: are you clear now?
15:28:17 <haleyb> regXboi: were you able to get the functional test to fail at all yesterday in testing?
15:28:24 <regXboi> haleyb: nope
15:28:47 <regXboi> haleyb: *but* amuller pushed a patch to change the timeout and we saw that it is not always honored
15:28:48 <carl_baldwin> regXboi: If the two are independent issues then I'm clear.
15:28:56 <regXboi> carl_baldwin: I'm assuming they are
15:29:10 <regXboi> carl_baldwin: I've seen no evidence that they aren't independent
15:29:39 <amuller> carl_baldwin: the dvr tempest job is failing because of a job configuration issue (It's not configuring the metering service plugin), a subset of the dvr functional tests are failing for another, unknown reason
15:29:50 <amuller> it's definitely independent
15:29:50 * regXboi wonders did I really just use a double negative? *shudder*
15:30:03 <Swami> amuller: agree
15:30:33 <amuller> there's several weird things with those functional test failures, chief of which is that those tests run for thousands of seconds and then either pass or fail (But this doesn't always happen, sometimes they run fine)
15:30:43 <amuller> we have a per-test timeout of 180 seconds which these tests can somehow just bypass
15:31:01 <amuller> and because those tests take so long (sometimes) the global job times out at 2 hours and we don't get the functional tests logs...
15:31:09 <amuller> which makes this harder since we can't seem to be able to reproduce locally either
15:31:37 <haleyb> and this wasn't blue box related, right?  or some other infra thing
15:31:47 <regXboi> haleyb: nope
15:31:57 <regXboi> haleyb: the errors are showing up at rax and hp
15:32:18 <regXboi> as for "some other infra thing" that might be possible, but very difficult to run down
15:32:51 <regXboi> I'm worried about the fact that tests are able to bypass the timeout
15:33:06 <regXboi> we should nail that down so that we can get reasonable logs on the failure
15:33:09 <carl_baldwin> I tried looking through logstash a bit yesterday.  Sometimes logstash steers me towards a culprit but that wasn't the case for this.
15:33:32 <regXboi> carl_baldwin: yes, that fact makes me suspect infra, but I can't find a smoking gun
15:33:42 <carl_baldwin> Who can take on the timeout issue?
15:35:03 * regXboi listens to crickets
15:35:05 <Swami> amuller was already working on the timeout issue, I thought.
15:35:11 <amuller> I ran in to a dead end
15:35:19 <regXboi> amuller: what was the dead end?
15:35:27 <Swami> amuller: did you get any help from infra on this.
15:35:33 <amuller> Swami: no
15:35:40 <amuller> regXboi: I didn't learn anything =/
15:35:50 <regXboi> amuller: ah
15:36:11 <regXboi> I'll sync up with amuller in channel and see if I can find some time to look at it
15:36:15 <Swami> carl_baldwin: is there anyone from infra who can help us here.
15:37:18 <regXboi> #action regXboi to sync up with amuller and see if he can roll the ball forward on the timeout issue
15:37:24 <regXboi> there - let's move on
15:37:34 <carl_baldwin> Swami: I think I would just ask in the infra room.  Sometimes anteaya can help get things moving for an urgent issue that is stuck.
15:37:47 <Swami> carl_baldwin: thanks
15:38:15 <carl_baldwin> regXboi: amuller:  Thank you.  I should be around today if I can help, let me know.
15:38:15 <regXboi> carl_baldwin, Swami: I'm a known handle over in infra, so they usually answer me if I ask questions
15:38:42 <Swami> regXboi: sounds good.
15:38:54 <carl_baldwin> Okay.  Let's move on.
15:39:01 <carl_baldwin> #topic Routed Networks
15:39:20 <carl_baldwin> I tried to clarify the spec once again.
15:39:35 <carl_baldwin> #link https://review.openstack.org/#/c/225384/
15:39:41 <carl_baldwin> I got a +1.  Merge it!
15:40:48 <carl_baldwin> I tried to clarify the use of bridge mappings a bit based on a discussion with russellb
15:40:50 <regXboi> hahahahahaha
15:41:24 <carl_baldwin> So, Neutron will depend on the plugin for network / host mapping and the ML2 plugin with agents will use bridge mappings to provide the mapping.
15:41:34 <neiljerram> Ah, interesting, I didn't get to that part yet.
15:41:57 <regXboi> so routed networks will depend on the ML2 plugin/agents?
15:42:01 <carl_baldwin> neiljerram: I'm sure I'll need another pass to actually say it clearly.  :)
15:42:12 <carl_baldwin> regXboi: No
15:42:13 <neiljerram> No, that's exactly what carl didn't just say
15:42:31 <carl_baldwin> routed networks will depend on the plugin (whatever that may be) providing a mapping.
15:42:48 <tidwellr1> regXboi: network/host mapping becomes an implementation detail
15:42:58 <regXboi> ok... can we say "a plugin" then? "the" is a bit more definitive than "a" as an article
15:43:06 <neiljerram> But that means it's an implementation point and not something described on the API
15:43:49 <regXboi> sorry for the grammar police, folks :/
15:44:05 <carl_baldwin> Anyway, go tear up the spec for me again, please.  :)
15:44:17 <neiljerram> But I guess that's OK as it only applies to Networks that are being used as part of an IpNetwork.
15:44:22 <carl_baldwin> Seriously though, I don't know what I'd do without good feedback on the spec.  I appreciate it very much.
15:44:37 <neiljerram> Will review again later.
15:44:41 <regXboi> carl_baldwin: it is on my list to do today
15:45:03 <vikram> Still going through ... It's massive ;)
15:45:29 <carl_baldwin> vikram: Thanks for your help with the API / model patch.
15:45:54 <vikram> carl_baldwin: Real game needs to begin now
15:46:00 <carl_baldwin> neiljerram: Thanks for yours too.  You gave some valuable feedback.
15:46:27 <neiljerram> Thanks, I'll try to continue that :-)
15:46:58 <carl_baldwin> Right now, I'm very interested in creating ports on IpNetworks and binding to a Network / IP address later.
15:47:27 <carl_baldwin> I'm also very interested in diving in to the Nova / Neutron interaction.
15:47:52 <carl_baldwin> I'll ping johnthetubaguy for some more Nova attention for this spec.
15:48:08 <neiljerram> I agree, but we need a working DB model in order to play with more code, and I think we need the spec to be closed to done in order to decide what the right working DB model is.
15:48:20 <neiljerram> s/closed/closer
15:48:28 <vikram> neiljerram:++
15:48:50 <neiljerram> So for me iterating on the spec is still the most important thing
15:49:36 <carl_baldwin> neiljerram: I agree that we still need to iterate on the spec.  On the other hand, I think some tinkering with it will give us valuable insights to get the spec right.
15:49:59 <neiljerram> That is definitely true too!  This is a hard change :-)
15:51:56 <carl_baldwin> Later today, I'm going to chat with kevinbenton a bit about the Nova part too.
15:52:29 <carl_baldwin> Well, we're running out of time.  Let's move on more quickly.
15:52:37 <carl_baldwin> #topic Address Scopes
15:52:57 <carl_baldwin> I'm hoping to get more patches merged.  If you're a core, could you take a look at the topic?
15:53:33 <carl_baldwin> #link https://review.openstack.org/#/q/status:open+topic:bp/address-scopes,n,z
15:53:37 <carl_baldwin> #topic BGP
15:53:54 <carl_baldwin> vikram: tidwellr1: ping
15:54:03 <vikram> carl_baldwin: pong
15:54:06 <tidwellr1> hi
15:55:14 <tidwellr1> not ready to have it hit too hard with reviews yet
15:55:22 <tidwellr1> between my wife having a baby and thanksgiving holiday, it's been slow going the last couple weeks
15:55:35 <carl_baldwin> tidwellr1: Congratulations!
15:55:43 <mlavalle> congrats tidwellr1 !!!!
15:55:54 <neiljerram> congratulations!
15:56:08 <neiljerram> Happy sleeping!
15:56:13 <vikram> great new ryan! congratulation!
15:56:19 <vikram> *news
15:56:23 <johnbelamaric> congratulations Ryan!
15:56:27 * carl_baldwin was trying not to spill the beans before tidwellr1. Now he can relax.
15:56:33 <tidwellr1> neiljerram: ;)
15:56:42 <mlavalle> carl_baldwin: LOL
15:57:08 <carl_baldwin> Well, we have 4 minutes.  What shall we do with those?
15:57:12 <regXboi> so, what are the details?
15:57:15 <vikram> carl_baldwin: Just resumed the work this week.. after couple of weeks.. so no progress from my side either
15:57:17 <regXboi> i.e. boy/girl, etc. etc. :)
15:57:21 <tidwellr1> I expect to have some coded in shape to start merging before the end of the year
15:57:35 <tidwellr1> ah yes, we had a little girl
15:57:37 <carl_baldwin> #topic Open Discussion (or just baby news)
15:57:54 <regXboi> congratulations!
15:58:01 <neiljerram> Does she have a Gerrit account yet?
15:58:08 <vikram> tidwellr1: what name you are thinking?
15:58:10 <tidwellr1> 6lbs 11oz, 19.6 in long, thankfully made it to the hospital instead of doing this on the side of throad
15:58:16 <tidwellr1> * the road
15:58:33 <regXboi> always a good thing :)
15:58:37 <tidwellr1> yes!
15:59:05 <mlavalle> carl_baldwin: from DNS side, finished coding all the functionality before Thanksgiving. Got review to implement if full extension. Did that last week and debugging now. Testing will be done today and tomorrow and should be good for reviews on Monday
15:59:08 <tidwellr1> @vikram her name is Isla (think how you pronounce "island")
15:59:21 <vikram> tidwellr1: nice ;)
15:59:36 <mlavalle> carl_baldwin: also NOva spec was merged
15:59:43 <regXboi> tidwellr1: very nice - I'll have to remember that one
15:59:48 <carl_baldwin> mlavalle: Yeah!
16:00:04 <carl_baldwin> #endmeeting