15:01:50 #startmeeting neutron_l3 15:01:50 hello 15:01:51 Meeting started Thu Dec 3 15:01:50 2015 UTC and is due to finish in 60 minutes. The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:53 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:55 hi 15:01:56 The meeting name has been set to 'neutron_l3' 15:02:04 hi 15:02:05 #topic Announcements 15:02:14 hi 15:02:27 * njohnston is auditing this course 15:02:29 moo 15:02:30 hi 15:02:50 Mitaka-1 is pretty much now. 15:02:53 #link https://wiki.openstack.org/wiki/Mitaka_Release_Schedule 15:03:26 It just keeps moving along whether we're ready or not. ;) 15:03:36 ...and there was much rejoicing... (yea!) 15:03:51 Any other announcements? 15:03:54 morning 15:04:35 #topic Bugs 15:04:41 mlavalle: Do you want to take this one? 15:04:46 yeah 15:05:07 Since our last meeting, we closed 3 bugs ++ 15:05:23 We have now 2 alive. 15:05:33 First one is https://bugs.launchpad.net/neutron/+bug/1478100 15:05:33 Launchpad bug 1478100 in neutron "DHCP agent scheduler can schedule dnsmasq to an agent without reachability to the network its supposed to serve" [High,In progress] - Assigned to Cedric Brandily (cbrandily) 15:05:51 amuller and zzelle are working it 15:06:14 they have a fix ready for review https://review.openstack.org/#/c/205631 15:06:31 so I encourage the team to take some time and review it 15:07:16 I was actually just thinking about this yesterday. There are a lot of things that assume that all hosts have reachability to all networks. Are there other places where reachability is taken in to account? 15:07:17 any comments on this one? 15:07:40 carl_baldwin: I don't think there's references to bridge_mappings in other places 15:07:44 apart from port binding obviously 15:07:58 carl_baldwin: mlavalle: Indeed that patch is waiting for reviews now 15:08:03 it's pretty mature / far long 15:08:15 Eugene has some concern which I've asked him to clarify 15:08:21 far along* 15:08:34 amuller: yeah, but hasn't responded in a bit 15:08:40 amuller: Thanks, I'll take a look. 15:08:49 mlavalle: And so I'm asking for additional eyes :) 15:09:02 I'm all for adding this, btw. 15:09:06 I had a conversation with Cedric and I'm convinced that the patch makes sense regardless of AZs 15:09:12 and I can defend that position 15:09:30 amuller: +1 to that. 15:09:39 so to my eyes it's just a matter of reviewing the actual implementation 15:10:00 * amuller will shut up now 15:10:11 thanks amuller ! 15:10:16 amuller: Good work! 15:10:27 actually Cedrid did most of the work but ok =D 15:10:31 Cedric* 15:10:50 ZZelle: Good work! 15:11:22 mlavalle: I think we can move on. :) 15:11:27 ok, next up is a new one https://bugs.launchpad.net/neutron/+bug/1519926 15:11:27 Launchpad bug 1519926 in neutron "L3-agent restart causes VM connectivity loss" [High,Confirmed] - Assigned to Hong Hui Xiao (xiaohhui) 15:11:55 it has an assignee but not a fix proposed yet, as far as I know 15:12:32 any comments on this one? 15:13:13 if not, that's all I have for today's meeting.... 15:13:19 I don't understand that bug 15:13:33 it's stopping the agent, killing the namespace, then starting the agent? What is that supposed to simulate? 15:13:34 stephen-ma: Are you around? 15:14:31 yes 15:15:08 amuller: it is trying to see whether the L3 agent can restore the states of all the routers after it crashed. 15:15:32 then why do we kill the namespace? 15:15:39 stephen-ma: ok, what is the namespace killing supposed to simulate? What happened between the L3 agent crashing and it being respawned? 15:15:43 stephen-ma: Is removing the namespace simulating a node crash? 15:16:12 Yes, it is supposed to test what happen if the node crashed. 15:16:30 stephen-ma: ok but if you restart the node I don't think the bug will happen 15:16:33 I did the same test on kilo and liberty. The test passed there. 15:16:34 Okay, so more than an agent crash. 15:16:42 deleting the namespace is not a good simulation of a node restart 15:16:59 (I'm not saying we don't have a bug here, I'm just trying to understand the severity) 15:17:27 stephen-ma: any errors in the L3 agent log when it starts? 15:17:42 stephen-ma: Does it happen with a node restart? 15:18:00 I didn't see any errors in the L3-agent log. 15:18:12 I didn't try by actually restarting the node. 15:18:54 stephen-ma: amuller: I agree that it is a bug. I'm thinking of setting it to Medium unless it does happen on node restart. 15:19:43 I set it to High before because I thought it would so maybe I'll keep it there until we know it doesn't happen on node restart. 15:20:15 if it happens on restart this would be a critical bug... I *really* doubt that's the case though 15:20:36 stephen-ma: Can we determine whether this is the case? 15:21:07 I'll try 15:21:14 stephen-ma: Thanks. 15:21:30 mlavalle: I think we'll take the discussion to the bug report. We can move on. 15:21:39 carl_baldwin: done with bugs 15:22:08 mlavalle: Thanks. I see some without severity on the list. 15:22:26 Should we make a pass through those outside this meeting and discuss any new High ones next week? 15:22:37 carl_baldwin: ++ 15:22:45 let's do that for next week 15:22:45 mlavalle: Thanks 15:23:31 regXboi: haleyb: Swami: obondarev: Is there anything we need to wrap up from yesterday's defunct DVR meeting? 15:23:52 carl_baldwin: the dvr jobs in the check pipeline are very sick 15:23:57 carl_baldwin: The only thing that was a concern was the functional test failures. 15:23:58 There were a couple of rather urgent bugs being discussed yesterday. I wanted to just check in. 15:24:16 haleyb is handling the DVR job failing on metering tests 15:24:20 carl_baldwin: I'm hoping that haleyb's infra patch to get metering back into the list of services will help them be happy 15:24:27 carl_baldwin: I could not reproduce the failures in my local setup. 15:24:29 (i.e. what amuller said) 15:24:32 the one I'm concerned about is the DVR functional tests failure 15:24:37 which I could not reproduce locally either 15:24:41 i created https://review.openstack.org/#/c/252493/ to fix the other issue 15:24:42 the DVR functional tests failure is not consistent 15:24:48 carl_baldwin: yes haleyb fix will fix the dvr job in single node. 15:24:54 regXboi: that's always the case =p 15:25:08 amuller: not always - sometimes we get lucky :) 15:25:27 amuller: Sometimes they are perfectly consistent. All failures. ;) 15:25:45 ^^^^^ is what I mean by "getting lucky" 15:26:25 So, do we think that haleyb 's fix will cure the sickness? I'm having trouble sensing if that is the case. 15:26:36 carl_baldwin: at this point it is not clear if this would have been introduced by a patch in neutron. 15:26:36 carl_baldwin: we are talking about 2 things here 15:26:54 The functional job failure rate is higher than can be for a voting job... We have to address this by either skipping these tests or solving the issue but we can't let this situation go on 15:26:56 carl_baldwin: haleyb's fix will solve the metering problem and not the functional tests. 15:27:07 skipping the tests is not great because this could be a legit bug 15:27:11 amuller is correct :( 15:27:14 I was confused by regXboi 's statement above about making them happy. 15:27:32 amuller: ya I agree, we not skip the tests, since this will introduce more bugs. 15:27:43 carl_baldwin: are you clear now? 15:28:17 regXboi: were you able to get the functional test to fail at all yesterday in testing? 15:28:24 haleyb: nope 15:28:47 haleyb: *but* amuller pushed a patch to change the timeout and we saw that it is not always honored 15:28:48 regXboi: If the two are independent issues then I'm clear. 15:28:56 carl_baldwin: I'm assuming they are 15:29:10 carl_baldwin: I've seen no evidence that they aren't independent 15:29:39 carl_baldwin: the dvr tempest job is failing because of a job configuration issue (It's not configuring the metering service plugin), a subset of the dvr functional tests are failing for another, unknown reason 15:29:50 it's definitely independent 15:29:50 * regXboi wonders did I really just use a double negative? *shudder* 15:30:03 amuller: agree 15:30:33 there's several weird things with those functional test failures, chief of which is that those tests run for thousands of seconds and then either pass or fail (But this doesn't always happen, sometimes they run fine) 15:30:43 we have a per-test timeout of 180 seconds which these tests can somehow just bypass 15:31:01 and because those tests take so long (sometimes) the global job times out at 2 hours and we don't get the functional tests logs... 15:31:09 which makes this harder since we can't seem to be able to reproduce locally either 15:31:37 and this wasn't blue box related, right? or some other infra thing 15:31:47 haleyb: nope 15:31:57 haleyb: the errors are showing up at rax and hp 15:32:18 as for "some other infra thing" that might be possible, but very difficult to run down 15:32:51 I'm worried about the fact that tests are able to bypass the timeout 15:33:06 we should nail that down so that we can get reasonable logs on the failure 15:33:09 I tried looking through logstash a bit yesterday. Sometimes logstash steers me towards a culprit but that wasn't the case for this. 15:33:32 carl_baldwin: yes, that fact makes me suspect infra, but I can't find a smoking gun 15:33:42 Who can take on the timeout issue? 15:35:03 * regXboi listens to crickets 15:35:05 amuller was already working on the timeout issue, I thought. 15:35:11 I ran in to a dead end 15:35:19 amuller: what was the dead end? 15:35:27 amuller: did you get any help from infra on this. 15:35:33 Swami: no 15:35:40 regXboi: I didn't learn anything =/ 15:35:50 amuller: ah 15:36:11 I'll sync up with amuller in channel and see if I can find some time to look at it 15:36:15 carl_baldwin: is there anyone from infra who can help us here. 15:37:18 #action regXboi to sync up with amuller and see if he can roll the ball forward on the timeout issue 15:37:24 there - let's move on 15:37:34 Swami: I think I would just ask in the infra room. Sometimes anteaya can help get things moving for an urgent issue that is stuck. 15:37:47 carl_baldwin: thanks 15:38:15 regXboi: amuller: Thank you. I should be around today if I can help, let me know. 15:38:15 carl_baldwin, Swami: I'm a known handle over in infra, so they usually answer me if I ask questions 15:38:42 regXboi: sounds good. 15:38:54 Okay. Let's move on. 15:39:01 #topic Routed Networks 15:39:20 I tried to clarify the spec once again. 15:39:35 #link https://review.openstack.org/#/c/225384/ 15:39:41 I got a +1. Merge it! 15:40:48 I tried to clarify the use of bridge mappings a bit based on a discussion with russellb 15:40:50 hahahahahaha 15:41:24 So, Neutron will depend on the plugin for network / host mapping and the ML2 plugin with agents will use bridge mappings to provide the mapping. 15:41:34 Ah, interesting, I didn't get to that part yet. 15:41:57 so routed networks will depend on the ML2 plugin/agents? 15:42:01 neiljerram: I'm sure I'll need another pass to actually say it clearly. :) 15:42:12 regXboi: No 15:42:13 No, that's exactly what carl didn't just say 15:42:31 routed networks will depend on the plugin (whatever that may be) providing a mapping. 15:42:48 regXboi: network/host mapping becomes an implementation detail 15:42:58 ok... can we say "a plugin" then? "the" is a bit more definitive than "a" as an article 15:43:06 But that means it's an implementation point and not something described on the API 15:43:49 sorry for the grammar police, folks :/ 15:44:05 Anyway, go tear up the spec for me again, please. :) 15:44:17 But I guess that's OK as it only applies to Networks that are being used as part of an IpNetwork. 15:44:22 Seriously though, I don't know what I'd do without good feedback on the spec. I appreciate it very much. 15:44:37 Will review again later. 15:44:41 carl_baldwin: it is on my list to do today 15:45:03 Still going through ... It's massive ;) 15:45:29 vikram: Thanks for your help with the API / model patch. 15:45:54 carl_baldwin: Real game needs to begin now 15:46:00 neiljerram: Thanks for yours too. You gave some valuable feedback. 15:46:27 Thanks, I'll try to continue that :-) 15:46:58 Right now, I'm very interested in creating ports on IpNetworks and binding to a Network / IP address later. 15:47:27 I'm also very interested in diving in to the Nova / Neutron interaction. 15:47:52 I'll ping johnthetubaguy for some more Nova attention for this spec. 15:48:08 I agree, but we need a working DB model in order to play with more code, and I think we need the spec to be closed to done in order to decide what the right working DB model is. 15:48:20 s/closed/closer 15:48:28 neiljerram:++ 15:48:50 So for me iterating on the spec is still the most important thing 15:49:36 neiljerram: I agree that we still need to iterate on the spec. On the other hand, I think some tinkering with it will give us valuable insights to get the spec right. 15:49:59 That is definitely true too! This is a hard change :-) 15:51:56 Later today, I'm going to chat with kevinbenton a bit about the Nova part too. 15:52:29 Well, we're running out of time. Let's move on more quickly. 15:52:37 #topic Address Scopes 15:52:57 I'm hoping to get more patches merged. If you're a core, could you take a look at the topic? 15:53:33 #link https://review.openstack.org/#/q/status:open+topic:bp/address-scopes,n,z 15:53:37 #topic BGP 15:53:54 vikram: tidwellr1: ping 15:54:03 carl_baldwin: pong 15:54:06 hi 15:55:14 not ready to have it hit too hard with reviews yet 15:55:22 between my wife having a baby and thanksgiving holiday, it's been slow going the last couple weeks 15:55:35 tidwellr1: Congratulations! 15:55:43 congrats tidwellr1 !!!! 15:55:54 congratulations! 15:56:08 Happy sleeping! 15:56:13 great new ryan! congratulation! 15:56:19 *news 15:56:23 congratulations Ryan! 15:56:27 * carl_baldwin was trying not to spill the beans before tidwellr1. Now he can relax. 15:56:33 neiljerram: ;) 15:56:42 carl_baldwin: LOL 15:57:08 Well, we have 4 minutes. What shall we do with those? 15:57:12 so, what are the details? 15:57:15 carl_baldwin: Just resumed the work this week.. after couple of weeks.. so no progress from my side either 15:57:17 i.e. boy/girl, etc. etc. :) 15:57:21 I expect to have some coded in shape to start merging before the end of the year 15:57:35 ah yes, we had a little girl 15:57:37 #topic Open Discussion (or just baby news) 15:57:54 congratulations! 15:58:01 Does she have a Gerrit account yet? 15:58:08 tidwellr1: what name you are thinking? 15:58:10 6lbs 11oz, 19.6 in long, thankfully made it to the hospital instead of doing this on the side of throad 15:58:16 * the road 15:58:33 always a good thing :) 15:58:37 yes! 15:59:05 carl_baldwin: from DNS side, finished coding all the functionality before Thanksgiving. Got review to implement if full extension. Did that last week and debugging now. Testing will be done today and tomorrow and should be good for reviews on Monday 15:59:08 @vikram her name is Isla (think how you pronounce "island") 15:59:21 tidwellr1: nice ;) 15:59:36 carl_baldwin: also NOva spec was merged 15:59:43 tidwellr1: very nice - I'll have to remember that one 15:59:48 mlavalle: Yeah! 16:00:04 #endmeeting