15:00:37 #startmeeting neutron_l3 15:00:38 carl_baldwin: I made it :) 15:00:39 Meeting started Thu Sep 17 15:00:37 2015 UTC and is due to finish in 60 minutes. The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:40 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:42 The meeting name has been set to 'neutron_l3' 15:00:45 ahoy 15:00:49 #topic Announcements 15:01:03 The RC period starts next week. 15:01:25 #link https://wiki.openstack.org/wiki/Liberty_Release_Schedule 15:01:35 o/ 15:01:57 We’ve got three candidates for PTL 15:02:01 #link https://review.openstack.org/#/q/project:openstack/election+file:%255Ecandidates/mitaka/Neutron/.*,n,z 15:02:33 hi 15:02:44 Your vote is important. The last election for Neutron was won by one vote if I recall correctly. ;) 15:02:56 It's good of those people to stand. 15:03:03 +1 15:03:14 more should feel free :) 15:03:24 Any other announcements? 15:03:45 regXboi: I think nominations are now closed. 15:04:04 carl_baldwin: that was more of a statement for next time 15:04:35 I see 15:04:57 Let’s move on to bugs. 15:05:00 #topic Bugs 15:05:08 hi 15:05:15 #link https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam#Agenda 15:05:17 mlavalle: hi 15:05:31 last week we were tracking 2 critical bugs 15:05:39 Looks like they’re both doing well. 15:06:03 t=yeah, I helped to diagnose one of them and armax proposed a fix over the weekend 15:06:11 no events since then 15:06:21 Good work. 15:06:31 the other one had a fix by armax on labor day 15:06:36 no events since then 15:06:44 so thank you very much to armax 15:07:15 I kept 2 high importance bugs in the agenda 15:07:29 mlavalle: thank you guys for staying on top of things 15:07:37 mlavalle: Any idea where we’re at with HA/DVR? 15:07:45 https://bugs.launchpad.net/neutron/+bug/1365473 15:07:46 Launchpad bug 1365473 in neutron "Unable to create a router that's both HA and distributed" [High,In progress] - Assigned to Adolfo Duarte (adolfo-duarte) 15:08:03 Adolfo had been actively pushing fixes to this 15:08:17 seems to be making progress, although it seems difficult to fix 15:08:37 carl_baldwin: adolfo mentioned that was rebasing his patch yesterday 15:08:38 there are 2 fixes, one for the server and one for the agent 15:08:39 sadly jschwarz is away on PTO 15:08:51 so he won't be able to help there any more for the upcoming week or two 15:09:08 carl_baldwin: There was some functional test that broke because of a recent change in upstream and adolfo is working on it. 15:09:13 last time adolfo pushed code was the end of last week 15:09:40 so i say let's give adolfo time to keep maming progress 15:09:42 amuller: Any idea where things stood when jshwarz left? 15:09:44 mlavalle: last week he was on a training the whole weem. 15:09:52 not great... 15:10:11 I asked John to create an etherpad for it 15:10:14 which is linked from the bug report 15:10:28 https://etherpad.openstack.org/p/DVR_HA_Routers 15:11:15 amuller: Thanks for calling attention to that. 15:11:20 amuller: thanks for the link, I will communicate to adolfo. 15:12:15 For bug #1494351, pavel has a fix up marked WIP. 15:12:16 bug 1494351 in neutron "Observed StaleDataError in gate-neutron-dsvm-api tests if reference IPAM driver is used" [High,In progress] https://launchpad.net/bugs/1494351 - Assigned to Pavel Bondar (pasha117) 15:12:22 #link https://review.openstack.org/#/c/223123/ 15:12:23 correct 15:12:41 carl_baldwin, mlavalle: I don't have anything to add to the list this week - the backlog is now down to 14 bugs without fix commits for RC1 :) 15:13:01 pavel_bondar: thanks for that progress! 15:13:05 so I will go through these and pull up ones for next week that could use some review love 15:13:28 regXboi: That is great progress. 15:13:42 carl_baldwin: thanks to all the folks pitching in! 15:13:51 I need to rework current fix to use compare-and-swap logic to make it work with Galera, so it may take time 15:14:13 pavel_bondar: Thanks. 15:14:22 regXboi: I kept https://bugs.launchpad.net/neutron/+bug/1450982 15:14:22 Launchpad bug 1450982 in neutron "DVR: Moving FloatingIP between hosts can lead to Floating Agent Gateway Port not being deleted" [Medium,Fix committed] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:14:29 in the agenda 15:14:41 mlavalle: I believe all of the patches for that have merged? 15:14:51 but I think we all the fixes are in, so if that's ok with you, i'll remove it 15:15:06 mlavalle: the above bug was fixed and merged. 15:15:18 Swami: thanks, beat me to asking the question :) 15:15:26 regXboi, Swami: ok, i'll remove it from tracking 15:15:38 mlavalle: thanks 15:15:59 Any other bugs to mention? 15:16:16 carl_baldwin: there was another bug that I filed yesterday. 15:16:19 finally there are medium importance bugs that need prunning.... i'll work on it this week 15:16:25 carl_baldwin: I have a patch for it 15:16:47 Swami: That’s right, do you have a link for it? 15:16:47 #link https://review.openstack.org/#/c/224250/ 15:17:01 Swami: thanks 15:17:28 Swami: I'll track it for next week 15:17:36 mlavalle: thanks 15:17:53 #topic Routed network segments 15:18:33 Work is continuing on this. I will have a spec up by tomorrow. 15:18:56 Looking forward to that! 15:19:08 then the party really starts ;0 15:19:33 :) 15:19:43 We should have a lot more to discuss next week. 15:20:03 #topic DVR 15:20:28 carl_baldwin: hi 15:20:31 DVR came up in a big way at the QA sprint this week. regXboi, sc68cal, dougwig, and I represented Neutron there. 15:20:42 :) 15:20:47 :) :) 15:20:49 nice representation 15:20:53 (oh my) 15:21:00 we sent the big guns! 15:21:27 What kicked of the discussion was the failure rate of the DVR job. 15:21:30 #link http://goo.gl/j5UkwT 15:21:39 sc68cal - do you have link to the email you sent to the mailing list as well? 15:21:57 regXboi: sure 15:22:12 carl_baldwin: but that was a spike because of bug introduced on a patch. 15:22:25 http://lists.openstack.org/pipermail/openstack-dev/2015-September/074433.html 15:22:25 #link http://lists.openstack.org/pipermail/openstack-dev/2015-September/074445.html 15:22:37 Swami: It wasn’t the spike in the multi-node job that sparked the discussion. 15:22:58 carl_baldwin: The main think that we need to solve in DVR is consistency 15:23:06 It was the gate-tempest-dsvm-neutron-dvr job that sparked the discussion. 15:23:25 there are *lots* of things to solve in DVR 15:23:32 carl_baldwin: we have always seen a 10-12% deviation in the DVR job and all those are related to session timed out. 15:23:57 For those of us who have looked at this graph for a long time now, we know that this job consistently doubles the failure rate of the gate-tempest-dsvm-neutron-full job. 15:24:11 carl_baldwin: ack and #sadpanda 15:24:33 carl_baldwin: right now the only difference in floating ip is basically we do have two namespaces for the packets to travel through and we need to figure out a way to debug where it is exactly failing. 15:25:12 Swami: Right, and we need to get to the bottom of it. 15:25:17 Swami: would it be a metter of training people so we have more eyes to debug issues? 15:25:26 matter^^^ 15:25:40 mlavalle: sure I can help anyone if they are interested in poking around. 15:26:18 Swami: if that helps, we can send an invitation to the ML asking for volunteers.... 15:26:37 mlavalle:Some of tempest test does not wait for the floatingip status. 15:27:05 mlavalle: think my post to the ML may cover the ask for volunteers 15:27:20 Swami: that is because some of the scenario tests use nova apis for creating floating IPs and they *can't* check for status 15:27:27 mlavalle: long time back I tried to add a patch to check for status and then do an ssh. But that patch got -1 since people said that some of the nova network tests does not having floatingip status. 15:27:43 regXboi: you are right, you beat me again. 15:27:46 sc68cal: yeah! you covered that base 15:28:13 Swami: I think we need to file a bug against the nova legacy apis that if they are going to return 201 then internally they have to check status 15:28:14 regXboi: mlavalle: so my question is how do we tackle this problem. 15:28:21 ^^^^ 15:28:31 one thing that I'm guilty of myself is always doing a 'recheck' when that job failed, so we were letting possible race conditions survive. I've seen some interesting errors this past week that I'm filing bugs for 15:28:54 regXboi and haleyb have gratiously agreed to devote some of their time to this. I’d like for them to lead the effort to root cause this. 15:28:57 haleyb: thanks 15:29:13 I talked to mtreinish about this very issue during the sprint and his suggestion was to file the bug on nova 15:29:17 carl_baldwin: I will also help them to root cause the issue. 15:29:35 because we don't have to change the API behavior, we need to change what it does before the API behavior 15:29:39 Swami: Much appreciated, thanks. 15:30:23 Swami: sometimes it's ovs, or some other thing like "no router namespace", either way investigating more is a good thing 15:30:25 regXboi: haleyb: I can sync up with you guys and keep me in the loop. 15:30:25 I’ll leave it to regXboi and haleyb to decide what the best way to coordinate and communicate is. 15:31:04 haleyb: yes I agree. 15:31:40 carl_baldwin: so what was the decision taken in the qa meeting. 15:32:02 carl_baldwin: should dvr completely line up with CVR before we re-enable the voting. 15:32:45 I would say we should be *much* closer than we are now, but people should be prepared for patches to be -1'd because the n-v dvr job failed 15:33:15 Swami: There will always be instantaneous differences in the failure rates. But, averaging over a period of recent history should show that they’re about the same. 15:33:24 regXboi: haleyb: carl_baldwin: One think we should also keep in mind is look at the non-voting dvr job before pushing in any patch that would save my life. 15:34:02 carl_baldwin: got it. thanks. 15:34:46 Swami: i hope you're not putting neutron on a defibrillator :) 15:34:51 We should mention this in the neutron meeting as a reminder to core reviewers to pay attention to oven the non-voting jobs. 15:35:07 s/oven/even/ 15:35:10 carl_baldwin: +1 15:35:17 I think also, that under further notice, we should keep DVR as an explicit agenda topic during this weekly meeting 15:35:41 carl_baldwin: one option is to make it voting, but on neutron patches only 15:35:49 but then y'all have to suffer the pain of a 25% fail rate 15:35:49 carl_baldwin: armando did just send out a review about being 'effective', seems like an addition there 15:36:19 mlavalle: sure I will try to join the meeting every week. I had to drop of my kids during this time, so I have not joined for a while. But I will make alternate arrangements. 15:36:21 mtreinish: We almost went that way but decided against it. 15:36:24 haleyb: link? 15:36:33 mtreinish: greetings! 15:36:37 https://review.openstack.org/#/c/224419/ 15:37:24 i haven't even read it yet myself 15:37:33 Swami: thanks! 15:38:31 carl_baldwin: ok sure 15:38:44 mtreinish: Thanks. 15:39:05 I feel like we’re about at the end of this topic, anything else? 15:39:27 carl_baldwin: not this week - there is more, but the prep work isn't done yet and that's on me 15:39:32 carl_baldwin: thanks, I don't have anything . 15:40:12 Thanks, all. 15:40:18 #topic BGP dynamic routing 15:40:24 hi 15:40:30 tidwellr: vikram: hi 15:40:54 hi 15:41:17 sorry, i don't have much update from my end as I was on vacation.. 15:41:33 thanks to tidewellr for taking care of things 15:41:41 thanks to tidwellr for taking care of things 15:41:58 good progress in the last week, we've worked out some more kinks Neutron is now peering and advertising routes with other routers 15:42:17 Oh My God!!! We did it;) 15:42:32 That sounds like great progress. 15:42:49 tidwellr: ++ 15:43:26 attention now turns to getting the agent to report status of peering sessions and getting Neutron push the correct routes through the dr_agent 15:44:06 Anything to discuss here? 15:44:32 don't need any time here 15:44:58 tidwellr: Thanks. 15:45:07 great work tidwellr 15:45:11 #topic DNS 15:45:15 mlavalle: hi 15:45:17 hi again 15:45:56 progress was a little slower this week, diagnosing bugs and taking a few days off (daughter got engaged in Wasjington DC) 15:46:24 however, I managed to crank some code in the airplanes, while wife was sleepong 15:46:32 sleeping^^^^ 15:46:50 mlavalle: congrats to daughter! 15:46:52 I addressed carl_baldwin 's comments in his last review 15:47:14 he pointed correctly that I was forgetting PTR records 15:47:34 so I added code for that and also the allembic migration script 15:48:04 pushed PS8 yesterday, which I am going to debug today in my sandbox 15:48:09 #link https://review.openstack.org/#/c/212213/ 15:48:25 mlavalle: congrats. you should have told me you were here in town :) 15:49:00 also yesterday had a conversation with Kiall (designate PTL) and mestery about adding neutron + designate to the gate 15:49:32 the decision was to do it experimentally during the rest of Liberty and aim to merge early i M 15:50:13 expect PS( later today, with the results of my debugging session 15:50:19 PS9^^^ 15:50:41 and at that point I will ask more reviews, also from the Designate team 15:51:00 I want them to see the way I am handling the PTR records 15:51:05 mlavalle: +1. Encouraging reviewers. 15:51:12 that's it for this week 15:51:26 mlavalle: Thank you. 15:51:45 #topic Open Discussion 15:51:54 Any other topics needing attention this week? 15:53:51 Thanks, all! 15:53:54 #endmeeting