15:00:37 <carl_baldwin> #startmeeting neutron_l3 15:00:38 <regXboi> carl_baldwin: I made it :) 15:00:39 <openstack> Meeting started Thu Sep 17 15:00:37 2015 UTC and is due to finish in 60 minutes. The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:42 <openstack> The meeting name has been set to 'neutron_l3' 15:00:45 <mlavalle> ahoy 15:00:49 <carl_baldwin> #topic Announcements 15:01:03 <carl_baldwin> The RC period starts next week. 15:01:25 <carl_baldwin> #link https://wiki.openstack.org/wiki/Liberty_Release_Schedule 15:01:35 <neiljerram> o/ 15:01:57 <carl_baldwin> We’ve got three candidates for PTL 15:02:01 <carl_baldwin> #link https://review.openstack.org/#/q/project:openstack/election+file:%255Ecandidates/mitaka/Neutron/.*,n,z 15:02:33 <Swami> hi 15:02:44 <carl_baldwin> Your vote is important. The last election for Neutron was won by one vote if I recall correctly. ;) 15:02:56 <neiljerram> It's good of those people to stand. 15:03:03 <carl_baldwin> +1 15:03:14 <regXboi> more should feel free :) 15:03:24 <carl_baldwin> Any other announcements? 15:03:45 <carl_baldwin> regXboi: I think nominations are now closed. 15:04:04 <regXboi> carl_baldwin: that was more of a statement for next time 15:04:35 <carl_baldwin> I see 15:04:57 <carl_baldwin> Let’s move on to bugs. 15:05:00 <carl_baldwin> #topic Bugs 15:05:08 <mlavalle> hi 15:05:15 <carl_baldwin> #link https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam#Agenda 15:05:17 <carl_baldwin> mlavalle: hi 15:05:31 <mlavalle> last week we were tracking 2 critical bugs 15:05:39 <carl_baldwin> Looks like they’re both doing well. 15:06:03 <mlavalle> t=yeah, I helped to diagnose one of them and armax proposed a fix over the weekend 15:06:11 <mlavalle> no events since then 15:06:21 <carl_baldwin> Good work. 15:06:31 <mlavalle> the other one had a fix by armax on labor day 15:06:36 <mlavalle> no events since then 15:06:44 <mlavalle> so thank you very much to armax 15:07:15 <mlavalle> I kept 2 high importance bugs in the agenda 15:07:29 <armax> mlavalle: thank you guys for staying on top of things 15:07:37 <carl_baldwin> mlavalle: Any idea where we’re at with HA/DVR? 15:07:45 <mlavalle> https://bugs.launchpad.net/neutron/+bug/1365473 15:07:46 <openstack> Launchpad bug 1365473 in neutron "Unable to create a router that's both HA and distributed" [High,In progress] - Assigned to Adolfo Duarte (adolfo-duarte) 15:08:03 <mlavalle> Adolfo had been actively pushing fixes to this 15:08:17 <mlavalle> seems to be making progress, although it seems difficult to fix 15:08:37 <Swami> carl_baldwin: adolfo mentioned that was rebasing his patch yesterday 15:08:38 <mlavalle> there are 2 fixes, one for the server and one for the agent 15:08:39 <amuller> sadly jschwarz is away on PTO 15:08:51 <amuller> so he won't be able to help there any more for the upcoming week or two 15:09:08 <Swami> carl_baldwin: There was some functional test that broke because of a recent change in upstream and adolfo is working on it. 15:09:13 <mlavalle> last time adolfo pushed code was the end of last week 15:09:40 <mlavalle> so i say let's give adolfo time to keep maming progress 15:09:42 <carl_baldwin> amuller: Any idea where things stood when jshwarz left? 15:09:44 <Swami> mlavalle: last week he was on a training the whole weem. 15:09:52 <amuller> not great... 15:10:11 <amuller> I asked John to create an etherpad for it 15:10:14 <amuller> which is linked from the bug report 15:10:28 <amuller> https://etherpad.openstack.org/p/DVR_HA_Routers 15:11:15 <carl_baldwin> amuller: Thanks for calling attention to that. 15:11:20 <Swami> amuller: thanks for the link, I will communicate to adolfo. 15:12:15 <carl_baldwin> For bug #1494351, pavel has a fix up marked WIP. 15:12:16 <openstack> bug 1494351 in neutron "Observed StaleDataError in gate-neutron-dsvm-api tests if reference IPAM driver is used" [High,In progress] https://launchpad.net/bugs/1494351 - Assigned to Pavel Bondar (pasha117) 15:12:22 <carl_baldwin> #link https://review.openstack.org/#/c/223123/ 15:12:23 <mlavalle> correct 15:12:41 <regXboi> carl_baldwin, mlavalle: I don't have anything to add to the list this week - the backlog is now down to 14 bugs without fix commits for RC1 :) 15:13:01 <mlavalle> pavel_bondar: thanks for that progress! 15:13:05 <regXboi> so I will go through these and pull up ones for next week that could use some review love 15:13:28 <carl_baldwin> regXboi: That is great progress. 15:13:42 <regXboi> carl_baldwin: thanks to all the folks pitching in! 15:13:51 <pavel_bondar> I need to rework current fix to use compare-and-swap logic to make it work with Galera, so it may take time 15:14:13 <carl_baldwin> pavel_bondar: Thanks. 15:14:22 <mlavalle> regXboi: I kept https://bugs.launchpad.net/neutron/+bug/1450982 15:14:22 <openstack> Launchpad bug 1450982 in neutron "DVR: Moving FloatingIP between hosts can lead to Floating Agent Gateway Port not being deleted" [Medium,Fix committed] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:14:29 <mlavalle> in the agenda 15:14:41 <regXboi> mlavalle: I believe all of the patches for that have merged? 15:14:51 <mlavalle> but I think we all the fixes are in, so if that's ok with you, i'll remove it 15:15:06 <Swami> mlavalle: the above bug was fixed and merged. 15:15:18 <regXboi> Swami: thanks, beat me to asking the question :) 15:15:26 <mlavalle> regXboi, Swami: ok, i'll remove it from tracking 15:15:38 <Swami> mlavalle: thanks 15:15:59 <carl_baldwin> Any other bugs to mention? 15:16:16 <Swami> carl_baldwin: there was another bug that I filed yesterday. 15:16:19 <mlavalle> finally there are medium importance bugs that need prunning.... i'll work on it this week 15:16:25 <Swami> carl_baldwin: I have a patch for it 15:16:47 <carl_baldwin> Swami: That’s right, do you have a link for it? 15:16:47 <Swami> #link https://review.openstack.org/#/c/224250/ 15:17:01 <carl_baldwin> Swami: thanks 15:17:28 <mlavalle> Swami: I'll track it for next week 15:17:36 <Swami> mlavalle: thanks 15:17:53 <carl_baldwin> #topic Routed network segments 15:18:33 <carl_baldwin> Work is continuing on this. I will have a spec up by tomorrow. 15:18:56 <neiljerram> Looking forward to that! 15:19:08 <tidwellr> then the party really starts ;0 15:19:33 <carl_baldwin> :) 15:19:43 <carl_baldwin> We should have a lot more to discuss next week. 15:20:03 <carl_baldwin> #topic DVR 15:20:28 <Swami> carl_baldwin: hi 15:20:31 <carl_baldwin> DVR came up in a big way at the QA sprint this week. regXboi, sc68cal, dougwig, and I represented Neutron there. 15:20:42 <sc68cal> :) 15:20:47 <regXboi> :) :) 15:20:49 <mlavalle> nice representation 15:20:53 <regXboi> (oh my) 15:21:00 <mlavalle> we sent the big guns! 15:21:27 <carl_baldwin> What kicked of the discussion was the failure rate of the DVR job. 15:21:30 <carl_baldwin> #link http://goo.gl/j5UkwT 15:21:39 <regXboi> sc68cal - do you have link to the email you sent to the mailing list as well? 15:21:57 <sc68cal> regXboi: sure 15:22:12 <Swami> carl_baldwin: but that was a spike because of bug introduced on a patch. 15:22:25 <sc68cal> http://lists.openstack.org/pipermail/openstack-dev/2015-September/074433.html 15:22:25 <carl_baldwin> #link http://lists.openstack.org/pipermail/openstack-dev/2015-September/074445.html 15:22:37 <carl_baldwin> Swami: It wasn’t the spike in the multi-node job that sparked the discussion. 15:22:58 <Swami> carl_baldwin: The main think that we need to solve in DVR is consistency 15:23:06 <carl_baldwin> It was the gate-tempest-dsvm-neutron-dvr job that sparked the discussion. 15:23:25 <regXboi> there are *lots* of things to solve in DVR 15:23:32 <Swami> carl_baldwin: we have always seen a 10-12% deviation in the DVR job and all those are related to session timed out. 15:23:57 <carl_baldwin> For those of us who have looked at this graph for a long time now, we know that this job consistently doubles the failure rate of the gate-tempest-dsvm-neutron-full job. 15:24:11 <regXboi> carl_baldwin: ack and #sadpanda 15:24:33 <Swami> carl_baldwin: right now the only difference in floating ip is basically we do have two namespaces for the packets to travel through and we need to figure out a way to debug where it is exactly failing. 15:25:12 <carl_baldwin> Swami: Right, and we need to get to the bottom of it. 15:25:17 <mlavalle> Swami: would it be a metter of training people so we have more eyes to debug issues? 15:25:26 <mlavalle> matter^^^ 15:25:40 <Swami> mlavalle: sure I can help anyone if they are interested in poking around. 15:26:18 <mlavalle> Swami: if that helps, we can send an invitation to the ML asking for volunteers.... 15:26:37 <Swami> mlavalle:Some of tempest test does not wait for the floatingip status. 15:27:05 <sc68cal> mlavalle: think my post to the ML may cover the ask for volunteers 15:27:20 <regXboi> Swami: that is because some of the scenario tests use nova apis for creating floating IPs and they *can't* check for status 15:27:27 <Swami> mlavalle: long time back I tried to add a patch to check for status and then do an ssh. But that patch got -1 since people said that some of the nova network tests does not having floatingip status. 15:27:43 <Swami> regXboi: you are right, you beat me again. 15:27:46 <mlavalle> sc68cal: yeah! you covered that base 15:28:13 <regXboi> Swami: I think we need to file a bug against the nova legacy apis that if they are going to return 201 then internally they have to check status 15:28:14 <Swami> regXboi: mlavalle: so my question is how do we tackle this problem. 15:28:21 <regXboi> ^^^^ 15:28:31 <haleyb> one thing that I'm guilty of myself is always doing a 'recheck' when that job failed, so we were letting possible race conditions survive. I've seen some interesting errors this past week that I'm filing bugs for 15:28:54 <carl_baldwin> regXboi and haleyb have gratiously agreed to devote some of their time to this. I’d like for them to lead the effort to root cause this. 15:28:57 <Swami> haleyb: thanks 15:29:13 <regXboi> I talked to mtreinish about this very issue during the sprint and his suggestion was to file the bug on nova 15:29:17 <Swami> carl_baldwin: I will also help them to root cause the issue. 15:29:35 <regXboi> because we don't have to change the API behavior, we need to change what it does before the API behavior 15:29:39 <carl_baldwin> Swami: Much appreciated, thanks. 15:30:23 <haleyb> Swami: sometimes it's ovs, or some other thing like "no router namespace", either way investigating more is a good thing 15:30:25 <Swami> regXboi: haleyb: I can sync up with you guys and keep me in the loop. 15:30:25 <carl_baldwin> I’ll leave it to regXboi and haleyb to decide what the best way to coordinate and communicate is. 15:31:04 <Swami> haleyb: yes I agree. 15:31:40 <Swami> carl_baldwin: so what was the decision taken in the qa meeting. 15:32:02 <Swami> carl_baldwin: should dvr completely line up with CVR before we re-enable the voting. 15:32:45 <regXboi> I would say we should be *much* closer than we are now, but people should be prepared for patches to be -1'd because the n-v dvr job failed 15:33:15 <carl_baldwin> Swami: There will always be instantaneous differences in the failure rates. But, averaging over a period of recent history should show that they’re about the same. 15:33:24 <Swami> regXboi: haleyb: carl_baldwin: One think we should also keep in mind is look at the non-voting dvr job before pushing in any patch that would save my life. 15:34:02 <Swami> carl_baldwin: got it. thanks. 15:34:46 <haleyb> Swami: i hope you're not putting neutron on a defibrillator :) 15:34:51 <carl_baldwin> We should mention this in the neutron meeting as a reminder to core reviewers to pay attention to oven the non-voting jobs. 15:35:07 <carl_baldwin> s/oven/even/ 15:35:10 <regXboi> carl_baldwin: +1 15:35:17 <mlavalle> I think also, that under further notice, we should keep DVR as an explicit agenda topic during this weekly meeting 15:35:41 <mtreinish> carl_baldwin: one option is to make it voting, but on neutron patches only 15:35:49 <mtreinish> but then y'all have to suffer the pain of a 25% fail rate 15:35:49 <haleyb> carl_baldwin: armando did just send out a review about being 'effective', seems like an addition there 15:36:19 <Swami> mlavalle: sure I will try to join the meeting every week. I had to drop of my kids during this time, so I have not joined for a while. But I will make alternate arrangements. 15:36:21 <carl_baldwin> mtreinish: We almost went that way but decided against it. 15:36:24 <carl_baldwin> haleyb: link? 15:36:33 <regXboi> mtreinish: greetings! 15:36:37 <haleyb> https://review.openstack.org/#/c/224419/ 15:37:24 <haleyb> i haven't even read it yet myself 15:37:33 <mlavalle> Swami: thanks! 15:38:31 <mtreinish> carl_baldwin: ok sure 15:38:44 <carl_baldwin> mtreinish: Thanks. 15:39:05 <carl_baldwin> I feel like we’re about at the end of this topic, anything else? 15:39:27 <regXboi> carl_baldwin: not this week - there is more, but the prep work isn't done yet and that's on me 15:39:32 <Swami> carl_baldwin: thanks, I don't have anything . 15:40:12 <carl_baldwin> Thanks, all. 15:40:18 <carl_baldwin> #topic BGP dynamic routing 15:40:24 <tidwellr> hi 15:40:30 <carl_baldwin> tidwellr: vikram: hi 15:40:54 <vikram> hi 15:41:17 <vikram> sorry, i don't have much update from my end as I was on vacation.. 15:41:33 <vikram> thanks to tidewellr for taking care of things 15:41:41 <vikram> thanks to tidwellr for taking care of things 15:41:58 <tidwellr> good progress in the last week, we've worked out some more kinks Neutron is now peering and advertising routes with other routers 15:42:17 <vikram> Oh My God!!! We did it;) 15:42:32 <carl_baldwin> That sounds like great progress. 15:42:49 <mlavalle> tidwellr: ++ 15:43:26 <tidwellr> attention now turns to getting the agent to report status of peering sessions and getting Neutron push the correct routes through the dr_agent 15:44:06 <carl_baldwin> Anything to discuss here? 15:44:32 <tidwellr> don't need any time here 15:44:58 <carl_baldwin> tidwellr: Thanks. 15:45:07 <vikram> great work tidwellr 15:45:11 <carl_baldwin> #topic DNS 15:45:15 <carl_baldwin> mlavalle: hi 15:45:17 <mlavalle> hi again 15:45:56 <mlavalle> progress was a little slower this week, diagnosing bugs and taking a few days off (daughter got engaged in Wasjington DC) 15:46:24 <mlavalle> however, I managed to crank some code in the airplanes, while wife was sleepong 15:46:32 <mlavalle> sleeping^^^^ 15:46:50 <carl_baldwin> mlavalle: congrats to daughter! 15:46:52 <mlavalle> I addressed carl_baldwin 's comments in his last review 15:47:14 <mlavalle> he pointed correctly that I was forgetting PTR records 15:47:34 <mlavalle> so I added code for that and also the allembic migration script 15:48:04 <mlavalle> pushed PS8 yesterday, which I am going to debug today in my sandbox 15:48:09 <carl_baldwin> #link https://review.openstack.org/#/c/212213/ 15:48:25 <johnbelamaric> mlavalle: congrats. you should have told me you were here in town :) 15:49:00 <mlavalle> also yesterday had a conversation with Kiall (designate PTL) and mestery about adding neutron + designate to the gate 15:49:32 <mlavalle> the decision was to do it experimentally during the rest of Liberty and aim to merge early i M 15:50:13 <mlavalle> expect PS( later today, with the results of my debugging session 15:50:19 <mlavalle> PS9^^^ 15:50:41 <mlavalle> and at that point I will ask more reviews, also from the Designate team 15:51:00 <mlavalle> I want them to see the way I am handling the PTR records 15:51:05 <carl_baldwin> mlavalle: +1. Encouraging reviewers. 15:51:12 <mlavalle> that's it for this week 15:51:26 <carl_baldwin> mlavalle: Thank you. 15:51:45 <carl_baldwin> #topic Open Discussion 15:51:54 <carl_baldwin> Any other topics needing attention this week? 15:53:51 <carl_baldwin> Thanks, all! 15:53:54 <carl_baldwin> #endmeeting