15:00:20 <carl_baldwin> #startmeeting neutron_l3 15:00:21 <openstack> Meeting started Thu Mar 13 15:00:20 2014 UTC and is due to finish in 60 minutes. The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:24 <openstack> The meeting name has been set to 'neutron_l3' 15:00:28 <carl_baldwin> #topic Announcements 15:00:37 <carl_baldwin> #link https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam 15:01:07 <carl_baldwin> First, I didn’t plan for daylight savings time when I chose the time for this meeting. 15:01:23 <carl_baldwin> The time shift has caused a bit of a problem for me. 15:01:46 <carl_baldwin> I'd like to suggest a couple of possible meeting times. Both Thursday. 15:02:15 <carl_baldwin> The first is an hour earlier. I know that makes things very early for a few in Western US. 15:02:29 <ajo> For me it's actually better , +1 15:02:48 <carl_baldwin> The second is two hours later which could be difficult for others in other parts of the world. 15:03:21 <safchain> for me both are ok 15:03:28 <safchain> HI all btw 15:03:34 <carl_baldwin> safchain: hi 15:03:41 <ajo> safchain, hi :) 15:04:54 <carl_baldwin> I'll wait a few days for others who might be reading the meeting logs to chime email. Ping me on irc or email with any concerns. I'll announce the meeting time before next week. And, I'll consider the next shift in daylight savings time. ;) 15:05:15 <ajo> sure, thanks carl_baldwin 15:05:19 <carl_baldwin> #topic l3-high-availability 15:05:32 <carl_baldwin> safchain: Anything to report? 15:05:54 <safchain> currently I'm working on the conntrackd integration, 15:06:14 <safchain> The assaf's patch has to be reworked a bit to support multicast 15:07:13 <carl_baldwin> I need to review again. Anything new on the FFE? I fear that it didn't happen. 15:07:18 <safchain> I don't know if all of you have tested patches 15:07:35 <safchain> carl_baldwin, no new for FFE 15:08:22 <ajo> I couldn't test yet safchain, but I will try to allocate some time for it. 15:08:34 <carl_baldwin> Okay. The sub team page has links and information about reviewing and testing but I'll admit I've not yet tested. 15:08:44 <safchain> carl_baldwin, I think this is almost everything for me, just need more feed back with functionnal test 15:09:13 <ajo> safchain, do you have some functional test examples? 15:09:25 <ajo> I could get some people on our team to provide feedback on that. 15:09:27 <carl_baldwin> Okay, I am looking forward to running it. I need a multi host development setup soon anyway. 15:09:56 <carl_baldwin> #link https://docs.google.com/document/d/1P2OnlKAGMeSZTbGENNAKOse6B2TRXJ8keUMVvtUCUSM/edit# 15:10:03 <safchain> ajo, I will add some test use cases on the doc. 15:10:20 <carl_baldwin> ajo: ^ This is the doc. 15:11:07 <ajo> Thanks, I mean, if we have already some kind of initial functional test written for this. I will keep a link to this doc for manual testing. 15:11:37 <safchain> ajo, not yet 15:11:58 <ajo> ok, it's not easy 15:12:08 <safchain> ajo, but tempest test should works with HA enabled 15:12:19 <ajo> ok, that's a good start 15:12:59 <carl_baldwin> safchain: anything else? 15:13:13 <safchain> It's ok for me 15:13:16 <carl_baldwin> #topic neutron-ovs-dvr 15:14:05 <carl_baldwin> Doesn't look like Swami is around. 15:14:30 <carl_baldwin> Swami is still working on detailing changes to L3. 15:14:41 <carl_baldwin> The doc for L2 is up. Could use more review. 15:14:57 <carl_baldwin> #link https://docs.google.com/document/d/1depasJSnGZPOnRLxEC_PYsVLcGVFXZLqP52RFTe21BE/edit#heading=h.5w7clq272tji 15:15:18 <safchain> Sure, I plan to review it by the end of the week 15:15:53 <carl_baldwin> Also looking in to integrating the HA L3 and HA DHCP was discussed. 15:16:38 <carl_baldwin> safchain: great 15:16:46 <Sudhakar_> hi all... 15:17:01 <ajo> hi Sudhakar_ 15:17:03 <carl_baldwin> Sudhakar_: hi 15:17:08 <safchain> carl_baldwin, yes I'll try to ping swami after reviewing the doc 15:17:17 <Sudhakar_> carl_baldwin, is there a doc about HA DHCP? 15:17:19 <safchain> hi Sudhakar_ 15:17:26 <Sudhakar_> hi ajo... 15:17:42 <carl_baldwin> Sudhakar_: I don't think there is a doc yet about it. Only some initial discussion expressing interest in starting that work. 15:17:43 <Sudhakar_> hi carl 15:18:10 <Sudhakar_> Did Swami initiate the discussion? 15:18:22 <carl_baldwin> Sudhakar_: yes 15:18:30 <Sudhakar_> Ok. I have some context then.. 15:18:44 <Sudhakar_> I am Swami's colleague ..based out of India 15:19:12 <Sudhakar_> basically we were thinking of an Agent monitoring service....which can be used to monitor different agents ... 15:19:22 <Sudhakar_> typically useful for L3 and DHCP when we have multiple NNs 15:20:03 <ajo> Sudhakar_, something like rpcdaemon ? 15:20:19 <Sudhakar_> not exactly.. 15:20:41 <Sudhakar_> a thread which can started from plugin itself... 15:20:49 <Sudhakar_> and act based on the agent report_states... 15:21:21 <ajo> Sudhakar_, what kind of actions? 15:22:03 <Sudhakar_> for ex: if a DHCP agent hosting a particular network goes down ....and we have another active DHCP agent in the cloud... 15:22:37 <Sudhakar_> agent monitor detects that this DHCP agent went down and trigger rescheduling the network's DHCP on to the other agent.. 15:22:54 <ajo> A few weeks ago, I was proposing that daemon agents could provide status via status file -> init.d "status", but it could be complementary. 15:23:08 <ajo> aha, it makes sense Sudhakar_ 15:23:21 <Sudhakar_> currently we have agent_down_time configuration which will help us decide on rescheduling... 15:23:28 <carl_baldwin> Sudhakar_: Do you have any document describing this that we could review offline? 15:23:33 <Sudhakar_> we could have another parameter altogether to avoid mixing up.. 15:23:48 <safchain> Sudhakar_, It seems there is something like that for LBaaS 15:23:49 <ajo> yes, a document on those ideas would be interesting, 15:23:57 <Sudhakar_> we are refining the doc... will publish it for review soon.. 15:24:16 <carl_baldwin> Actually, I made a mistake above. I said HA DHCP where I should have said, more precisely, distributed DHCP. 15:24:36 <carl_baldwin> Sudhakar_: Great. 15:24:39 <ajo> aha carl_baldwin , the one based in openflow rules? 15:24:46 <Sudhakar_> Ok ..:) 15:25:12 <Sudhakar_> Distributed DHCP was another thought...but i don't have much idea on that yet... 15:25:22 <carl_baldwin> ajo: open flow rules could play a part but that did not come up explicitly. 15:25:33 <ajo> understood 15:26:01 <carl_baldwin> #topic l3-agent-consolidation 15:26:36 <carl_baldwin> This work is up for review but the bp was pushed out to Icehouse. 15:26:52 <carl_baldwin> yamahata: anything to add? 15:27:10 <yamahata> carl_baldwin: nothing new this week. 15:27:28 <carl_baldwin> #topic bgp-dynamic-routing 15:27:45 <carl_baldwin> #link https://blueprints.launchpad.net/neutron/+spec/bgp-dynamic-routing 15:27:52 <carl_baldwin> #link https://blueprints.launchpad.net/neutron/+spec/neutron-bgp-mpls-vpn 15:28:10 <carl_baldwin> nextone92: are you around? 15:28:59 <carl_baldwin> I spent some time reviewing the bgp-mpls bp this week and made some notes. 15:29:38 <carl_baldwin> It looks like a few key people aren't around this week to discuss. So, I'll try again next week. 15:30:02 <carl_baldwin> #topic DNS lookup of instances 15:30:32 <carl_baldwin> Really quick, I’m almost done writing a blueprint for this. Then, I need get it reviewed internally before I can post it. 15:30:42 <carl_baldwin> I hope to have more to report on this next week. 15:30:51 <ajo> sounds interesting, thanks carl_baldwin 15:30:57 <carl_baldwin> #topic Agent Performance with Wrapper Overhead 15:31:07 <carl_baldwin> #link https://etherpad.openstack.org/p/neutron-agent-exec-performance 15:31:31 <carl_baldwin> This has come up on the ML this week. I have worked on it some so I created this etherpad. 15:32:06 <rossella_> carl_baldwin: nice summary on the etherpad 15:32:18 <ajo> yes, thanks carl_baldwin :) 15:32:39 <carl_baldwin> rossella_: ajo: thanks 15:32:59 <carl_baldwin> So, there are a number of potential ways to tackle the problem. 15:33:36 <carl_baldwin> I'm wondering what could be done for Icehouse. 15:33:57 <nextone92> carl_baldwin - sorry I'm so late to join the meeting 15:34:03 <Sudhakar_> carl_baldwin, thanks for putting up the doc. looking forward on this.. 15:34:06 <ajo> Yes, I had that thought too carl_baldwin 15:34:09 <rossella_> Icehouse is now 15:34:19 <rossella_> we can't do much 15:34:23 <Swami> Carl: Sorry I am late today 15:34:35 <safchain> yes, I will have a look to this etherpad 15:34:37 <carl_baldwin> Swami: nextone92: Hi 15:34:59 <Swami> Carl: hi 15:35:04 <ajo> Yuriy's idea (priviledged agent) doesn't look bad from the point of view of keeping all in python. But looks like it requires more changes into neutron. Too bad to be at the end of the cycle. 15:35:10 <carl_baldwin> rossella_: I fear you are right. There isn't much unless we can find bugs that could be fixed in short order. 15:35:24 <YorikSar> o/ 15:35:31 <YorikSar> I'm that Yuriy. 15:35:40 <ajo> Ho YorikSar ! :) 15:35:43 <ajo> Hi :) 15:35:51 <YorikSar> I don't think it'll become very intrusive 15:35:57 <carl_baldwin> ajo: YorikSar: My thinking is similar. It may be a very good long term solution. 15:36:25 <carl_baldwin> YorikSar: I noticed your additions to the ether pad only this morning so I have not had a chance to review them. 15:36:35 <YorikSar> We basically need to replace execute() calls with smth like rootwrap.client.execute() 15:36:40 <ajo> I'm just worried with, for example, memory consumtion. We must keep all instances tied tight... to avoid leaking "agents" 15:37:08 <YorikSar> ajo: They can kill themselves by timeout. 15:37:27 <YorikSar> Then we won;t leak them. 15:37:35 <ajo> And at client exit 15:37:50 <ajo> May be, for ones running inside a netns: kill by timeout 15:37:51 <YorikSar> ajo: Yeah. Which can end up basically the same. 15:38:03 <ajo> the system-wide ones: kill by client exit + longer timoeut 15:38:51 <ajo> carl_baldwin, do you think this approach could have the potential to be backported to Icehouse if it's tackled from now to the start of Juno? 15:40:11 <YorikSar> ajo: I'm thinking about trying to push this to oslo.rootwrap, actually. So backporting will be minimal, but it'll be another feature. 15:40:24 <carl_baldwin> ajo: I don't think it adds features and it wouldn't change the database. So, I think there might be hope for it. 15:40:45 <ajo> carl_baldwin, do we have a bug filed for this? 15:40:51 <carl_baldwin> ... not a new feature from the user perspective. More of an implementation detail. 15:41:09 <ajo> Yes, we're killing a cpu-eating-bug.... 15:41:31 <YorikSar> carl_baldwin: Oh, yes. Agree. 15:41:49 <carl_baldwin> It is a significant implementation detail though. 15:41:59 <ajo> yes, I agree carl_baldwin 15:42:00 <carl_baldwin> I don't think there is one overarching bug for this. 15:42:22 <carl_baldwin> I have filed detailed bugs for some of the individual problems that I've found and fixed. 15:42:50 <ajo> carl_baldwin, I can fill a bug with the details 15:42:56 <carl_baldwin> ajo: Great. 15:42:58 <ajo> (basically, the start of the latest mail thread) 15:43:08 <ajo> #action fill bug about the rootwrap overhead problem. 15:43:15 <ajo> is it done this way? 15:43:33 <ajo> sorry, I'm almost new to meetings 15:44:02 <haleyb> carl_baldwin: perhaps for icehouse all we can do is continue chipping away at unnecessary calls, and maybe get your priority change in? my $.02 15:44:06 <carl_baldwin> ajo: I think you need to mention your handle after action. But, yes. Everyone should feel free to add their own action items. 15:44:31 <ajo> #action ajo fill bug about the rootwrap overhead problem. 15:44:36 <Swami> is that even possible for icehouse, at this time 15:44:42 <rossella_> haleyb: +1 15:44:56 <YorikSar> I'm going to work on POC for that agent soon, btw. 15:45:10 <carl_baldwin> haleyb: Yes. I'm hoping to wrap up that priority change this week as a bug fix. 15:45:11 <YorikSar> It's going to be interesting stuff to code :) 15:45:39 <ajo> may be, for icehouse, I could try to spend some time in reducing the python subset in the current rootwrap, and get a C++ translation we can use. 15:45:46 <carl_baldwin> Swami: I imagine there is little that can be done for Icehouse. Only bug fixes and I imagine that significant changes will not be accepted. 15:46:04 <Swami> Yes that's my thought as well. 15:46:07 <ajo> (automated one), but I'm unsure about the auditability of such solution. that might require some investigation. 15:46:18 <carl_baldwin> ajo: It might be worth a try. That is something I'm not very familiar with though. 15:47:07 <ajo> carl_baldwin: may be it's not much work <1 week, I could try to allocate the time for that with my manager... 15:47:30 <ajo> I have found speed improvements of >x50 with the C++ translation, but the python subset is rather reduced. 15:48:07 <carl_baldwin> ajo: Remember that we need to reduce start up time and not necessarily execution speed. 15:48:23 <ajo> Yes, that's greatly reduced, let me look for some numbers I had. 15:48:36 <carl_baldwin> ajo: sounds good. 15:48:51 <carl_baldwin> There are updates to "sudo" and "ip" that can help at scale. These fall outside the scope of the Openstack release. 15:49:14 <YorikSar> I wouldn't actually call switching to some subset of Python staying with Python. it'd still be some other language. 15:49:30 <carl_baldwin> Is there any documentation existing in openstack about tuning at the OS level? 15:49:51 <YorikSar> But it might worth it to compare our approaches and probably come up with some benchmark. 15:49:54 <ajo> 1 sec. getting the numbers 15:50:00 <carl_baldwin> If so, I thought we could add some information from the ether pad to that document. If not, it could be created. 15:50:23 <mwagner_lap> carl_baldwin, not sure if there any docs on tuning at the OS level 15:50:31 <ajo> http://fpaste.org/85068/25818139/ 15:50:49 <mwagner_lap> assuming you are talking about the neutron server itself 15:50:52 <ajo> [majopela@redcylon ~]$ time python test.py 15:50:53 <ajo> real 0m0.094s 15:50:58 <ajo> [majopela@redcylon ~]$ time ./test 15:50:58 <ajo> real 0m0.004s 15:51:37 <carl_baldwin> #action carl_baldwin will look for OS level tuning documentation and either augment it or create it. 15:52:20 <ajo> carl_baldwin, there is an "iproute" patch, and a "sudo" patch, could you add them to the etherpad? 15:52:46 <carl_baldwin> FWIW, my efforts at consolidating system calls to run multiple calls under a single wrapper invocation have shown that it is extremely challenging with little reward. 15:53:18 <carl_baldwin> ajo: I believe those patches are referenced from the etherpad. 15:53:28 <ajo> ah, thanks carl_baldwin 15:53:48 <ajo> you're right , [3] and [2] 15:53:57 <carl_baldwin> ajo: Some of them are rather indirect. I'll fix that. 15:54:24 <carl_baldwin> #action carl_baldwin will fix references to patches to make them easier to spot and follow. 15:54:40 <ajo> carl_baldwin, doing it as we talk, :) 15:54:49 <carl_baldwin> ajo: cool, thanks. 15:56:09 <carl_baldwin> So, ajo and YorikSar We'll be looking forward to seeing what you come up with. Keep the ether pad up and we'll collaborate there. 15:56:43 <YorikSar> ok 15:56:50 <carl_baldwin> Anything else? 15:57:12 <carl_baldwin> #topic General Discussion 15:57:48 <ajo> carl_baldwin, 15:58:01 <ajo> I've seen neighbour table overflow messages from kernel, 15:58:09 <ajo> when I start lots of networks, 15:58:13 <ajo> have you seen this before? 15:58:36 <ajo> lots (>100) 15:58:43 <safchain> ajo, which plugin/agent ? 15:58:51 <haleyb> ipv6 error? i think we've seen that too 15:58:54 <ajo> normal neutron-l3-agent 15:58:59 <ajo> with ipv4 15:59:24 <ajo> and openvswitch 15:59:27 <carl_baldwin> I believe that we have seen it but I did not work on that issue directly. So, I cannot offer the solution. 15:59:39 <ajo> It's in my todo list 15:59:54 <ajo> I tried to tune the ARP garbage collection settings on the kernel 16:00:03 <ajo> but, I'm not sure if it's namespace related 16:00:28 <haleyb> ajo: found my notes - yes, found and solution is to increase size - gc_thresh* 16:00:31 <carl_baldwin> I've got a hard stop at the hour. Feel free to continue discussion in the neutron room or here if no one has this room. 16:00:52 <carl_baldwin> Thank you all who came and participated. 16:01:05 <safchain> thx carl_baldwin 16:01:08 <haleyb> ajo: neighbor table is shared between all namespaces 16:01:09 <carl_baldwin> Please review the meetings logs and get back to me about potential time change for this meeting. 16:01:17 <Sudhakar_> thanks carl 16:01:22 <carl_baldwin> Bye! 16:01:23 <carl_baldwin> #endmeeting