14:00:38 <kevinbenton> #startmeeting networking
14:00:38 <haleyb> hi
14:00:38 <openstack> Meeting started Tue Mar 15 14:00:38 2016 UTC and is due to finish in 60 minutes.  The chair is kevinbenton. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:39 <iwamoto> o/
14:00:39 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:42 <openstack> The meeting name has been set to 'networking'
14:00:43 <rossella_s> hi
14:00:44 <johnsom> o/
14:01:00 <jschwarz> \o/
14:01:04 <kevinbenton> This will be a relatively short meeting, just a few announcements
14:01:16 <kevinbenton> #topic Announcements/Reminders
14:01:18 <njohnston> o/
14:01:24 <mhickey> hello
14:01:41 <kevinbenton> Today the branch for stable/mitaka is going to be cut
14:01:53 <dasm> \o/
14:01:59 <korzen> hello
14:02:07 <salv-orlando> aloha
14:02:27 <kevinbenton> so any fixes that need to go into mitaka after this will need to be back-ported like we would for any other stable branch
14:02:28 <ajo> o/
14:03:03 <ajo> ack
14:03:09 <vhoward> o/
14:03:20 <kevinbenton> I believe armax and ihrachys have narrowed the bugs down so there are no major blockers for RC-1 that we need to worry about
14:03:38 <kevinbenton> does anyone have any bugs that need to be brought to everyone's attention?
14:04:05 <kevinbenton> (that would be a major issue in Mitaka)
14:04:13 <hichihara> https://bugs.launchpad.net/neutron/+bug/1556884
14:04:13 <openstack> Launchpad bug 1556884 in neutron "floating-ip association is allowed via router interface" [Medium,In progress] - Assigned to YAMAMOTO Takashi (yamamoto)
14:04:52 <kevinbenton> hichihara: thanks, i saw this one as well and it looks like we accidentally added a feature :)
14:04:55 <hichihara> I'm not sure that is worth Mitaka. L3 folks should review it.
14:05:27 <kevinbenton> hichihara: it may be worth considering because we don't want to accidentally ship a feature that looks like it works
14:06:13 <hichihara> kevinbenton: I think so
14:06:15 <kevinbenton> hichihara: probably something to propose as a back-port to the mitaka branch before the final release
14:06:23 <reedip__> o/
14:07:09 <kevinbenton> Also, this week is the week to announce PTL candidacies, so if you are interested in being PTL, send an email to the list!
14:07:10 <irenab> kevinbenton: The feature is indeed very useful
14:07:19 <hichihara> kevinbenton: I'm OK.
14:08:17 <hichihara> I haven't seen Neutron candidacy yet.
14:08:39 <kevinbenton> hichihara: i don't think anyone has sent one yet
14:08:50 <amuller> Armando is suspiciously quiet :)
14:09:00 <ihrachys> hichihara: we still have hope Armando will lead the way ;)
14:09:22 <hichihara> ihrachys: Of course! :)
14:09:23 <kevinbenton> #link https://launchpad.net/neutron/+milestone/mitaka-rc1
14:09:43 <salv-orlando> I've heard Cthulhu wants to run as neutron PTL...
14:09:46 <kevinbenton> ^^ that's the stuff targeted for RC1, keep an eye on anything still open in high or critical status
14:10:00 <ajo> salv-orlando, is he friend with zuul?
14:10:27 <salv-orlando> ajo: they might know each other from some past work experience but I don't think they're friends
14:10:40 * njohnston is highly amused
14:10:45 <rossella_s> lol
14:10:46 <kevinbenton> #info hichihara is bug deputy this week!
14:10:57 <hichihara> Yeah.
14:11:07 <rossella_s> good luck hichihara
14:11:17 <hichihara> I have already started
14:11:43 <mhickey> hichihara: ++
14:11:55 <kevinbenton> #link https://github.com/openstack/neutron/blob/master/doc/source/policies/bugs.rst#neutron-bug-deputy
14:12:01 <kevinbenton> ^^ bug deputy info
14:12:25 <kevinbenton> #topic open discussion
14:12:38 <kevinbenton> i don't have anything else. does anyone have anything they would like to discuss?
14:12:50 <iwamoto> there are some restructure-l2-agent related bugs
14:13:02 <iwamoto> bug/1528895 and bug/1430999
14:13:10 <ajo> salv-orlando: lol
14:13:22 <iwamoto> I wonder if we want a quick fix for the coming release
14:13:56 <ajo> hichihara++
14:14:14 <kevinbenton> i don't think so on bug/1430999. we can advise timeout increases for that as a workaround
14:14:36 <iwamoto> the bugs have been there for more than a release and affects only high density environments
14:15:22 <kevinbenton> iwamoto: yes, i don't think we want to put together last minute chunking fixes for these
14:15:54 <rossella_s> I agree with you kevinbenton ...
14:16:12 <reedip__> we can target such issues in N-1?
14:16:46 <iwamoto> ok
14:16:53 <kevinbenton> yes, we need to clearly identify the bottlenecks anyway
14:17:01 <iwamoto> I think reverting change is better than increasing timeouts
14:17:05 <rossella_s> we need a more general approach to fix those issues...I think Kevin is working on that, right Kevinbenton?
14:17:09 <iwamoto> the new RPCs don't scale
14:17:11 <kevinbenton> rossella_s: yes
14:17:21 <kevinbenton> iwamoto: wait, revert what?
14:17:39 <iwamoto> batched agent RPC calls
14:18:32 <iwamoto> or impose some limit on numbers of ports one RPC can send
14:18:49 <kevinbenton> iwamoto: didn't that ship in liberty?
14:18:50 <ajo> iwamoto, I think the former is better
14:19:16 <ajo> iwamoto,  we could have a parameter for agents (max bulk ports objects call) or something like that
14:19:21 <ajo> objects per call
14:19:47 <iwamoto> yes liberty has the bug. and at least one person had to increase timeout for workaround , it seems
14:19:50 <kevinbenton> breaking the calls into chunks it basically pointless though
14:20:04 <kevinbenton> when increasing the timeout acheives the same effect
14:20:04 <rossella_s> ajo we need something better than that
14:20:14 <rossella_s> kevinbenton you were working at that right, can't find the patch right now
14:20:23 <kevinbenton> rossella_s: yes, there is a spec
14:20:32 <kevinbenton> https://review.openstack.org/#/c/225995/
14:20:46 <ajo> why is it pointless?
14:20:54 <kevinbenton> ajo: what does it acheive?
14:21:07 <kevinbenton> ajo: the agent still sits there and waits for all of the calls to return
14:21:27 <ajo> kevinbenton: smaller subsets that need to be completed
14:21:30 <kevinbenton> ajo: so waiting for 50 smaller calls to return instead of 1 big one doesn't improve anything on the agent size
14:21:38 <iwamoto> IMO agent should  not issue such a gigantic RPC call
14:21:41 <rossella_s> that could be backported when ready to fix these issues
14:21:41 <rossella_s> thanks kevinbenton
14:21:45 <ajo> the agent waits, but if some of the calls timeout the succeeded ones don't need to be retried
14:21:56 <amuller> why don't we increase the timeout from 1 min? I've made that suggestion before
14:22:12 <amuller> We know 1 min is too low, we know the value is arbitrary anyway
14:22:15 <amuller> Let's bump it up
14:22:17 <kevinbenton> amuller: +1
14:22:23 <ajo> we can make the timeout dynamic
14:22:37 <ajo> a factor per number of bulk call objects
14:23:09 <ajo> if we can do that in oslo.messaging (not sure if we can dynamically control that per call)
14:23:12 <njohnston> amuller: +1
14:23:30 <ajo> but +1 to just bumping it a bit
14:23:53 <rossella_s> we could avoid such gigantic calls when they are not needed and send only a diff to update the l2 agent
14:23:53 <kevinbenton> we are only chunking because for some reason people have been afraid to increase these timeouts
14:24:13 <reedip__> I agree with Ajo of keeping the bump dynamic
14:24:23 <kevinbenton> rossella_s: i think the only time we get the huge ones on startup anyway
14:24:54 <ajo> how's timeout controlled?
14:24:59 <ajo> is there any way to set it per RPC call?
14:25:00 <kevinbenton> a configuration variable
14:25:03 <rossella_s> kevinbenton, also on bulk create that might happen
14:25:12 * ajo digs oslo_messaging
14:25:18 <iwamoto> is timeout a issue in other projects than neutron?
14:25:21 <amuller> ajo: yes it can be per call
14:25:29 <ajo> iwamoto, it is, for example in cinder
14:25:41 <ajo> I know they need to bump it in some deployments
14:25:58 <ajo> amuller, if that's the case, I'd propose controlling it on bulk calls based on the amount of objects
14:26:13 <amuller> we need something now for the Mitaka release
14:26:15 <ajo> and we can have a rpc_bulk_call_timeout_factor
14:26:16 <kevinbenton> there is almost no reason to not have a high timeout
14:26:24 <ajo> amuller, short term: just bump it
14:26:39 <kevinbenton> short timeouts only protect against completely lost messages, that's it
14:26:44 <rossella_s> amuller, bump it ;)
14:26:51 <ajo> bump!
14:26:55 <reedip__> lol
14:27:02 <ajo> :)
14:27:29 <ajo> Kevinbenton, in fact, they don't even stop the server operation,
14:27:37 <kevinbenton> no, they don't
14:27:43 <ajo> so the impact for server load is even worse, as the operation would be retried
14:27:49 <ajo> so yes
14:28:21 <kevinbenton> from the server's perspective, processing a giant call is not much better than processing smaller calls
14:28:22 <iwamoto> does RPC  timeouts serve any positive purpose?
14:28:26 <ajo> it makes sense to raise those timeouts by default, message loss is generally a non expected events, stuff could freeze for 3-4 minutes in such case and it'd be ok
14:28:31 <kevinbenton> iwamoto: detecting lost messages
14:28:56 <ajo> "a non expected events" -> "a non expected event"
14:29:08 <iwamoto> kevinbenton: doesn't tcp and amqp supposed to handle that?
14:29:36 <kevinbenton> iwamoto: a server can die after it takes the call message off the queue
14:29:38 <salv-orlando> ajo: while your claim is questionable about system freeze a timeout should be set in a way that 99% of non-problematic calls typically finish within that time
14:30:03 <salv-orlando> so if a call takes over 5secs 50% of time as 5 sec timeout makes no sense, it must be increased
14:30:05 <iwamoto> so it has some meaning in a active-active setup
14:30:20 <kevinbenton> iwamoto: yes
14:30:31 <ajo> salv-orlando, even 99.9% ? :)
14:30:48 <ajo> failing 1 of 100 non-problematic calls also sounds problematic
14:30:49 <ajo> :)
14:31:07 <salv-orlando> ajo: whatever... nines are not my department
14:31:12 <salv-orlando> ;)
14:31:12 <ajo> :)
14:31:20 <ajo> salv-orlando, what timeout do we have now by default?
14:31:35 <kevinbenton> either 30 or 60 seconds
14:31:43 <kevinbenton> it comes from oslo messaging i think
14:31:50 <salv-orlando> kevinbenton: which we arbitrarily chose, didn't we?
14:31:55 <salv-orlando> right inherited
14:32:03 <salv-orlando> so arbitrary from our perspective
14:32:07 <kevinbenton> yep
14:32:27 <amuller> 60 secs
14:32:31 <salv-orlando> I just think a timeout should be set a realistic value wrt to the call you're making
14:32:42 <ajo> I keep thinking, a certain other number would always be arbitrary...
14:32:52 <ajo> but for now, higher is better
14:33:01 <ajo> from bulk calls perspective
14:33:36 <salv-orlando> ajo: an "educated guess" timeout... not entirely arbitary, come on ;)
14:33:54 <ajo> salv-orlando, yes that's why I say proportional timeouts could be a better approach looking at the long term
14:33:57 <kevinbenton> what about one that grows everytime a timeout exception is encountered?
14:34:06 <kevinbenton> and sleeps in between?!
14:34:10 <amuller> kevinbenton: you mean like the patch you already have up for review? =D
14:34:16 <kevinbenton> yeah, that one :)
14:34:21 <amuller> funny you should mention it
14:34:46 <kevinbenton> https://review.openstack.org/#/c/286405/
14:35:11 <ajo> +1 for exponential backoffs (not only extra timeout)
14:35:11 <kevinbenton> This was all an elaborate setup to get everyone to look at unicode table flipping
14:35:17 <haleyb> there's a lot of red on that one :)
14:35:42 <kevinbenton> Add exponential backoff to RPC client: https://review.openstack.org/#/c/280595/
14:36:31 <ajo> hmm, Kevinbenton++
14:36:51 * ajo reviews again
14:36:55 <iwamoto> what's the point in gradually increasing timeouts?
14:37:01 <ajo> garyk1, about https://review.openstack.org/#/c/286405/
14:37:09 <kevinbenton> iwamoto: to make them larger :)
14:37:10 <iwamoto> we can have the max from the beginning
14:37:12 <ajo> I think it could be beneficial now,
14:37:26 <ajo> and later on that could be addited onto oslo messaging itself
14:37:51 <kevinbenton> iwamoto: the idea is that you want a timeout still to be able to detect lost messages in a reasonably quick time
14:38:45 <kevinbenton> iwamoto: this allows it to be increased for just calls that trigger timeouts if the configuration setting is too low (which will probably be the case for many deployers)
14:39:16 <kevinbenton> Let's discuss on that patch
14:39:25 <kevinbenton> does anyone have anything else, or can we end the meeting?
14:40:04 <reedip__> I agree with kevinbenton, timeout should be large so that small delays are ignored, but no large that it takes an eternity to get the mesage return back. Keeping it dynamic helps in having the timeout in an acceptble range for ifferent systems
14:41:51 <salv-orl_> kevinbenton: yay neutron as a learning system....
14:41:55 <salv-orl_> eventually it will be able to play go
14:42:22 <kevinbenton> salv-orl_: i think we need a neural network to determine timeout values
14:42:33 <salv-orl_> kevinbenton: I think you need some sleep
14:42:44 <reedip__> kevinbenton: how many layers of neural network do u need??? ;)
14:43:14 <ajo> kevinbenton, I added you a nit comment: https://review.openstack.org/#/c/280595/7/neutron/common/rpc.py
14:43:41 * haleyb buys stock in Skynet :)
14:43:50 <amuller> kevinbenton: not interested unless it's running on containers
14:43:55 <ajo> to really make the backoffs exponential too, as literacy suggest
14:44:05 <ajo> I'm not an expert in fact, just a reader
14:44:18 <kevinbenton> ok
14:44:26 <kevinbenton> time for meeting to be over i think :)
14:44:30 <kevinbenton> thanks everyone!
14:44:32 <iwamoto> good night
14:44:34 <kevinbenton> #endmeeting