#openstack-meeting-3 log

05:31:15 <anil_rao> #startmeeting taas
05:31:17 <openstack> Meeting started Wed Sep  7 05:31:15 2016 UTC and is due to finish in 60 minutes.  The chair is anil_rao. Information about MeetBot at http://wiki.debian.org/MeetBot.
05:31:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
05:31:20 <openstack> The meeting name has been set to 'taas'
05:31:23 <soichi> hi
05:31:26 <kaz> hi
05:31:27 <anil_rao> Hi
05:32:04 <anil_rao> #topic  Performance measurement (progress report)
05:33:10 <soichi> #link: https://wiki.openstack.org/w/images/2/22/Increasing_Source_VMs-20160906.png
05:33:14 <kaz> I uploaded a document about performance measurement last week.
05:33:35 <kaz> I guess that it is the cause that the softirq is unbalanced to one physical cpu.
05:34:28 <kaz> So we tried to balance softirq among cpus.
05:34:58 <anil_rao> kaz: That is interesting to see.
05:35:23 <kaz> It seems that the values of received packets without mirror are increasing.
05:35:31 <kaz> i guess because of softirq balancing.
05:37:10 <anil_rao> Can you please clarify what do you mean by "when source VMs are increased"
05:38:17 <kaz> please see the last weeks slide, page 1
05:38:43 <soichi> #link: https://wiki.openstack.org/w/images/7/74/Increasing_Source_VMs-20160831.png
05:39:24 <kaz> this means that the number of vms are increasing and the number of destination vm and monitor vm are fixed.
05:39:51 <anil_rao> Thanks.
05:40:13 <vnyyad> Kaz: so overall the total flows being monitored is increased
05:40:39 <kaz> yes
05:40:51 <anil_rao> So, if I am reading this right, when mirroring is enabled, we half the throughput of the receiving VM but send at the same rate to the monitor VM.
05:41:57 <kaz> yes, i think so
05:42:00 <vnyyad> anil_rao: looks like
05:44:29 <soichi> last week, i got several valuable comments from anil
05:44:42 <soichi> 1) it is better to measure in case of TCP
05:45:37 <soichi> 2) it is better to use SIPP benchmark, too
05:46:29 <anil_rao> soichi: I am not sure if those cases are better but they would serve to highlight other aspects. :-)
05:46:48 <soichi> okay, i see
05:48:38 <anil_rao> kaz: I did not fully understand the last (2nd) bullet item below the graph in slide #4.
05:49:34 <anil_rao> Compared to the results in last week's graph both cases (w and w/o mirroring) have improved after IRQ balancing.
05:55:22 <anil_rao> Without mirroring, the receiving VM is getting between 200K and 250K pps. With mirroring it gets between 100K and 130K pps, but the monitor VM also receives at the same rate.
05:56:17 <anil_rao> Both the receiving VM and the monitor VM are on the same host, so we are essentially dealing with the same volume  of traffic, split between 2 VMs.
05:56:42 <anil_rao> Shouldn't this be expected?
05:57:30 <vnyyad> +1
05:58:24 <kaz> sorry, i don't know why, yet.
05:59:52 <anil_rao> What IPerf is doing is maxing out the bandwidth (for any given packet size). So once we have reached that point, without miorroring and then turn on mirroring we can expect the performance to bve half.
06:00:25 <soichi> +1
06:00:57 <anil_rao> It would be interesting to drive the receiving VM at say 100K pps without mirroring and then turn on mirroring. In that case we should expect no change in the performance of the receiving VM.
06:01:30 <soichi> i think so, too
06:01:37 <anil_rao> The receiving VM should continue to get 100K pps but the monitor VM should get the same rate too.
06:01:50 <soichi> sure
06:02:15 <kaz> I will try.
06:02:35 <anil_rao> Thanks kaz.
06:02:44 <vnyyad> thanks
06:03:38 <anil_rao> Looking at last week's result I see the same behavior there too. I.e. without IRQ balancing we were still seeing the case where when mirroring was turned on, the receiving VM + monitor VM was getting the same rate as just the receiving VM without mirroring.
06:04:07 <anil_rao> IRQ blancing has definitely helped improve the overall host throughput.
06:04:30 <kaz> yes
06:04:37 <vnyyad> yeah
06:05:12 <anil_rao> These are good results!
06:05:31 <kaz> thank you
06:06:28 <anil_rao> To actually demonstrate the overhead of monitoring, it might be better to not saturate the system's bandwidth limit. I.e. we keep enough room for the extra volume generated from mirroring. This way we should be able to show that mirroring doesn't affect the receiving VM (or at least that is the goal)
06:06:28 <reedip> hi
06:07:00 <anil_rao> reedip: Hi
06:07:05 <soichi> reedip: hi
06:07:13 <reedip> sorry, was late, reading up the logs
06:08:03 <soichi> anil_rao: agree
06:08:14 <kaz> anil_rao: +1
06:09:40 <anil_rao> Here is a proposal for the test.
06:09:54 <anil_rao> Compute highest throughput for the receiving VM.
06:10:17 <anil_rao> Send at less than half that rate to the receiving VM (for multiple source VMs)
06:10:24 <anil_rao> Enable mirroring.
06:10:47 <anil_rao> See the diffrence in the rate at the receiving VM and the monitor VM.
06:11:01 <anil_rao> Expected result: No change to receiving VM. Same rate at monitor VM.
06:11:23 <anil_rao> In reality there might be a little difference and we should report that.
06:12:15 <kaz> OK, i will try that.
06:12:52 <anil_rao> Thanks kaz. I look forward to the results.
06:12:54 <soichi> i guess we can see increse of CPU usage on the host
06:13:21 <soichi> after enable mirroring
06:13:38 <anil_rao> soichi: Yes. That would be nice to measure.
06:13:43 <kaz> soichi: I think so
06:14:00 <anil_rao> If we don't hit 100% we get the true overhead, otherwise the overhead is clipped and we don't get a worthwhile result.
06:14:19 <soichi> +1
06:14:39 <kaz> I agree
06:15:10 <anil_rao> If folks are interested, we can discuss the TaaS bug related to ingress side mirroring.
06:15:24 <kaz> sure
06:15:24 <vnyyad> ani_rao: +1
06:15:28 <soichi> +1
06:15:36 <anil_rao> #topic Open Discussion
06:16:18 <anil_rao> I had send out a mail to the Neutron mailing list with a detailed description of the problem, root-cause and a proposal to move forward.
06:17:10 <anil_rao> In summary, given the way OVS treats VLAN tagged ports on a host, we don't have any options left for solutions completely within the scope of TaaS.
06:18:00 <anil_rao> We will need the core Neutron OVS driver to explicitly tag VLAN ids for packets coming in to br-int from the 'instance' ports.
06:18:15 <soichi> +1
06:18:47 <anil_rao> I am prototying this solution and will report back to the mailing list when I have a working version.
06:18:59 <vnyyad> anil_rao: Can we handle this specific case by not forwarding the mirrored traffic to the br-tap but handle it in br-int
06:19:25 <vnyyad> it will be a crude solution but might work
06:20:03 <anil_rao> vnyyad: We cannot avoid forwarding to br-tap because the mirror destination may be on a different host.
06:20:30 <anil_rao> Here is the basic problem.
06:20:36 <vnyyad> hmmm... yes true realized it...
06:20:46 <anil_rao> OVS does not tag packets flowing within the same host's br-int.
06:21:12 <anil_rao> Neutron specifies that port MACs are unique only within a network.
06:21:39 <anil_rao> This means that it is (technically) possible to two ports on differnet networks but on the same host to have the same MAC.
06:21:50 <vnyyad> yes
06:22:15 <anil_rao> If these two networks belong to different tenants, TaaS would have really broken tenant isolation because we would leak traffic of one tenant to another.
06:22:34 <soichi> yes
06:22:40 <kaz> +1
06:22:49 <vnyyad> yes... a thin chance of happening but nevertheless can happen
06:23:06 <anil_rao> vnyyad: Yes. :-(
06:24:20 <anil_rao> My prototype involves having the Neutron driver explictly add a vlan tag (corresponding to the port) for all packets coming in via that port. After that TaaS works without any modification.
06:24:45 <soichi> +1
06:25:08 <anil_rao> This way out currently solution for broadcast/multicast ingress traffic also works as is.
06:25:18 <anil_rao> out --> our
06:25:34 <vnyyad> any rational why they dont tag, may be its an optimization
06:25:34 <soichi> it soudns good
06:25:45 <vnyyad> but this solution should be good to have
06:26:23 <soichi> i have another topic
06:26:29 <anil_rao> When OVS works in normal mode they operate as a legacy switch and just keep track of ports and tags internally without having to actually tag packets.
06:27:00 <anil_rao> Neutron has also set br-int in legacy (or normal) mode for its typical operation. So everything seemed good until now.
06:27:21 <vnyyad> ok
06:27:29 <anil_rao> We are the first applciation that is trying to detect packets ingressing a VM's vNIC in br-int. However, I am sure there will be others soon.
06:27:43 <vnyyad> for sure
06:28:37 <soichi> anil_rao: would you please submit to the vBrownBag Tech Talks at Barcelona Summit? Submission deadline: Sep. 15th
06:28:38 <anil_rao> Looks like we are running out of time. Any other topics
06:28:38 <soichi> in my understand, speaksers will be anil, kaz, and reedip (3 min. each?)
06:28:53 <anil_rao> soichi: I will do that tomorrow morning.
06:29:03 <soichi> okay, thank you
06:30:32 <anil_rao> We'll continue the discussion next week.
06:30:38 <soichi> bye
06:30:42 <kaz> bye
06:30:43 <anil_rao> #endmeeting