Wednesday, 2019-04-24

*** auk has joined #kata-dev		00:21
*** tmhoang has quit IRC		01:55
*** igordc has joined #kata-dev		02:43
*** igordc has quit IRC		05:49
*** pcaruana has joined #kata-dev		06:24
*** auk has quit IRC		06:25
*** tmhoang has joined #kata-dev		07:47
*** jodh has joined #kata-dev		08:06
*** gwhaley has joined #kata-dev		08:09
kata-irc-bot2	<james.o.hunt> @claire - I think syndicating the weekly emails to the Kata blog is a great idea! :) /cc @eric.ernst.	08:23
*** stackedsax has quit IRC		09:53
*** stackedsax has joined #kata-dev		09:53
brtknr	Hey all, just reporting on some preliminary fio test results.... why does kata perform so well in sequential write case compared to sequential read: https://raw.githubusercontent.com/brtknr/kata-vs-crio/master/aggregate-bw-kata.png	11:38
brtknr	For comparison, this is the raw baremetal performance: https://raw.githubusercontent.com/brtknr/kata-vs-crio/master/aggregate-bw-bare.png	11:39
*** devimc has joined #kata-dev		12:42
kata-irc-bot2	<graham.whaley> Hi @brtknr - wrt why write might be looking much better than read - my best guess is it might be due to what is being cached and where... you might have to show the exact fio settings you used (and there are sooooo many ;) ).	12:52
kata-irc-bot2	<graham.whaley> writes could be getting cached in the VM (and not flushed out or sync'd to the host), whereas reads have to come from the host (unless the same item is being read again, and happens to have been cached in the guest)	12:52
kata-irc-bot2	<graham.whaley> For instance, in our fio metrics tests, we default to using fio direct=true mode to try and avoid such effects	12:53
kata-irc-bot2	<graham.whaley> https://github.com/kata-containers/tests/blob/master/metrics/storage/fio.sh#L54	12:53
kata-irc-bot2	<graham.whaley> as we really are interested in measuring the transport mechanism, and not any guest cache effects. Really depends on exactly what you want to measure ;)	12:53
*** irclogbot_3 has quit IRC		12:55
*** irclogbot_2 has joined #kata-dev		12:55
*** altlogbot_1 has quit IRC		12:57
*** altlogbot_0 has joined #kata-dev		12:57
*** fuentess has joined #kata-dev		12:58
brtknr	gwhaley: Thanks for the heads up, I will try again with direct=true :)	13:05
brtknr	gwhaley: Do you know why sequential read is not getting cached in the same way?	13:06
kata-irc-bot2	<graham.whaley> brtknr - reads will only be cached once they are read - so, if you read the same thing twice, maybe the 2nd read will hit the cache. sequential will probably never hit the cache, as you only read each item once. random you have some chance (depending on the random algorithm) of reading some items more than once. but	13:07
kata-irc-bot2	<graham.whaley> this also all depends on how big your test file is, and how big your cache (ram) is... :slightly_smiling_face:	13:08
kata-irc-bot2	<graham.whaley> it is probably worth you reading the top part (the config settings) of that fio test we have already. fio is very 'flexible' though	13:08
brtknr	so write caching acts more like a buffer whereas read caching is more like a short term memory?	13:13
kata-irc-bot2	<graham.whaley> yes, I think you could say that. Of course, reality is a bit more complex...	13:15
kata-irc-bot2	<graham.whaley> in the case of Kata for instance, as it is a VM, you have caches potentially (but not always) both in the VM(guest) and on the host. There are pros and cons to having either both enabled, or only one enbabled. And, I think the setups change depending on what sort of backing store/graph driver is being used (like 9p or devicemapper or virtio-fs etc.). It's moderately complex :slightly_smiling_face:	13:17
kata-irc-bot2	<graham.whaley> I remember now @brtknr - somebody asked about some of this before on an Issue, and I wrote some stuff down - have a look at https://github.com/kata-containers/tests/issues/560 maybe	13:19
brtknr	gwhaley: thats great, some nice explanations there, thank you :)	13:31
brtknr	gwhaley: Also looking at the graph again, why does a single kata VM have a significantly better randwrite IO compared to 64 running in parallel?	14:23
brtknr	Do they have a global lock?	14:24
gwhaley	brtknr: are you running 64 kata's, or 64 fio tests inside a single kata?	14:24
gwhaley	that is - maybe a 64 threaded parallel fio test inside a single kata (to be clear)	14:24
brtknr	gwhaley: I'm running 64 kata's across 2 instances, 32 instances per node	14:24
brtknr	Single fio job per kata	14:25
brtknr	4 threads per job	14:25
brtknr	but I think fio aggregates this result	14:25
gwhaley	right. I've not tried that ;-) So, a single instance is out-performing the total of the 64 instances?	14:25
brtknr	gwhaley: yep, thats the claim	14:25
gwhaley	it could be a latency thing, or a bottleneck somewhere. It's an interesting case/finding	14:26
brtknr	https://raw.githubusercontent.com/brtknr/kata-vs-crio/master/aggregate-bw-kata.png	14:26
gwhaley	does the non-kata case scale more linearly?	14:26
brtknr	gwhaley: it does: https://raw.githubusercontent.com/brtknr/kata-vs-crio/master/aggregate-bw-runc.png	14:26
brtknr	https://raw.githubusercontent.com/brtknr/kata-vs-crio/master/aggregate-bw-bare.png	14:26
brtknr	bare=bare metal, runc=well, runc	14:27
gwhaley	just to throw one thought out then - in the kata case, each instance will have its own cache inside the VM, and then share the cache on the host as well. in the bare metal case there is just the one cache on the host. It might be	14:28
gwhaley	that kata is therefore consuming more RAM (as it has more caches), so the caches are not being as effective individually.	14:28
brtknr	gwhaley: thats an interesting thought	14:28
gwhaley	you may be able to get clues if that is the case by changing the size of either/both the size of the test file, and the amount of RAM each kata instance has	14:29
brtknr	how would you explain runc?	14:30
brtknr	is that also being cached on the host since they share the same kernel?	14:30
gwhaley	brtknr: usign the same kernel, but runc will be using the host cache - so only one cache for all instances. So, one big cache. For kata, as it can have caches inside the VMs (guests), they may be duplicating data that is also held in the host cache, and thus effectively reducing the host cache size (consuming memory that may have been used by the host for cacheing). It is just a thought. It might not actually be the bottleneck or reaso	14:32
gwhaley	n :_)	14:32
brtknr	gwhaley: interesting thought!	14:35
*** igordc has joined #kata-dev		14:42
*** pcaruana has quit IRC		15:06
*** tmhoang has quit IRC		15:13
*** pcaruana has joined #kata-dev		15:46
*** altlogbot_0 has quit IRC		16:09
*** altlogbot_1 has joined #kata-dev		16:11
*** altlogbot_1 has quit IRC		16:43
*** altlogbot_3 has joined #kata-dev		16:43
*** igordc has quit IRC		16:45
*** altlogbot_3 has quit IRC		16:53
*** altlogbot_2 has joined #kata-dev		16:53
*** devimc has quit IRC		16:57
kata-irc-bot2	<eric.ernst> Heads up - sweet post from @salvador.fuentes on our CI/CD, and how we leverage service offerings donates from various cloud providers: Great post from @salvador.fuentes live on the blog now! https://medium.com/kata-containers/kata-containers-testing-and-packaging-powered-by-the-cloud-b752de2ee471	16:57
*** devimc has joined #kata-dev		16:57
brtknr	https://raw.githubusercontent.com/brtknr/kata-vs-bare-vs-runc/master/scenario-cumulative-aggregate-cf-1-clients.png	17:06
brtknr	https://raw.githubusercontent.com/brtknr/kata-vs-bare-vs-runc/master/scenario-cumulative-aggregate-cf-64-clients.png	17:06
brtknr	Here's some funky graphs which compare kata-vs-bare-vs-runc all side by side	17:07
*** igordc has joined #kata-dev		17:16
*** jodh has quit IRC		17:23
*** gwhaley has quit IRC		17:23
*** igordc has quit IRC		20:22
*** igordc has joined #kata-dev		20:23
*** igordc has quit IRC		20:23
*** pcaruana has quit IRC		20:39
*** devimc has quit IRC		21:09
*** igordc has joined #kata-dev		21:53
*** fuentess has quit IRC		21:59
*** davidgiluk has joined #kata-dev		22:36
*** davidgiluk has quit IRC		22:50

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!