Friday, 2020-11-13

kata-irc-bot<eric.ernst> :thread: shim size comparison for 2.0 and 1.x00:19
kata-irc-bot<eric.ernst> Was doing some quick testing comparing 2.0 and 1.x for size comparison.00:20
kata-irc-bot<eric.ernst> shim | ps rss | total program size|resident set size|shared pages| text (code) | data/stack |  library | dirty pages | |---|---|----|---|---|-----|-----|-----|----| | 2.x |  33920 | 250093 | 8480 | 4521 |  4176 |  0 | 43064 | 0 | | 1.x | 25252 | 248467 | 6313 |  4393 |  4989 |  0 |  43120 |  0 |00:24
kata-irc-bot<eric.ernst> ```shim | ps rss | total program size|resident set size|shared pages| text (code) | data/stack |  library | dirty pages |  |---|---|----|---|---|-----|-----|-----|----| | 2.x |  33920 | 250093 | 8480 | 4521 |  4176 |  0 | 43064 | 0 | | 1.x | 25252 | 248467 | 6313 |  4393 |  4989 |  0 |  43120 |  0 |```00:25
kata-irc-bot<eric.ernst>00:26
kata-irc-bot<eric.ernst> So, code is less, which I expect based on some of what we removed for store, etc.00:26
kata-irc-bot<eric.ernst> But the memory usage is significantly higher. I would have expected maybe parity, or less, but not an ~8MB increase.00:27
kata-irc-bot<eric.ernst> That's eating into the benefits of the reduced guest/agent00:27
*** bumperSteff has quit IRC00:57
*** bumperSteff has joined #kata-dev00:57
*** sameo has quit IRC00:57
*** auk_ is now known as auk01:46
*** auk_ has joined #kata-dev03:45
*** auk has quit IRC03:46
*** auk_ is now known as auk03:55
kata-irc-bot<eric.ernst> @liubin0329 thanks for adding pprof.  I'm going to take a look now.04:07
kata-irc-bot<bergwolf> yeah, that's totally unexpected05:47
kata-irc-bot<eric.ernst> this is just swapping agent/runtime, rest of it is the same on the system.05:48
kata-irc-bot<eric.ernst> https://gist.github.com/egernst/f3f786d38f92c7a7f20baf581f490e6d05:50
kata-irc-bot<bergwolf> this one? containerd-shim-kata-v2 threads 3 vs. 505:53
kata-irc-bot<liubin0329> Seems related to Prometheus: ```(pprof) top Showing nodes accounting for 2581.10kB, 100% of 2581.10kB total Showing top 10 nodes out of 36       flat  flat%   sum%        cum   cum%   528.17kB 20.46% 20.46%   528.17kB 20.46%  github.com/kata-containers/kata-containers/src/runtime/containerd-shim-v2.glob..func1   516.76kB 20.02% 40.48%  1028.79kB 39.86%  encoding/json.typeFields   512.14kB 19.84% 60.33%  1024.14kB 39.68%05:55
kata-irc-botgithub.com/prometheus/client_golang/prometheus.(*metricMap).getOrCreateMetricWithLabelValues   512.02kB 19.84% 80.16%  1028.79kB 39.86%  encoding/json.newStructEncoder      512kB 19.84%   100%      512kB 19.84%  github.com/prometheus/client_golang/prometheus.makeLabelPairs          0     0%   100%   516.76kB 20.02%  encoding/json.(*Encoder).Encode          0     0%   100%   512.02kB 19.84%  encoding/json.(*decodeState).object          0     0%05:55
kata-irc-bot100%   512.02kB 19.84%  encoding/json.(*decodeState).unmarshal          0     0%   100%   512.02kB 19.84%  encoding/json.(*decodeState).value          0     0%   100%   516.76kB 20.02%  encoding/json.(*encodeState).marshal```05:55
kata-irc-bot<liubin0329> Wait, I'm using FC, let me change to QEMU and check again.05:56
*** auk has quit IRC07:16
*** sgarzare has joined #kata-dev07:51
*** pcaruana has joined #kata-dev07:52
*** jodh has joined #kata-dev07:59
*** sameo has joined #kata-dev08:11
*** fgiudici has joined #kata-dev08:20
*** david-lyle has joined #kata-dev08:47
*** sgarzare_ has joined #kata-dev08:47
*** dklyle has quit IRC08:47
*** sgarzare has quit IRC08:47
*** snir has quit IRC08:49
*** snir has joined #kata-dev08:50
*** david-lyle has quit IRC08:58
*** davidgiluk has joined #kata-dev09:07
*** th0din has joined #kata-dev10:19
*** devimc has joined #kata-dev12:32
*** jodh_ has joined #kata-dev13:38
*** jodh has quit IRC13:40
*** EricAdamsZNC2 has quit IRC13:40
*** EricAdamsZNC has joined #kata-dev13:42
*** sameo has quit IRC14:03
*** devimc has quit IRC14:12
*** devimc has joined #kata-dev14:15
*** crobinso has joined #kata-dev14:35
kata-irc-bot<christophe> Could someone help me understand the failure in http://jenkins.katacontainers.io/job/kata-containers-2.0-fedora-PR/560? The change is https://github.com/kata-containers/kata-containers/pull/1114, and the fedora-crio test fails with ```Failed at 48: chronic sudo -E yum install -y kubelet-"$install_kubernetes_version" kubeadm-"$install_kubernetes_version" kubectl-"$install_kubernetes_version" --disableexcludes=kubernetes Kubernetes not15:05
kata-irc-botinstalled Openshift not installed Disable systemd-journald rate limit Terminated ++ handle_error 69 ++ local exit_code=143 ++ local line_number=69 ++ echo 'Failed at /tmp/jenkins/workspace/kata-containers-2.0-fedora-PR/ci_entry_point.sh +69: .ci/jenkins_job_build.sh "${repo_to_test}"'```15:05
kata-irc-bot<fidencio> Let me take a look.15:08
kata-irc-bot<fidencio> ```Install Kubernetes components Build timed out (after 5 minutes). Marking the build as aborted. Build was aborted Performing Post build task...```15:10
kata-irc-bot<fidencio> It does look like a flake, and it does seem to be safe enough to just restart that specific CI.15:12
kata-irc-bot<fidencio> @christophe:  ^15:12
kata-irc-bot<eric.ernst> @c3d is the workload being constrained?15:16
kata-irc-bot<eric.ernst> Adjusting score seems okay in general, but I’m interested in better understanding the failure here. We don’t support “bestEffort” (unconstrained) very well.15:16
kata-irc-bot<christophe> @eric.ernst Well, there is a problem here that the OOM killer adjustment applies also to children. Is that what you are referring to? But in the original bug, the OOM killer specifically killed the agent.15:18
kata-irc-bot<eric.ernst> Well, before getting into that, I’m also interested just in general on how this can be reproduced. I’m guessing there isn’t a memory limit applied to the containers cgroup in the guest15:19
kata-irc-bot<christophe> Ah, this happened while running a fuzzer in a container with vfio, originally. So I'm not sure how easy it is to reproduce15:20
kata-irc-bot<christophe> I don't think there was a specific constraint.15:20
kata-irc-bot<christophe> I will ask. Unfortunately, it's on a private Bugzilla.15:20
kata-irc-bot<eric.ernst> The workload was a fuzzer? What’s the workload spec, or cmdlone?15:21
kata-irc-bot<eric.ernst> If you don’t constrain a memory greedy workload, bad things happen.15:21
kata-irc-bot<christophe> ```podman --runtime=kata-vfio run --security-opt label=type:container_kvm_t -it --rm --cap-add=CAP_IPC_LOCK --device=/dev/vfio/120 --device=/dev/vfio/vfio  fedora sh  Now inside the container: # git clone https://gitlab.com/cailca/linux-mm # cd linux-mm; make # ./random -x 0-100 -f (which just run some syscalls fuzzing)```15:21
kata-irc-bot<eric.ernst> —memory=(something same)15:21
kata-irc-bot<christophe> So no, no memory constraint15:21
kata-irc-bot<eric.ernst> Maybe we can fail more gracefully .. maybe.15:22
kata-irc-bot<christophe> Can we have that discussion on the issue itself, BTW, so that there is a record?15:22
kata-irc-bot<eric.ernst> Oomadj as well, but... I can see why this Halle a.15:22
*** sameo has joined #kata-dev15:28
*** devimc has quit IRC15:39
*** devimc has joined #kata-dev15:39
*** dklyle has joined #kata-dev15:48
*** pcaruana has quit IRC16:09
*** sgarzare_ has quit IRC17:00
kata-irc-bot<fidencio> @jose.carlos.venegas.m, @salvador.fuentes, hola! :slightly_smiling_face:17:07
kata-irc-bot<fidencio> I'd like to ask what's exactly tested with containerd on 2.x branch, as I'd like to reach the parity of what's tested with containerd and what's tested with CRI-O. In the past few days I was able to enable the `bats` that were skipped on CRI-O, meaning that in that part we're fine.  But what are the other bits that would need some love? :slightly_smiling_face:17:07
kata-irc-bot<fidencio> Adding @wmoschet and @cmeadors to the loop as well.17:07
kata-irc-bot<fidencio> And @fgiudici. :slightly_smiling_face:17:14
kata-irc-bot<jose.carlos.venegas.m> @fidencio Hey17:14
kata-irc-bot<jose.carlos.venegas.m> Let me check if based in CI files I can tell you. Today @salvador.fuentes @gabriela.cervantes.te are offline that are the ones that will know quickly17:15
kata-irc-bot<fidencio> Ah, @jose.carlos.venegas.m, don't need to spend time on this now.17:16
kata-irc-bot<fidencio> I can poke them again next week, no problem at all.17:16
kata-irc-bot<jose.carlos.venegas.m> @fidencio sure no problem they are back next week17:16
kata-irc-bot<fidencio> Thanks!17:17
fgiudiciroot17:18
fgiudiciups :-P17:18
davidgilukPassword:17:34
fidenciohunter217:37
*** jodh_ has quit IRC18:02
fgiudicilol18:27
*** fgiudici has quit IRC18:29
kata-irc-bot<eric.ernst> @bergwolf -- yeah. I see that in the pmap, but when checking /proc/$PID/status, I see # threads listed as 9 for each. The ~8MB is coming from RssAnon (not terribly useful)18:43
*** davidgiluk has quit IRC20:11
*** auk has joined #kata-dev21:19
*** devimc has quit IRC22:03
*** th0din has quit IRC22:21
*** th0din has joined #kata-dev22:24
*** ajin has quit IRC22:37
kata-irc-bot<eric.ernst> So, I still think the shim binary is super large, and I hope we can look into reducing size if feasible, but I think I found some of the issues22:39
kata-irc-bot<eric.ernst> 1. 1.x is a bug in the makefile - buildmode=pie was dropped. Yikes.  This accounts for a ~20% reduction in binary size. Opened https://github.com/kata-containers/runtime/issues/3074 and will fix, which'll bring 1.x to be as large, almost, as 2.0 2. I noticed that on 2.0 we link a couple of libraries now: ``` required from libc.so.6:     0x0d696914 0x00 06 GLIBC_2.4     0x09691974 0x00 04 GLIBC_2.3.4     0x09691a75 0x00 0222:41
kata-irc-botGLIBC_2.2.5```22:41
kata-irc-bot<eric.ernst> @samuel.ortiz - thanks for the soundboard on some of this.22:42
kata-irc-bot<eric.ernst> This shim is run per pod, and is just as critical as the agent and the VMM and the guest kernel for reducing the footprint of Kata. We should monitor this closely, and see what we can do to reduce it (am not saying rewrite in rust!)22:50
kata-irc-bot<eric.ernst> devimc @archana.m.shinde @chen.bo PTAL: https://github.com/kata-containers/runtime/pulls?q=is%3Apr+is%3Aopen+shim-v2++buildmode22:56
*** sameo has quit IRC23:52

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!