*** sameo has joined #kata-dev | 05:07 | |
*** ailan_ has joined #kata-dev | 05:37 | |
*** pcaruana has joined #kata-dev | 06:31 | |
*** jodh has joined #kata-dev | 06:43 | |
*** sgarzare has joined #kata-dev | 07:23 | |
*** davidgiluk has joined #kata-dev | 08:01 | |
*** sgarzare has quit IRC | 08:08 | |
*** gwhaley has joined #kata-dev | 08:08 | |
*** sgarzare has joined #kata-dev | 08:16 | |
*** tmhoang has joined #kata-dev | 08:57 | |
*** ailan_ has quit IRC | 09:21 | |
*** ailan_ has joined #kata-dev | 10:58 | |
*** jodh has quit IRC | 11:58 | |
*** vgoyal has joined #kata-dev | 12:16 | |
*** devimc has joined #kata-dev | 12:35 | |
*** pcaruana has quit IRC | 12:39 | |
*** pcaruana has joined #kata-dev | 12:40 | |
*** hashar has joined #kata-dev | 12:53 | |
kata-irc-bot | <fidencio> Howdy! So, last week talked a bit about possible gating tests we could use for Fedora packages. libpod gating tests seemed like a good start and @wmoschet gave it a try. Now, I'm looking at the results and trying to understand whether the failures are "expected" or not (so, sorry, but I'll end up dropping few questions here Today, being this one the first one). | 12:59 |
---|---|---|
kata-irc-bot | <fidencio> Is kata-runtime supposed to deal with uidmappings? I mean, something like `podman run --rm --uidmap 0:100:1000 fedora mount` ? I'm currently getting the following error: `Error: rpc error: code = Unknown desc = User namespaces enabled, but no user mapping found.: OCI runtime error` Of course, it works as expected when not using kata as the runtime. | 12:59 |
fidencio | devimc: ^ :-) | 13:05 |
fidencio | and buenos días! | 13:06 |
devimc | fidencio: buenos dias | 13:16 |
devimc | fidencio: take a look https://github.com/kata-containers/tests/blob/master/.ci/podman/configuration_podman.yaml | 13:16 |
fidencio | devimc: cool, so it basically means that if it's not in the list, it's not supported and intended to not be supported for now. Do I understand it correctly? | 13:18 |
devimc | fidencio: yup | 13:20 |
fidencio | devimc: cool, that was easy. Is it worth to have an issue open (at least some) of the non-supported cases? | 13:22 |
fidencio | devimc: it could help us to keep track of things that are failing, in general | 13:23 |
devimc | fidencio: yes, it'd great | 13:24 |
devimc | *it'd be | 13:24 |
fidencio | devimc: okay, I'll open the issues against the "test" repo | 13:25 |
fidencio | devimc: thanks for the help! | 13:25 |
devimc | fidencio: yw ;) | 13:25 |
*** hashar has quit IRC | 13:33 | |
kata-irc-bot | <wmoschet> @fidencio is it one of the cases that failed in my tests? | 13:41 |
kata-irc-bot | <wmoschet> Another question: is there such as list of non-supported cases for cri-o as well? | 13:42 |
kata-irc-bot | <fidencio> @wmoschet, yep, that's one of your cases and I'm going through case by case to check what's the reason of the failure | 13:47 |
kata-irc-bot | <fidencio> and then I'll open issues on kata-containers that we can end up pointing to podman itself if something could be improved on their side | 13:47 |
kata-irc-bot | <wmoschet> @fidencio got it. Then for those non-supported cases I can change my script to filter out the tests | 13:47 |
kata-irc-bot | <wmoschet> cool | 13:47 |
kata-irc-bot | <fidencio> Yep, for now we'll just skip the ones failing ... just gimme some time to go through all of them, so we can at least annotate which one we expect to be an easy fix, and whatnot | 13:48 |
kata-irc-bot | Action: fidencio has a meeting in a few minutes, so it'll take some time :slightly_smiling_face: | 13:48 |
kata-irc-bot | <wmoschet> @fidencio sure, I have plenty of other things to do...so no pressure. And I am returning from vacation and still in slow motion mode :slightly_smiling_face: | 13:49 |
kata-irc-bot | <wmoschet> s/vacation/extended weekend/ | 13:50 |
*** pcaruana has quit IRC | 13:52 | |
*** pcaruana has joined #kata-dev | 13:52 | |
*** sbrivio has quit IRC | 14:03 | |
*** st3 has joined #kata-dev | 14:04 | |
*** devimc has quit IRC | 14:38 | |
*** devimc has joined #kata-dev | 14:39 | |
*** hashar has joined #kata-dev | 14:45 | |
*** hashar has quit IRC | 14:50 | |
*** hashar has joined #kata-dev | 15:51 | |
*** gwhaley has quit IRC | 17:02 | |
*** sgarzare has quit IRC | 17:03 | |
*** ailan__ has joined #kata-dev | 18:22 | |
*** Jeffrey4l has quit IRC | 18:23 | |
*** ailan_ has quit IRC | 18:24 | |
*** Jeffrey4l has joined #kata-dev | 18:31 | |
*** dklyle has joined #kata-dev | 18:36 | |
*** Jeffrey4l has quit IRC | 18:36 | |
*** Jeffrey4l has joined #kata-dev | 18:43 | |
fidencio | devimc: don't we control whether we register the handler or not? | 19:00 |
devimc | fidencio: I think so, there is a lit of signals to handle | 19:00 |
fidencio | devimc: so how would be hit that situation? Oo | 19:01 |
devimc | fidencio: give more context about the test | 19:02 |
devimc | what commands it runs? | 19:03 |
devimc | fidencio: does podman use `kata-runtime` to signal the container? | 19:06 |
fidencio | devimc: podman --runtime=/usr/bin/kata-runtime run -d alpine sleep 60; podman --runtime=/usr/bin/kata-runtime stop $container_id; | 19:07 |
devimc | fidencio: or the workload (in our case the kata-shim) is signaled directly? | 19:07 |
fidencio | devimc: that's basically what the test does | 19:07 |
devimc | fidencio: ohh i see | 19:10 |
fidencio | devimc: somehow it triggers a SIGKILL instead of SIGTERM | 19:10 |
devimc | fidencio: https://paste.centos.org/view/54cd00b7 | 19:13 |
devimc | this patch will fix it | 19:14 |
devimc | fidencio: but now the question is why that condition was added? | 19:14 |
fidencio | devimc: yep, I'd dare to say the patch doesn't fix it unless we understand why that was added at the first place | 19:16 |
*** davidgiluk has quit IRC | 19:17 | |
fidencio | devimc: let me open an issue for that and we can keep debugging and keeping trace of what we're doing | 19:17 |
fidencio | devimc: btw, this is *not* high-prio from my side | 19:17 |
devimc | fidencio: oks, thx | 19:18 |
*** st3 is now known as sbrivio | 20:12 | |
fidencio | devimc: https://github.com/kata-containers/tests/issues/2504 does the error look familiar? O:-) | 20:12 |
devimc | fidencio: lol - no again | 20:14 |
fidencio | devimc: at least now we have it in a test | 20:15 |
devimc | fidencio: does it run with selinux on? | 20:15 |
devimc | why it's failing again? | 20:16 |
fidencio | devimc: so, I just checking which are the libpod failures and this one is the 3rd of the 6 we faced ... | 20:16 |
fidencio | devimc: of course, as I'm running it on a default RHEL, it comes with SELinux enable | 20:17 |
fidencio | devimc: now I just deleted /var/lib/{vc,containers} | 20:17 |
fidencio | devimc: set the selinux to disabled, rebooted the machine ... and let's see | 20:17 |
fidencio | devimc: I'm just glad I've faced this before and you've helped me before ... | 20:18 |
sbrivio | hi, i'm getting the dreaded "Failed to check if grpc server is working: context deadline exceeded: OCI runtime error" while running a somewhat "heavy" kernel (say, KASan and lockdep) | 20:19 |
sbrivio | kata-agent is up and running after ~20s (if i boot up the image in qemu stand-alone), i'm running kata-runtime with podman, it gives up after ~17s | 20:20 |
sbrivio | i changed defaultDialTimeout from 15s to 60s in agent, shim, and runtime, rebuilt, still it seems to time out after 15s, does anybody have any pointer? | 20:21 |
sbrivio | (that change was inspired by devimc's https://github.com/jshachm/agent/commit/6bd9b01106f6b8570f48c6cd7b8403dc5a831d30 ) | 20:23 |
devimc | sbrivio: if the container has not been created after 10s, the container manager will kill it | 20:29 |
sbrivio | devimc, i see, do you know where i could change that? | 20:30 |
devimc | sbrivio: I tried, but it was impossible, that timeout is not configurable | 20:30 |
sbrivio | shim/vendor/github.com/docker/docker/container/monitor.go:10:loggerCloseTimeout = 10 * time.Second | 20:31 |
devimc | sbrivio: why your kernel is too "heavy" ? | 20:31 |
sbrivio | this one perhaps? hmm | 20:31 |
fidencio | devimc: debug kernel :-) | 20:31 |
sbrivio | devimc, yeah, essentially, because i have KASan enabled :) lockdep alone would probably not take so long | 20:32 |
devimc | sbrivio: have you tried with the kata cli ? | 20:32 |
sbrivio | oh, wait, that's just for docker, one of the many random results grep -rn "10 \* time" gave me | 20:32 |
sbrivio | devimc, no, i haven't, not sure how to do that, is there some guide? | 20:33 |
devimc | sbrivio: guide? - ppff I have something better.. source code | 20:34 |
devimc | sbrivio: https://github.com/kata-containers/tests/blob/master/functional/vfio/run.sh#L53-L80 | 20:34 |
sbrivio | devimc, oh, that's a language i like! thanks :) | 20:35 |
devimc | sbrivio: yw ;) | 20:35 |
fidencio | okay, 3 out of 6 issues are on github. the rest will be opened Tomorrow. | 20:38 |
fidencio | devimc: as usual, thanks for the help! | 20:38 |
devimc | fidencio: cool! and yw | 20:39 |
fidencio | take care everyone and "siganme los buenos" O:-) | 20:40 |
devimc | haha | 20:40 |
devimc | fidencio: take care man! | 20:40 |
sbrivio | devimc, [while trying to recycle a bundle that kata-runtime can digest] by "container manager" you mean podman or docker? because from my logs that doesn't seem to be the case | 20:51 |
devimc | yes | 20:52 |
devimc | sbrivio: what's the failure? | 20:52 |
sbrivio | the first one i see in syslog from kata-runtime is a: "Stopping Sandbox" | 20:52 |
sbrivio | (pasting logs...) | 20:53 |
devimc | sbrivio: you can use paste,centos.org | 20:54 |
devimc | sbrivio: https://paste.centos.org/ | 20:54 |
sbrivio | devimc, yes yes, i'm used to 0bin, using that as i've seen it used here already :) | 20:55 |
sbrivio | https://paste.centos.org/view/fb809212 sorry, it's horrible | 20:55 |
sbrivio | failure seems to come at line 29 | 20:55 |
*** sameo has quit IRC | 20:57 | |
devimc | sbrivio: are you running this nested ? | 20:58 |
devimc | sbrivio: on azure? | 20:59 |
sbrivio | devimc, nested on local kvm-amd box | 20:59 |
devimc | sbrivio: run `kata-runtime kata-check` | 21:00 |
sbrivio | # kata-runtime kata-check | 21:00 |
sbrivio | System is capable of running Kata Containers | 21:00 |
sbrivio | System can currently create Kata Containers | 21:00 |
devimc | I see | 21:00 |
sbrivio | i mean, it works with the same kernel built without KASan and lockdep (and a few others, perhaps) | 21:00 |
devimc | sbrivio: the VM is started at 22:17:52 and stopped 14 seconds later | 21:02 |
devimc | sbrivio: have you tried without vsocks? | 21:03 |
sbrivio | devimc, 14 to 15 seconds, yes, that's what made me think it was that defaultDialTimeout | 21:03 |
sbrivio | not yet, or i'm not even sure it's configured to use vsocks (sorry, i'm a kernel developer, relatively new to this) -- checking | 21:03 |
devimc | sbrivio: it's using vsocks | 21:04 |
sbrivio | devimc, oh, now that you mention it, i see it in the logs :) | 21:04 |
devimc | sbrivio: take a look to /usr/share/defaults/kata-containers/configuration.toml | 21:04 |
devimc | you can disable it there | 21:05 |
sbrivio | yep, editing now | 21:05 |
devimc | or here /etc/kata-containers/configuration.toml | 21:06 |
sbrivio | yeah, i have a copy in /usr/local/share that i'm using, it's actually disabled there, trying to find out where that might come from instead... | 21:06 |
devimc | the file in /etc takes precedence | 21:07 |
sbrivio | i built the thing with SYSCONFDIR to be sure, anyway, checking it's not enabled anywhere now... | 21:07 |
sbrivio | (the thing == kata-runtime) | 21:07 |
sbrivio | devimc, okay, disabled it for real now :) and it goes further, thanks! | 21:09 |
sbrivio | just hitting this for some reason now: | 21:10 |
sbrivio | DEBU[0021] Starting container 434a74005f29c79a8fd7507a8e9e2a7b3c0a9f9b1d2f9180269e5b1696a2e178 with command [/bin/bash] | 21:10 |
sbrivio | rpc error: code = Unknown desc = path "memory" missing | 21:10 |
sbrivio | ERRO[0032] `/usr/bin/kata-runtime start 434a74005f29c79a8fd7507a8e9e2a7b3c0a9f9b1d2f9180269e5b1696a2e178` failed: exit status 1 | 21:10 |
sbrivio | but i doubt it's related to vsocks | 21:10 |
devimc | yeah, that's a different error | 21:10 |
devimc | at least now `kata-runtime start` is executed | 21:10 |
devimc | sbrivio: do you have enough (>3GB) RAM memory? | 21:11 |
sbrivio | 6GB, yeah | 21:12 |
sbrivio | total used free shared buff/cache available | 21:12 |
sbrivio | Mem: 6795520 1227944 3007304 7852 2560272 5253604 | 21:12 |
devimc | sbrivio: are you using podman? | 21:13 |
sbrivio | yep: | 21:13 |
devimc | `path "memory" missing` doesn't make sense to me | 21:13 |
sbrivio | something like this: podman --runtime /usr/bin/kata-runtime run --log-level=debug --security-opt label=type:container_kvm_t -it fedora | 21:13 |
devimc | selinux on - ouch! | 21:14 |
sbrivio | ha, sorry! :D | 21:14 |
* sbrivio disables and retries | 21:14 | |
sbrivio | nope, it's not that, grepping around now... | 21:15 |
devimc | sbrivio: it should work, | 21:15 |
devimc | sbrivio: but I don't use it | 21:15 |
devimc | sbrivio: I think there are still some bugs - https://github.com/kata-containers/tests/issues/2504#issuecomment-623689071 | 21:16 |
devimc | disable it, rm -rf /var/lib/{vc,containers}, and try again.. | 21:16 |
* sbrivio tries that too | 21:16 | |
sbrivio | devimc, that's another failure now, Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing: OCI runtime error | 21:19 |
sbrivio | but... i have a hint now, before "Stopping Sandbox", i'm getting a message by earlyoom | 21:20 |
sbrivio | May 04 23:18:53 localhost.localdomain earlyoom[1012]: mem avail: 4472 of 6636 MiB (67 %), swap free: 0 of 0 MiB ( 0 %) | 21:20 |
sbrivio | (67%? come on, it's not much...) | 21:20 |
* sbrivio restarts the whole thing with way more memory | 21:20 | |
devimc | sbrivio: no | 21:20 |
devimc | change the default_memory | 21:20 |
devimc | in the configuration file | 21:20 |
sbrivio | also in configuration.toml? | 21:20 |
sbrivio | okay | 21:20 |
devimc | 640 - 1024 should be enough | 21:21 |
devimc | by default is 2048 | 21:21 |
devimc | sbrivio: and don't forget to enable_debug | 21:21 |
sbrivio | devimc, thanks, done, set enable_debug for all the components now, same "path memory missing" thing: | 21:24 |
devimc | sbrivio: that's weird | 21:24 |
* sbrivio fetching logs | 21:24 | |
devimc | sbrivio: how are you building the kernel? | 21:24 |
sbrivio | with make, on the host | 21:25 |
sbrivio | devimc, what do you mean exactly? kernel config? | 21:25 |
devimc | sbrivio: yes | 21:25 |
sbrivio | just a moment | 21:25 |
devimc | sbrivio: seems like you forgot to enable something | 21:25 |
devimc | sbrivio: clone this repo https://github.com/kata-containers/packaging | 21:26 |
sbrivio | devimc, https://paste.centos.org/view/b2cec596 and still, it works without CONFIG_KASAN and other stuff | 21:26 |
devimc | sbrivio: I see 5.7 | 21:27 |
devimc | that too new for us | 21:27 |
devimc | not sure if we support it | 21:27 |
sbrivio | devimc, yeah, it's pretty much latest upstream | 21:27 |
sbrivio | devimc, still, it works without those options | 21:28 |
sbrivio | (i mean, net-next.git kind of upstream) | 21:28 |
sbrivio | devimc, so that could be something that kata-agent doesn't understand? or something unexpected happening in the guest, you mean? | 21:29 |
devimc | sbrivio: yeah - something missing under /sys | 21:29 |
* sbrivio still trying to relate podman logs to syslog | 21:30 | |
devimc | sbrivio: sudo journalctl -b -t kata-proxy | 21:30 |
sbrivio | devimc, oh, so much better, thanks (laugh if you want, but i'm not used to systemd :)) | 21:31 |
*** vgoyal has quit IRC | 21:32 | |
devimc | sbrivio: devuan ? | 21:32 |
sbrivio | devimc, and finally: https://paste.centos.org/view/5878ffb9 | 21:32 |
sbrivio | devimc, debian, at some point devuan, now debian, still not used to it :) | 21:33 |
sbrivio | (i thought, let's switch back to "proper" debian so that i learn... eventually...) | 21:34 |
devimc | sbrivio: "Could not update parent cpuset cgroup (/sys/fs/cgroup/cpuset/libpod/cpuset.cpus) cpuset:'0': open /sys/fs/cgroup/cpuset/libpod/cpuset.cpus: no such file or directory" | 21:34 |
devimc | sbrivio: cool! | 21:34 |
sbrivio | devimc, i see, checking configuration :) | 21:34 |
devimc | I like debian | 21:34 |
devimc | sbrivio: seem like you don't have support for cgroups | 21:34 |
devimc | cpu cgroups | 21:35 |
sbrivio | yeah, that would be totally weird, still, checking :) | 21:35 |
devimc | sbrivio: take a look to this folder https://github.com/kata-containers/packaging/tree/master/kernel/configs/fragments | 21:36 |
devimc | it contains all the CONFIGs that kata needs | 21:36 |
sbrivio | devimc, useful, thanks. how do you "source" those? | 21:36 |
devimc | might be you missed one | 21:36 |
devimc | sbrivio: scripts/kconfig/merge_config.sh | 21:38 |
devimc | it's a kernel tool | 21:38 |
sbrivio | devimc, ah, that :) yep, sure | 21:38 |
sbrivio | CONFIG_CGROUP_CPUACCT=y can't be, really... | 21:38 |
sbrivio | also CPUSET set | 21:38 |
devimc | sbrivio: don;t forget all *_CGROUP_* | 21:39 |
sbrivio | devimc, it's all there -- let me check what happens if i boot the guest in a similar way... | 21:40 |
sbrivio | (with just qemu, as i would get with kata-runtime) | 21:40 |
devimc | sbrivio: gotta go - let's continue this tomorrow -cu! take care | 21:41 |
*** devimc has quit IRC | 21:41 | |
*** ailan__ has quit IRC | 21:52 | |
*** hashar has quit IRC | 22:42 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!