kata-irc-bot1 | <saini.paramdeep> @archana.m.shinde @julio.montes I tested both reboot and resetting sysrq trigger and I found that reboot kills the container and doesn’t restart but qemu process keeps on running. However, re-setting sysrq-trigger kills the container but crictl doesn’t clean the container. I am wondering how can we handle this scenario in a better way as many application might need a functionality of forceful reboot of a kata vm. | 03:26 |
---|---|---|
*** bpradipt has joined #kata-dev | 04:19 | |
*** sameo_ has joined #kata-dev | 04:41 | |
*** bpradipt has joined #kata-dev | 04:52 | |
kata-irc-bot1 | <fidencio> Out of curiosity, are you testing using containerd? | 06:20 |
*** pcaruana has joined #kata-dev | 06:49 | |
*** sgarzare has joined #kata-dev | 06:59 | |
*** Yarboa has joined #kata-dev | 07:00 | |
*** jodh has joined #kata-dev | 07:13 | |
*** jodh has quit IRC | 07:18 | |
*** fgiudici has joined #kata-dev | 07:19 | |
*** davidgiluk has joined #kata-dev | 08:04 | |
*** jodh has joined #kata-dev | 08:07 | |
*** sameo_ has quit IRC | 08:10 | |
*** sameo has joined #kata-dev | 08:22 | |
*** sgarzare has quit IRC | 09:06 | |
*** sgarzare has joined #kata-dev | 09:09 | |
*** sgarzare has quit IRC | 09:59 | |
*** Yarboa has quit IRC | 10:42 | |
*** Yarboa has joined #kata-dev | 10:54 | |
*** fgiudici has quit IRC | 11:19 | |
*** fgiudici_ has joined #kata-dev | 11:19 | |
*** sgarzare has joined #kata-dev | 12:09 | |
*** bpradipt has quit IRC | 13:27 | |
*** th0din_ has quit IRC | 13:27 | |
*** irclogbot_3 has quit IRC | 13:27 | |
*** georgk has quit IRC | 13:27 | |
*** bpradipt has joined #kata-dev | 13:28 | |
*** th0din_ has joined #kata-dev | 13:28 | |
*** irclogbot_3 has joined #kata-dev | 13:28 | |
*** georgk has joined #kata-dev | 13:28 | |
*** crobinso has joined #kata-dev | 13:46 | |
*** dklyle has joined #kata-dev | 14:39 | |
*** fgiudici_ has quit IRC | 15:20 | |
*** fgiudici_ has joined #kata-dev | 15:20 | |
*** Yarboa has quit IRC | 15:28 | |
*** Yarboa has joined #kata-dev | 15:43 | |
kata-irc-bot1 | <saini.paramdeep> @fidencio Yes, I am using crio with kata runtime and it uses containerd. | 15:53 |
kata-irc-bot1 | <fidencio> You either use crio, or containerd, to start the kata runtime (or the shimv2). | 15:56 |
*** sgarzare has quit IRC | 16:01 | |
kata-irc-bot1 | <saini.paramdeep> hmm…sorry for the confusion. I again checked and we are using crio. | 16:03 |
*** Yarboa has quit IRC | 16:12 | |
kata-irc-bot1 | <fidencio> Ack! I'm not the person to help, but I'm curious to follow this one up | 16:13 |
*** Yarboa has joined #kata-dev | 16:24 | |
*** jodh has quit IRC | 17:02 | |
kata-irc-bot1 | <saini.paramdeep> @fidencio Sure. | 17:18 |
*** fgiudici_ has quit IRC | 17:23 | |
*** davidgiluk has quit IRC | 19:12 | |
*** th0din has joined #kata-dev | 19:14 | |
*** th0din_ has quit IRC | 19:15 | |
*** Yarboa has quit IRC | 19:43 | |
kata-irc-bot1 | <fidencio> @efoster, around | 19:54 |
kata-irc-bot1 | <efoster> Indeed! | 19:55 |
kata-irc-bot1 | <fidencio> Looking at the in the ContainerCreate ... don't you think we should actually delete the container in case of failure, instead of stopping it? | 19:56 |
kata-irc-bot1 | <efoster> Stopping it cleans up all of its resources. If we just delete it then I don't think the mounts get cleaned up properly | 19:56 |
kata-irc-bot1 | <efoster> https://github.com/kata-containers/runtime/blob/master/virtcontainers/container.go#L957-L976 | 19:57 |
kata-irc-bot1 | <efoster> No cleanup there, and `c.sandbox.removeContainer` just deletes the container from a data structure: https://github.com/kata-containers/runtime/blob/a885b1bbf9069f8cfe10b21ba4314f5ef430b29a/virtcontainers/sandbox.go#L774-L791 | 19:58 |
kata-irc-bot1 | <fidencio> Yeah, just checked that now. :S Then it leads me to another question, shouldn't we also delete it? | 19:59 |
kata-irc-bot1 | <fidencio> After the stop() call | 19:59 |
kata-irc-bot1 | <efoster> https://github.com/kata-containers/runtime/pull/2826/files#diff-bca2c2b5b04e0e74533b492fd23a3cc8R1175 I believe this makes the same call that `c.delete()` does, but I'm not in love with it. | 20:00 |
kata-irc-bot1 | <efoster> I'd be fine with calling `c.delete` after `c.stop` | 20:00 |
kata-irc-bot1 | <fidencio> You're right. `c.delete()` ends up calling `s.removeContainer`. | 20:02 |
kata-irc-bot1 | <efoster> Hrm... Not sure why this pod doesn't want to clean up. | 20:06 |
kata-irc-bot1 | <efoster> There don't seem to be any resources hanging around, but cri-o really doesn't want to let go of it | 20:07 |
kata-irc-bot1 | <fidencio> I assume you got some log about the CreateContainer failure, didn't you? | 20:11 |
kata-irc-bot1 | <fidencio> Mind to share? | 20:11 |
kata-irc-bot1 | <fidencio> I'm trying to take a look at CRI-O's cleanup | 20:11 |
kata-irc-bot1 | <efoster> | 20:11 |
kata-irc-bot1 | <efoster> Here's a little bit out of CRI-O's logs | 20:12 |
kata-irc-bot1 | <efoster> let me do another test run and I'll get you the full thing from start to finish | 20:12 |
kata-irc-bot1 | <efoster> I apologize in advance for all the junk, there's some other stuff running on this node that's going to clutter up the logs. | 20:12 |
kata-irc-bot1 | <fidencio> So, my guess: • CreateContainer fails • The error gets propagated all the way up to CRI-O • CRI-O cleans itself up by doing: https://github.com/cri-o/cri-o/blob/master/internal/oci/runtime_vm.go#L116-L120 • Kata ends up deleting the container • CRI-O bails on deleting the container, as the clean up already took care of part of that: https://github.com/cri-o/cri-o/blob/master/internal/oci/runtime_vm.go#L530-L533 | 20:19 |
kata-irc-bot1 | <fidencio> I can provide you a quick patch for cri-o, if you think my guess does make sense | 20:19 |
kata-irc-bot1 | <efoster> That's different from https://github.com/cri-o/cri-o/pull/3924, right? Full logs will be heading your way in a second btw | 20:21 |
kata-irc-bot1 | <fidencio> That's different, yes | 20:21 |
kata-irc-bot1 | <efoster> Well, I'm 100% down to try it out! | 20:21 |
kata-irc-bot1 | <efoster> That sounds pretty reasonable to me. | 20:22 |
kata-irc-bot1 | <efoster> https://gist.github.com/evanfoster/79c6c86feca841114f000621fb2d3043 Full CRI-O and Kata logs. Pod ID and name in the gist description | 20:23 |
kata-irc-bot1 | <fidencio> Same wip/1.17-to-evan-foster branch you were already using | 20:26 |
kata-irc-bot1 | <efoster> Alrighty, I'll rebuild that and try again. | 20:27 |
kata-irc-bot1 | <fidencio> Thanks! | 20:29 |
kata-irc-bot1 | <efoster> That did it! The pod hung around in a `NotReady` state for 30 seconds and then was fully deleted. | 20:32 |
kata-irc-bot1 | <fidencio> Cool, I'll submit the patch to CRI-O | 20:32 |
kata-irc-bot1 | <fidencio> Thanks for digging into this, Evan! | 20:32 |
kata-irc-bot1 | <efoster> Thanks for all the help! Glad we got it figured out. | 20:34 |
kata-irc-bot1 | <efoster> Could you send me the link to the CRI-O PR when you create it? | 20:35 |
kata-irc-bot1 | <efoster> Or just tag me, whatever's most convenient. | 20:36 |
kata-irc-bot1 | <fidencio> Sure, just trying to add as much detail as possible in the commit message :slightly_smiling_face: | 20:44 |
kata-irc-bot1 | <fidencio> https://github.com/cri-o/cri-o/pull/3949 | 20:55 |
kata-irc-bot1 | <efoster> Awesome, thanks much! | 20:55 |
kata-irc-bot1 | <fidencio> np! Let's wait and see if the PR actually makes as much sense for the maintainers as it did for us. :slightly_smiling_face: | 20:56 |
kata-irc-bot1 | <fidencio> Thanks for help on this, nice collaboration, very much appreciated! | 20:56 |
kata-irc-bot1 | <efoster> Same here! 1 issue down, 1 to go. | 20:59 |
kata-irc-bot1 | <fidencio> BTW, I'm still facing some shimv2 processes laying around when a pod terminates itself successfully. But I need to finish some other stuff before I get back to this one. | 21:01 |
kata-irc-bot1 | <efoster> I've seen that occasionally, but it seems pretty intermittent. I'll let my test system churn and see if I can reproduce it and maybe track it down. | 21:03 |
kata-irc-bot1 | <fidencio> That one is quite tricky, mainly because it's not reproducible with containerd. So, I ended up setting up a server and will trace both containerd and crio to figure out what's going on. I just need some time. | 21:05 |
*** Yarboa has joined #kata-dev | 21:09 | |
kata-irc-bot1 | <fidencio> Okay, let me call it a day, have a good one! | 21:32 |
*** crobinso has quit IRC | 22:00 | |
*** th0din has quit IRC | 23:36 | |
*** th0din has joined #kata-dev | 23:41 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!