Monday, 2020-07-13

kata-irc-bot1<saini.paramdeep> @archana.m.shinde @julio.montes  I tested both reboot and resetting sysrq trigger and I found that reboot kills the container and doesn’t restart but qemu process keeps on running. However, re-setting sysrq-trigger kills the container but crictl doesn’t clean the container. I am wondering how can we handle this scenario in a better way as many application might need a functionality of forceful reboot of a kata vm.03:26
*** bpradipt has joined #kata-dev04:19
*** sameo_ has joined #kata-dev04:41
*** bpradipt has joined #kata-dev04:52
kata-irc-bot1<fidencio> Out of curiosity, are you testing using containerd?06:20
*** pcaruana has joined #kata-dev06:49
*** sgarzare has joined #kata-dev06:59
*** Yarboa has joined #kata-dev07:00
*** jodh has joined #kata-dev07:13
*** jodh has quit IRC07:18
*** fgiudici has joined #kata-dev07:19
*** davidgiluk has joined #kata-dev08:04
*** jodh has joined #kata-dev08:07
*** sameo_ has quit IRC08:10
*** sameo has joined #kata-dev08:22
*** sgarzare has quit IRC09:06
*** sgarzare has joined #kata-dev09:09
*** sgarzare has quit IRC09:59
*** Yarboa has quit IRC10:42
*** Yarboa has joined #kata-dev10:54
*** fgiudici has quit IRC11:19
*** fgiudici_ has joined #kata-dev11:19
*** sgarzare has joined #kata-dev12:09
*** bpradipt has quit IRC13:27
*** th0din_ has quit IRC13:27
*** irclogbot_3 has quit IRC13:27
*** georgk has quit IRC13:27
*** bpradipt has joined #kata-dev13:28
*** th0din_ has joined #kata-dev13:28
*** irclogbot_3 has joined #kata-dev13:28
*** georgk has joined #kata-dev13:28
*** crobinso has joined #kata-dev13:46
*** dklyle has joined #kata-dev14:39
*** fgiudici_ has quit IRC15:20
*** fgiudici_ has joined #kata-dev15:20
*** Yarboa has quit IRC15:28
*** Yarboa has joined #kata-dev15:43
kata-irc-bot1<saini.paramdeep> @fidencio Yes, I am using crio with kata runtime and it uses containerd.15:53
kata-irc-bot1<fidencio> You either use crio, or containerd, to start the kata runtime (or the shimv2).15:56
*** sgarzare has quit IRC16:01
kata-irc-bot1<saini.paramdeep> hmm…sorry for the confusion. I again checked and we are using crio.16:03
*** Yarboa has quit IRC16:12
kata-irc-bot1<fidencio> Ack! I'm not the person to help, but I'm curious to follow this one up16:13
*** Yarboa has joined #kata-dev16:24
*** jodh has quit IRC17:02
kata-irc-bot1<saini.paramdeep> @fidencio Sure.17:18
*** fgiudici_ has quit IRC17:23
*** davidgiluk has quit IRC19:12
*** th0din has joined #kata-dev19:14
*** th0din_ has quit IRC19:15
*** Yarboa has quit IRC19:43
kata-irc-bot1<fidencio> @efoster, around19:54
kata-irc-bot1<efoster> Indeed!19:55
kata-irc-bot1<fidencio> Looking at the in the ContainerCreate ... don't you think we should actually delete the container in case of failure, instead of stopping it?19:56
kata-irc-bot1<efoster> Stopping it cleans up all of its resources. If we just delete it then I don't think the mounts get cleaned up properly19:56
kata-irc-bot1<efoster> https://github.com/kata-containers/runtime/blob/master/virtcontainers/container.go#L957-L97619:57
kata-irc-bot1<efoster> No cleanup there, and `c.sandbox.removeContainer` just deletes the container from a data structure: https://github.com/kata-containers/runtime/blob/a885b1bbf9069f8cfe10b21ba4314f5ef430b29a/virtcontainers/sandbox.go#L774-L79119:58
kata-irc-bot1<fidencio> Yeah, just checked that now. :S Then it leads me to another question, shouldn't we also delete it?19:59
kata-irc-bot1<fidencio> After the stop() call19:59
kata-irc-bot1<efoster> https://github.com/kata-containers/runtime/pull/2826/files#diff-bca2c2b5b04e0e74533b492fd23a3cc8R1175 I believe this makes the same call that `c.delete()` does, but I'm not in love with it.20:00
kata-irc-bot1<efoster> I'd be fine with calling `c.delete` after `c.stop`20:00
kata-irc-bot1<fidencio> You're right. `c.delete()`  ends up calling `s.removeContainer`.20:02
kata-irc-bot1<efoster> Hrm... Not sure why this pod doesn't want to clean up.20:06
kata-irc-bot1<efoster> There don't seem to be any resources hanging around, but cri-o really doesn't want to let go of it20:07
kata-irc-bot1<fidencio> I assume you got some log about the CreateContainer failure, didn't you?20:11
kata-irc-bot1<fidencio> Mind to share?20:11
kata-irc-bot1<fidencio> I'm trying to take a look at CRI-O's cleanup20:11
kata-irc-bot1<efoster>20:11
kata-irc-bot1<efoster> Here's a little bit out of CRI-O's logs20:12
kata-irc-bot1<efoster> let me do another test run and I'll get you the full thing from start to finish20:12
kata-irc-bot1<efoster> I apologize in advance for all the junk, there's some other stuff running on this node that's going to clutter up the logs.20:12
kata-irc-bot1<fidencio> So, my guess: • CreateContainer fails • The error gets propagated all the way up to CRI-O • CRI-O cleans itself up by doing: https://github.com/cri-o/cri-o/blob/master/internal/oci/runtime_vm.go#L116-L120 • Kata ends up deleting the container • CRI-O bails on deleting the container, as the clean up already took care of part of that: https://github.com/cri-o/cri-o/blob/master/internal/oci/runtime_vm.go#L530-L53320:19
kata-irc-bot1<fidencio> I can provide you a quick patch for cri-o, if you think my guess does make sense20:19
kata-irc-bot1<efoster> That's different from https://github.com/cri-o/cri-o/pull/3924, right? Full logs will be heading your way in a second btw20:21
kata-irc-bot1<fidencio> That's different, yes20:21
kata-irc-bot1<efoster> Well, I'm 100% down to try it out!20:21
kata-irc-bot1<efoster> That sounds pretty reasonable to me.20:22
kata-irc-bot1<efoster> https://gist.github.com/evanfoster/79c6c86feca841114f000621fb2d3043 Full CRI-O and Kata logs. Pod ID and name in the gist description20:23
kata-irc-bot1<fidencio> Same wip/1.17-to-evan-foster branch you were already using20:26
kata-irc-bot1<efoster> Alrighty, I'll rebuild that and try again.20:27
kata-irc-bot1<fidencio> Thanks!20:29
kata-irc-bot1<efoster> That did it! The pod hung around in a `NotReady` state for 30 seconds and then was fully deleted.20:32
kata-irc-bot1<fidencio> Cool, I'll submit the patch to CRI-O20:32
kata-irc-bot1<fidencio> Thanks for digging into this, Evan!20:32
kata-irc-bot1<efoster> Thanks for all the help! Glad we got it figured out.20:34
kata-irc-bot1<efoster> Could you send me the link to the CRI-O PR when you create it?20:35
kata-irc-bot1<efoster> Or just tag me, whatever's most convenient.20:36
kata-irc-bot1<fidencio> Sure, just trying to add as much detail as possible in the commit message :slightly_smiling_face:20:44
kata-irc-bot1<fidencio> https://github.com/cri-o/cri-o/pull/394920:55
kata-irc-bot1<efoster> Awesome, thanks much!20:55
kata-irc-bot1<fidencio> np! Let's wait and see if the PR actually makes as much sense for the maintainers as it did for us. :slightly_smiling_face:20:56
kata-irc-bot1<fidencio> Thanks for help on this, nice collaboration, very much appreciated!20:56
kata-irc-bot1<efoster> Same here! 1 issue down, 1 to go.20:59
kata-irc-bot1<fidencio> BTW, I'm still facing some shimv2 processes laying around when a pod terminates itself successfully. But I need to finish some other stuff before I get back to this one.21:01
kata-irc-bot1<efoster> I've seen that occasionally, but it seems pretty intermittent. I'll let my test system churn and see if I can reproduce it and maybe track it down.21:03
kata-irc-bot1<fidencio> That one is quite tricky, mainly because it's not reproducible with containerd.  So, I ended up setting up a server and will trace both containerd and crio to figure out what's going on. I just need some time.21:05
*** Yarboa has joined #kata-dev21:09
kata-irc-bot1<fidencio> Okay, let me call it a day, have a good one!21:32
*** crobinso has quit IRC22:00
*** th0din has quit IRC23:36
*** th0din has joined #kata-dev23:41

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!