Wednesday, 2020-10-14

*** crobinso has quit IRC00:23
kata-irc-bot<eric.ernst> @bergwolf @fupan @liubin0329 can either of ya'll confirm you've tested crictl update when a pod running with 2.0?00:30
kata-irc-bot<eric.ernst> When updating cpusets, I'm seeing: ```FATA[0060] Updating container resources for "e8881a3fee67b0" failed: rpc error: code = DeadlineExceeded desc = failed to update resources: failed to update resources: context deadline exceeded ```00:31
kata-irc-bot<eric.ernst> I see this with both cpusets and with updating cpu quota :S00:39
kata-irc-bot<eric.ernst> This is with master as well as rc1; if someone can TAL and try to reproduce that'd be very helpful.00:39
kata-irc-bot<eric.ernst> have to drop for now....00:39
*** auk has joined #kata-dev00:40
kata-irc-bot<bergwolf> we do use it internally but I haven’t tried update with upstream. Let’s me tal01:03
*** dklyle has quit IRC01:17
*** david-lyle has joined #kata-dev01:17
*** david-lyle has quit IRC01:24
kata-irc-bot<eric.ernst> I wonder if this is similar to the exec issues I was running into.01:44
kata-irc-bot<eric.ernst> I’ll try on azure later tonight.01:44
kata-irc-bot<bergwolf> Do I need k8s 1.18? with 1.17 I’m getting ```The Pod "busybox" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)```02:14
kata-irc-bot<eric.ernst> i think you have an existing busybox running?02:30
kata-irc-bot<eric.ernst> that you need to remove?02:30
kata-irc-bot<eric.ernst> I just ran my test w/ crictl.02:30
kata-irc-bot<eric.ernst> ```sudo crictl run --runtime kata-qemu container.json sandbox.json```02:31
kata-irc-bot<eric.ernst> ```sudo crictl update --cpu-quota=2000 229b6f0b69fe157fb0a39eb```02:31
kata-irc-bot<eric.ernst> using @liubin0329's wonderful json ;) : https://github.com/kata-containers/kata-containers/tree/2.0-dev/docs/how-to/data/crictl/busybox02:32
kata-irc-bot<bergwolf> ok, thx, let me retry02:51
*** auk has quit IRC03:06
kata-irc-bot<bergwolf> https://github.com/kata-containers/kata-containers/pull/947 should fix it05:29
kata-irc-bot<bergwolf> The problem is containerd somehow sends an update request with empty device type while updating the cpu quota. It should not happen IMHO but we should not panic either.05:30
kata-irc-bot<bergwolf>05:31
kata-irc-bot```&UpdateContainerRequest{ContainerId:69ea8dafe282f29bece8ad5828c0f24a3ecf28af052c1cfe8be61e3f4094d989,Resources:&LinuxResources{Devices:[]LinuxDeviceCgroup{LinuxDeviceCgroup{Allow:false,Type:,Major:0,Minor:0,Access:rwm,XXX_unrecognized:[],},},Memory:&LinuxMemory{Limit:0,Reservation:0,Swap:0,Kernel:0,KernelTCP:0,Swappiness:0,DisableOOMKiller:false,XXX_unrecognized:[],},CPU:&LinuxCPU{Shares:0,Quota:2000,Period:0,RealtimeRuntime:0,RealtimePeriod:0,05:31
kata-irc-botCpus:,Mems:,XXX_unrecognized:[],},Pids:nil,BlockIO:nil,HugepageLimits:[]LinuxHugepageLimit{},Network:nil,XXX_unrecognized:[],},XXX_unrecognized:[],}```05:31
kata-irc-bot<bergwolf> The affending request. it has an unexpected Device there05:31
*** sameo has joined #kata-dev06:14
*** sgarzare has joined #kata-dev06:51
*** jodh has joined #kata-dev07:02
*** fgiudici has joined #kata-dev07:11
*** davidgiluk has joined #kata-dev08:07
kata-irc-bot<ssheribe> Hi, while trying to run on openshift (cri-o 1.19 upstream 2.0 agent+runtime), i'm getting hotplug memory err, as hotplugging issues were mentioned yesterday and IIUC should be fixed upstream i wonder whether this the same or if i'm missing something, any suggestions?12:16
kata-irc-bot<ssheribe> {"msg":"hotplug memory error: No such file or directory","level":"INFO","ts":"2020-10-14T11:59:31.315302384+00:00","source":"agent","name":"kata-agent","pid":"1","version":"0.1.0","subsystem":"rpc"} [    9.881403] pci 0000:00:02.0: PCI bridge to [bus 01] [    9.896093] pci 0000:00:02.0:   bridge window [io  0xc000-0xcfff]12:17
kata-irc-bot<fidencio> @ssheribe, just for your information, this is the issue and patch @eric.ernst merged Yesterday: • issue: https://github.com/kata-containers/kata-containers/issues/908 • patch: https://github.com/kata-containers/kata-containers/pull/910 I'm not exactly sure those are the same.12:33
kata-irc-bot<fidencio> Do you happen to have the name of the tests where it's happening and could you also update the full log (for this specific test) somewhere we can have access to?  I think people will be mostly interested in the messages that happened before that12:34
kata-irc-bot<fidencio> And, this is with  rc1 or latest git?12:35
kata-irc-bot<ssheribe> Latest, the thing is that it does work with the k8s single noded (which uses cri-o 1.18.3 and older qemu)12:37
kata-irc-bot<ssheribe> I'll share it, unfortunately i don't have access to it currently as it's on my older laptop, I'll be able to get later today/tomorrow, so meanwhile i'm trying to just re-run it12:39
kata-irc-bot<fidencio> Okay, it works with the k8s single node. Interesting.  How complicated would be to try to do the following tests12:42
kata-irc-bot<ssheribe> (and yes this fix is in, so it's not that i suppose, thanks)12:43
kata-irc-bot<fidencio> 1. in the same working environment, replace qemu for the one used in 2.0; 2. after checking with qemu, replace also the initrd / image used; By doing this we ensure qemu & initrd are out of question; Then replace, 3. shimv2 & agent12:44
kata-irc-bot<fidencio> with this we can ensure that if the issue occurs, it's on shimv2 or agent12:44
kata-irc-bot<fidencio> if you still cannot reproduce, try updating cri-o ...  By doing this we kind of are at least certain about which component may be causing it12:45
kata-irc-bot<fidencio> Does this make sense?12:45
kata-irc-bot<ssheribe> change the init rd to one with 1.x?12:48
kata-irc-bot<fidencio> Hmmm. Yeah, the initrd contains the agent, that isn't a simple replace.12:53
kata-irc-bot<fidencio> Once we use 2.0 agent we must go with 2.0 runtime as well12:54
kata-irc-bot<fidencio> So, if there's no simple way to split the parts in order to try to isolate the problematic component, the best we can do is getting the logs and providing as much instructions as possible on how to reproduce12:55
kata-irc-bot<fidencio> @ssheribe, an issue would also be very much welcome12:55
kata-irc-bot<fidencio> I have to run for a training, and will be back in 3 hours or so12:55
kata-irc-bot<ssheribe> np, Thanks! I'll try to do that, probably upgrading the k8s machine piece by piece should be hard i guess, i also consider to start running test with 4.5.0 so i'll have some results first12:58
kata-irc-bot<bergwolf> The error says that there is no `/sys/devices/system/memory/probe` in your guest. Which kernel are you testing against? Can you retry with the one shipped by Kata 2.0 (like in the -rc1)?13:14
kata-irc-bot<bergwolf> My guess is that your kernel config does not enable memory hotplug13:14
kata-irc-bot<eric.ernst> Thanks. We need to up the coverage.13:24
kata-irc-bot<ssheribe> @bergwolf I'm using vmlinuz-kata-v5.6-april-09-2020-83-virtiofs13:27
kata-irc-bot<bergwolf> can you retry with `vmlinux-5.4.60-83` ?13:30
kata-irc-bot<bergwolf> surely you need to switch back to 9pfs IIUC13:33
kata-irc-bot<ssheribe> trying13:34
*** crobinso has joined #kata-dev13:38
*** fuentess has joined #kata-dev13:41
kata-irc-bot<ssheribe> well i get 'virtio-9p-pci' is not a valid device model name, sounds like it's disabled in the qemu.. (i will check that)13:47
kata-irc-bot<bergwolf> maybe switching qemu to `qemu-system-x86_64` instead of `qemu-virtiofs-system-x86_64` ? Possibly we disabled 9p when enabling virtio-fs13:53
kata-irc-bot<ssheribe> It's 4.2.0 with the necessary parts backported, but I can try this version and hope that nothing will break13:59
kata-irc-bot<bergwolf> ah, I thought you were using the kata 2.0 qemu. If it was built in-house, it’s possible that 9pfs is disabled by your script or environment.14:05
kata-irc-bot<ssheribe> Yes, it's indeed seems to be disabled, but it used to work with virtiofs14:10
kata-irc-bot<eric.ernst> If it’s RH, I assume 9p was dropped from it.14:13
*** dklyle has joined #kata-dev14:54
kata-irc-bot<gabriela.cervantes.te> I think that we should test the PR with the jenkins job that you created @salvador.fuentes before merging in case that not all the images are passing14:56
kata-irc-bot<gabriela.cervantes.te> and then it can run daily once that is merged @salvador.fuentes14:57
kata-irc-bot<eric.ernst> Sounds good. I expect some will need to be removed (they failed for me w runc)15:01
kata-irc-bot<eric.ernst> exciting night of bug fixes -- thanks  @bergwolf  et al15:50
kata-irc-bot<salvador.fuentes> ok, sure, let me run with PR, will let you know the results15:56
kata-irc-bot<eric.ernst> ...master is looking much better now, will continue testing...16:56
*** jodh has quit IRC17:03
kata-irc-bot<salvador.fuentes> @eric.ernst job ran and failed since the first test on function `setup`17:30
kata-irc-bot<salvador.fuentes> ```12:24:33 not ok 1 [display text] hello world in an alpine container 12:24:33 # (from function `setup' in test file popular_docker_images.bats, line 20) 12:24:33 #   `[ "$status" -eq 0 ]' failed```17:30
kata-irc-bot<salvador.fuentes> http://jenkins.katacontainers.io/job/kata-containers-2.0-popular-images/4/console17:30
*** sgarzare has quit IRC17:33
*** davidgiluk has quit IRC19:36
*** auk has joined #kata-dev20:03
*** th0din has quit IRC20:42
*** fgiudici has quit IRC21:15
*** crobinso has quit IRC21:16
*** auk has quit IRC21:42
*** auk has joined #kata-dev21:46
kata-irc-bot<fidencio> People, I'd like to forward the CRI-O survey, which is still open for the next 2 weeks. If you're using CRI-O, please, consider taking the survey. https://www.surveymonkey.com/r/crio-user-poll-202022:00
*** sameo has quit IRC22:02
*** auk has quit IRC22:06
*** th0din has joined #kata-dev22:10
*** sameo has joined #kata-dev22:20
*** auk has joined #kata-dev22:33
*** fuentess has quit IRC22:39
*** th0din has quit IRC22:47
*** EricAdamsZNC has joined #kata-dev23:05
*** crobinso has joined #kata-dev23:08
*** crobinso has quit IRC23:10
*** th0din has joined #kata-dev23:14
*** sameo has quit IRC23:17
*** dklyle has quit IRC23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!