Tuesday, 2019-06-11

*** igordc has quit IRC00:56
*** khyr0n has joined #kata-dev03:35
*** khyr0n has quit IRC05:26
*** lpetrut has joined #kata-dev05:28
*** lpetrut has quit IRC05:43
*** sameo has joined #kata-dev06:37
*** lpetrut has joined #kata-dev06:46
*** sameo has quit IRC06:52
*** pcaruana has joined #kata-dev06:55
*** tmhoang has joined #kata-dev07:13
*** jodh has joined #kata-dev07:22
*** sameo has joined #kata-dev07:59
*** gwhaley has joined #kata-dev07:59
*** davidgiluk has joined #kata-dev08:05
*** tobberydberg has quit IRC09:22
*** tobberydberg has joined #kata-dev09:31
*** gwhaley has quit IRC09:43
*** pcaruana has quit IRC10:19
kata-irc-bot2<xu> I’ve sent the mail to skip the AC meeting this week.10:26
kata-irc-bot2<xu> Let’s discuss topics online.10:27
*** gwhaley has joined #kata-dev10:28
*** pcaruana has joined #kata-dev11:09
*** devimc has joined #kata-dev12:12
*** fuentess has joined #kata-dev12:35
*** devimc has quit IRC13:51
*** pcaruana has quit IRC14:04
*** devimc has joined #kata-dev14:19
*** pcaruana has joined #kata-dev14:30
*** pcaruana has quit IRC14:31
*** pcaruana has joined #kata-dev14:31
*** jodh has quit IRC14:59
brtknrhey all, is docker going to support RuntimeClass at some point in the same way crio and containerd do now?15:11
*** lpetrut has quit IRC15:12
*** pcaruana has quit IRC15:17
*** devimc has quit IRC15:19
*** devimc has joined #kata-dev15:20
brtknrto work with kubernetes... as I understand, docker already has a --runtime flag15:21
kata-irc-bot2<eric.ernst> I’m not sure. Seems a lot of docker integration is migrating into containerd for k8s...15:30
*** pcaruana has joined #kata-dev15:43
brtknrI see15:44
brtknrFailed create pod sandbox: rpc error: code = Unknown desc = container create failed: nemu-system-x86_64: -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/hugepages,share=on,prealloc=on: os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM15:44
*** igordc has joined #kata-dev15:44
brtknrI am trying to get the kata-nemu example off the ground but hitting this15:45
brtknrI have vm.nr_hugepages=1024 in /etc/sysctl.conf15:46
stefanhabrtknr: Did you actually set that sysctl?  (Putting it in the file isn't enough, at least if you don't run other commands or restart the host.)15:49
stefanhabrtknr: sysctl vm.nr_hugepages=102415:49
brtknrstefanha: yes, i set it in the file and rebooted the machine15:50
stefanhabrtknr: What output does "sysctl vm.nr_hugepages" produce?15:50
brtknr[centos@kata-worker-1 ~]$ sysctl vm.nr_hugepages15:50
brtknrvm.nr_hugepages = 102415:50
stefanhabrtknr: Any other QEMU processes running? :)15:51
*** sameo has quit IRC15:52
brtknrstefanha: not as far as im aware15:52
brtknrits a new instance15:52
stefanhabrtknr: If the host has enough memory you could try doubling the number of hugepages just to make sure there are enough hugepages for this 2 GB guest.15:55
*** pcaruana has quit IRC15:55
brtknrThe host has 128GB of memory15:56
kata-irc-bot2<raravena80> not that I know of, probably not. They are even talking about deprecating the docker-shim.15:58
brtknrstefanha: I've doubled it to 2048, same error15:59
brtknrstefanha: I'm only trying to run the test-deploy-kata-nemu.yaml example15:59
stefanhabrtknr: Hm...I'm stumped.  QEMU prints this error when it gets SIGBUS while accessing all guest RAM pages during startup.16:00
stefanhabrtknr: I'm not sure why it's getting SIGBUS if there are enough hugepages reserved.16:00
brtknrI've tried running the kata-qemu example and it works fine16:00
brtknrstefanha: so I tried running the containers directly via docker and it appears to work16:11
*** gwhaley has quit IRC16:13
davidgilukbrtknr: You could try checking /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages  before starting it16:13
*** tmhoang has quit IRC16:21
brtknrdavidgiluk: that gives me the same result as `sysctl vm.nr_hugepages`16:25
*** khyr0n has joined #kata-dev16:25
brtknrits invocation through k8s thats tanking16:25
davidgilukok, so I guess that means it actually did allocate the number of pages you asked for - it doesn't always16:26
brtknrdavidgiluk: do I need a newer version of crio?16:26
brtknror containerd?16:26
davidgilukno idea16:26
*** gwhaley has joined #kata-dev16:28
davidgilukbrtknr: I wonder if k8s is giving the container it's own /dev/hugepages mount or something similar16:28
brtknrlooks like @eric.ernst is the one who commited the kata-nemu example16:30
davidgilukbrtknr: https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/  says 'Huge pages are isolated at a pod scope' - so I guess it's doing something, but my knowledge of k8s is ><16:30
kata-irc-bot2<eric.ernst> :wave:16:30
brtknr@eric.ernst how did you get kata-nemu working on k8s 1.14?16:31
*** eernst has joined #kata-dev16:31
kata-irc-bot2<eric.ernst> kubectl apply :slightly_smiling_face:16:31
brtknrlol that easy?16:31
kata-irc-bot2<eric.ernst> w containerd16:31
brtknrhmm16:31
brtknrwonder if cri-o is the culprit16:32
kata-irc-bot2<eric.ernst> there may be an issue there.16:32
kata-irc-bot2<eric.ernst> @gmmaharaj is digging into it.16:32
kata-irc-bot2<gmmaharaj> brtknr: what is the cni you are using in k8s?16:34
brtknrgmmaha: flannel16:34
kata-irc-bot2<gmmaharaj> thanks.16:34
brtknrgmmaha: what should i use?16:35
kata-irc-bot2<gmmaharaj> i am using canal and i am hitting the same issue. was trying to see if it was cni specific. in my case, the pods crash right after kata is deployed and then never come back up16:35
brtknrgmmaha: are you also using cri-o?16:37
gmmahabrtknr: yeah, i am using crio16:37
brtknreric.ernst what cni are you using?16:37
brtknr@eric.ernst what cni are you using?16:38
kata-irc-bot2<eric.ernst> canal16:38
brtknrthanks16:39
gmmahabrtknr: what version of crio are you running on this machine?16:40
brtknrcrio version 1.13.6-1.dev.rhaos4.1.gitee2e748.el7-dev and you?16:41
gmmahai have tried 1.13.1, 1.14.116:42
gmmahasame outcome16:42
*** pcaruana has joined #kata-dev16:49
brtknrIm gonna attempt with containerd/flannel16:49
*** devimc has quit IRC17:23
*** sameo has joined #kata-dev17:40
*** devimc has joined #kata-dev17:43
*** devimc has quit IRC17:52
kata-irc-bot2<graham.whaley> hi @salvador.fuentes @gabriela.cervantes.te - I've been peeking at the k8s expose IP CI integration test, as it popped out in the time stamp logs when we were analysing where time went. First, I think we can improve it with a change of the image used. It seems node-hello has been deprecated, and if we replace it with hello-app we get a big saving in download size:17:53
kata-irc-bot2<graham.whaley> ```gcr.io/google-samples/hello-app      1.0                 bc5c421ecd6c7       10.1MB gcr.io/google-samples/node-hello     1.0                 4c7ea8709739b       665MB ```17:53
kata-irc-bot2<graham.whaley> 2nd thing though, I'm not sure if the test is quite testing what we think? On these lines: https://github.com/kata-containers/tests/blob/master/integration/kubernetes/k8s-expose-ip.bats#L55-L59 - that looks to me like maybe it is doing a curl from the pod IP addresses, and not from the externally exposed IP address from the load balancer. wdyt @gabriela.cervantes.te and @archana.m.shinde?  I'll see if I can play with that a bit17:54
kata-irc-bot2more to confirm17:54
kata-irc-bot2<salvador.fuentes> yeah, agree on changing the image :slightly_smiling_face:... on the second item I am not sure what is returned with that query...17:56
kata-irc-bot2<graham.whaley> yeah, I think I've only done the node expose IP thing once before, so I need to stare at that more. My local hand test didn't work first time for me :(17:57
kata-irc-bot2Action: graham.whaley very happy at trimming 650Mb off our 1.8Gb download though17:57
kata-irc-bot2<archana.m.shinde> @graham.whaley about your 2nd point, yeah looks like we are just curling against the pod-ip(actual endpoint)17:59
kata-irc-bot2<archana.m.shinde> we should probably do a ```kubectl get svc ``` to get the CLusterIP there and curl against that18:00
kata-irc-bot2<graham.whaley> @archana.m.shinde - right. that's what I thought. I'll see if I can make that work whilst fixing the image then ;)  /me hates BATS for lack of debugability...18:00
kata-irc-bot2<archana.m.shinde> and since its a loadbalancer svc and not just a clusterip svc, it would make sense to add a step further to curl againsts the localhost IP itself, to check that it is exposed outside the cluster18:02
kata-irc-bot2<graham.whaley> np. let me first open an Issue. guess who gets to be a reviewer when the PR comes ;)18:03
kata-irc-bot2<archana.m.shinde> haha :slightly_smiling_face:18:03
*** eernst has quit IRC18:07
*** eernst has joined #kata-dev18:13
*** eernst has quit IRC18:17
*** eernst has joined #kata-dev18:17
*** eernst has quit IRC18:18
*** pcaruana has quit IRC18:39
kata-irc-bot2<eric.ernst> @salvador.fuentes @gabriela.cervantes.te anything chance for ubuntu-ci-18-04 for packaging repo in latest day or so?18:39
kata-irc-bot2<eric.ernst> I see a few failures that I didn't execpt: ``` ok 1 Replication controller 1..1 not ok 1 Verify nginx connectivity between pods # (in test file nginx.bats, line 29) #   `kubectl wait --for=condition=Available deployment/${deployment}' failed # Image is up to date for docker.io/library/busybox@sha256:6dce4a9c1635c4c9b6a2b645e6613fa0238182fe13929808ee2258370d0f3497 # Image is up to date for18:40
kata-irc-bot2docker.io/library/nginx@sha256:706446e9c6667c0880d5da3f39c09a6c7d2114f5a5d6b74a2fafd24ae30d2078 # INFO: k8s configured to use runtimeclass # deployment.apps/nginx-deployment created # error: timed out waiting for the condition on deployments/nginx-deployment # deployment.extensions "nginx-deployment" deleted # Error from server (NotFound): services "nginx-deployment" not found # error: resource name may not be empty Makefile:107: recipe for18:40
kata-irc-bot2target 'kubernetes' failed make: *** [kubernetes] Error 1 Failed at 40: sudo -E PATH="$PATH" bash -c "make test" Build step 'Execute shell' marked build as failure Performing Post build task... Match found for :.* : True Logical operation result is TRUE Running script  : #!/bin/bash  ```18:40
kata-irc-bot2<graham.whaley> hmm, I've been running that test locally today and it was fine. nothing I know of... still, my components will not have been upto date18:41
kata-irc-bot2<eric.ernst> I see several PRs; same for ubuntu-ci-18-04 in proxy repo18:42
kata-irc-bot2<eric.ernst> i see volume test isn't using nginx now... not sure if this could be related.18:45
kata-irc-bot2<eric.ernst> not seeing other changes in tests repo that would indicate a problem.18:45
kata-irc-bot2<gabriela.cervantes.te> I will try to reproduce the error18:46
kata-irc-bot2<graham.whaley> @eric.ernst yeah, (I asked that we) move the volume test off nginx, as the volume test itself has no need for nginx or an image of that size - part of trying to optimise and reduce the CI where we can (micro opts, whilst also looking at macro opts)18:55
kata-irc-bot2<eric.ernst> Yes, I'm just trying to understand why the network connectivity test is failing.18:56
kata-irc-bot2<eric.ernst> and that's the only even-semi-relevant change in the tests repo over last day or two.18:56
kata-irc-bot2<eric.ernst> if volume test ran before connectivity, maybe it is just terrible amount of time needed to pull the image? idk...18:56
kata-irc-bot2<eric.ernst> I see that particular nginx test even last week too.19:01
kata-irc-bot2<eric.ernst> failed waiting for it to become ready.19:01
kata-irc-bot2<eric.ernst> it --> nginx deployment19:01
kata-irc-bot2<graham.whaley> @eric.ernst - if you needed to see what changed across all the major repos, I wrote a script for that: https://github.com/kata-containers/tests/tree/master/cmd/history  coz, otherwise it can be hard (although, you probably can coax it out of github if you have the right foo magic on the date range stuff... )19:01
*** davidgiluk has quit IRC19:16
*** khyr0n has quit IRC19:17
kata-irc-bot2<graham.whaley> hi @archana.m.shinde - so, I can curl off the service/loadbalancer port (8080), but not the service nodeport (port 31478 for instance) - might that be because that port is meant to be served up/out externally via a proxy (like an nginx etc.), which means maybe it is not ip table routed or whatever on the localhost? That feels like a lot of trouble for me to add to the test.... you OK with me moving the test from testing the pods19:23
kata-irc-bot2directly to testing the port on the svc IP ? :slightly_smiling_face:19:23
*** igordc has quit IRC19:26
kata-irc-bot2<gabriela.cervantes.te> @eric.ernst, I can reproduce the issue locally `not ok 1 Verify nginx connectivity between pods` and I am only running that test19:28
kata-irc-bot2<eric.ernst> kubectl describe show anything helpful?19:28
kata-irc-bot2<archana.m.shinde> @graham.whaley yeah sounds good19:31
gwhaleyok, fix that and PR in the morning. thx. cu19:32
*** gwhaley has quit IRC19:33
kata-irc-bot2<gabriela.cervantes.te> @eric.ernst I will see that in a fresh VM because what is funny is that I am getting an error like `/usr/share/defaults/kata-containers/configuration.toml: file /usr/bin/qemu-system-x86_64 does not exist`19:34
kata-irc-bot2<eric.ernst> did our default toml change?19:35
kata-irc-bot2<eric.ernst> tbh, I don't test via packages often, so haven't seen this.19:35
kata-irc-bot2<gabriela.cervantes.te> ohh ok, I will install everything in a new VM19:36
*** eernst has joined #kata-dev19:48
*** khyr0n has joined #kata-dev19:48
*** eernst has quit IRC19:52
*** eernst has joined #kata-dev19:54
*** sameo has quit IRC19:58
kata-irc-bot2<gabriela.cervantes.te> @eric.ernst at my ubuntu 18.04 is working ok, I will change the test to use nginx image instead of nginx:1.1420:08
kata-irc-bot2<gmmaharaj> @eric.ernst we not moving to 5.x kernel anytime soon?20:51
kata-irc-bot2<eric.ernst> yerp20:51
kata-irc-bot2<gmmaharaj> @eric.ernst is on fire with reviews today.20:58
kata-irc-bot2<eric.ernst> bang bang!20:58
*** tmhoang has joined #kata-dev21:04
*** tmhoang has quit IRC21:07
*** eernst has quit IRC21:17
*** eernst has joined #kata-dev21:18
*** eernst has quit IRC21:23
*** igordc has joined #kata-dev21:31
*** tmhoang has joined #kata-dev21:35
*** fuentess has quit IRC22:11
*** jugs has quit IRC22:24
*** jugs has joined #kata-dev22:24
brtknrgmmaha: I have it working with containerd and flannel22:28
gmmahabrtknr: aah nice.. we have been debugging this as well. if crio is restarted once k8s static pods + cni are up, things go wrong and the cni pod is never able to come up and crio keeps crashing.22:29
gmmahait is safer to say that when you are using crio, make sure that before you start k8s, you set the config options, mainly `manage_network_ns_lifecycle` to true before starting anything22:30
brtknrhmm interesting, any idea why crio is not as well documented as containerd?22:31
gmmahaohh our documents do need some love. :) i will make a change to fix this piece atleast up today.22:32
brtknrgmmaha: also how do I verify that kata-nemu is using virtiofs?22:33
brtknrgmmaha: i definitely cant find any mention of 9p under `ps aux | grep nemu | grep 9p`22:34
gmmahaon the host if you do `ps -ef | grep virt` you should see 2x the no. of processes as that of kata containers. each pod wil have a qemu process and a virtiofsd_x86_64 process.22:34
brtknrgmmaha: thats just want i needed! thanks :)22:35
gmmahacool.. glad i could help22:36
brtknr this is very exciting! i can go to sleep for now22:36
gmmaha:D22:37
brtknris there somewhere i can read up on how virtiofs differs from 9p and its upsides/downsides?22:37
gmmahabrtknr: https://virtio-fs.gitlab.io/ that is their landing page.22:37
brtknrgmmaha: they are advertising kata 1.7 there :D22:38
gmmahai believe an arch document for virtio-fs is there or is being worked on. I will try and fill you in on that as soon as i hear more.22:38
brtknrjust found this document by stefanha https://vmsplice.net/~stefan/stefanha-virtio-fs-kata.pdf22:39
gmmahacoool..22:40
brtknrgmmaha: hmm i am running `dd if=/dev/zero of=block bs=512M count=2` and i am surprised by the numbers22:49
gmmahabrtknr: good surprised or bad surprised? ;)22:50
brtknr1073741824 bytes (1.1 GB) copied, 8.85543 s, 121 MB/s for kata-qemu22:50
brtknr1048576000 bytes (1.0 GB) copied, 59.3652 s, 17.7 MB/s for kata-nemu22:50
brtknrI am guessing I have configured something incorrectly somewhere22:50
gmmahayeah those are odd. i am also guessing kata-qemu is using 9p and not virtio-fs?22:52
brtknrgmmaha: yeah thats right22:52
gmmahabrtknr: yeah, quite not he numbers i was looking for.22:53
gmmahaor what i got when i tested it.22:53
gmmahai was easily seeing twice the throughput22:53
brtknrreally?22:53
gmmahathe last i tested yes..22:54
brtknrgmmaha: this is baremetal perfomance for reference: 1073741824 bytes (1.1 GB) copied, 0.606793 s, 1.8 GB/s22:54
gmmahait has been a little while since i did (2 weeks ish)22:54
gmmahai was comparing just 9p vs virtio-fs and 9p took twice the amount of time compared to virtio-fs to write a 1G file.22:59
*** eernst has joined #kata-dev22:59
*** eernst has quit IRC23:00
*** eernst has joined #kata-dev23:00
brtknrwhat was the command you ran?23:01
gmmaha`dd if=/dev/zero of=test.file bs=1M count=1024`23:01
brtknrgmmaha: were you doing this inside kata containers or outside?23:02
gmmahainside the kata container. onto the rootfs23:02
*** eernst has quit IRC23:05
brtknrkata-nemu: 268435456 bytes (268 MB) copied, 2.99375 s, 89.7 MB/s23:06
brtknrkata-qemu: 268435456 bytes (268 MB) copied, 1.89188 s, 142 MB/s23:06
brtknrits looking 2x slower23:07
brtknrI must have something badly configured!23:07
gmmahabrtknr: and you are using 1.7.1 right?23:07
gmmahainstalled that using kata-deploy?23:07
gmmahai will try and bring up my system (virtual unfortunately) and test this out with just docker.23:08
gmmahals23:09
brtknrI'm using packaging master branch23:09
brtknrgmmaha: something strange just happened23:22
brtknr536870912 bytes (537 MB) copied, 0.979417 s, 548 MB/s23:22
brtknrthis is in kata-nemu pod23:22
brtknr536870912 bytes (537 MB) copied, 0.736673 s, 729 MB/s23:23
brtknri deleted kata-deploy and redeployed it23:23
brtknrgmmaha: the network file system I am using is BeeGFS... when writing to this filesystem directly on the hypervisor, I get the good performance23:31
brtknrOTOH, when I use Ceph as the storage backend, I get a much better performance for kata-nemu23:31
gmmahabrtknr: aaah..23:34
gmmahai am a bit familiar with ceph. never had the chance to work with beeGFS23:34
gmmahaso if i undrestnad it right, it works fine if your rootfs is on the host itself.23:35
gmmahawhere when backed by BeeGFS, you see a big performance hit. is that right?23:35
brtknrWell, in both cases I mounted network disks (Ceph and BeeGFS) mounted on the hypervisor to the kata container23:37
brtknrCeph worked well, BeeGFS didnt23:38
gmmahaaah..23:39
gmmahabrtknr: i need to head out now, but let me see if i can look into what BeeGFS has.. but this will all boil down to how you ahve setup ceph/beegfs. rbd, cephfs equivalent etc.23:39
brtknrgmmaha: thanks for the nice discussion, speak soon :)23:42

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!