*** igordc has quit IRC | 00:56 | |
*** khyr0n has joined #kata-dev | 03:35 | |
*** khyr0n has quit IRC | 05:26 | |
*** lpetrut has joined #kata-dev | 05:28 | |
*** lpetrut has quit IRC | 05:43 | |
*** sameo has joined #kata-dev | 06:37 | |
*** lpetrut has joined #kata-dev | 06:46 | |
*** sameo has quit IRC | 06:52 | |
*** pcaruana has joined #kata-dev | 06:55 | |
*** tmhoang has joined #kata-dev | 07:13 | |
*** jodh has joined #kata-dev | 07:22 | |
*** sameo has joined #kata-dev | 07:59 | |
*** gwhaley has joined #kata-dev | 07:59 | |
*** davidgiluk has joined #kata-dev | 08:05 | |
*** tobberydberg has quit IRC | 09:22 | |
*** tobberydberg has joined #kata-dev | 09:31 | |
*** gwhaley has quit IRC | 09:43 | |
*** pcaruana has quit IRC | 10:19 | |
kata-irc-bot2 | <xu> I’ve sent the mail to skip the AC meeting this week. | 10:26 |
---|---|---|
kata-irc-bot2 | <xu> Let’s discuss topics online. | 10:27 |
*** gwhaley has joined #kata-dev | 10:28 | |
*** pcaruana has joined #kata-dev | 11:09 | |
*** devimc has joined #kata-dev | 12:12 | |
*** fuentess has joined #kata-dev | 12:35 | |
*** devimc has quit IRC | 13:51 | |
*** pcaruana has quit IRC | 14:04 | |
*** devimc has joined #kata-dev | 14:19 | |
*** pcaruana has joined #kata-dev | 14:30 | |
*** pcaruana has quit IRC | 14:31 | |
*** pcaruana has joined #kata-dev | 14:31 | |
*** jodh has quit IRC | 14:59 | |
brtknr | hey all, is docker going to support RuntimeClass at some point in the same way crio and containerd do now? | 15:11 |
*** lpetrut has quit IRC | 15:12 | |
*** pcaruana has quit IRC | 15:17 | |
*** devimc has quit IRC | 15:19 | |
*** devimc has joined #kata-dev | 15:20 | |
brtknr | to work with kubernetes... as I understand, docker already has a --runtime flag | 15:21 |
kata-irc-bot2 | <eric.ernst> I’m not sure. Seems a lot of docker integration is migrating into containerd for k8s... | 15:30 |
*** pcaruana has joined #kata-dev | 15:43 | |
brtknr | I see | 15:44 |
brtknr | Failed create pod sandbox: rpc error: code = Unknown desc = container create failed: nemu-system-x86_64: -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/hugepages,share=on,prealloc=on: os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM | 15:44 |
*** igordc has joined #kata-dev | 15:44 | |
brtknr | I am trying to get the kata-nemu example off the ground but hitting this | 15:45 |
brtknr | I have vm.nr_hugepages=1024 in /etc/sysctl.conf | 15:46 |
stefanha | brtknr: Did you actually set that sysctl? (Putting it in the file isn't enough, at least if you don't run other commands or restart the host.) | 15:49 |
stefanha | brtknr: sysctl vm.nr_hugepages=1024 | 15:49 |
brtknr | stefanha: yes, i set it in the file and rebooted the machine | 15:50 |
stefanha | brtknr: What output does "sysctl vm.nr_hugepages" produce? | 15:50 |
brtknr | [centos@kata-worker-1 ~]$ sysctl vm.nr_hugepages | 15:50 |
brtknr | vm.nr_hugepages = 1024 | 15:50 |
stefanha | brtknr: Any other QEMU processes running? :) | 15:51 |
*** sameo has quit IRC | 15:52 | |
brtknr | stefanha: not as far as im aware | 15:52 |
brtknr | its a new instance | 15:52 |
stefanha | brtknr: If the host has enough memory you could try doubling the number of hugepages just to make sure there are enough hugepages for this 2 GB guest. | 15:55 |
*** pcaruana has quit IRC | 15:55 | |
brtknr | The host has 128GB of memory | 15:56 |
kata-irc-bot2 | <raravena80> not that I know of, probably not. They are even talking about deprecating the docker-shim. | 15:58 |
brtknr | stefanha: I've doubled it to 2048, same error | 15:59 |
brtknr | stefanha: I'm only trying to run the test-deploy-kata-nemu.yaml example | 15:59 |
stefanha | brtknr: Hm...I'm stumped. QEMU prints this error when it gets SIGBUS while accessing all guest RAM pages during startup. | 16:00 |
stefanha | brtknr: I'm not sure why it's getting SIGBUS if there are enough hugepages reserved. | 16:00 |
brtknr | I've tried running the kata-qemu example and it works fine | 16:00 |
brtknr | stefanha: so I tried running the containers directly via docker and it appears to work | 16:11 |
*** gwhaley has quit IRC | 16:13 | |
davidgiluk | brtknr: You could try checking /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages before starting it | 16:13 |
*** tmhoang has quit IRC | 16:21 | |
brtknr | davidgiluk: that gives me the same result as `sysctl vm.nr_hugepages` | 16:25 |
*** khyr0n has joined #kata-dev | 16:25 | |
brtknr | its invocation through k8s thats tanking | 16:25 |
davidgiluk | ok, so I guess that means it actually did allocate the number of pages you asked for - it doesn't always | 16:26 |
brtknr | davidgiluk: do I need a newer version of crio? | 16:26 |
brtknr | or containerd? | 16:26 |
davidgiluk | no idea | 16:26 |
*** gwhaley has joined #kata-dev | 16:28 | |
davidgiluk | brtknr: I wonder if k8s is giving the container it's own /dev/hugepages mount or something similar | 16:28 |
brtknr | looks like @eric.ernst is the one who commited the kata-nemu example | 16:30 |
davidgiluk | brtknr: https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/ says 'Huge pages are isolated at a pod scope' - so I guess it's doing something, but my knowledge of k8s is >< | 16:30 |
kata-irc-bot2 | <eric.ernst> :wave: | 16:30 |
brtknr | @eric.ernst how did you get kata-nemu working on k8s 1.14? | 16:31 |
*** eernst has joined #kata-dev | 16:31 | |
kata-irc-bot2 | <eric.ernst> kubectl apply :slightly_smiling_face: | 16:31 |
brtknr | lol that easy? | 16:31 |
kata-irc-bot2 | <eric.ernst> w containerd | 16:31 |
brtknr | hmm | 16:31 |
brtknr | wonder if cri-o is the culprit | 16:32 |
kata-irc-bot2 | <eric.ernst> there may be an issue there. | 16:32 |
kata-irc-bot2 | <eric.ernst> @gmmaharaj is digging into it. | 16:32 |
kata-irc-bot2 | <gmmaharaj> brtknr: what is the cni you are using in k8s? | 16:34 |
brtknr | gmmaha: flannel | 16:34 |
kata-irc-bot2 | <gmmaharaj> thanks. | 16:34 |
brtknr | gmmaha: what should i use? | 16:35 |
kata-irc-bot2 | <gmmaharaj> i am using canal and i am hitting the same issue. was trying to see if it was cni specific. in my case, the pods crash right after kata is deployed and then never come back up | 16:35 |
brtknr | gmmaha: are you also using cri-o? | 16:37 |
gmmaha | brtknr: yeah, i am using crio | 16:37 |
brtknr | eric.ernst what cni are you using? | 16:37 |
brtknr | @eric.ernst what cni are you using? | 16:38 |
kata-irc-bot2 | <eric.ernst> canal | 16:38 |
brtknr | thanks | 16:39 |
gmmaha | brtknr: what version of crio are you running on this machine? | 16:40 |
brtknr | crio version 1.13.6-1.dev.rhaos4.1.gitee2e748.el7-dev and you? | 16:41 |
gmmaha | i have tried 1.13.1, 1.14.1 | 16:42 |
gmmaha | same outcome | 16:42 |
*** pcaruana has joined #kata-dev | 16:49 | |
brtknr | Im gonna attempt with containerd/flannel | 16:49 |
*** devimc has quit IRC | 17:23 | |
*** sameo has joined #kata-dev | 17:40 | |
*** devimc has joined #kata-dev | 17:43 | |
*** devimc has quit IRC | 17:52 | |
kata-irc-bot2 | <graham.whaley> hi @salvador.fuentes @gabriela.cervantes.te - I've been peeking at the k8s expose IP CI integration test, as it popped out in the time stamp logs when we were analysing where time went. First, I think we can improve it with a change of the image used. It seems node-hello has been deprecated, and if we replace it with hello-app we get a big saving in download size: | 17:53 |
kata-irc-bot2 | <graham.whaley> ```gcr.io/google-samples/hello-app 1.0 bc5c421ecd6c7 10.1MB gcr.io/google-samples/node-hello 1.0 4c7ea8709739b 665MB ``` | 17:53 |
kata-irc-bot2 | <graham.whaley> 2nd thing though, I'm not sure if the test is quite testing what we think? On these lines: https://github.com/kata-containers/tests/blob/master/integration/kubernetes/k8s-expose-ip.bats#L55-L59 - that looks to me like maybe it is doing a curl from the pod IP addresses, and not from the externally exposed IP address from the load balancer. wdyt @gabriela.cervantes.te and @archana.m.shinde? I'll see if I can play with that a bit | 17:54 |
kata-irc-bot2 | more to confirm | 17:54 |
kata-irc-bot2 | <salvador.fuentes> yeah, agree on changing the image :slightly_smiling_face:... on the second item I am not sure what is returned with that query... | 17:56 |
kata-irc-bot2 | <graham.whaley> yeah, I think I've only done the node expose IP thing once before, so I need to stare at that more. My local hand test didn't work first time for me :( | 17:57 |
kata-irc-bot2 | Action: graham.whaley very happy at trimming 650Mb off our 1.8Gb download though | 17:57 |
kata-irc-bot2 | <archana.m.shinde> @graham.whaley about your 2nd point, yeah looks like we are just curling against the pod-ip(actual endpoint) | 17:59 |
kata-irc-bot2 | <archana.m.shinde> we should probably do a ```kubectl get svc ``` to get the CLusterIP there and curl against that | 18:00 |
kata-irc-bot2 | <graham.whaley> @archana.m.shinde - right. that's what I thought. I'll see if I can make that work whilst fixing the image then ;) /me hates BATS for lack of debugability... | 18:00 |
kata-irc-bot2 | <archana.m.shinde> and since its a loadbalancer svc and not just a clusterip svc, it would make sense to add a step further to curl againsts the localhost IP itself, to check that it is exposed outside the cluster | 18:02 |
kata-irc-bot2 | <graham.whaley> np. let me first open an Issue. guess who gets to be a reviewer when the PR comes ;) | 18:03 |
kata-irc-bot2 | <archana.m.shinde> haha :slightly_smiling_face: | 18:03 |
*** eernst has quit IRC | 18:07 | |
*** eernst has joined #kata-dev | 18:13 | |
*** eernst has quit IRC | 18:17 | |
*** eernst has joined #kata-dev | 18:17 | |
*** eernst has quit IRC | 18:18 | |
*** pcaruana has quit IRC | 18:39 | |
kata-irc-bot2 | <eric.ernst> @salvador.fuentes @gabriela.cervantes.te anything chance for ubuntu-ci-18-04 for packaging repo in latest day or so? | 18:39 |
kata-irc-bot2 | <eric.ernst> I see a few failures that I didn't execpt: ``` ok 1 Replication controller 1..1 not ok 1 Verify nginx connectivity between pods # (in test file nginx.bats, line 29) # `kubectl wait --for=condition=Available deployment/${deployment}' failed # Image is up to date for docker.io/library/busybox@sha256:6dce4a9c1635c4c9b6a2b645e6613fa0238182fe13929808ee2258370d0f3497 # Image is up to date for | 18:40 |
kata-irc-bot2 | docker.io/library/nginx@sha256:706446e9c6667c0880d5da3f39c09a6c7d2114f5a5d6b74a2fafd24ae30d2078 # INFO: k8s configured to use runtimeclass # deployment.apps/nginx-deployment created # error: timed out waiting for the condition on deployments/nginx-deployment # deployment.extensions "nginx-deployment" deleted # Error from server (NotFound): services "nginx-deployment" not found # error: resource name may not be empty Makefile:107: recipe for | 18:40 |
kata-irc-bot2 | target 'kubernetes' failed make: *** [kubernetes] Error 1 Failed at 40: sudo -E PATH="$PATH" bash -c "make test" Build step 'Execute shell' marked build as failure Performing Post build task... Match found for :.* : True Logical operation result is TRUE Running script : #!/bin/bash ``` | 18:40 |
kata-irc-bot2 | <graham.whaley> hmm, I've been running that test locally today and it was fine. nothing I know of... still, my components will not have been upto date | 18:41 |
kata-irc-bot2 | <eric.ernst> I see several PRs; same for ubuntu-ci-18-04 in proxy repo | 18:42 |
kata-irc-bot2 | <eric.ernst> i see volume test isn't using nginx now... not sure if this could be related. | 18:45 |
kata-irc-bot2 | <eric.ernst> not seeing other changes in tests repo that would indicate a problem. | 18:45 |
kata-irc-bot2 | <gabriela.cervantes.te> I will try to reproduce the error | 18:46 |
kata-irc-bot2 | <graham.whaley> @eric.ernst yeah, (I asked that we) move the volume test off nginx, as the volume test itself has no need for nginx or an image of that size - part of trying to optimise and reduce the CI where we can (micro opts, whilst also looking at macro opts) | 18:55 |
kata-irc-bot2 | <eric.ernst> Yes, I'm just trying to understand why the network connectivity test is failing. | 18:56 |
kata-irc-bot2 | <eric.ernst> and that's the only even-semi-relevant change in the tests repo over last day or two. | 18:56 |
kata-irc-bot2 | <eric.ernst> if volume test ran before connectivity, maybe it is just terrible amount of time needed to pull the image? idk... | 18:56 |
kata-irc-bot2 | <eric.ernst> I see that particular nginx test even last week too. | 19:01 |
kata-irc-bot2 | <eric.ernst> failed waiting for it to become ready. | 19:01 |
kata-irc-bot2 | <eric.ernst> it --> nginx deployment | 19:01 |
kata-irc-bot2 | <graham.whaley> @eric.ernst - if you needed to see what changed across all the major repos, I wrote a script for that: https://github.com/kata-containers/tests/tree/master/cmd/history coz, otherwise it can be hard (although, you probably can coax it out of github if you have the right foo magic on the date range stuff... ) | 19:01 |
*** davidgiluk has quit IRC | 19:16 | |
*** khyr0n has quit IRC | 19:17 | |
kata-irc-bot2 | <graham.whaley> hi @archana.m.shinde - so, I can curl off the service/loadbalancer port (8080), but not the service nodeport (port 31478 for instance) - might that be because that port is meant to be served up/out externally via a proxy (like an nginx etc.), which means maybe it is not ip table routed or whatever on the localhost? That feels like a lot of trouble for me to add to the test.... you OK with me moving the test from testing the pods | 19:23 |
kata-irc-bot2 | directly to testing the port on the svc IP ? :slightly_smiling_face: | 19:23 |
*** igordc has quit IRC | 19:26 | |
kata-irc-bot2 | <gabriela.cervantes.te> @eric.ernst, I can reproduce the issue locally `not ok 1 Verify nginx connectivity between pods` and I am only running that test | 19:28 |
kata-irc-bot2 | <eric.ernst> kubectl describe show anything helpful? | 19:28 |
kata-irc-bot2 | <archana.m.shinde> @graham.whaley yeah sounds good | 19:31 |
gwhaley | ok, fix that and PR in the morning. thx. cu | 19:32 |
*** gwhaley has quit IRC | 19:33 | |
kata-irc-bot2 | <gabriela.cervantes.te> @eric.ernst I will see that in a fresh VM because what is funny is that I am getting an error like `/usr/share/defaults/kata-containers/configuration.toml: file /usr/bin/qemu-system-x86_64 does not exist` | 19:34 |
kata-irc-bot2 | <eric.ernst> did our default toml change? | 19:35 |
kata-irc-bot2 | <eric.ernst> tbh, I don't test via packages often, so haven't seen this. | 19:35 |
kata-irc-bot2 | <gabriela.cervantes.te> ohh ok, I will install everything in a new VM | 19:36 |
*** eernst has joined #kata-dev | 19:48 | |
*** khyr0n has joined #kata-dev | 19:48 | |
*** eernst has quit IRC | 19:52 | |
*** eernst has joined #kata-dev | 19:54 | |
*** sameo has quit IRC | 19:58 | |
kata-irc-bot2 | <gabriela.cervantes.te> @eric.ernst at my ubuntu 18.04 is working ok, I will change the test to use nginx image instead of nginx:1.14 | 20:08 |
kata-irc-bot2 | <gmmaharaj> @eric.ernst we not moving to 5.x kernel anytime soon? | 20:51 |
kata-irc-bot2 | <eric.ernst> yerp | 20:51 |
kata-irc-bot2 | <gmmaharaj> @eric.ernst is on fire with reviews today. | 20:58 |
kata-irc-bot2 | <eric.ernst> bang bang! | 20:58 |
*** tmhoang has joined #kata-dev | 21:04 | |
*** tmhoang has quit IRC | 21:07 | |
*** eernst has quit IRC | 21:17 | |
*** eernst has joined #kata-dev | 21:18 | |
*** eernst has quit IRC | 21:23 | |
*** igordc has joined #kata-dev | 21:31 | |
*** tmhoang has joined #kata-dev | 21:35 | |
*** fuentess has quit IRC | 22:11 | |
*** jugs has quit IRC | 22:24 | |
*** jugs has joined #kata-dev | 22:24 | |
brtknr | gmmaha: I have it working with containerd and flannel | 22:28 |
gmmaha | brtknr: aah nice.. we have been debugging this as well. if crio is restarted once k8s static pods + cni are up, things go wrong and the cni pod is never able to come up and crio keeps crashing. | 22:29 |
gmmaha | it is safer to say that when you are using crio, make sure that before you start k8s, you set the config options, mainly `manage_network_ns_lifecycle` to true before starting anything | 22:30 |
brtknr | hmm interesting, any idea why crio is not as well documented as containerd? | 22:31 |
gmmaha | ohh our documents do need some love. :) i will make a change to fix this piece atleast up today. | 22:32 |
brtknr | gmmaha: also how do I verify that kata-nemu is using virtiofs? | 22:33 |
brtknr | gmmaha: i definitely cant find any mention of 9p under `ps aux | grep nemu | grep 9p` | 22:34 |
gmmaha | on the host if you do `ps -ef | grep virt` you should see 2x the no. of processes as that of kata containers. each pod wil have a qemu process and a virtiofsd_x86_64 process. | 22:34 |
brtknr | gmmaha: thats just want i needed! thanks :) | 22:35 |
gmmaha | cool.. glad i could help | 22:36 |
brtknr | this is very exciting! i can go to sleep for now | 22:36 |
gmmaha | :D | 22:37 |
brtknr | is there somewhere i can read up on how virtiofs differs from 9p and its upsides/downsides? | 22:37 |
gmmaha | brtknr: https://virtio-fs.gitlab.io/ that is their landing page. | 22:37 |
brtknr | gmmaha: they are advertising kata 1.7 there :D | 22:38 |
gmmaha | i believe an arch document for virtio-fs is there or is being worked on. I will try and fill you in on that as soon as i hear more. | 22:38 |
brtknr | just found this document by stefanha https://vmsplice.net/~stefan/stefanha-virtio-fs-kata.pdf | 22:39 |
gmmaha | coool.. | 22:40 |
brtknr | gmmaha: hmm i am running `dd if=/dev/zero of=block bs=512M count=2` and i am surprised by the numbers | 22:49 |
gmmaha | brtknr: good surprised or bad surprised? ;) | 22:50 |
brtknr | 1073741824 bytes (1.1 GB) copied, 8.85543 s, 121 MB/s for kata-qemu | 22:50 |
brtknr | 1048576000 bytes (1.0 GB) copied, 59.3652 s, 17.7 MB/s for kata-nemu | 22:50 |
brtknr | I am guessing I have configured something incorrectly somewhere | 22:50 |
gmmaha | yeah those are odd. i am also guessing kata-qemu is using 9p and not virtio-fs? | 22:52 |
brtknr | gmmaha: yeah thats right | 22:52 |
gmmaha | brtknr: yeah, quite not he numbers i was looking for. | 22:53 |
gmmaha | or what i got when i tested it. | 22:53 |
gmmaha | i was easily seeing twice the throughput | 22:53 |
brtknr | really? | 22:53 |
gmmaha | the last i tested yes.. | 22:54 |
brtknr | gmmaha: this is baremetal perfomance for reference: 1073741824 bytes (1.1 GB) copied, 0.606793 s, 1.8 GB/s | 22:54 |
gmmaha | it has been a little while since i did (2 weeks ish) | 22:54 |
gmmaha | i was comparing just 9p vs virtio-fs and 9p took twice the amount of time compared to virtio-fs to write a 1G file. | 22:59 |
*** eernst has joined #kata-dev | 22:59 | |
*** eernst has quit IRC | 23:00 | |
*** eernst has joined #kata-dev | 23:00 | |
brtknr | what was the command you ran? | 23:01 |
gmmaha | `dd if=/dev/zero of=test.file bs=1M count=1024` | 23:01 |
brtknr | gmmaha: were you doing this inside kata containers or outside? | 23:02 |
gmmaha | inside the kata container. onto the rootfs | 23:02 |
*** eernst has quit IRC | 23:05 | |
brtknr | kata-nemu: 268435456 bytes (268 MB) copied, 2.99375 s, 89.7 MB/s | 23:06 |
brtknr | kata-qemu: 268435456 bytes (268 MB) copied, 1.89188 s, 142 MB/s | 23:06 |
brtknr | its looking 2x slower | 23:07 |
brtknr | I must have something badly configured! | 23:07 |
gmmaha | brtknr: and you are using 1.7.1 right? | 23:07 |
gmmaha | installed that using kata-deploy? | 23:07 |
gmmaha | i will try and bring up my system (virtual unfortunately) and test this out with just docker. | 23:08 |
gmmaha | ls | 23:09 |
brtknr | I'm using packaging master branch | 23:09 |
brtknr | gmmaha: something strange just happened | 23:22 |
brtknr | 536870912 bytes (537 MB) copied, 0.979417 s, 548 MB/s | 23:22 |
brtknr | this is in kata-nemu pod | 23:22 |
brtknr | 536870912 bytes (537 MB) copied, 0.736673 s, 729 MB/s | 23:23 |
brtknr | i deleted kata-deploy and redeployed it | 23:23 |
brtknr | gmmaha: the network file system I am using is BeeGFS... when writing to this filesystem directly on the hypervisor, I get the good performance | 23:31 |
brtknr | OTOH, when I use Ceph as the storage backend, I get a much better performance for kata-nemu | 23:31 |
gmmaha | brtknr: aaah.. | 23:34 |
gmmaha | i am a bit familiar with ceph. never had the chance to work with beeGFS | 23:34 |
gmmaha | so if i undrestnad it right, it works fine if your rootfs is on the host itself. | 23:35 |
gmmaha | where when backed by BeeGFS, you see a big performance hit. is that right? | 23:35 |
brtknr | Well, in both cases I mounted network disks (Ceph and BeeGFS) mounted on the hypervisor to the kata container | 23:37 |
brtknr | Ceph worked well, BeeGFS didnt | 23:38 |
gmmaha | aah.. | 23:39 |
gmmaha | brtknr: i need to head out now, but let me see if i can look into what BeeGFS has.. but this will all boil down to how you ahve setup ceph/beegfs. rbd, cephfs equivalent etc. | 23:39 |
brtknr | gmmaha: thanks for the nice discussion, speak soon :) | 23:42 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!