Tuesday, 2019-06-11

*** igordc has quit IRC		00:56
*** khyr0n has joined #kata-dev		03:35
*** khyr0n has quit IRC		05:26
*** lpetrut has joined #kata-dev		05:28
*** lpetrut has quit IRC		05:43
*** sameo has joined #kata-dev		06:37
*** lpetrut has joined #kata-dev		06:46
*** sameo has quit IRC		06:52
*** pcaruana has joined #kata-dev		06:55
*** tmhoang has joined #kata-dev		07:13
*** jodh has joined #kata-dev		07:22
*** sameo has joined #kata-dev		07:59
*** gwhaley has joined #kata-dev		07:59
*** davidgiluk has joined #kata-dev		08:05
*** tobberydberg has quit IRC		09:22
*** tobberydberg has joined #kata-dev		09:31
*** gwhaley has quit IRC		09:43
*** pcaruana has quit IRC		10:19
kata-irc-bot2	<xu> I’ve sent the mail to skip the AC meeting this week.	10:26
kata-irc-bot2	<xu> Let’s discuss topics online.	10:27
*** gwhaley has joined #kata-dev		10:28
*** pcaruana has joined #kata-dev		11:09
*** devimc has joined #kata-dev		12:12
*** fuentess has joined #kata-dev		12:35
*** devimc has quit IRC		13:51
*** pcaruana has quit IRC		14:04
*** devimc has joined #kata-dev		14:19
*** pcaruana has joined #kata-dev		14:30
*** pcaruana has quit IRC		14:31
*** pcaruana has joined #kata-dev		14:31
*** jodh has quit IRC		14:59
brtknr	hey all, is docker going to support RuntimeClass at some point in the same way crio and containerd do now?	15:11
*** lpetrut has quit IRC		15:12
*** pcaruana has quit IRC		15:17
*** devimc has quit IRC		15:19
*** devimc has joined #kata-dev		15:20
brtknr	to work with kubernetes... as I understand, docker already has a --runtime flag	15:21
kata-irc-bot2	<eric.ernst> I’m not sure. Seems a lot of docker integration is migrating into containerd for k8s...	15:30
*** pcaruana has joined #kata-dev		15:43
brtknr	I see	15:44
brtknr	Failed create pod sandbox: rpc error: code = Unknown desc = container create failed: nemu-system-x86_64: -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/hugepages,share=on,prealloc=on: os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM	15:44
*** igordc has joined #kata-dev		15:44
brtknr	I am trying to get the kata-nemu example off the ground but hitting this	15:45
brtknr	I have vm.nr_hugepages=1024 in /etc/sysctl.conf	15:46
stefanha	brtknr: Did you actually set that sysctl? (Putting it in the file isn't enough, at least if you don't run other commands or restart the host.)	15:49
stefanha	brtknr: sysctl vm.nr_hugepages=1024	15:49
brtknr	stefanha: yes, i set it in the file and rebooted the machine	15:50
stefanha	brtknr: What output does "sysctl vm.nr_hugepages" produce?	15:50
brtknr	[centos@kata-worker-1 ~]$ sysctl vm.nr_hugepages	15:50
brtknr	vm.nr_hugepages = 1024	15:50
stefanha	brtknr: Any other QEMU processes running? :)	15:51
*** sameo has quit IRC		15:52
brtknr	stefanha: not as far as im aware	15:52
brtknr	its a new instance	15:52
stefanha	brtknr: If the host has enough memory you could try doubling the number of hugepages just to make sure there are enough hugepages for this 2 GB guest.	15:55
*** pcaruana has quit IRC		15:55
brtknr	The host has 128GB of memory	15:56
kata-irc-bot2	<raravena80> not that I know of, probably not. They are even talking about deprecating the docker-shim.	15:58
brtknr	stefanha: I've doubled it to 2048, same error	15:59
brtknr	stefanha: I'm only trying to run the test-deploy-kata-nemu.yaml example	15:59
stefanha	brtknr: Hm...I'm stumped. QEMU prints this error when it gets SIGBUS while accessing all guest RAM pages during startup.	16:00
stefanha	brtknr: I'm not sure why it's getting SIGBUS if there are enough hugepages reserved.	16:00
brtknr	I've tried running the kata-qemu example and it works fine	16:00
brtknr	stefanha: so I tried running the containers directly via docker and it appears to work	16:11
*** gwhaley has quit IRC		16:13
davidgiluk	brtknr: You could try checking /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages before starting it	16:13
*** tmhoang has quit IRC		16:21
brtknr	davidgiluk: that gives me the same result as `sysctl vm.nr_hugepages`	16:25
*** khyr0n has joined #kata-dev		16:25
brtknr	its invocation through k8s thats tanking	16:25
davidgiluk	ok, so I guess that means it actually did allocate the number of pages you asked for - it doesn't always	16:26
brtknr	davidgiluk: do I need a newer version of crio?	16:26
brtknr	or containerd?	16:26
davidgiluk	no idea	16:26
*** gwhaley has joined #kata-dev		16:28
davidgiluk	brtknr: I wonder if k8s is giving the container it's own /dev/hugepages mount or something similar	16:28
brtknr	looks like @eric.ernst is the one who commited the kata-nemu example	16:30
davidgiluk	brtknr: https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/ says 'Huge pages are isolated at a pod scope' - so I guess it's doing something, but my knowledge of k8s is ><	16:30
kata-irc-bot2	<eric.ernst> :wave:	16:30
brtknr	@eric.ernst how did you get kata-nemu working on k8s 1.14?	16:31
*** eernst has joined #kata-dev		16:31
kata-irc-bot2	<eric.ernst> kubectl apply :slightly_smiling_face:	16:31
brtknr	lol that easy?	16:31
kata-irc-bot2	<eric.ernst> w containerd	16:31
brtknr	hmm	16:31
brtknr	wonder if cri-o is the culprit	16:32
kata-irc-bot2	<eric.ernst> there may be an issue there.	16:32
kata-irc-bot2	<eric.ernst> @gmmaharaj is digging into it.	16:32
kata-irc-bot2	<gmmaharaj> brtknr: what is the cni you are using in k8s?	16:34
brtknr	gmmaha: flannel	16:34
kata-irc-bot2	<gmmaharaj> thanks.	16:34
brtknr	gmmaha: what should i use?	16:35
kata-irc-bot2	<gmmaharaj> i am using canal and i am hitting the same issue. was trying to see if it was cni specific. in my case, the pods crash right after kata is deployed and then never come back up	16:35
brtknr	gmmaha: are you also using cri-o?	16:37
gmmaha	brtknr: yeah, i am using crio	16:37
brtknr	eric.ernst what cni are you using?	16:37
brtknr	@eric.ernst what cni are you using?	16:38
kata-irc-bot2	<eric.ernst> canal	16:38
brtknr	thanks	16:39
gmmaha	brtknr: what version of crio are you running on this machine?	16:40
brtknr	crio version 1.13.6-1.dev.rhaos4.1.gitee2e748.el7-dev and you?	16:41
gmmaha	i have tried 1.13.1, 1.14.1	16:42
gmmaha	same outcome	16:42
*** pcaruana has joined #kata-dev		16:49
brtknr	Im gonna attempt with containerd/flannel	16:49
*** devimc has quit IRC		17:23
*** sameo has joined #kata-dev		17:40
*** devimc has joined #kata-dev		17:43
*** devimc has quit IRC		17:52
kata-irc-bot2	<graham.whaley> hi @salvador.fuentes @gabriela.cervantes.te - I've been peeking at the k8s expose IP CI integration test, as it popped out in the time stamp logs when we were analysing where time went. First, I think we can improve it with a change of the image used. It seems node-hello has been deprecated, and if we replace it with hello-app we get a big saving in download size:	17:53
kata-irc-bot2	<graham.whaley> ```gcr.io/google-samples/hello-app 1.0 bc5c421ecd6c7 10.1MB gcr.io/google-samples/node-hello 1.0 4c7ea8709739b 665MB ```	17:53
kata-irc-bot2	<graham.whaley> 2nd thing though, I'm not sure if the test is quite testing what we think? On these lines: https://github.com/kata-containers/tests/blob/master/integration/kubernetes/k8s-expose-ip.bats#L55-L59 - that looks to me like maybe it is doing a curl from the pod IP addresses, and not from the externally exposed IP address from the load balancer. wdyt @gabriela.cervantes.te and @archana.m.shinde? I'll see if I can play with that a bit	17:54
kata-irc-bot2	more to confirm	17:54
kata-irc-bot2	<salvador.fuentes> yeah, agree on changing the image :slightly_smiling_face:... on the second item I am not sure what is returned with that query...	17:56
kata-irc-bot2	<graham.whaley> yeah, I think I've only done the node expose IP thing once before, so I need to stare at that more. My local hand test didn't work first time for me :(	17:57
kata-irc-bot2	Action: graham.whaley very happy at trimming 650Mb off our 1.8Gb download though	17:57
kata-irc-bot2	<archana.m.shinde> @graham.whaley about your 2nd point, yeah looks like we are just curling against the pod-ip(actual endpoint)	17:59
kata-irc-bot2	<archana.m.shinde> we should probably do a ```kubectl get svc ``` to get the CLusterIP there and curl against that	18:00
kata-irc-bot2	<graham.whaley> @archana.m.shinde - right. that's what I thought. I'll see if I can make that work whilst fixing the image then ;) /me hates BATS for lack of debugability...	18:00
kata-irc-bot2	<archana.m.shinde> and since its a loadbalancer svc and not just a clusterip svc, it would make sense to add a step further to curl againsts the localhost IP itself, to check that it is exposed outside the cluster	18:02
kata-irc-bot2	<graham.whaley> np. let me first open an Issue. guess who gets to be a reviewer when the PR comes ;)	18:03
kata-irc-bot2	<archana.m.shinde> haha :slightly_smiling_face:	18:03
*** eernst has quit IRC		18:07
*** eernst has joined #kata-dev		18:13
*** eernst has quit IRC		18:17
*** eernst has joined #kata-dev		18:17
*** eernst has quit IRC		18:18
*** pcaruana has quit IRC		18:39
kata-irc-bot2	<eric.ernst> @salvador.fuentes @gabriela.cervantes.te anything chance for ubuntu-ci-18-04 for packaging repo in latest day or so?	18:39
kata-irc-bot2	<eric.ernst> I see a few failures that I didn't execpt: ``` ok 1 Replication controller 1..1 not ok 1 Verify nginx connectivity between pods # (in test file nginx.bats, line 29) # `kubectl wait --for=condition=Available deployment/${deployment}' failed # Image is up to date for docker.io/library/busybox@sha256:6dce4a9c1635c4c9b6a2b645e6613fa0238182fe13929808ee2258370d0f3497 # Image is up to date for	18:40
kata-irc-bot2	docker.io/library/nginx@sha256:706446e9c6667c0880d5da3f39c09a6c7d2114f5a5d6b74a2fafd24ae30d2078 # INFO: k8s configured to use runtimeclass # deployment.apps/nginx-deployment created # error: timed out waiting for the condition on deployments/nginx-deployment # deployment.extensions "nginx-deployment" deleted # Error from server (NotFound): services "nginx-deployment" not found # error: resource name may not be empty Makefile:107: recipe for	18:40
kata-irc-bot2	target 'kubernetes' failed make: *** [kubernetes] Error 1 Failed at 40: sudo -E PATH="$PATH" bash -c "make test" Build step 'Execute shell' marked build as failure Performing Post build task... Match found for :.* : True Logical operation result is TRUE Running script : #!/bin/bash ```	18:40
kata-irc-bot2	<graham.whaley> hmm, I've been running that test locally today and it was fine. nothing I know of... still, my components will not have been upto date	18:41
kata-irc-bot2	<eric.ernst> I see several PRs; same for ubuntu-ci-18-04 in proxy repo	18:42
kata-irc-bot2	<eric.ernst> i see volume test isn't using nginx now... not sure if this could be related.	18:45
kata-irc-bot2	<eric.ernst> not seeing other changes in tests repo that would indicate a problem.	18:45
kata-irc-bot2	<gabriela.cervantes.te> I will try to reproduce the error	18:46
kata-irc-bot2	<graham.whaley> @eric.ernst yeah, (I asked that we) move the volume test off nginx, as the volume test itself has no need for nginx or an image of that size - part of trying to optimise and reduce the CI where we can (micro opts, whilst also looking at macro opts)	18:55
kata-irc-bot2	<eric.ernst> Yes, I'm just trying to understand why the network connectivity test is failing.	18:56
kata-irc-bot2	<eric.ernst> and that's the only even-semi-relevant change in the tests repo over last day or two.	18:56
kata-irc-bot2	<eric.ernst> if volume test ran before connectivity, maybe it is just terrible amount of time needed to pull the image? idk...	18:56
kata-irc-bot2	<eric.ernst> I see that particular nginx test even last week too.	19:01
kata-irc-bot2	<eric.ernst> failed waiting for it to become ready.	19:01
kata-irc-bot2	<eric.ernst> it --> nginx deployment	19:01
kata-irc-bot2	<graham.whaley> @eric.ernst - if you needed to see what changed across all the major repos, I wrote a script for that: https://github.com/kata-containers/tests/tree/master/cmd/history coz, otherwise it can be hard (although, you probably can coax it out of github if you have the right foo magic on the date range stuff... )	19:01
*** davidgiluk has quit IRC		19:16
*** khyr0n has quit IRC		19:17
kata-irc-bot2	<graham.whaley> hi @archana.m.shinde - so, I can curl off the service/loadbalancer port (8080), but not the service nodeport (port 31478 for instance) - might that be because that port is meant to be served up/out externally via a proxy (like an nginx etc.), which means maybe it is not ip table routed or whatever on the localhost? That feels like a lot of trouble for me to add to the test.... you OK with me moving the test from testing the pods	19:23
kata-irc-bot2	directly to testing the port on the svc IP ? :slightly_smiling_face:	19:23
*** igordc has quit IRC		19:26
kata-irc-bot2	<gabriela.cervantes.te> @eric.ernst, I can reproduce the issue locally `not ok 1 Verify nginx connectivity between pods` and I am only running that test	19:28
kata-irc-bot2	<eric.ernst> kubectl describe show anything helpful?	19:28
kata-irc-bot2	<archana.m.shinde> @graham.whaley yeah sounds good	19:31
gwhaley	ok, fix that and PR in the morning. thx. cu	19:32
*** gwhaley has quit IRC		19:33
kata-irc-bot2	<gabriela.cervantes.te> @eric.ernst I will see that in a fresh VM because what is funny is that I am getting an error like `/usr/share/defaults/kata-containers/configuration.toml: file /usr/bin/qemu-system-x86_64 does not exist`	19:34
kata-irc-bot2	<eric.ernst> did our default toml change?	19:35
kata-irc-bot2	<eric.ernst> tbh, I don't test via packages often, so haven't seen this.	19:35
kata-irc-bot2	<gabriela.cervantes.te> ohh ok, I will install everything in a new VM	19:36
*** eernst has joined #kata-dev		19:48
*** khyr0n has joined #kata-dev		19:48
*** eernst has quit IRC		19:52
*** eernst has joined #kata-dev		19:54
*** sameo has quit IRC		19:58
kata-irc-bot2	<gabriela.cervantes.te> @eric.ernst at my ubuntu 18.04 is working ok, I will change the test to use nginx image instead of nginx:1.14	20:08
kata-irc-bot2	<gmmaharaj> @eric.ernst we not moving to 5.x kernel anytime soon?	20:51
kata-irc-bot2	<eric.ernst> yerp	20:51
kata-irc-bot2	<gmmaharaj> @eric.ernst is on fire with reviews today.	20:58
kata-irc-bot2	<eric.ernst> bang bang!	20:58
*** tmhoang has joined #kata-dev		21:04
*** tmhoang has quit IRC		21:07
*** eernst has quit IRC		21:17
*** eernst has joined #kata-dev		21:18
*** eernst has quit IRC		21:23
*** igordc has joined #kata-dev		21:31
*** tmhoang has joined #kata-dev		21:35
*** fuentess has quit IRC		22:11
*** jugs has quit IRC		22:24
*** jugs has joined #kata-dev		22:24
brtknr	gmmaha: I have it working with containerd and flannel	22:28
gmmaha	brtknr: aah nice.. we have been debugging this as well. if crio is restarted once k8s static pods + cni are up, things go wrong and the cni pod is never able to come up and crio keeps crashing.	22:29
gmmaha	it is safer to say that when you are using crio, make sure that before you start k8s, you set the config options, mainly `manage_network_ns_lifecycle` to true before starting anything	22:30
brtknr	hmm interesting, any idea why crio is not as well documented as containerd?	22:31
gmmaha	ohh our documents do need some love. :) i will make a change to fix this piece atleast up today.	22:32
brtknr	gmmaha: also how do I verify that kata-nemu is using virtiofs?	22:33
brtknr	gmmaha: i definitely cant find any mention of 9p under `ps aux \| grep nemu \| grep 9p`	22:34
gmmaha	on the host if you do `ps -ef \| grep virt` you should see 2x the no. of processes as that of kata containers. each pod wil have a qemu process and a virtiofsd_x86_64 process.	22:34
brtknr	gmmaha: thats just want i needed! thanks :)	22:35
gmmaha	cool.. glad i could help	22:36
brtknr	this is very exciting! i can go to sleep for now	22:36
gmmaha	:D	22:37
brtknr	is there somewhere i can read up on how virtiofs differs from 9p and its upsides/downsides?	22:37
gmmaha	brtknr: https://virtio-fs.gitlab.io/ that is their landing page.	22:37
brtknr	gmmaha: they are advertising kata 1.7 there :D	22:38
gmmaha	i believe an arch document for virtio-fs is there or is being worked on. I will try and fill you in on that as soon as i hear more.	22:38
brtknr	just found this document by stefanha https://vmsplice.net/~stefan/stefanha-virtio-fs-kata.pdf	22:39
gmmaha	coool..	22:40
brtknr	gmmaha: hmm i am running `dd if=/dev/zero of=block bs=512M count=2` and i am surprised by the numbers	22:49
gmmaha	brtknr: good surprised or bad surprised? ;)	22:50
brtknr	1073741824 bytes (1.1 GB) copied, 8.85543 s, 121 MB/s for kata-qemu	22:50
brtknr	1048576000 bytes (1.0 GB) copied, 59.3652 s, 17.7 MB/s for kata-nemu	22:50
brtknr	I am guessing I have configured something incorrectly somewhere	22:50
gmmaha	yeah those are odd. i am also guessing kata-qemu is using 9p and not virtio-fs?	22:52
brtknr	gmmaha: yeah thats right	22:52
gmmaha	brtknr: yeah, quite not he numbers i was looking for.	22:53
gmmaha	or what i got when i tested it.	22:53
gmmaha	i was easily seeing twice the throughput	22:53
brtknr	really?	22:53
gmmaha	the last i tested yes..	22:54
brtknr	gmmaha: this is baremetal perfomance for reference: 1073741824 bytes (1.1 GB) copied, 0.606793 s, 1.8 GB/s	22:54
gmmaha	it has been a little while since i did (2 weeks ish)	22:54
gmmaha	i was comparing just 9p vs virtio-fs and 9p took twice the amount of time compared to virtio-fs to write a 1G file.	22:59
*** eernst has joined #kata-dev		22:59
*** eernst has quit IRC		23:00
*** eernst has joined #kata-dev		23:00
brtknr	what was the command you ran?	23:01
gmmaha	`dd if=/dev/zero of=test.file bs=1M count=1024`	23:01
brtknr	gmmaha: were you doing this inside kata containers or outside?	23:02
gmmaha	inside the kata container. onto the rootfs	23:02
*** eernst has quit IRC		23:05
brtknr	kata-nemu: 268435456 bytes (268 MB) copied, 2.99375 s, 89.7 MB/s	23:06
brtknr	kata-qemu: 268435456 bytes (268 MB) copied, 1.89188 s, 142 MB/s	23:06
brtknr	its looking 2x slower	23:07
brtknr	I must have something badly configured!	23:07
gmmaha	brtknr: and you are using 1.7.1 right?	23:07
gmmaha	installed that using kata-deploy?	23:07
gmmaha	i will try and bring up my system (virtual unfortunately) and test this out with just docker.	23:08
gmmaha	ls	23:09
brtknr	I'm using packaging master branch	23:09
brtknr	gmmaha: something strange just happened	23:22
brtknr	536870912 bytes (537 MB) copied, 0.979417 s, 548 MB/s	23:22
brtknr	this is in kata-nemu pod	23:22
brtknr	536870912 bytes (537 MB) copied, 0.736673 s, 729 MB/s	23:23
brtknr	i deleted kata-deploy and redeployed it	23:23
brtknr	gmmaha: the network file system I am using is BeeGFS... when writing to this filesystem directly on the hypervisor, I get the good performance	23:31
brtknr	OTOH, when I use Ceph as the storage backend, I get a much better performance for kata-nemu	23:31
gmmaha	brtknr: aaah..	23:34
gmmaha	i am a bit familiar with ceph. never had the chance to work with beeGFS	23:34
gmmaha	so if i undrestnad it right, it works fine if your rootfs is on the host itself.	23:35
gmmaha	where when backed by BeeGFS, you see a big performance hit. is that right?	23:35
brtknr	Well, in both cases I mounted network disks (Ceph and BeeGFS) mounted on the hypervisor to the kata container	23:37
brtknr	Ceph worked well, BeeGFS didnt	23:38
gmmaha	aah..	23:39
gmmaha	brtknr: i need to head out now, but let me see if i can look into what BeeGFS has.. but this will all boil down to how you ahve setup ceph/beegfs. rbd, cephfs equivalent etc.	23:39
brtknr	gmmaha: thanks for the nice discussion, speak soon :)	23:42

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!