15:10:50 #startmeeting libvirt 15:10:51 Meeting started Tue Jul 22 15:10:50 2014 UTC and is due to finish in 60 minutes. The chair is danpb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:10:52 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:10:54 The meeting name has been set to 'libvirt' 15:11:01 o/ 15:11:04 o/ 15:11:04 o/ 15:11:07 o/ 15:11:15 o/ 15:11:38 sorry i got side tracked & forgot ... 15:12:29 No worries, it happens. We just had a topic from our team that we'd really value your input on. 15:12:36 ok, go for it 15:13:21 So, we're exploring cpu tuning for containers, but, as I'm sure you've seen, /proc/cpuinfo still shows the host's processor info instead of something that better reflects the tuning for the guest, like /proc/meminfo does. 15:13:54 ah, so this is a real can of worms you'll wish you'd not raised :-) 15:14:00 haha 15:14:05 Lol, oh my favorite! 15:14:32 so for a start containers don't really have any concept of virtualized CPUs 15:14:47 so with cpushares/quota, you still technically have every cpu, but if you've locked down the cpus with cpusets, I believe you would only have the cpus you've been allocated 15:14:53 so you can simulate vcpus with cpusets 15:15:01 eg if you tell libvirt 3 or 8 whatever it is meaningless 15:15:10 mhmm 15:15:19 what containers do give you is the ability to set affinity of the container to the host 15:15:35 so you can say only run this container on host CPUs n->m 15:15:43 which is done with cgroups cpuset 15:16:01 the /proc/cpuinfo file though is really unrelated to CPU affinity masks 15:16:09 eg, consider if you ignore containers for a minute 15:16:24 and just have a host OS and put apache inside a cpuset cgroup 15:16:35 you then have the exact same scenario wrt /proc/cpuinfo 15:17:13 what this all says to me is that applications should basically ignore /proc/cpuinfo as a way to determine how many CPUs they have available 15:17:34 they need to look at what they are bound to 15:18:08 How would we inform applications of that? Is it common for applications to inspect /proc/cpuinfo for tuning themselves? 15:18:45 i don't know to be honest - I've been told some (to remain unnamed) large enterprise software parses stuff in /proc/cpuinfo 15:19:00 heh 15:19:10 hmmm 15:19:19 i kind of see this is a somewhat of a gap in the Linux ecosystem API 15:19:46 nothing is really providing apps a good library API to determine available CPU / RAM availability 15:20:30 I see where you're coming from 15:20:53 i kind of feel the same way about /proc/meminfo - what we hacked up in libvirt is really not container specific - same issue with any app that wants to see "available 'host' memory" which is confined by cgroups memory controller 15:21:15 so wrt vcpu and flavors for containers, we're considering just setting vcpu to zero for our lxc flavors - does that sound reasonable? 15:21:20 so in that sense I (at least partially) regret that we added overriding of /proc/meminfo into libvirt 15:22:05 sew: not sure about that actually 15:22:23 sew: it depends how things interact with the NUMA/CPU pinning stuff I'm working on 15:22:37 What would you have in place of the /proc/meminfo solution in Libvirt to provide guests a normal way of understanding its capabilities? 15:22:45 sew: we're aiming to avoid directly exposing the idea of setting a CPU affinity mask to the user/admin 15:22:53 danpb: working on that in libvirt or nova's libvirt drive? 15:23:00 sew: so the flavour would just say "want exclusive CPU pinning" 15:23:18 and then libvirt nova driver would figure out what host CPUs to pin the guest to 15:23:26 interesting concept danpb 15:23:31 so to be able todo that, we need the vcpus number set to a sensible value 15:23:41 ah, danpb, so we could mimic vcpus for containers with that? 15:23:42 just so that we can figure out how many host CPUs to pin the guest to 15:24:10 even though when we pass this vcpu value onto libvirt it will be ignored 15:24:30 IOW from Nova flavour level, vcpus is still useful even though it isn't useful at libvirt level 15:24:47 apmelton: it is a Juno feature for Nova libvirt driver 15:25:20 ah really, I wasn't aware of that, how do you use it? 15:25:23 thomasem: ultimately i think there needs to be some kind fo API to more easily query cgroup confinement / resource availability 15:25:51 apmelton: big picture is outlined here https://wiki.openstack.org/wiki/VirtDriverGuestCPUMemoryPlacement 15:25:56 Ah, so a process, whether in a full container guest, or simply under a single cgroup limitation can find its boundaries? 15:26:15 thomasem: yeah, pretty much 15:26:26 Hmmm, I wonder how we could pursue that, tbh. 15:26:37 Start a chat on ze mailing list for LXC? 15:26:48 Or perhaps work like that is already underway? 15:26:49 with the way systemd is rising to a standard in linux, and the owner of cgroups, it is possible that systemd's DBus APIs might be the way forward 15:27:00 oh okay 15:27:25 interesting 15:27:29 but i'm fairly sure there's more that systemd would need to expose in this respect still 15:28:06 overall though the current view is that Systemd will be the exclusive owner of all things cgroup related - libvirt and other apps need to talk to systemd to make changes to cgroups config 15:28:15 systemd does seem like the logical place for all that to happen 15:28:21 gotcha 15:28:36 Something to research and pursue, then. 15:31:05 that all said, if there's a compelling reason for libvirt to fake /proc/cpuinfo for sake of compatibility we might be able to explore that upstream 15:31:34 just that it would really be a work of pure fiction based solely on the value from the XML that does nothing from a functional POV :-) 15:31:47 Yeah, we'd be lying. 15:31:48 lol 15:32:00 for added fun, /proc/cpuinfo is utterly different for each CPU architecture - thanks linux :-( 15:32:02 danpb: wouldn't it be better to base it off cpupinning? 15:32:05 It's just the question of whether it's better to lie closer to the truth :P 15:32:16 or is that not supported with libvirt-lxc? 15:32:30 you can do guest pinning with libivrt lxc 15:33:35 for instance, instead of ignoring the vcpu value in lxc, could libvirt translate that into cpu pins? 15:33:56 if the kernel ever introduced a cgroup tunable "max N number of processes concurrently in running state for schedular" that would conceptually work for a vcpu value but that's probalby not likely to happen 15:34:13 heh 15:34:17 apmelton: that would mean the guests were always pinned even when pinning is not requested 15:34:38 which is something i'd prefer to avoid since while it works ok as you startup a sequence of guests 15:34:48 once you shutdown a few & start some new ones, you end up very unbalanced in placement 15:34:59 yea, that gets complex fast 15:35:15 you'd have to have libvirt constantly re-pinning containers to balance things out again 15:36:41 ok, that makes sense 15:39:27 regarding faking /proc/cpuinfo for compatibility, I am not immediately aware of an application use-case that would look for that. Can anyone think of an issue with the guest being able to see the host processor info in general (in a multi-tenant env)? 15:40:00 i don't think there's any real information leakage problems 15:41:41 Okay 15:41:44 maybe if you new that the data you were after was on a particular type of node you could use /proc/cpuinfo to navigate the cloud 15:42:10 just an idea 15:43:41 dgenin: the /proc/cpuinfo file is pretty low entropy as far as identifying information is concerned 15:44:05 particularly as clouds will involve large pools of identical hardware 15:44:13 true, there are bound to be many nodes with the same cpuinfo 15:44:31 there's many other easier ways to identify hosts 15:44:52 what do you have in mind? 15:46:10 sysfs exposes host UUIDs :-) 15:47:08 yeah, but the attacker is not likely to know something so precise 15:47:20 another possibility is that cpuinfo is not sufficient alone but it could be combined with other identifying information to pidgeonhole the node 15:47:47 any other agenda items to discuss, or we can call it a wrap 15:48:28 Not from me. I have stuff to think about now. :) 15:48:45 Not like I didn't before, but more now. hehe. 15:49:04 thx for the background on cpuinfo danpb 15:49:26 ok, till next week.... 15:49:30 #endmeeting