15:02:01 <danpb> #startmeeting libvirt 15:02:02 <openstack> Meeting started Tue Jul 15 15:02:01 2014 UTC and is due to finish in 60 minutes. The chair is danpb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:05 <openstack> The meeting name has been set to 'libvirt' 15:02:08 <ndipanov> \o 15:02:10 <apmelton> o/ 15:03:01 <sew> o/ 15:03:25 <nelsnelson> o/ 15:03:49 <danpb> ....no agenda so... 15:03:55 <danpb> #topic Open Discussion 15:03:57 <thomasem> o/ 15:04:44 <thomasem> Who here is going to the minisummit? 15:05:15 <apmelton> I'll be there 15:05:16 <danpb> you mean the Nova mid-cycle meetup ? 15:05:20 <thomasem> yeah 15:05:24 <s1rp_> ill be there 15:05:29 <danpb> afraid I won't be there - clashes with my holiday 15:05:33 <ndipanov> I am sadly giving it a miss this time 15:05:37 <thomasem> Ahhhh gotcha 15:05:39 <ndipanov> another conf just before that 15:05:58 <thomasem> Okey dokey 15:06:04 <thomasem> I'm going to be there too 15:06:39 <thomasem> Want to discuss that review, s1rp_? 15:07:14 <s1rp_> sure, so the idea is that we want to add config-drive support to lxc guests 15:07:27 <s1rp_> but config-drive as it is in trunk only exposes block-devices 15:07:58 <s1rp_> so this patch adds a new backend type called `fs` which just drops the configdrive info into a directory on the host 15:08:03 <danpb> right yes, i think we discussed in the past doing a bind mount of a filesystem 15:08:06 <danpb> yes 15:08:09 <s1rp_> from there, we copy it into the rootfs to aboid a bind mount 15:08:28 <s1rp_> we avoid a bindmount because there were recent security issues with it 15:08:47 <danpb> well when i say bind mount, i don't mean nova doing a bind mount 15:08:57 <danpb> i mean just add a <filesystem> element to libvirt XML for it 15:09:14 <danpb> and let libvirt figure out the bind mount setup it wants todo 15:09:38 <danpb> in fact you would not actually need to neccessarily change nova config drive support 15:09:44 <s1rp_> right, but bind-mounts were recently shown to be hackable, the kernel datastructure allowed you to traverse to the root of the bind-mounted filesystem 15:09:48 <danpb> you could let nova continue to create a FAT or ISO image 15:09:53 <s1rp_> so we wanted to avoid that entirely 15:10:00 <danpb> and then libvirt would just mount the image in the container 15:10:35 <danpb> s1rp_: do you hve a link for that problem 15:10:46 <danpb> s1rp_: basically everything about the way LXC is setup involves bind mounts 15:10:59 <danpb> so config drive use of bind mounts is the least of your worries if that's got a flaw 15:11:05 <apmelton> danpb: it's based on this: http://seclists.org/oss-sec/2014/q2/565 15:11:10 <s1rp_> danpb: yeah ill try and dig that up... was on HN a few weeks ago... apmelton thomasem happen to remember where we saw that 15:11:27 <danpb> oh that one 15:11:35 <s1rp_> danpb: userns insulate you from this problem, but we wanted to be extra careful 15:11:42 <danpb> that's a docker flaw due to lack of userns 15:11:56 <danpb> if you don't use userns then you must *not* give out capabilities to the container if you want to be secure 15:12:05 <danpb> thats not a bind mount flaw 15:12:21 <danpb> so i don't think that's any reason to avoid use of bind mounts for config drive 15:12:49 <apmelton> using bind mount can also expose details about the host 15:12:49 <sew> so in general, we'd like to shrink the attack surface of lxc containers by not exposing underlying host resources 15:12:50 <danpb> we already document that LXC in Nova is considered insecure (untill we have the userns feature done) 15:13:21 <apmelton> for instance if we're bind mounting from those hosts root filesystem, details like capacity are exposed to the container 15:13:37 <s1rp_> apmelton: yeah good point, that was another issue with it 15:13:42 <danpb> ok, so that's a more reasonable argument 15:14:06 <danpb> so if that's a concern then instead of introducing a new configdrive type = fs, just use the existing type=iso or type=fat 15:14:15 <danpb> and we can just loopback mount the config drive image in the container 15:14:25 <danpb> at a well known location of /config or some such 15:14:49 <s1rp_> danpb: isn't that more moving parts? 15:15:20 <danpb> that'd have the advantage that we'd not have so much difference between lxc & non-lxc setup from nova's pov 15:15:44 <apmelton> danpb: does that expose the iso as a device, or as a filesystem? 15:15:53 <s1rp_> configdrive already creates a temp-direcotry and then moves that into the blockdev... the proposed lxc approach just 'blesses' that temp directory into a real directory on the host 15:15:59 <s1rp_> then we move that into rootfs 15:16:09 <danpb> apmelton: if you use <filesystem type=file> then you provide a file containing a filesystem which gets mounted at the desire place 15:16:38 <danpb> apmelton: so the container would see the filesystem, just as it would if you'd done a bind mount but without exposing host FS 15:17:09 <apmelton> ah so we'd do <filesystem type=iso>? 15:17:11 <danpb> s1rp_: you'd have to keep that temp directory around forever though now 15:17:27 <danpb> s1rp_: so it introduces extra cleanup that has to happen for LXC 15:17:40 <s1rp_> danpb: how come, it just gets moved into rootfs and we can forget about it 15:17:40 <danpb> apmelton: no just type=file, libvirt should auto-detect the format and mount it 15:17:57 <apmelton> ah cool 15:18:02 <thomasem> did not know that 15:18:08 <danpb> s1rp_: oh hmm, you're actually moving the dirctory 15:18:16 <s1rp_> danpb: yep 15:18:28 <s1rp_> danpb: code happily will skip the rm if it's not there.. 15:18:55 <s1rp_> danpb: also need to add `mv` to compute.filters to allow the copy into the rootfs 15:19:07 <s1rp_> tried to lock that down with a regex though 15:19:25 <danpb> s1rp_: i think from the cloud admin POV it'd still be compelling if we could do this without introducing the need to use CONF.config_drive_format == 'fs' 15:19:37 <danpb> s1rp_: because that would be one less magic config setting they need to remember to change for LXC only 15:19:50 <s1rp_> danpb: yeah i definitely ++ that point 15:20:27 <danpb> so could you just try out the <filesystem type=file> approach to see if it will "just work" with the config_drive_format = fat or iso9660 or both 15:20:44 <danpb> if we hit problems with that idea then we could re-visit it 15:21:01 <s1rp_> sounds good 15:21:16 <danpb> i'll comment on the review for sake of history 15:21:22 <s1rp_> coolness 15:21:50 <apmelton> danpb: do you know if filesystem supports blkiotune? 15:23:00 <apmelton> like, can we set limits for the underlying block device supporting the filesystem? 15:24:26 <danpb> apmelton: there's two block I/O tuning options in libvirt - there's global to the VM settings and there's per-disk settings 15:24:47 <danpb> the former uses cgroups and should work with LXC IIRC, and the latter uses built-in QEMU rate limiting of which there's no equivalent with LXC 15:25:26 <danpb> if the LXC <filesystems> are each backed by unique loopback or nbd devices though 15:25:49 <danpb> the top level global I/O tuning with cgroups is probably suffiicient, since each loop/nbd dev would have a distinct major/minor number 15:26:09 <apmelton> so there's that, but also tunes on the config drive 15:26:34 <apmelton> we wouldn't want a container to be able to peg the drive hosting nova's data dir 15:26:50 <apmelton> so we'd wanna set limits on that loopback device pretty low 15:27:18 <danpb> yep, this is probably something that could use some enhancement 15:27:47 <apmelton> I was doing some digging a little while ago, and I don't think the filesystem tag worked with blkiotune 15:28:27 <apmelton> I didn't see any cgroups getting set at least 15:28:38 <danpb> it currently doesn't - you'd have to figure out which loop device was used and then set the top level policy 15:30:06 <apmelton> I suppose that could be tricky if you're setting two different tunings on filesystems hosted on the same block device 15:30:59 <danpb> yeah, that's why QEMU added support for per-device limits itself 15:31:10 <danpb> because cgroups can;'t distinguish that scenario 15:31:20 <apmelton> gotcha 15:33:46 <thomasem> Well this has certainly been a productive chat. 15:35:07 <danpb> cool, so any other topics.... 15:37:21 <danpb> ok, lets call this a wrap 15:37:24 <danpb> #endmeeting