15:02:01 <danpb> #startmeeting libvirt
15:02:02 <openstack> Meeting started Tue Jul 15 15:02:01 2014 UTC and is due to finish in 60 minutes.  The chair is danpb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:02:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:02:05 <openstack> The meeting name has been set to 'libvirt'
15:02:08 <ndipanov> \o
15:02:10 <apmelton> o/
15:03:01 <sew> o/
15:03:25 <nelsnelson> o/
15:03:49 <danpb> ....no agenda so...
15:03:55 <danpb> #topic Open Discussion
15:03:57 <thomasem> o/
15:04:44 <thomasem> Who here is going to the minisummit?
15:05:15 <apmelton> I'll be there
15:05:16 <danpb> you mean the Nova mid-cycle meetup ?
15:05:20 <thomasem> yeah
15:05:24 <s1rp_> ill be there
15:05:29 <danpb> afraid I won't be there - clashes with my holiday
15:05:33 <ndipanov> I am sadly giving it a miss this time
15:05:37 <thomasem> Ahhhh gotcha
15:05:39 <ndipanov> another conf just before that
15:05:58 <thomasem> Okey dokey
15:06:04 <thomasem> I'm going to be there too
15:06:39 <thomasem> Want to discuss that review, s1rp_?
15:07:14 <s1rp_> sure, so the idea is that we want to add config-drive support to lxc guests
15:07:27 <s1rp_> but config-drive as it is in trunk only exposes block-devices
15:07:58 <s1rp_> so this patch adds a new backend type called `fs` which just drops the configdrive info into a directory on the host
15:08:03 <danpb> right yes, i think we discussed in the past doing a bind mount of a filesystem
15:08:06 <danpb> yes
15:08:09 <s1rp_> from there, we copy it into the rootfs to aboid a bind mount
15:08:28 <s1rp_> we avoid a bindmount because there were recent security issues with it
15:08:47 <danpb> well when i say bind mount, i don't mean nova doing a bind mount
15:08:57 <danpb> i mean just add a  <filesystem> element to libvirt XML for it
15:09:14 <danpb> and let libvirt figure out the bind mount setup it wants todo
15:09:38 <danpb> in fact you would not actually need to neccessarily change nova config drive support
15:09:44 <s1rp_> right, but bind-mounts were recently shown to be hackable, the kernel datastructure allowed you to traverse to the root of the bind-mounted filesystem
15:09:48 <danpb> you could let nova continue to create a FAT or ISO image
15:09:53 <s1rp_> so we wanted to avoid that entirely
15:10:00 <danpb> and then libvirt would just mount the image in the container
15:10:35 <danpb> s1rp_: do you hve a link for that problem
15:10:46 <danpb> s1rp_: basically everything about the way LXC is setup involves bind mounts
15:10:59 <danpb> so config drive use of bind mounts is the least of your worries if that's got a flaw
15:11:05 <apmelton> danpb: it's based on this: http://seclists.org/oss-sec/2014/q2/565
15:11:10 <s1rp_> danpb: yeah ill try and dig that up... was on HN a few weeks ago... apmelton thomasem happen to remember where we saw that
15:11:27 <danpb> oh that one
15:11:35 <s1rp_> danpb: userns insulate you from this problem, but we wanted to be extra careful
15:11:42 <danpb> that's a docker flaw due to lack of userns
15:11:56 <danpb> if you don't use userns then you must *not* give out capabilities to the container if you want to be secure
15:12:05 <danpb> thats not a bind mount flaw
15:12:21 <danpb> so i don't think that's any reason to avoid use of bind mounts for config drive
15:12:49 <apmelton> using bind mount can also expose details about the host
15:12:49 <sew> so in general, we'd like to shrink the attack surface of lxc containers by not exposing underlying host resources
15:12:50 <danpb> we already document that LXC in Nova is considered insecure (untill we have the userns feature done)
15:13:21 <apmelton> for instance if we're bind mounting from those hosts root filesystem, details like capacity are exposed to the container
15:13:37 <s1rp_> apmelton: yeah good point, that was another issue with it
15:13:42 <danpb> ok, so that's a more reasonable argument
15:14:06 <danpb> so if that's a concern then instead of introducing a new configdrive  type = fs, just use the existing type=iso or type=fat
15:14:15 <danpb> and we can just loopback mount the config drive image in the container
15:14:25 <danpb> at a well known location of /config or some such
15:14:49 <s1rp_> danpb: isn't that more moving parts?
15:15:20 <danpb> that'd have the advantage that we'd not have so much difference between lxc & non-lxc setup from nova's pov
15:15:44 <apmelton> danpb: does that expose the iso as a device, or as a filesystem?
15:15:53 <s1rp_> configdrive already creates a temp-direcotry and then moves that into the blockdev... the proposed lxc approach just 'blesses' that temp directory into a real directory on the host
15:15:59 <s1rp_> then we move that into rootfs
15:16:09 <danpb> apmelton: if you use  <filesystem type=file>  then you provide a file containing a filesystem which gets mounted at the desire place
15:16:38 <danpb> apmelton: so the container would see the filesystem,  just as it would if you'd done a bind mount but without exposing host FS
15:17:09 <apmelton> ah so we'd do <filesystem type=iso>?
15:17:11 <danpb> s1rp_: you'd have to keep that temp directory around forever though now
15:17:27 <danpb> s1rp_: so it introduces extra cleanup that has to happen for LXC
15:17:40 <s1rp_> danpb: how come, it just gets moved into rootfs and we can forget about it
15:17:40 <danpb> apmelton: no just type=file, libvirt should auto-detect the format and mount it
15:17:57 <apmelton> ah cool
15:18:02 <thomasem> did not know that
15:18:08 <danpb> s1rp_: oh hmm, you're actually moving the dirctory
15:18:16 <s1rp_> danpb: yep
15:18:28 <s1rp_> danpb: code happily will skip the rm if it's not there..
15:18:55 <s1rp_> danpb: also need to add `mv` to compute.filters to allow the copy into the rootfs
15:19:07 <s1rp_> tried to lock that down with a regex though
15:19:25 <danpb> s1rp_: i think from the cloud admin POV it'd still be compelling if we could do this without introducing the need to use   CONF.config_drive_format == 'fs'
15:19:37 <danpb> s1rp_: because that would be one less magic config setting they need to remember to change for LXC only
15:19:50 <s1rp_> danpb: yeah i definitely ++ that point
15:20:27 <danpb> so could you just try out the  <filesystem type=file> approach to see if it will "just work" with the  config_drive_format = fat or iso9660 or both
15:20:44 <danpb> if we hit problems with that idea then we could re-visit it
15:21:01 <s1rp_> sounds good
15:21:16 <danpb> i'll comment on the review for sake of history
15:21:22 <s1rp_> coolness
15:21:50 <apmelton> danpb: do you know if filesystem supports blkiotune?
15:23:00 <apmelton> like, can we set limits for the underlying block device supporting the filesystem?
15:24:26 <danpb> apmelton: there's two block I/O tuning options in libvirt - there's global to the VM settings and there's per-disk settings
15:24:47 <danpb> the former uses cgroups and should work with LXC  IIRC, and the latter uses built-in QEMU rate limiting of which there's no equivalent with LXC
15:25:26 <danpb> if the LXC <filesystems> are each backed by unique  loopback or nbd devices though
15:25:49 <danpb> the top level global I/O tuning with cgroups is probably suffiicient, since each loop/nbd dev would have a distinct major/minor number
15:26:09 <apmelton> so there's that, but also tunes on the config drive
15:26:34 <apmelton> we wouldn't want a container to be able to peg the drive hosting nova's data dir
15:26:50 <apmelton> so we'd wanna set limits on that loopback device pretty low
15:27:18 <danpb> yep, this is probably something that could use some enhancement
15:27:47 <apmelton> I was doing some digging a little while ago, and I don't think the filesystem tag worked with blkiotune
15:28:27 <apmelton> I didn't see any cgroups getting set at least
15:28:38 <danpb> it currently doesn't - you'd have to figure out which loop device was used and then set the top level policy
15:30:06 <apmelton> I suppose that could be tricky if you're setting two different tunings on filesystems hosted on the same block device
15:30:59 <danpb> yeah, that's why QEMU added support for per-device limits itself
15:31:10 <danpb> because cgroups can;'t distinguish that scenario
15:31:20 <apmelton> gotcha
15:33:46 <thomasem> Well this has certainly been a productive chat.
15:35:07 <danpb> cool, so any other topics....
15:37:21 <danpb> ok, lets call this a wrap
15:37:24 <danpb> #endmeeting