13:00:34 #startmeeting image_encryption 13:00:34 Meeting started Mon Sep 9 13:00:34 2024 UTC and is due to finish in 60 minutes. The chair is Luzi. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:34 The meeting name has been set to 'image_encryption' 13:00:44 #topic Roll Call 13:00:53 o/ 13:01:07 ahoy! 13:01:46 dansmith, pdeore and anyone from Cinder, are you around? 13:01:50 hit fungi 13:02:13 hi 13:03:20 well mhen asked for people from Cinder and Glance to come around today, because we need to discuss the disk-format problem 13:03:28 i recall rosmaita had some input during the last meeting 13:04:10 we have a cross-project session planned for the PTG now, see last entry here: https://etherpad.opendev.org/p/2025.1-ptg-glance-planning 13:05:17 i suppose the issue you're alluding to is the recent realization that we need to check images for safety before exposing qemu to them means there's a need for additional inspection after decryption? 13:06:20 correct 13:06:27 dan is working on a spec for this: https://review.opendev.org/c/openstack/glance-specs/+/925111 13:07:00 quoting dansmith here: "I think we need a new disk_format for luks-encrypted images. So much of the complexity of handling that CVE came from all the side vectors by which you can fool nova, glance, and cinder into doing something bad by saying an image is in one format, but actually sending another." 13:07:13 "Having to probe raw images to see if they smell like a LUKS disk, and if not, assume it's a regular raw is inviting more possibility for issue here I think." 13:07:24 "I've got a spec proposed to make glance inspect and reject uploads that do not conform to the stated disk_format" 13:08:03 yep, that matches my understanding from our recent security advisories and the related qemu-img cve 13:08:35 so we need to discuss, what kind of images we want to allow and what the properties of this images should be, so an inspection will notice that 1. the image is encrypted, and maybe 2. it is LUKS encrypted qcow or so 13:09:22 we effectively have two possible encrypted formats right now: 1. raw luks (as created by Cinder from volumes for example) and 2. qcow2+luks 13:09:52 so i guess the idea is that there would be a specific luks-encrypted variant of each basic image type that we support? or would luks be an image type and then you'd have a property stating what kind of payload is encapsulated inside it? 13:10:17 that's the question 13:10:31 how to map this to container_format and disk_format valeus 13:10:39 any other encrypted images than those mentioned by mhen shouldn't be allowed 13:10:45 yeah 13:10:49 ... in a way that pleases Glance, Cinder and Nova 13:11:07 so you could have luks-raw and luks-qcow2 as image types i guess, which wouldn't require additional fields 13:11:58 the can of worms here is: what is considered container and what is disk format when we talk about qcow2+luks for example 13:12:27 we had a similar discussion in the beginning of the image encryption contribution iirc 13:13:06 there was some discussion around this and Glance ended up preferring we leave disk_format and container_format alone and put this into metadata (os_encrypt_*) 13:13:33 ah, like a qcow2 format image that contains an encrypted block payload? rather than a qcow2 image file the entirety of which is an opaque encrypted blob? 13:14:17 oh, i see, just from the metadata perspective then 13:14:29 https://docs.openstack.org/glance/latest/user/formats.html 13:14:48 qcow2 is classified as a disk_format here 13:15:16 aha, i was blithely unaware of the distinction in glance 13:15:50 so "encrypted" container similar to the existing "compressed" container? 13:16:22 I dunno, that's up to Glance (and by extension Cinder and Nova) and we need to discuss exactly this. 13:16:33 makes sense 13:17:17 could imagine a "raw+luks" and "qcow2+luks" disk_format for example 13:17:36 making luks a container_format like "compressed" currently is, is questionable 13:18:03 or we can add a os_decrypt_disk_format which contains the format after decryption: allowed values would be raw or qcow2 13:18:04 since with qcow2+luks, qcow2 is outer and luks is inner format afaik 13:19:27 yeah, that's what i was wondering, so given encryption is an existing option for the qcow2 format already, it technically is still a qcow2 image file just with a different block encoding 13:20:23 whereas with raw, encryption is at the outer layer 13:20:23 "format after decryption" <- I don't think this is applicable here; this would suggest that qcow2 is the inner format but I think it is the other way around 13:21:53 so maybe the real problem is considering qcow2 and raw image encryption as similar, when they're fairly distinct underlying representations that simply need some similar logistics for handling key material and such 13:22:01 whatever we will do, we need to be able to get around with both kind of encrypted images: luks encrypted raw blocks and qcow2 with encrypted payload 13:23:23 yes, and we need appropriate format metadata values to represent those 13:24:16 well we did not wanted to wait until PTG to discuss this, but already have considered some options at least 13:24:27 so for qcow2 there's a crypt_method field which contains values indicating none or aes 13:25:22 or luks 13:25:24 https://github.com/qemu/qemu/blob/master/docs/interop/qcow2.txt#L62-L65 13:25:30 got it 13:26:47 but really if the encryption can be determined by inspecting the header of a qcow2 encrypted image, then that's clearly distinct (and auditable) when compared to an encrypted blob the user claims is hiding a raw format image inside 13:27:30 er, rather, auditable without access to the decryption key i mean 13:27:39 that would be another thing, we may need to discuss 13:27:49 Glance will have access to the encryption keys 13:28:02 encryption/decryption 13:28:27 here's what the output looks like from qemu-img info for a qcow2-luks image: https://etherpad.opendev.org/p/image-formats 13:28:29 okay, so that does at least mean it can theoretically validate both, and it's more a matter of the time and storage required to do the decryption 13:30:01 both qcow2+luks as well as raw luks have a header 13:30:26 oh, right, because raw luks has a luks header 13:30:41 using `qemu-img info` on a raw luks disk will print "file format: luks" 13:30:44 the kernel needs to know the various parameters it was created with, etc 13:31:08 you can identify both pretty easily by looking at the first chunks I guess 13:31:40 and if I understand dansmith correctly, he wants to have disk_format values that represent those specifically so that he can check exactly that 13:31:46 presumably we "just" need the format inspector to be able to check those header details without relying on calls to qemu-img then 13:35:00 so the luks+raw image isn't really a "could be anything" sort of raw image, it's a file with a luks header 13:35:11 pretty much 13:35:13 and we can tell qemu to treat it as luks rather than raw 13:35:37 I guess we could simplify this as "luks" disk_format rather than "raw+luks" 13:36:20 then we'd have "luks" and "qcow2+luks" as new disk_formats, which we can differentiate 13:36:35 both in the inspector and the actual handling 13:37:32 and technically speaking, qcow2+luks isn't really a new image type (though i suppose openstack services could treat it as such if that's the desire), merely a set of supported options for the existing qcow2 image type 13:38:32 would probably be better to just stick with qcow2 as the disk_format 13:39:08 how do we mark it as encrypted then? 13:39:25 just based on whether it has a "os_encrypt_key_id" metadata? 13:39:25 well, it will have the other metadata 13:39:32 won't be usable without it 13:39:47 yeah, so this goes back to my earlier assertion that trying to treat the two cases similarly may be the problem. one is essentially a new type of image, while the other is a new set of options for a type of image openstack already recognizes 13:40:07 fair point 13:40:10 i am coming around to your way of thinking 13:40:51 there are some similar logistics for both since you do need to track key material 13:41:07 rightz 13:41:21 what happens if someone creates a luks image and writes a vmdk or something into it? 13:41:46 can luks encapsulate random other image types? 13:42:02 it encrypts block data, so yes, anything effectively 13:42:03 well, with luks you create a "container" that holds stuff 13:42:35 but the vulns were based on qemu-img handling it before writing it to the target (e.g. for conversion) 13:42:54 I believe it would only look at the first layer here (luks) because it is not called recursively 13:43:03 mmm, right, thinking about how i use it on physical machines, i generally make a disk partition, luks encrypt that, then set the luks device as an lvm physical volume 13:43:33 so i've always thought of our cinder volumes-uploaded-as-images to be container_format==luks, disk_format=raw 13:44:13 though 'luks' is not a defined value for glance container_format 13:44:26 https://docs.openstack.org/glance/latest/user/formats.html 13:44:32 "The container format refers to whether the virtual machine image is in a file format that also contains metadata about the actual virtual machine." 13:44:57 I first thought so too, but container_format means something different in Glance. 13:45:00 mhen: the risks of passing untrusted image types to qemu-img extend to qemu itself as well. basically we need to make sure that qemu's format guesser doesn't come into play and that it only uses the precise drivers we want because it also includes drivers with unsafe side effects 13:45:05 yeah, so i've been thinking about it incorrectly 13:45:24 the qemu-img risk is merely an extension of a larger risk inherent to qemu as a whole 13:45:52 yea right, qemu is also handling the attachment to libvirt iirc 13:48:06 i guess the question is whether, when qemu boots a luks encrypted image, does it decrypt the luks wrapper and use its own (vmdk or whatever) driver on what's inside the luks encryption in order to present that unencrypted to the guest kernel? or does it pass the luks encrypted device to the guest kernel and let the kernel's own block device drivers handle what's inside? 13:49:02 the risk we're trying to mitigate is not knowing what disk drivers qemu will decide to use on an image 13:49:07 the guest kernel sees an unencrypted block device afaik, encryption/decryption is handled by the host somewhere around KVM and qemu 13:50:53 okay, so in this case qemu's luks driver would decrypt the image and then present a block device with vmdk headers to the guest kernel, rather than qemu also iteratively calling into its vmdk driver and presenting whatever's inside that vmdk to the guest kernel? 13:51:33 the former behavior should be safe, the latter unsafe 13:52:03 fungi: I believe that the former is the case, however I'm no expert of qemu 13:52:33 Anything sounds funny about adimm's answer? :) https://bugs.launchpad.net/glance/+bug/2079027 13:53:02 croelandt: wrong channel? 13:54:37 croelandt: and yes, that's clearly something pasted from an llm chatbot 13:55:42 oops sorry, wrong channel indeed 13:56:58 okay so we definitely need to discuss this with Nova and Glance 13:57:46 Luzi: and possibly also with someone from qemu who has deeper knowledge about whether it's "safe" to let qemu directly handle untrusted luks format images 13:58:18 though the cinder/glance/nova folks probably know who from qemu to loop in 13:59:04 I'll see if I can attempt to try the vmdk-within-luks case in one of my dev setups 14:01:05 thanks! 14:02:26 okay, thank you for the discussion today 14:02:35 have a nice week 14:02:41 thanks! 14:02:49 #endmeeting image_encryption