opendevreview | Rajesh Tailor proposed openstack/python-novaclient master: Bump microversion to 2.96 https://review.opendev.org/c/openstack/python-novaclient/+/911575 | 05:06 |
---|---|---|
*** mklejn_ is now known as mklejn | 09:00 | |
*** fnordahl_ is now known as fnordahl | 10:13 | |
opendevreview | Rajesh Tailor proposed openstack/python-novaclient master: Bump microversion to 2.96 https://review.opendev.org/c/openstack/python-novaclient/+/911575 | 12:00 |
-opendevstatus- NOTICE: Jobs that fail due to being unable to resolve mirror.dfw.rackspace.opendev.org can be rechecked. This error was an unexpected side effect of some nodepool configuration changes which have been reverted. | 16:55 | |
dansmith | melwitt: so I created an encrypted instance, cirros image, booted fine | 18:04 |
dansmith | I snapshotted it, the snap has the new credential uuid | 18:05 |
dansmith | I then booted a new instance from that snap, saw that it ran qemu-img convert to (I think) unencrypt it before it uploaded to ceph to then create a vm image from, | 18:05 |
dansmith | but the disk is unreadable in the new instance.. like fails in BIOS | 18:06 |
dansmith | since this is ceph it's hard to examine the actual disk to see what the state is.. I should try to repro with qcow because I can poke at it easier | 18:06 |
dansmith | but (a) that's expected to work, right? | 18:07 |
dansmith | and (b) I'm thinking if a user does this, they don't realize that you just stripped the encryption from their snapshot image to upload it | 18:07 |
dansmith | or maybe I'm wrong about what was happening - perhaps we were just re-encrypting it with the new key? if had used qcow it'd be easier for me to see what happened | 18:08 |
dansmith | maybe it's in the logs actually | 18:08 |
melwitt | dansmith: hm, well it shouldn't be unencrypting it, convert is used to make a copy with a new key. but as we discussed I have to change all that and haven't yet | 18:08 |
dansmith | okay, I thought that maybe the "no support yet for encrypted backing files" meant we were de-keying it because it's the image we're based on | 18:09 |
melwitt | and yeah, it was expected to work, to be able to boot a new instance from the image. there might be something that got messed up when I refactored the things needed from review feedback on the bottom patches | 18:10 |
melwitt | well, the rbd patch is after the encrypted backing file support one | 18:10 |
dansmith | okay so, there's a command in the log which is what I was seeing when I watched it, where we convert _to_ luks _from_ raw | 18:11 |
dansmith | so I assumed that meant we had raw'd it | 18:11 |
melwitt | and that's the root disk, not swap or ephemeral? | 18:12 |
dansmith | we wouldn't run convert on ephemeral or swap because we're creating from scratch would we/ | 18:14 |
melwitt | oh, right. in earlier revisions I had it creating as raw blank and then mkfs and then encrypted it. but yeah I think I changed it to create luks off the bat | 18:14 |
melwitt | so it sounds like there must be a bug there that happened in the intervening time. I need to look into it | 18:16 |
melwitt | converting from raw to luks should only happen if the source image is unencrypted. and it would be interesting to see if it's erroneously getting converted to raw after pulling it down from ceph. I wonder if maybe i messed something up in the fetch_to_raw function. it's supposed to be a no-op if the source image is luks | 18:17 |
melwitt | that was something that got changed during addressing review feedback | 18:18 |
melwitt | I'll investigate | 18:21 |
dansmith | there's a raw disk in _base, but I think that might be the original cirros I created from | 18:22 |
dansmith | but yeah, I probably need to take ceph out of the equation here so I can grok what's going on | 18:22 |
dansmith | I'm trying to look at the log and I don't see anything with "-O raw" which is good | 18:22 |
dansmith | yeah I'm not sure maybe I misread a command line as a convert when it was something else | 18:24 |
melwitt | it would be 'qemu-img convert' in that case | 18:26 |
dansmith | yeah | 18:27 |
dansmith | we are doing a lot of qemu-img info execs with no input format, which is a problem, | 18:27 |
dansmith | not sure if that's old or new code though | 18:27 |
dansmith | okay let me redo this with no ceph so I can try to figure out they boot-of-a-snap is dead | 18:28 |
dansmith | or see if it's a ceph thing I guess | 18:29 |
melwitt | should be old, for qemu-img info | 18:29 |
dansmith | ack, I thought we had fixed those | 18:30 |
melwitt | I'm double checking. maybe I'm forgetting something | 18:31 |
melwitt | info itself didn't change but passing the format is optional, it's possible I added a call and didn't pass the format. I don't remember off the top of my head but just saying there's a possibility. I'll go back and fix that if I find it | 18:33 |
dansmith | oh I know it's optional, it's just a security bug to not provide the format | 18:36 |
dansmith | I guess maybe we had some info cases where we need to probe in the dark, as long as we're careful not to follow the links | 18:36 |
dansmith | convert is the one you absolutely have to always have | 18:36 |
dansmith | anyway, no need to rathole on that specifically, just noticed it | 18:37 |
melwitt | ack, thanks | 18:38 |
stblatzheim | dansmith: if you're around would be happy for review of https://review.opendev.org/c/openstack/nova/+/911070 :) | 20:10 |
artom | I think I need a bit of philosophical guidance from anyone still around... | 21:20 |
artom | What's the general opinion for changing Nova code _just_ to enable additional testing in CI? | 21:21 |
artom | Specifically https://review.opendev.org/c/openstack/nova/+/910022/ | 21:21 |
artom | CPU power management in libvirt is done via module-level functions, so I don't see a way to mock/patch/stub stuff to do live migration testing in functional tests, as both of the libvirt drivers on the source and destination host call into the same module, nova.virt.libvirt.cpu.api | 21:22 |
artom | I would essentially have to turn all that machinery into classes, and give each virt driver an instance of the class | 21:23 |
artom | Which would allow proper per-host mocking | 21:23 |
artom | And I feel like that's a pretty radical change | 21:23 |
dansmith | philosophically, I think we should design code to work well and to be testable (as the latter feeds into the former) | 21:38 |
dansmith | whether or not it's worth refactoring what you're looking at specifically is maybe a judgment call, but I wouldn't rule it out on principle or anything | 21:38 |
artom | So - thinking out loud here then. | 21:39 |
artom | 1. The scope is in-tree functional tests. | 21:39 |
artom | 2. The existing code is _not_ testable for live migration. Not without some pretty convoluted, probably fragile, and hard to understand hax that may not even work. | 21:40 |
artom | 3. Sounds like I have a case for refactoring. At least, it's not to be dismissed off-hand. | 21:42 |
artom | If I do it and it ends up getting shot down in review, at least that outcome wasn't immediately obvious when I started (now). | 21:43 |
dansmith | yeah I mean it's hard to say without a little more detail, but I'm too fried at this point in the day to dig into something else | 21:43 |
dansmith | but the words you're saying sound fine on their face | 21:44 |
artom | Yeah, that's the best I can hope for, at this point. Hard to argue about hypothetical without concrete code to look at. | 21:44 |
artom | At least I know I'm not way cray-cray with my reflex to refactor. | 21:44 |
dansmith | not *because* of this anyway :) | 21:46 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!