fungi | i'm reenqueuing 927214,1 into the deploy pipeline to see if that rax iad api error is transient | 00:05 |
---|---|---|
fungi | same error again | 00:37 |
fungi | https://rackspace.service-now.com/system_status doesn't indicate any known problem | 00:38 |
fungi | build history says the job's been working in daily periodic | 00:45 |
fungi | the same keypair task succeeded for rax dfw and ord but failed on iad, so this must be temporary | 00:48 |
fungi | maybe when the periodic run happens in a couple of hours i'll still be up, otherwise i can reenqueue it again tomorrow when i get a free moment | 00:49 |
Clark[m] | You can also probably remove that region temporarily? | 01:35 |
opendevreview | tzing proposed opendev/system-config master: Update openEuler mirror repo https://review.opendev.org/c/opendev/system-config/+/927462 | 06:33 |
opendevreview | tzing proposed openstack/diskimage-builder master: Upgrade openEuler to 22.03 LTS https://review.opendev.org/c/openstack/diskimage-builder/+/927466 | 06:41 |
frickler | the keypair api was still broken for the periodic job, too, but it is working when I'm checking it manually from bridge now, so I'll try to reenqueue once again | 06:51 |
frickler | likely not related, but checking grafana it looks like we have about 70 instances stuck deleting in rax since a week https://grafana.opendev.org/d/a8667d6647/nodepool3a-rackspace?orgId=1&from=now-30d&to=now | 06:54 |
frickler | also ubuntu-ports mirroring is broken, might be related to arm slowness issues. The lock file '/afs/.openstack.org/mirror/ubuntu-ports/db/lockfile' already exists. | 06:59 |
frickler | not sure if I'll get to any of this today, though | 06:59 |
opendevreview | tzing proposed openstack/diskimage-builder master: Upgrade openEuler to 24.03 LTS https://review.opendev.org/c/openstack/diskimage-builder/+/927466 | 08:35 |
frickler | next failure, this time in infra-prod-base, but transient: E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 15213 (unattended-upgr) | 09:24 |
frickler | doing another enqueue ... | 09:24 |
frickler | oh, seems interest in openeuler has returned. curious to see for how long it will last this time | 09:26 |
opendevreview | Thomas Bachman proposed openstack/project-config master: Re-introduce gate jobs for networking-cisco https://review.opendev.org/c/openstack/project-config/+/927493 | 10:38 |
opendevreview | Jens Harbott proposed openstack/project-config master: Pin publish-openstack-releasenotes-python3 to jammy https://review.opendev.org/c/openstack/project-config/+/927495 | 11:43 |
frickler | fungi: elodilles: ^^ | 11:43 |
fungi | thanks | 11:43 |
elodilles | thanks, too o/ | 11:50 |
opendevreview | Merged openstack/project-config master: Pin publish-openstack-releasenotes-python3 to jammy https://review.opendev.org/c/openstack/project-config/+/927495 | 12:03 |
frickler | fungi: latest invocation of infra-prod-run-cloud-launcher seems to have passed | 12:10 |
frickler | confirming keypairs are in place for SJC3(flex) | 12:12 |
fungi | yep, i saw you mention rerunning it earlier, thanks! | 12:14 |
fungi | once i'm in a stable place for a bit, and assuming nothing more urgent comes up, i'll try launching a mirror server there | 12:16 |
cardoe | If ya need any help on the flex bits or the existing rax bits, you can ping me. | 12:51 |
frickler | cardoe: nice, thx, we'll certainly come back to that ... like possibly with the stuck instances I mentioned earlier | 12:53 |
cardoe | You know the tenant / project? I’ll poke it when I’m at my desk. | 12:56 |
fungi | cardoe: we were getting a lengthy (at least several hours) 503 "service unavailable" error back from the iad keystone endpoint over night, at least 23:38-02:35 utc but could have been longer. the rackspace status page didn't indicate any outage though | 13:00 |
fungi | it's working fine now, just figured you'd want to know if it wasn't an unannounced maintenance or something | 13:01 |
frickler | actually when I checked this morning, only the os-keypair API seemed affected, other things like "flavor list" did work, which seemed very confusing to me | 13:04 |
cardoe | Yeah that's weird. | 13:04 |
frickler | the issue was gone when I tried to debug in more detail later | 13:04 |
fungi | oh, interesting, so it was just os-keypair. agreed that's extra strange | 13:05 |
frickler | cardoe: project_id 637776, example instance 97bf1d90-7137-420e-9156-95224ff72945, stuck in deleting for a week now | 13:07 |
cardoe | poking | 13:20 |
fungi | cardoe: if you're interested, i tried to boot our first instance in flex sjc3, dc5d05c7-8bdb-4b68-a1ee-eac44d6312f2 (project name 610275_Flex) errored with "Exceeded max scheduling attempts 3... Last exception: HTTP HTTPInternalServerError" as the detail in server show | 13:41 |
fungi | similar info returned from the sdk during the server create: | 13:43 |
fungi | openstack.exceptions.SDKException: Error in creating the server. Compute service reports fault: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance dc5d05c7-8bdb-4b68-a1ee-eac44d6312f2. Last exception: HTTP HTTPInternalServerError | 13:43 |
fungi | i'll leave that one for now and try booting a second one just to see if it persists | 13:49 |
fungi | my second try succeeded, so it must have been an intermittent failure | 13:51 |
fungi | infra-root: new development in raxflex testing... the Ubuntu-24.04 image they have lacks ifupdown (i think it must be all netplan/networkmanager and maybe systemd-networkd?). does that sound familiar? is there a preference for falling back on jammy for now vs trying with an official noble image downloaded from ubuntu.com? | 14:37 |
fungi | or is there an outstanding change for our launch script to handle noble? | 14:38 |
frickler | fungi: interesting, best refer to how tonyb did this, iiuc he did create mirror nodes running noble already? | 14:42 |
frickler | my noble vm running the default cloud image doesn't have ifupdown either | 14:44 |
Clark[m] | Maybe login to the vexxhost nodes and see what they have? | 14:48 |
Clark[m] | I wasn't aware of any patches to launch node to make that work | 14:48 |
frickler | dpkg-query: package 'ifupdown' is not installed | 14:53 |
frickler | on mirror02.sjc3.vexxhost | 14:54 |
fungi | it's possible i misread where the actual error was on launch | 14:55 |
fungi | in openeuler mirror news, the reason for the failures/stale lock is because the volume is full | 14:55 |
fungi | it's got a 350g quota right now, should i try increasing that to 400g? | 14:56 |
fungi | also i've commented on https://review.opendev.org/927462 trying to get a handle on what the addition of another release means for storage requirements | 14:56 |
fungi | okay, yeah it's actually make_swap.sh that the launch failed on | 14:57 |
fungi | swapon: /dev/sdc: read swap header failed | 14:58 |
fungi | mount: /mnt: unknown filesystem type 'swap'. | 14:58 |
frickler | is that the ephemeral volume or an extra one? the one on vexxhost doesn't have any extra disks, so this may be an untested scenario | 15:01 |
fungi | oh, interesting, it also formatted a /swapfile so i think what's happening is that there's a preexisting entry for /dev/sdc (probably the ephemeral disk?) as a swap device | 15:01 |
fungi | i'll have to rerun with --keep and inspect the fstab just to be sure | 15:01 |
frickler | seems the script only knows about /dev/vdb and /dev/xvde, so we'll need to adapt it | 15:02 |
fungi | oh, good point, i could give that a try too. right now struggling with how to get ssh'd into the held node | 15:09 |
fungi | s/held node/kept server/ | 15:09 |
fungi | aha, it may be added by cloud-init? found this entry in fstab: | 15:14 |
fungi | /dev/sdc none swap sw,comment=cloudconfig 0 0 | 15:14 |
fungi | but yeah, i can try patching it to add /dev/sdc as a possibility and see what happens | 15:14 |
Clark[m] | If you keep the node you should be able to use the temporary ssh key that launch node generates | 15:25 |
fungi | yeah, i did, that's how i got the fstab | 15:29 |
fungi | i think the server is coming up with /dev/sdc already active as swap too, so i'm testing unswapping and deleting that line out of fstab in the script before it tries to configure swap | 15:30 |
clarkb | we could also just have two swap devices if that is simpler | 15:30 |
clarkb | but ya maybe we should do something like if `lsblk | grep SWAP`; then skip; fi | 15:31 |
fungi | well, the ephemeral disk is configured as swap but then we try mounting it | 15:31 |
clarkb | hrm reading make_swap.sh we check if the device is mounted, if it is we unmount it then format it then remount the formatted device | 15:33 |
clarkb | how is it getting to the point where it mounts something that hasn't been reformated? | 15:33 |
clarkb | and all of that is already protected by a check for "only do this if we have no swap" | 15:33 |
clarkb | re openeuler, maybe the thing that makes the most sense is to delete what we've already got and start fresh? | 15:36 |
clarkb | I'm not sure there was much if any utilization of the old release | 15:36 |
fungi | yeah, i'm not sure. i've instrumented some additional debugging around that part to find out what exactly is happening | 15:37 |
clarkb | is it possible that the swap device is one that we created so ended up in that code | 15:37 |
clarkb | then its the reformatting we got wrong preventing it from being mounted? | 15:38 |
fungi | oh, there's also already a /dev/sdb mounted as swap | 15:42 |
fungi | er, not mounted, but formatted | 15:43 |
fungi | blkid reports this: | 15:43 |
fungi | /dev/sdb: UUID="db3566d3-28f5-4d1b-86f7-d0edc30f2b14" TYPE="swap" | 15:43 |
clarkb | mgiht also help to get a before picture too | 15:43 |
fungi | that is before | 15:43 |
fungi | also this: | 15:44 |
fungi | /dev/sdc: LABEL="ephemeral0" UUID="f8bf32ef-29c0-4eb5-99b0-8b28e6527add" BLOC | 15:44 |
fungi | K_SIZE="4096" TYPE="ext4" | 15:44 |
clarkb | if there is already a swap device we should completely skip the code in make_swap.sh | 15:44 |
fungi | but then this is what's in the fstab: | 15:44 |
fungi | /dev/sdb /mnt auto defaults,nofail,x-systemd.requires=cloud-init.service,_netdev,comment=cloudconfig 0 2 | 15:44 |
clarkb | oh! | 15:44 |
clarkb | so the disk layout for the image is buggy | 15:45 |
clarkb | in that case maybe uploading the upstream image does make sense | 15:45 |
fungi | oh, er | 15:45 |
fungi | also this is in the original fstab: | 15:46 |
fungi | /dev/sdc none swap sw,comment=cloudconfig 0 0 | 15:47 |
fungi | so i think that means cloud-init detected sdb and sdc backwards? | 15:47 |
clarkb | ya seems like something got them mixed up (could be udev changing the order too I Think) | 15:48 |
mordred | stupid cloud-init | 15:48 |
corvus | mordred with the hot takes :) | 15:48 |
clarkb | I think the next thing to try would be to upload an upstream image to see if it behaves any better | 15:48 |
mordred | I'm useful | 15:48 |
fungi | assuming comment=cloudconfig is an indicator the entry was added by cloud-init anyway | 15:48 |
fungi | yeah, i can try uploading a known official ubuntu image later today maybe, we'll see how far i get with it | 15:49 |
clarkb | ya but maybe their image is a snapshot of an image that was booted and added that stuff prior to our boot | 15:50 |
clarkb | then when we boot off the snapshot udev runs against new devices and gets the names mixed up | 15:50 |
clarkb | would need to look at cloud-init logs to see where it is doing that | 15:50 |
frickler | fungi: what does the flavor say that you used? iiuc nova can create both swap + ephemeral if both is in the flavor | 15:51 |
fungi | swap | 4096 | 15:52 |
fungi | OS-FLV-EXT-DATA:ephemeral | 64 | 15:52 |
fungi | so yes, it has both devices, but they're switched around in the fstab for some reason | 15:52 |
fungi | or maybe blkid is identifying them backwards | 15:53 |
clarkb | fungi: I think this is the classic issue with using device paths in fstab. Order isn't garunteed boot to boot | 15:53 |
clarkb | you should prefer uuids or labels instead | 15:53 |
fungi | yep | 15:53 |
clarkb | which is why I wonder if the fstab was written by cloudinit then the image was snapshooted then we boot that later and get a different order | 15:53 |
frickler | oh, that isn't even shown in "flavor list" by default | 15:53 |
fungi | yeah, it's possible the /etc/fstab is baked into the image and it's a custom doctored image supplied by the provider instead of an official distro image | 15:54 |
frickler | maybe that's also something cardoe can find out more about | 15:54 |
frickler | but also another reason to try with a stock image | 15:55 |
fungi | trying to upload one from https://cloud-images.ubuntu.com/noble/current/ right now | 15:57 |
fungi | it's queued as ubuntu-noble-server-cloudimg-2024-08-22 | 15:57 |
fungi | once it's done importing glance-side i'll try booting with that instead | 15:57 |
clarkb | (side note, we may need to try multiple reboots too to check that things are stable as we may encounter the same issue down the road with a reboot) | 15:58 |
fungi | agreed | 15:58 |
fungi | yeah, looks like that's getting past the swap setup issues | 15:59 |
fungi | i'll unwind my changes to the script and see if it "just works" without fixing | 15:59 |
clarkb | that == upstream image? | 16:00 |
fungi | yes | 16:00 |
fungi | it got as far as trying to start unbound and then brokw | 16:01 |
fungi | broke | 16:01 |
fungi | unbound.service: Start request repeated too quickly. | 16:02 |
fungi | unbound.service: Failed with result 'exit-code'. | 16:02 |
clarkb | I think that means it failed for some other reason previously and is trying to restart too quickly which systsemd doesn't like | 16:03 |
clarkb | may need to keep the node and check journalctl logs | 16:03 |
fungi | though now openstack server list is failing there... 502 Bad Gateway: nginx/1.27.0 | 16:04 |
clarkb | maybe we wrote out ipv6 configs for unbound when we shouldn't have or something | 16:04 |
fungi | coming back from the nova api | 16:04 |
fungi | and now it's back to working again | 16:04 |
fungi | confirmed that the official noble image doesn't have a problem with the original launch node script before i worked on trying to fix it | 16:06 |
clarkb | fungi: oh so undoing your changes fixed the unbound issue? | 16:07 |
fungi | i don't know about that yet, just noting that they were unnecessary to get make_swap.sh to succeed once i switched to the official image | 16:09 |
fungi | and no, unbound still isn't starting, but i told it to keep the instance this time so i can look closer | 16:10 |
clarkb | got it | 16:10 |
fungi | error: unable to open /var/lib/unbound/root.key for reading: No such file or directory | 16:14 |
fungi | i guess we need to fix that in the launch script too | 16:14 |
frickler | oh, I think I remember something. that could be in a separate pkg that moved from required to optional | 16:15 |
fungi | maybe we have a race condition? dns-root-data was already installed, and stopping/starting unbound "just worked" | 16:16 |
frickler | hmm, the file exists on mirror02, but is not known to dpkg | 16:16 |
clarkb | fungi: I thought there was a fix for that in system-config. We don't install unbound until we run ansible so that sould cover it | 16:17 |
fungi | yeah | 16:17 |
clarkb | but ya maybe it is an order of operations thing we install unbound before getting to that package? | 16:17 |
frickler | the pkgs puts it into /usr/share/dns/root.key | 16:17 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/925311/2/playbooks/roles/base/unbound/tasks/main.yaml | 16:18 |
fungi | yeah, it's an order of operations problem | 16:18 |
clarkb | yes I think the order is the issue | 16:18 |
clarkb | we need to flip the task we added and the one above it | 16:18 |
fungi | unbound is already enabled, and fails to start, then we install dns-root-data and we try to enable unbound which is already enabled and remains in a failed state | 16:19 |
frickler | ExecStartPre=-/usr/libexec/unbound-helper root_trust_anchor_update | 16:19 |
frickler | so the service needs a restart after installing dns-root-data I'd think | 16:20 |
fungi | right, unless we can install dns-root-data before unbound | 16:20 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Install dns-root-data before unbound https://review.opendev.org/c/opendev/system-config/+/927526 | 16:22 |
clarkb | the dns-root-key pacakge looks simple and should be able to write teh data out without any interaction to anything else, Then when we install unbound I would expect it to work | 16:26 |
clarkb | but I didn't approve that change in case there was any other debugging you wanted to do first. Feel free to approve if not | 16:26 |
frickler | seems like tonyb hit the same error, but didn't test with a fresh install https://review.opendev.org/c/opendev/system-config/+/925311 | 16:28 |
frickler | so that fix seems reasonable | 16:28 |
fungi | i don't need to check anything else i don't think, i'll go ahead and approve it so i can test again here in a bit and see if we have any other problems | 16:29 |
fungi | oh, frickler approved it. thanks all! | 16:30 |
cardoe | apologies. been in video chats for hours on end. | 16:57 |
cardoe | fungi: Flex SJC3 should be fixed now. | 16:58 |
fungi | cardoe: thanks! | 16:58 |
fungi | i'll go ahead and delete the error node i was leaving there | 16:59 |
cardoe | So for some context, flex is cloudnull's | 17:00 |
cardoe | It used to be my org/team so that's why I said I'd wrangle. | 17:00 |
cardoe | I'm doing the ironic-y side | 17:00 |
fungi | no problem, thanks for the assist! | 17:01 |
cardoe | I know the image is custom and it's silly/annoying. | 17:01 |
fungi | no worries, we thought we'd try it first | 17:01 |
fungi | but we're used to uploading our own images for stuff, so it's no biggie | 17:01 |
fungi | just be aware there might be a mismatch between the swap/ephemeral devices and the baked-in fstab on the noble image there | 17:02 |
opendevreview | Merged opendev/system-config master: Install dns-root-data before unbound https://review.opendev.org/c/opendev/system-config/+/927526 | 17:36 |
cardoe | well now I can finally peek. | 17:59 |
cardoe | fungi: the noble image on flex is straight Ubuntu... https://docs.rackspacecloud.com/openstack-glance-images/#ubuntu-2404-noble that's how it's sourced. | 18:06 |
cardoe | The image is custom on OSPC (the old stuff) and that's annoying. | 18:06 |
Clark[m] | cardoe: is it edited in anyway? It seemed like the fstab entries between swap and the ephemeral disk were getting mixed up. One theory I had is maybe y'all are snapshotting a slightly edited version that includes a cloud inited fstab that maybe gets it wrong when booted later | 18:13 |
Clark[m] | I also theorized this mixup would occur in subsequent boots regardless and we'll need to test that | 18:13 |
cardoe | I'm being told that it's not. But I'm gonna compare hashes. | 18:14 |
JayF | This issue was pointed out in Ironic regarding device reordering; it was cited as being seen on new RHELs but maybe worth consideration? https://review.opendev.org/c/openstack/ironic/+/927518 tl;dr: new async device initialization can cause direct-use of e.g. /dev/sda to fail (and reorder) | 18:20 |
cardoe | It's unmodified. It's old. It's from June but it's the same hash. | 18:20 |
JayF | I'd be surprised if this is it, but just worth a thought at all | 18:21 |
cardoe | So I've reproduced your issue and I even grabbed the latest image from Canonical and reproduced it as well. | 18:41 |
fungi | ca | 18:54 |
fungi | gah, sorry, typing on a phone | 18:54 |
fungi | cardoe: when uploading the current noble cloud image (url mentioned above) we didn't run into the issue, but at the moment i'm not in a position to investigate further | 18:56 |
cardoe | hrm. I literally did exactly what was in that doc and reproduced it. | 18:57 |
fungi | interesting | 19:00 |
fungi | i'll be doing some more test boots soon and can try to replicate both ways | 19:01 |
fungi | once i'm at a computer for a few minutes | 19:01 |
clarkb | ya it could be a race | 19:07 |
clarkb | and if so the solution is likely to have cloud-init write out fstab using labels or uuids | 19:07 |
clarkb | or maybe there is something that can be done on the kvm/qemu side to make the devices attach more consistently though I think that could always still be potentially problematic | 19:08 |
fungi | cardoe: `sha512sum noble-server-cloudimg-amd64.img` claims "06a897542d73dbfd40ad5131ba108a21aed84b027c71451e4fd756681802dd62529f32867c4f38c148cb068076eeba334b7e8880336c7f1a829cd57fa3e9850b" which is also what `openstack image show ubuntu-noble-server-cloudimg-2024-08-22` claims while `openstack image show Ubuntu-24.04` says it's " | 19:14 |
fungi | er, says its hash is "06a897542d73dbfd40ad5131ba108a21aed84b027c71451e4fd756681802dd62529f32867c4f38c148cb068076eeba334b7e8880336c7f1a829cd57fa3e9850b" | 19:15 |
fungi | so both match, yes | 19:15 |
fungi | i'll retry with your latest Ubuntu-24.04 now | 19:15 |
fungi | cardoe: sorry, no, i pasted from the wrong output, the Ubuntu-24.04 hash claimed by glance is actually "a8c6a2adfcf440f5d8f849afc1e84790558fbf163abe9bb925b05f6e861d5c90dca3c3652a54d852821edfad1d0d2bc7749e0661f5055209ca063c1e99421bb3" instead | 19:16 |
fungi | so different image. which one did you download? | 19:17 |
fungi | the image i was having success with is https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img specifically | 19:17 |
Clark[m] | The hash reported by glance is never the same as the one you upload aiui (which is crazy but ya) unfortunately | 19:22 |
fungi | oddly, it does match in the one i uploaded | 19:24 |
fungi | https://paste.opendev.org/show/bXaIezBstl4YLazqm1j9/ | 19:27 |
fungi | though maybe the problem isn't a difference in the image, but a difference in the image properties which is changing nova's behavior | 19:28 |
fungi | i didn't set any properties at all when uploading to glance | 19:28 |
fungi | yeah, i'm still consistently failing on "mount: /mnt: unknown filesystem type 'swap'." when launching with Ubuntu-24.04 while the ubuntu-noble-server-cloudimg-2024-08-22 i uploaded is working fine | 19:31 |
Clark[m] | Have you tried rebooting the one you uploaded multiple times on a single instance or just new boots via new instances? | 19:35 |
fungi | not yet | 19:37 |
fungi | struggling with car chargers, netbook battery is about to give out | 19:38 |
fungi | and cellular wifi is being finicky too | 19:38 |
cardoe | So I set all those properties and it was a baddie. | 19:46 |
cardoe | The deleting stuff on the old rax are cleaned up now. | 19:48 |
fungi | just confirmed for the image i uploaded (no properties), clout-init correctly found and used /dev/vdb as ephemeral and /dev/vdc as swap | 19:49 |
fungi | er, cloud-init | 19:49 |
fungi | so virtual disk driver instead of scsi disk driver | 19:50 |
fungi | guess that's the hw_scsi_model='virtio-scsi' property at work for the latter | 19:51 |
fungi | and/or hw_disk_bus='scsi' | 19:52 |
fungi | repeatedly rebooting the instance created from my propertyless image upload, several times now, comes up with vdb as ephemeral and vdc as swap every time so far | 20:02 |
JayF | fungi: this is sounding really similar to the issue TheJulia documented in Ironic that I linked above ^^ | 20:03 |
fungi | i can try reuploading and setting the same scsi properties i see on the public Ubuntu-24.04 that's been exhibiting the problem, maybe that will narrow it down | 20:04 |
JayF | especially since vdx works and sdx breaks | 20:04 |
fungi | yeah, that's what i'm surmising | 20:04 |
fungi | cardoe: JayF: yep, bug reproduced when i boot from the exact same image file uploaded with --property hw_disk_bus='scsi' --property hw_scsi_model='virtio-scsi' | 20:13 |
JayF | What is it, they change ordering from some device every 5 years in the kernel? You could set a very slow annoying clock by it | 20:14 |
fungi | not sure which of those two properties is sufficient to reproduce the problem, or whether both are required together | 20:15 |
fungi | i guess for now, from opendev's perspective, it's a question of whether infra-root sysadmins would prefer a mirror server that's using virtio or an older ubuntu jammy instead of noble | 20:18 |
JayF | there are absolutely ways to fix this, I suspect even in a cloud-init script | 20:18 |
JayF | e.g. sub out device names for LABEL= in a first boot script | 20:18 |
JayF | as long as cloud-init stuff is setup to find the right device and not *also* hardcoded to sda | 20:19 |
cardoe | Well so for you to use virtio-scsi you gotta be using the scsi disk bus | 20:21 |
cardoe | So I suspect that disabling either effectively turns them both off. | 20:21 |
fungi | i suspected as much, which is why i simply set both | 20:22 |
Clark[m] | I suspect that we don't need whatever performance improvement that might bring for the mirror node | 20:23 |
Clark[m] | It's bottleneck is almost certainly going to be network and not disk related | 20:23 |
fungi | JayF: well, i think it's being detected the wrong way around *by* cloud-init | 20:23 |
JayF | they're going to get a flood of bug reports then and will have to find a way to solve that | 20:24 |
fungi | Clark[m]: also i'm going to attach a cinder volume regardless because neither the rootfs nor the ephemeral disk is large enough to hold all our usual caches | 20:24 |
fungi | so yeah, we'll hardly be touching these devices either way | 20:24 |
fungi | and sorry for slow replies. i switched to my backup netbook which had a full charge but teensy little keyboard | 20:25 |
cardoe | I was trying to see if there was an existing report against cloud-init | 20:26 |
cardoe | Could it be how nova creates the entry in libvirt is causing it to hint sda? | 20:26 |
cardoe | I'm not crazy familiar with the nova side but I do know the libvirt side. | 20:26 |
fungi | well, sda is the rootfs in this case, but yeah b and c for ephemeral and swap are being detected in reverse | 20:27 |
cardoe | I meant sdb/sdc sorry. | 20:27 |
cardoe | Basically in libvirt you'd end up having <target dev='sdb' bus='scsi'/> | 20:28 |
fungi | i do happen to know some nova people, unsurprisingly. we can certainly ask around ;) | 20:28 |
cardoe | But you don't have to specify those and instead can just say where on the bus it's hanging from. | 20:28 |
cardoe | And you can even say "gimme the next one" | 20:28 |
JayF | cardoe: AIUI, those are even reordering, the pci path stuff | 20:29 |
JayF | cardoe: at least according to the doc julia wrote up | 20:29 |
fungi | so it depends on the guest kernel vintage i guess? | 20:29 |
cardoe | Kevin says he reproduced it in 22.04 | 20:30 |
cardoe | Which shouldn't have async device initialization | 20:30 |
fungi | oh, good, in that case i can cancel my boot attempt i was testing that exact question with | 20:31 |
fungi | though yeah it already failed on the same problem before i could cancel | 20:32 |
fungi | so, double-confirmed i guess | 20:32 |
cardoe | So removed that property on all the images. | 22:50 |
cardoe | issue on Flex should go away. | 22:50 |
fungi | thanks cardoe! easily worked around, but i'm sure it was going to trip up other users who might not be in as good a position to figure it out | 23:53 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!