*** mlavalle has quit IRC | 00:00 | |
fungi | sounds likely | 00:04 |
---|---|---|
*** brinzhang_ has joined #opendev | 00:09 | |
*** brinzhang0 has quit IRC | 00:12 | |
*** tosky has quit IRC | 00:17 | |
corvus | fungi, frickler: fungi just made me aware of this change: https://review.opendev.org/773710 because i spent about 40 minutes trying to figure out why a job which previously passed suddenly didn't have enough ram | 00:43 |
corvus | i am strongly in favor of limiting the memory in use on those nodes to match the other providers | 00:44 |
fungi | yeah, we merged that quickly because it was requested by a representative of the resource donor, indicating the old flavor was going to be removed | 00:44 |
corvus | honestly, it's fine if we want to supply nodes with more ram | 00:44 |
corvus | but it's *really* important that nodes with the same nodepool label are at least roughly equivalent | 00:44 |
fungi | i was tempted to also announce it on the ml, as it worried me, but there didn't seem to be much concern from anyone else at the time so figured we'd address it when it became a problem | 00:45 |
corvus | it's a problem, and i'm concerned :) | 00:45 |
corvus | was it decided that v3-standard-2 was not cpu sufficient? | 00:47 |
corvus | https://vexxhost.com/pricing/ | 00:47 |
corvus | 2 cores, 8gb | 00:47 |
fungi | (brief) discussion in addition to what's in the change comments happened in here http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2021-02-02.log.html#t2021-02-02T15:33:33-2 and then again later when i brought it up in the meeting http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-02-19.01.log.html#l-146 | 00:47 |
fungi | it seemed like the concern was that a 2 cpu server instance would be very slow relative to other providers | 00:47 |
fungi | where we're typically doing ~8 cpus | 00:48 |
*** hamalq has quit IRC | 00:48 | |
fungi | mnaser wanted the memory to balance the cpu count based on the ratios provided by his hardware | 00:48 |
corvus | yeah, i assume that's still balanced at 2 vcpus; and i *assume* that's not enough, but it's just an assumption; i'm curious if anyone looked into it | 00:49 |
fungi | we can certainly give the 2cpu flavor a try, i'm not opposed, just expecting it will be slow | 00:49 |
fungi | and no i don't think we tested it | 00:50 |
corvus | fungi: what labels have > 8g of ram? | 01:02 |
openstackgerrit | James E. Blair proposed openstack/project-config master: Revert "Remove restrict-memory" https://review.opendev.org/c/openstack/project-config/+/785081 | 01:07 |
corvus | that reverts a revert from 2016 | 01:07 |
corvus | fungi, frickler, clarkb, ianw: ^ honestly no idea if that will still work, and unfortunately, i don't have time to drive something of this complexity right now. i'd appreciate some help getting a fix in place. | 01:12 |
fungi | corvus: ubuntu-focal-arm64-xxxlarge, ubuntu-bionic-arm64-large, ubuntu-bionic-expanded-vexxhost, centos-7-expanded, ubuntu-bionic-expanded, ubuntu-bionic-32GB, multi-numa-ubuntu-bionic-expanded, multi-numa-centos-7-expanded | 01:15 |
fungi | those are the ones i'm finding in our launcher configs at least | 01:15 |
corvus | fungi: oof. i guess i missed a bunch then :( | 01:15 |
corvus | oh, because we use flavor-name for those instead of min-ram | 01:16 |
corvus | fungi: wait what about v2-highcpu-8 ? | 01:18 |
corvus | are they only in sjc1 and not in ymq-1 ? | 01:18 |
fungi | yeah, we have max-servers 0 in sjc1 | 01:21 |
fungi | all our capacity is in ymq-1 | 01:21 |
corvus | and do we know why v2-highcpu-8 wasn't used? is it not available or some other reason? | 01:22 |
fungi | i don't know, but we could ask mnaser when he's around | 01:22 |
kevinz | fungi: corvus: Morning! Is there anything I can help? | 01:28 |
corvus | kevinz: with what? | 01:28 |
corvus | with the 8gb limit thing? | 01:29 |
kevinz | Maybe, I saw that clarkb said that "re centos arm64 images maybe we should email kevinz about trying fat32 config drives instead of iso?" | 01:29 |
kevinz | Arm64 CI related | 01:29 |
corvus | kevinz: oh i have no idea about that, sorry | 01:29 |
fungi | kevinz: we noticed our centos-8-arm64 images stopped booting sometime around friday, so were trying to figure out why (they just drop into a dracut emergency shell now according to the console log) | 01:30 |
fungi | we were trying to figure out what might have changed to cause that, and clarkb surmised it could be a change in iso support in dib | 01:31 |
kevinz | corvus: Np | 01:35 |
kevinz | fungi: OK, understand. So if you need any change from the arm64 cloud side, please let me know. | 01:37 |
fungi | kevinz: thanks, clarkb was suggesting as a possibility switching the configdrive type, but i don't think we've even determined what's causing dracut to timeout starting things. we probably need to boot a standard image or modified one there so we can log in and take a look | 01:40 |
kevinz | fungi: OK, np, thanks for leting me know :-) | 01:48 |
*** rh-jlabarre has joined #opendev | 01:55 | |
*** rh-jlabarre has quit IRC | 01:55 | |
*** rh-jlabarre has joined #opendev | 01:56 | |
*** rh-jelabarre has quit IRC | 01:56 | |
*** ykarel has joined #opendev | 04:11 | |
*** rh-jlabarre has quit IRC | 04:20 | |
*** ysandeep|away is now known as ysandeep | 04:59 | |
*** marios has joined #opendev | 05:09 | |
*** whoami-rajat has joined #opendev | 05:23 | |
*** sboyron has joined #opendev | 05:32 | |
*** ralonsoh has joined #opendev | 05:51 | |
*** remal has joined #opendev | 05:56 | |
*** lpetrut has joined #opendev | 06:02 | |
*** ykarel_ has joined #opendev | 06:18 | |
*** ykarel has quit IRC | 06:21 | |
*** remal has quit IRC | 06:22 | |
*** eolivare has joined #opendev | 06:27 | |
*** ykarel_ is now known as ykarel | 06:32 | |
*** slaweq has joined #opendev | 06:37 | |
*** lpetrut has quit IRC | 06:40 | |
*** fressi has joined #opendev | 06:42 | |
*** amoralej|off is now known as amoralej | 06:56 | |
*** rpittau|afk is now known as rpittau | 07:03 | |
*** andrewbonney has joined #opendev | 07:14 | |
openstackgerrit | Riccardo Pittau proposed openstack/diskimage-builder master: Convert multi line if statement to case https://review.opendev.org/c/openstack/diskimage-builder/+/734479 | 07:17 |
*** tosky has joined #opendev | 07:37 | |
*** artom has quit IRC | 07:46 | |
*** artom has joined #opendev | 07:46 | |
hrw | hm. looks like need to look where to find someone donating more arm64 nodes. check-arm64 queue looks overloaded | 07:48 |
*** ykarel has quit IRC | 07:56 | |
*** jpena|off is now known as jpena | 07:57 | |
*** ykarel has joined #opendev | 08:20 | |
*** tkajinam has quit IRC | 08:25 | |
*** tkajinam has joined #opendev | 08:26 | |
lourot | o/ zuul seems to be having random failures when git-cloning at the moment, sometimes from opendev.org, sometimes from github.com, see for example the last failures in this review: https://review.opendev.org/c/openstack/charm-ceph-iscsi/+/784421 - Is it a known issue? thanks! | 08:43 |
lourot | oh it's a cert issue on opendev.org: if you open https://opendev.org/openstack/charm-ops-openstack in your browser you'll see the cert isn't trusted anymore | 08:50 |
*** darshna has quit IRC | 08:50 | |
lourot | the cert just expired | 08:51 |
hrw | lourot: 6th June 2021 is expiration date | 08:52 |
hrw | lourot: cert was refreshed on 8th March | 08:52 |
lourot | for me it reads: | 08:53 |
lourot | Issued On Thursday, January 7, 2021 at 6:43:43 AM | 08:53 |
lourot | Expires On Wednesday, April 7, 2021 at 7:43:43 AMY | 08:53 |
lourot | am I hitting a server that still has the old cert? | 08:53 |
hrw | looks like | 08:53 |
lourot | from gitea08.opendev.org | 08:53 |
hrw | I got gitea07.opendev.org | 08:54 |
hrw | that's why | 08:54 |
*** rpittau is now known as rpittau|bbl | 09:23 | |
kevinz | hrw: will consider to adding some resources from Linaro Cambridge Colo side, the uk2.linaro.cloud | 09:44 |
hrw | hm. it is not only gitea08 which has ssl cert expired | 09:58 |
hrw | curl -o /requirements/upper-constraints.txt https://releases.openstack.org/constraints/upper/master | 09:59 |
hrw | INFO:kolla.common.utils.kolla-toolbox:1mcurl: (60) SSL certificate problem: certificate has expired | 09:59 |
hrw | so random CI jobs fail depends on which area they work | 10:05 |
*** dtantsur|afk is now known as dtantsur | 10:23 | |
*** fressi has quit IRC | 10:24 | |
yoctozepto | infra-root: at least one of opendev mirrors has an expired ssl cert, jobs fail randomly :-( ^^ | 10:34 |
openstackgerrit | chandan kumar proposed openstack/diskimage-builder master: Make DIB_DNF_MODULE_STREAMS part of yum element https://review.opendev.org/c/openstack/diskimage-builder/+/785138 | 10:36 |
*** ykarel has quit IRC | 10:41 | |
*** fressi has joined #opendev | 10:42 | |
*** ykarel has joined #opendev | 10:48 | |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/project-config master: Add Debian Bullseye nodepool images and wheels https://review.opendev.org/c/openstack/project-config/+/783613 | 11:02 |
yoctozepto | infra running rootless today ;-( | 11:02 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/project-config master: Add Debian bullseye wheel cache publish jobs https://review.opendev.org/c/openstack/project-config/+/783633 | 11:03 |
*** fressi has quit IRC | 11:15 | |
*** fressi has joined #opendev | 11:17 | |
*** zoharm has joined #opendev | 11:30 | |
*** dpawlik4 has joined #opendev | 11:40 | |
*** dpawlik4 is now known as dpawlik | 11:42 | |
*** dtantsur is now known as dtantsur|bbl | 11:42 | |
*** pas-ha has joined #opendev | 11:47 | |
*** kopecmartin has quit IRC | 11:51 | |
*** kopecmartin has joined #opendev | 11:52 | |
*** pas-ha has left #opendev | 11:56 | |
openstackgerrit | Hervé Beraud proposed opendev/irc-meetings master: Switch the release team meeting to 2pm UTC https://review.opendev.org/c/opendev/irc-meetings/+/785157 | 12:03 |
*** amoralej is now known as amoralej|lunch | 12:15 | |
*** artom has quit IRC | 12:27 | |
*** rh-jlabarre has joined #opendev | 12:28 | |
*** dtantsur|bbl is now known as dtantsur | 12:28 | |
*** fresta has joined #opendev | 12:34 | |
*** mgoddard has joined #opendev | 12:35 | |
*** rpittau|bbl is now known as rpittau | 12:51 | |
*** amoralej|lunch is now known as amoralej | 12:59 | |
*** lpetrut has joined #opendev | 13:16 | |
fungi | i'm working on disabling any gitea backends with expired certs now | 13:44 |
fungi | i'm not seeing an expired cert served from any of our 8 gitea backends... is anyone still seeing it? | 13:46 |
fungi | i suppose it could have been a transient problem with a stale apache worker which finally got recycled | 13:47 |
fungi | right now my browser is happy with https://gitea01.opendev.org:3000/ through https://gitea08.opendev.org:3000/ | 13:48 |
fungi | hrw: i don't know that the arm64 nodes are all that backed up, but we observed that as of friday we stopped being able to boot our centos-8-arm64 images there, the kernel loads but dracut times out waiting for something to start (doesn't say what) and then drops to an interactive recovery shell, but novnc doesn't seem to be working for it so we can't easily investigate and i'm personally not all that | 13:51 |
fungi | familiar with centos to begin guessing what the problem is | 13:51 |
*** chkumar|ruck is now known as raukadah | 13:52 | |
fungi | clarkb suggested something might have changed with dib's support for iso configdrives and that switching the configdrive to vfat might be able to rule that out, but we're down to a skeleton crew and could use all the help we can get diagnosing problems | 13:52 |
corvus | fungi: i randomly got a gitea exp cert on 08 | 13:54 |
corvus | fungi: but via the lb, not directly to 3000 | 13:55 |
fungi | corvus: like just now? | 13:56 |
corvus | yep | 13:56 |
fungi | the cert on it was replaced march 8, so the apache theory is strong | 13:56 |
fungi | i'll check process times | 13:56 |
corvus | fungi: i also get good certs from 08 | 13:56 |
fungi | i see 6 apache processes with start times prior to march 8 | 13:56 |
fungi | i'll restart apache there | 13:57 |
fungi | #status log cold restarted apache on gitea08.opendev.org as there were some stale worker processes which seemed to be serving expired certs from more than a month ago | 13:59 |
openstackstatus | fungi: finished logging | 13:59 |
fungi | this would also explain why our cert checker didn't alert us to that | 14:00 |
*** tinwood has joined #opendev | 14:02 | |
fungi | i need to go run some quick errands but should be back by 15:00 utc | 14:04 |
*** fressi has quit IRC | 14:18 | |
yoctozepto | thanks fungi | 14:27 |
*** artom has joined #opendev | 14:36 | |
hrw | fungi: thanks for info. I am not familiar with how dib works (despite having some changes there). Can add it to todo but no idea will find time/ideas | 14:45 |
zbr | certificate expired on opendev.org? https://zuul.opendev.org/t/openstack/build/479f1cbbc2a34d259cbeb341e4075d57 | 14:54 |
*** lpetrut has quit IRC | 14:54 | |
yoctozepto | zbr: fungi fixed that recently | 14:57 |
*** ysandeep is now known as ysandeep|dinner | 15:00 | |
*** kopecmartin has quit IRC | 15:01 | |
*** kopecmartin has joined #opendev | 15:01 | |
zbr | this was like 2 minutes before I posted the message, so unless he fixed them in during the last 15-20mins. | 15:03 |
clarkb | I know we put mitigations in place for that on some apache configs, iirc we set worker request limits | 15:03 |
clarkb | it is possible more than one gitea backend has stale apache workers. Note everyone is able to check the backends directly, you can point s_client or your browser at them and see what you get. However if it is stale apache backends then you will only get errors some percentage of the time | 15:06 |
clarkb | give me a minute to load keys and I'll look | 15:06 |
zbr | thanks! | 15:07 |
fungi | i can go through and restart apache on all the backends i suppose, if we want to be completely sure | 15:09 |
clarkb | 06 had a couple of processes from february and I've restarted apache there now too | 15:11 |
fungi | looks like https://zuul.opendev.org/t/openstack/build/479f1cbbc2a34d259cbeb341e4075d57/log/job-output.txt#783 was indeed ~50 minutes after i restarted apache on gitea08 so there may be more than one backend in a similar situation still | 15:12 |
clarkb | 05 and 04 also have older processes but not that old. I'll restart them too | 15:12 |
clarkb | the others look fine | 15:12 |
clarkb | oh I guess 07 has a slgihtly old set too | 15:12 |
fungi | yeah, any with worker processes from prior to whatever the timestamp on /etc/letsencrypt-certs/gitea0?.opendev.org/gitea0?.opendev.org.cer is | 15:13 |
fungi | which i suppose could vary between backends | 15:13 |
clarkb | we can port the worker limits from mirrors to the giteas if we haven't already | 15:14 |
fungi | i was wondering about that, i'll see if i can find them | 15:14 |
zbr | that is one moment when i am proud about my homelab using traefik which takes care of cert management itself (including renewals) | 15:14 |
clarkb | zbr: we haev automated renewals as well. The problem is that when you tell apache to gracefully reload it does so at its own pace | 15:15 |
clarkb | its not a problem with cert renewals | 15:15 |
clarkb | its an issue with apache reload being too graceful | 15:15 |
fungi | looks like we're setting MaxConnectionsPerChild in /etc/apache2/conf-enabled/connection-tuning.conf on the mirrors | 15:15 |
zbr | stuck processes I guess? | 15:15 |
clarkb | zbr: yes | 15:15 |
fungi | zbr: not really "stuck" but rather by default apache doesn't recycle worker processes | 15:16 |
clarkb | fungi: though a graceful reload is supposed to eventually turn them over aiui | 15:16 |
fungi | right, but the default turnover is essentially set to never | 15:16 |
zbr | tbh, i did not use apache in many years, more of nginx guy. | 15:16 |
zbr | i am reading the docs of MaxConnectionsPerChild -- and based on this i should assume that w/ default settings apache never restarts them? meaning graceful-restart never finishes? | 15:20 |
clarkb | fungi: hrw: kevinz: note I'm not sure it was anything to do with dib's support for isos. glean + simple-init leans on mount and the kernel for that. I was suggesting that something may have changed in centos to break that. I have been meaning to boot the upstream arm64 centos8 image and see if that helps give us any clues (if it reproducing maybe the console works there, if it doesn't reproduce we | 15:21 |
clarkb | can compare package versions, etc) | 15:21 |
hrw | clarkb: and it is on c8 not cs8, rigth? | 15:22 |
*** andrewbonney has quit IRC | 15:22 | |
* hrw in kolla meeting | 15:22 | |
fungi | hrw: it's both actually | 15:22 |
hrw | thanks | 15:22 |
fungi | i confirmed centos-stream-8-arm64 is breaking the same way | 15:22 |
clarkb | zbr: I think ist a bit more forceful than that, but by settings our own limits we can cycle much more quickly | 15:23 |
clarkb | zbr: it runs apachectl graceful which sends sigusr1 to apache if youwan to go figure out exactly what it does | 15:24 |
zbr | GracefulShutdownTimeout also defaults to 0, which may explain why very old processed were never restarted. | 15:24 |
*** andrewbonney has joined #opendev | 15:24 | |
zbr | would a 60 (s) limit be too aggressive? | 15:25 |
zbr | usually if you want to restart you have a need, so any value between 60-1800 would seem ok to me. | 15:26 |
fungi | zbr: we've been setting MaxConnectionsPerChild 8192 which seems to work fine | 15:27 |
Alex_Gaynor | FYI we're disabling centos arm64 builders on pyca/cryptography due to the booting issues. Is there a good way to get notified when they're working again, so we can re-enable? | 15:27 |
fungi | the workers will each field far more connections that that in a month | 15:27 |
*** mlavalle has joined #opendev | 15:28 | |
zbr | the there was at least one immortal among them :D | 15:28 |
fungi | Alex_Gaynor: yeah we can give you a heads up in here | 15:30 |
Alex_Gaynor | fungi: 🙇♂️ much obliged, thanks! | 15:30 |
fungi | zbr: well, no, i mean we've set that elsewhere (static sites, mirror servers...) just not on our gitea backends | 15:30 |
fungi | i'm adding that now | 15:31 |
fungi | change will be up for review in a few minutes | 15:31 |
*** lourot has quit IRC | 15:32 | |
clarkb | #status log Restarted apache on gitea 04-07 to clean up additional stale processes which may have served old certs | 15:34 |
openstackstatus | clarkb: finished logging | 15:34 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Set MaxConnectionsPerChild 8192 for Gitea backends https://review.opendev.org/c/opendev/system-config/+/785226 | 15:38 |
fungi | clarkb: corvus: zbr: yoctozepto: hrw: ^ that should prevent the issue in the future | 15:40 |
fungi | prevent the stale certs problem i mean | 15:40 |
*** ykarel is now known as ykarel|away | 15:41 | |
hrw | fungi: super | 15:47 |
*** zoharm has quit IRC | 15:52 | |
zbr | apparently gerrit is no longer responding | 15:58 |
*** lourot has joined #opendev | 15:59 | |
fungi | current load average is only around 13, so actually fairly low for this time of day | 15:59 |
*** hamalq has joined #opendev | 16:00 | |
clarkb | I was just able to +2 fungi's change | 16:01 |
clarkb | and my dashboard loads | 16:01 |
*** ykarel|away has quit IRC | 16:02 | |
*** sshnaidm is now known as sshnaidm|afk | 16:06 | |
fungi | maybe it was only briefly choked | 16:07 |
*** dtantsur is now known as dtantsur|afk | 16:18 | |
*** ysandeep|dinner is now known as ysandeep | 16:28 | |
clarkb | it just occurred to me that another way to test the arm64 centos-8 image could be with qemu emulating arm64 | 16:29 |
clarkb | https://nb03.opendev.org/images/ the images are there if anyone wants to try ^ they expect a config drive. cc hrw | 16:29 |
*** marios is now known as marios|out | 16:31 | |
openstackgerrit | Merged opendev/irc-meetings master: Switch the release team meeting to 2pm UTC https://review.opendev.org/c/opendev/irc-meetings/+/785157 | 16:43 |
hrw | clarkb: [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.18.0-294.el8.aarch64 root=/dev/mapper/loop7p3 ro | 16:44 |
hrw | clarkb: root= value is weird | 16:44 |
fungi | i did see a kmesg in the consoles about a missing loop device, but it didn't look fatal | 16:45 |
fungi | however i hadn't considered it in conjunction with that kernel command line | 16:45 |
*** avass has joined #opendev | 16:46 | |
hrw | changed by hand to root=/dev/vda3 and it boots fine | 16:46 |
*** rpittau is now known as rpittau|afk | 16:46 | |
hrw | CentOS Stream 8 | 16:46 |
hrw | Kernel 4.18.0-294.el8.aarch64 on an aarch64 | 16:46 |
hrw | localhost login: | 16:46 |
hrw | so image build process needs fixing | 16:47 |
hrw | I suspect that centos 8 has the same issue | 16:48 |
clarkb | I wonder why that only affects arm64 centos. That makes me suspect a bug in centos grub though thats mostly due to process of elimination | 16:48 |
*** eolivare has quit IRC | 16:48 | |
clarkb | however dib does use loop devices so maybe its detecting the wrong device when it does the install. We also have the build logs at https://nb03.opendev.org/ | 16:48 |
*** marios|out has quit IRC | 16:49 | |
clarkb | maybe that will offer clues as to how that device is selected | 16:49 |
fungi | i agree, it does sound very much like some new centos mechanism has started trying to autodetect the rootfs at build time, and seems to be exclusive to the bootloader setup for arm | 16:50 |
fungi | could it be (u)efi-related? and we do mbr on x86? | 16:51 |
clarkb | yes good point | 16:55 |
clarkb | mbr across x86 as far as I know | 16:55 |
hrw | dib supports both mbr and uefi on x86. I do not know which one you use | 16:57 |
fungi | we set hw_firmware_type to uefi in the meta block for arm64 diskimages and not for the others. i guess mbr must be the default? | 16:59 |
hrw | probably | 17:00 |
hrw | hw_firmware_type=uefi is not needed since mitaka | 17:00 |
hrw | I got it to be default in nova then | 17:00 |
fungi | ahh, so that doesn't control image properties i guess | 17:00 |
hrw | it is saying 'dear nova, please use uefi as a bootloader while booting VM with this image' | 17:01 |
hrw | while nova assumes uefi on aarch64 unless said otherwise | 17:01 |
mordred | it would be amazing if that was how api calls were written | 17:01 |
mordred | POST "dear nova, please use uefi as a bootloader while booting VM with this image" | 17:02 |
fungi | just give ai/ml a little more time | 17:02 |
mordred | RESPONSE "I understand you want to order a bucket of fried chicken" | 17:02 |
hrw | mordred: nope, too many different versions of english | 17:02 |
hrw | and someone could use 'my dear nova' or just 'nova' | 17:03 |
hrw | or 'you #@$@#%^@^@$#%@ nova $@#$@%@' | 17:03 |
hrw | https://marcin.juszkiewicz.com.pl/2018/01/04/today-i-was-fighting-with-nova-no-idea-who-won/ | 17:03 |
mordred | hrw: I mean - I do the 'you #@$@#%^@^@$#%@ nova $@#$@%@' version in the comments of openstacsdk already ... :) | 17:04 |
hrw | when I was writing it my API calls would involve curses | 17:04 |
hrw | mordred: please. 'openstack HERE-WE-SPEAK-TO-NOVA' vs novaclient is not something I can discuss while sober | 17:05 |
hrw | bbiab - have to go and buy some stuff | 17:05 |
mordred | hrw: we'll schedule it for when we're drinking some time | 17:05 |
*** jpena is now known as jpena|off | 17:08 | |
fungi | clarkb: hrw: okay, so we add the block-device-efi element in the config for our arm builder, but not in the config for our x86 builders: https://opendev.org/openstack/project-config/src/branch/master/nodepool/nb03.opendev.org.yaml#L81 | 17:08 |
fungi | which means this element is only used in our arm images: https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/block-device-efi | 17:10 |
fungi | and all that really seems to do is export DIB_BLOCK_DEVICE=efi in the environment for other elements to key on | 17:11 |
clarkb | ya so very likely x86 + mbr + centos8 is fine but arm64 + uefi + centos8 trips some issue with device detection | 17:11 |
fungi | https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/bootloader/finalise.d/50-bootloader has some switching on that value, seems to be the main place it's used | 17:13 |
*** amoralej is now known as amoralej|off | 17:14 | |
*** ysandeep is now known as ysandeep|away | 17:16 | |
clarkb | install-packages -m bootloader grub-efi-$ARCH | 17:16 |
* clarkb looks at a build log to see if it emits device choices there | 17:17 | |
fungi | maybe this section? https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/bootloader/finalise.d/50-bootloader#L244-L255 | 17:18 |
clarkb | https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/bootloader/finalise.d/50-bootloader#L14-L16 contains the loop device in it | 17:19 |
hrw | mordred: ok | 17:19 |
clarkb | but that seems to only be used by ppc in that script | 17:19 |
clarkb | https://nb03.opendev.org/centos-8-arm64-0000036826.log is the log I am looking at | 17:19 |
hrw | I need dib shell command so can recreate locally | 17:20 |
clarkb | hrw: something like `disk-image-create block-device-efi vm simple-init initialize-urandom growroot journal-to-console centos-minimal epel` | 17:21 |
clarkb | that should default to centos 8 as ltest centos and you need to run it on aarch64 host | 17:22 |
hrw | clarkb: thanks | 17:22 |
clarkb | I've stripped out infra specific elements that cache repos and stuff as they significantly lengthen the build time | 17:22 |
*** brinzhang_ has quit IRC | 17:22 | |
fungi | we also have diskimage-builder/src/branch/master/diskimage_builder/elements/block-device-efi/environment.d/15-block-device.bash though you'd need to uncomment line 39 there | 17:22 |
fungi | er, sorry, pasted from wrong buffer | 17:23 |
fungi | https://opendev.org/openstack/project-config/src/branch/master/tools/build-image.sh | 17:23 |
*** brinzhang_ has joined #opendev | 17:23 | |
hrw | and no initialize-urandom | 17:23 |
hrw | heh. dib assumes centos on host too ;( | 17:24 |
clarkb | hrw it shouldn't we cross build | 17:24 |
clarkb | using debian, you do have to have some rpm tools though iirc | 17:24 |
clarkb | if you run on centos then they will alread be present | 17:24 |
*** lbragstad has quit IRC | 17:25 | |
clarkb | Creating fs command [['mkfs', '-t', 'ext4', '-i', '4096', '-J', 'size=512', '-L', 'cloudimg-rootfs', '-U', '4c9f96ba-63ca-443f-9721-080da3b255de', '-q', '/dev/mapper/loop7p3']] create /usr/local/lib/python3.7/site-packages/diskimage_builder/block_device/level2/mkfs.py:132 <- that is from earlier in the build where the actual fs is written | 17:26 |
clarkb | I suspect that we want grub-efi-aa64 to use labels or uuids and not paths | 17:26 |
clarkb | since it is grub-efi-aa64 that should be writing the kernel command line info right? | 17:27 |
hrw | would be best | 17:27 |
hrw | clarkb: it can be either grub which generate it on fly or rootfs | 17:30 |
clarkb | looks like the centos8 grub2-efi-aa64 package (and related modules etc pacakges) were last updated on march 2. There are two version though an el8 version and an el8_3.1 | 17:31 |
clarkb | el8_3.1 is the newer one | 17:31 |
clarkb | and build log says we install _3.1 | 17:32 |
fungi | i wonder how far back we started using that | 17:33 |
fungi | seems like we had working images until last week | 17:33 |
fungi | but also, possible something else was breaking centos image builds until last week and that was the first time it updated since early march | 17:34 |
clarkb | ya | 17:34 |
clarkb | https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/bootloader/finalise.d/50-bootloader#L198 is likely related? | 17:36 |
clarkb | note the line above we specifically tell it to use the label, now to check if the logs show us that label is set to the label and not the path | 17:36 |
clarkb | 2021-04-06 23:29:22.724 | + echo GRUB_DEVICE=LABEL=cloudimg-rootfs | 17:37 |
fungi | oh, yep | 17:38 |
clarkb | it appears we are explicitly attempting to set the root device via a label | 17:38 |
clarkb | and that label matches the label set in the mkfs above | 17:38 |
fungi | which maybe has stopped working with uefi on centos? | 17:39 |
fungi | (the label detection i mean) | 17:39 |
*** klonn has joined #opendev | 17:39 | |
clarkb | at build time or boot time? | 17:40 |
fungi | if the problem is there in the config, then at boot time i guess | 17:42 |
hrw | root=LABEL=cloudimg-rootfs boots fine | 17:42 |
fungi | though that doesn't square with how the root=/dev/loopfoo is getting into the config | 17:42 |
clarkb | hrw: ya and we seem to explicitly try to set that value in our script | 17:43 |
clarkb | so at least the intended behavior is good, we just have to figure out how to apply it :) | 17:44 |
clarkb | we run grub2-mkconfig twice with two different outputs https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/bootloader/finalise.d/50-bootloader#L177-L179 and https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/bootloader/finalise.d/50-bootloader#L229 | 17:46 |
clarkb | the first one is uefi specific | 17:46 |
clarkb | the first one also happens prior to us setting GRUB_DEVICE=LABEL=cloudimg-rootfs | 17:46 |
clarkb | are we maybe running that too early or we should be setting those grub.cfg options prior? | 17:47 |
fungi | i wonder if there's useful output in the build log to tell us | 17:47 |
clarkb | fungi: I'm looking at the output to find ^ | 17:47 |
*** slaweq has quit IRC | 17:47 | |
clarkb | what isn't clear to me is which one is used at boot time, but wouldn't be surprised if a uefi specific config is used when booting uefi :) | 17:47 |
hrw | the one from L177 | 17:48 |
hrw | on normal system /etc/grub.cfg is often symlink to efi one | 17:49 |
clarkb | hrw: ok in that case we set BOOT_DEVICE after L177 which may explain the issue | 17:49 |
clarkb | hrw: oh interesting I wonder if that was the case until recently on centos maybe? | 17:49 |
clarkb | so when we updated the normal path before it was updating the efi file | 17:49 |
clarkb | but now perhaps not | 17:49 |
hrw | commit 27a326dafb621269c501225fd4842615ca4adf73 | 17:51 |
hrw | Author: Steve Baker <sbaker@redhat.com> | 17:51 |
hrw | Date: Fri Mar 5 16:35:21 2021 +1300 | 17:51 |
hrw | Support secure-boot bootloader where possible | 17:51 |
hrw | I suspect that part | 17:51 |
hrw | dib-- | 17:52 |
hrw | 2021-04-07 17:50:36.484 | diskimage_builder.block_device.exception.BlockDeviceSetupException: exec_sudo failed | 17:52 |
hrw | in shell sudo works.. | 17:52 |
hrw | ah. no gdisk in system ;d | 17:52 |
clarkb | hrw: ya I'm noticing the code that is bad mentions secure boot | 17:54 |
clarkb | I'm trying to write up a quick change (that is probably wrong) but at least gives us enough breadcrumb to follow | 17:55 |
hrw | clarkb: https://paste.centos.org/view/6f166b9d like? | 17:56 |
hrw | generate /etc/default/grub and then handle generation of both grub configs | 17:56 |
hrw | each dib run shows me what my aarch64 system lacks ;D | 17:58 |
hrw | gdisk, kpartx, dosfstools... | 17:59 |
hrw | to make things funnier - iirc I added them to dib... | 17:59 |
clarkb | hrw: you might be able to grab the zuul/nodepool-builder docker image to avoid finding all that stuff | 18:00 |
clarkb | hrw: I'm moving just the GRUB_MKCONFIG calls | 18:00 |
openstackgerrit | Clark Boylan proposed openstack/diskimage-builder master: Properly set grub2 root device when using efi https://review.opendev.org/c/openstack/diskimage-builder/+/785247 | 18:02 |
clarkb | hrw: ^ something like that | 18:02 |
clarkb | hrw: what isn't clear to me is why run `$GRUBNAME --modules="$modules" $extra_options $GRUB_OPTS $BOOT_DEV` only when /boot/efi/$EFI_BOOT_DIR does not exist | 18:05 |
*** klonn has quit IRC | 18:05 | |
fungi | i wonder if that was meant to skip preexisting efi setups | 18:06 |
clarkb | the ubuntu focal build shows it is doing that skip which is why those haven't broken | 18:06 |
hrw | no idea too | 18:06 |
hrw | ok, I have to end a day | 18:06 |
fungi | thanks for the help hrw! this was very useful | 18:07 |
clarkb | we run grub-mkconfig once at the very end of the script against /boot/grub/grub.cfg on ubuntu-focal | 18:07 |
hrw | fungi: yw | 18:07 |
clarkb | that likely explains why only centos8 is broken | 18:07 |
* hrw off | 18:07 | |
fungi | yes, i figured it wasn't necessarily an efi problem, merely a problem exposed by some of the efi-specific switching logic | 18:08 |
clarkb | the change that seems to have introduced the problem does set EFI_BOOT_DIR="EFI/ubuntu" in the ubuntu-common element | 18:09 |
clarkb | but our build logs imply that doesn't exist | 18:10 |
clarkb | on my local nas server that path does exist though | 18:11 |
*** andrewbonney has quit IRC | 18:13 | |
clarkb | reading the commit message and going through the logical paths of the script I think my change may actually be the fix | 18:15 |
*** lbragstad has joined #opendev | 18:19 | |
*** sboyron has quit IRC | 18:29 | |
clarkb | fungi: re the flavor change change, I didn't sense urgency in the commit message when I read it, but it was early morning for me iirc and I may have missed it | 18:41 |
*** rh-jlabarre has quit IRC | 19:06 | |
*** sboyron has joined #opendev | 19:10 | |
*** sboyron has quit IRC | 19:11 | |
*** sboyron has joined #opendev | 19:12 | |
fungi | commit message said the flavor we were using was being deleted, change was pushed by an operator from the donor who also immediately pinged us in here upon pushing it | 19:16 |
*** sboyron has quit IRC | 19:16 | |
fungi | clarkb: okay, so current best theory is the bug was introduced in 3.7.1 which was tagged a week ago... timing roughly matches when we started seeing problems | 19:19 |
fungi | we could roll back to 3.7.0 temporarily, though we're currently running 3.8.0 which includes fixes for our debian-bullseye images | 19:20 |
clarkb | fungi: ah I totally missed that the old flavors has been removed | 19:24 |
clarkb | fungi: ya, I also suspect that my fix would actually fix it | 19:24 |
clarkb | though testing is difficult. I guess we can deploy the nodepool-builder builder image made by check to nb03 and test it that way? | 19:24 |
fungi | oh, that's a good idea | 19:26 |
fungi | yeah i was trying to figure out a good hotfix which we could use to exercise the patch | 19:26 |
*** weshay|ruck has left #opendev | 19:31 | |
*** ralonsoh has quit IRC | 19:50 | |
*** slaweq has joined #opendev | 19:53 | |
openstackgerrit | Tristan Cacqueray proposed opendev/gerritlib master: Add ignore events filter https://review.opendev.org/c/opendev/gerritlib/+/785262 | 19:57 |
*** whoami-rajat has quit IRC | 19:57 | |
openstackgerrit | Tristan Cacqueray proposed opendev/gerritbot master: Ignore replication event https://review.opendev.org/c/opendev/gerritbot/+/785264 | 19:59 |
*** mgagne has joined #opendev | 20:22 | |
*** dmellado has quit IRC | 20:27 | |
*** dmellado has joined #opendev | 20:29 | |
*** spotz has quit IRC | 20:38 | |
clarkb | infra-root I've put nb03 in the emergency file. Now I'm going to pause all the image builds but centos-8 and centos-8-stream on nb03 and run the image built for testing my change? | 21:18 |
fungi | clarkb: sounds like a good test | 21:19 |
clarkb | the only concern with that is it is the siblings image which means a few things will be installed from source. I'm double checking what those items are now | 21:19 |
fungi | you may need to delete old images, but that's no loss, they don't boot anyway | 21:19 |
clarkb | looks like openstacksdk and disk image builder are the expected siblings. I think we'll be ok with sdk | 21:21 |
clarkb | this didn't work | 21:25 |
clarkb | we seem to not build arm64 images in those jobs :( | 21:26 |
clarkb | I think I know a workaround to that though | 21:26 |
clarkb | does pausing only pause uploads but not builds? it seems that it is building an ubuntu image after I restarted it back on the old image which has arm64 version | 21:27 |
fungi | oh, so did you to it in the config or the cli? | 21:28 |
fungi | also there's pausing the image for a specific provider vs pausing the diskimage | 21:28 |
fungi | clarkb: stevebaker has a comment on 785247 too | 21:30 |
stevebaker | hey | 21:30 |
fungi | might be faster to hash stuff out in here ;) | 21:31 |
stevebaker | yes lets | 21:31 |
fungi | stevebaker: if you need to catch up, basically the secure boot change seems to have stopped us from being able to boot arm64 centos-(stream-)8 images | 21:32 |
*** lbragstad has quit IRC | 21:32 | |
clarkb | ok give me a minute to clean up the mess I just made (basically put nb03 back to normal) | 21:32 |
clarkb | but I'm around | 21:32 |
stevebaker | fungi: ack | 21:33 |
fungi | currently it tries to boot with a nonexistet loop device as the kernel root= parameter | 21:33 |
fungi | dracut gives up on mounting the rootfs eventually and drops to an emergency shell | 21:33 |
clarkb | infra-root I have restarted the builder on its normal image, updated the nodepool.yaml to unpause all images, and removed nb03 from the emergency file | 21:34 |
clarkb | this is the state it was in ebfore I tried to test things | 21:34 |
stevebaker | hopefully my suggestion to copy the grub.cfg to /boot/grub2 at the end of the function only for amd64 will result in working arm64 *and* amd64 images which boot in legacy bios | 21:34 |
clarkb | stevebaker: I don't think your suggestion will fix anything. The problem is that block you modified is before we set grub settings so any grub config that is written then is bad | 21:35 |
clarkb | I think the fix for your concern is to run mkconfig twice or copy it later in the script | 21:36 |
stevebaker | clarkb: that is what I meant to suggest, build on https://review.opendev.org/c/openstack/diskimage-builder/+/785247 to copy it later in the script | 21:38 |
clarkb | stevebaker: gotcha, sorry the comment context was in the location that was the problem. I'm working on an update to do that now | 21:38 |
*** spotz has joined #opendev | 21:38 | |
stevebaker | clarkb, fungi: I'd like to discuss ubuntu secure boot too, when you're ready | 21:39 |
fungi | well, currently we're just trying to get arm jobs running again without breaking what you implemented so far, further feature work should probably be discussed in #openstack-dib, and when more of the dib maintainers are around | 21:42 |
stevebaker | ok | 21:42 |
* stevebaker joins that | 21:42 | |
openstackgerrit | Clark Boylan proposed openstack/diskimage-builder master: Properly set grub2 root device when using efi https://review.opendev.org/c/openstack/diskimage-builder/+/785247 | 21:43 |
clarkb | I think ^ addresses both of the comments on the previous patchset | 21:43 |
clarkb | if you can review that and see if it makes sense I can psuh a chagne to ndoepool that builds an arm64 image of that (or attempts to anyway) | 21:43 |
mordred | clarkb: your commit message makes sense to me | 21:44 |
fungi | awesome, was just about to ask what the plan was to work around the lack of test image builds for arm | 21:44 |
clarkb | I've removed the x86 image from nb03 too just to avoid any potential future confusion | 21:50 |
stevebaker | clarkb: I've commented | 21:51 |
clarkb | good catch. Do we expect an efi config to be different than a legacy config? Its all about path lookups in the bios/uefi system right? not the actual grub config (because once grub is running we want it to do the same thing in both scenarios? | 21:54 |
clarkb | we can't use hardlinks because these files are on different partitions? and we can't use symlinks because uefi is looking for a specific partition uuid right? | 21:56 |
stevebaker | clarkb: I would assume they should be identical, until we come across a specific need to do otherwise | 21:57 |
*** slaweq has quit IRC | 21:57 | |
clarkb | stevebaker: should we always run the grub-install --modules on line 179? | 22:02 |
clarkb | based on comments above that line it seems to imply that is necessary for supporting legacy boot too | 22:02 |
stevebaker | clarkb: #172 handles dual legacy boot for x86 efi. So #179 is only needed for installing grub as the bootloader when that distro hasn't been properly set up for booting the secure boot shim. The redhat version of grub2 now errors when doing this to force proper shim secure boot. This is what prompted me to do that change in the first place | 22:07 |
clarkb | stevebaker: but that only addresses x86 | 22:08 |
clarkb | stevebaker: why not do the grub-install for everyone after setting the appropriate flags as before, then additionally ensure grub.cfg exists in the proper location? | 22:09 |
clarkb | its not clear to me why those would be mutually exclusive if the point is to produce images that do legacy and efi | 22:09 |
clarkb | (since you need both to do that aiui) | 22:09 |
clarkb | is the problem that uefi will prefer the generic and not specific grub install? | 22:11 |
clarkb | oh wait I think I may be conflating a couple of things. There is grub support for legacy bios. grub support for generic uefi. grub support for specific uefi with the shim | 22:11 |
clarkb | I was treating the first two as if they were the same thing in my head | 22:11 |
*** ianw_pto is now known as ianw | 22:14 | |
ianw | o/ | 22:14 |
clarkb | I'm going to try and simplify some of the conditions in this script to make this mroe apparent | 22:14 |
stevebaker | clarkb: grub2-install now fails on redhat in the efi case (landing in rhel-8.4) so it can't just be run for everyone. I'm using the presence of /boot/efi/$EFI_BOOT_DIR as a flag for "the shim is the bootloader, assume everything is set up correctly to just use that instead of grub" | 22:14 |
stevebaker | ianw: hai! | 22:14 |
fungi | ianw: aren't you still on vacation? | 22:14 |
*** lbragstad has joined #opendev | 22:15 | |
clarkb | stevebaker: right, but it is failing because we set --removable right? | 22:15 |
clarkb | stevebaker: I think if we drop the --removable then rehl should be fine? | 22:15 |
ianw | fungi: no back thu/fri , driving to sydney next week and working from there for week two of school holidays :) | 22:16 |
clarkb | fungi: I think the dateline makes stuff weird | 22:16 |
ianw | easter makes things weird, we have easter friday and easter monday here as public holidays | 22:16 |
ianw | this all sounds super fun and i'll read through scrollback ... :/ | 22:17 |
clarkb | hrm --removable also prevents updating nvram settings /me reads to see if a flag does just that | 22:22 |
stevebaker | clarkb: It looks like --removable stops it messing with nvram, which you wouldn't want during an image build. The grub2-install error is because the redhat grub2 maintainer is very opinionated that every UEFI boot should be secure boot capable ;) and that /boot/efi/EFI/BOOT/BOOTX64.EFI should always be the shim bootloader, and never the grub binary. | 22:22 |
clarkb | stevebaker: ya, my concern is taht we aren't installing the modules that we want. We're essetnailly crossing our fingers that the rhel package maintainer understands that you might need these various modules | 22:23 |
clarkb | though the manpage for grub2-install implies the default is all so we're doing an optimization by reducing the list? | 22:24 |
clarkb | stevebaker: what if we set --boot-directory to /boot/efi/EFI/centos ? | 22:25 |
stevebaker | clarkb: we can't write out any grub binary in the secure boot case, unless we can sign them. /boot/efi/EFI/BOOT/BOOTX64.EFI shim is signed by microsoft and adds keys so that redhat signed /boot/efi/EFI/centos/grubx64.efi can run. All we need to do is ensure /boot/efi/EFI/centos/grub.cfg is there for grubx64.efi to use. | 22:30 |
clarkb | stevebaker: and in the case that /boot/efi/EFI/BOOT/BOOTX64.EFI is grub and not the shim it knows to look at /boot/grub/grub.cfg? | 22:31 |
clarkb | it appears that my suse system does this but also with a shim | 22:32 |
clarkb | oh it has the config in both locations | 22:32 |
clarkb | stevebaker: I think I get it now. I'm jsut trying to make the script in dib more verbose about what is going on | 22:33 |
clarkb | I'll get a new ps up shortly | 22:33 |
stevebaker | clarkb: yeah I'm not sure about that case. Does grub2-install generate /boot/efi/EFI/BOOT/BOOTX64.EFI *and* /boot/efi/EFI/BOOT/grub.cfg? | 22:34 |
clarkb | stevebaker: my suse install does not | 22:36 |
clarkb | so it must be using /boot/grub/grub.cfg along with bios boot | 22:36 |
stevebaker | it must be, yeah | 22:37 |
stevebaker | this may be complicated by each distro forking grub and doing different things :/ | 22:38 |
openstackgerrit | Clark Boylan proposed openstack/diskimage-builder master: Properly set grub2 root device when using efi https://review.opendev.org/c/openstack/diskimage-builder/+/785247 | 22:43 |
clarkb | stevebaker: fungi ianw ^ ok that tries to simplify things just a bit more to make it more clear about what is going on (also comments) | 22:44 |
ianw | i'm running a backup prune in a root screen on backup02 vexxhost | 22:45 |
clarkb | oh I've got the no secure boot warning in there twice now. Let me clean that up | 22:45 |
openstackgerrit | Clark Boylan proposed openstack/diskimage-builder master: Properly set grub2 root device when using efi https://review.opendev.org/c/openstack/diskimage-builder/+/785247 | 22:46 |
stevebaker | clarkb: that looks good, thanks. I'll try building some images locally and trying (non) secure uefi and legacy boot, but I'm happy for this to land asap | 22:48 |
clarkb | stevebaker: thank you for talking me through that, it was really helpful to understand the various scenarios at play | 22:49 |
clarkb | I've rechecked https://review.opendev.org/c/zuul/nodepool/+/785286 which should get us an arm64 image we can use too | 22:50 |
ianw | clarkb: do you have an arm64 node setup as a builder already to try that? | 23:07 |
clarkb | ianw: no was going to put it on nb03 with everything but centos-8 paused | 23:08 |
*** tosky has quit IRC | 23:11 | |
fungi | that will hopefully also get centos-8-arm and centos-stream-8-arm jobs running again without having to wait for a dib release | 23:18 |
clarkb | the image build for that change failed though. Looking at it now | 23:18 |
fungi | they've been stuck and queuing up for roughly a week at this point | 23:18 |
clarkb | I think I see the issue (the first one to fail anyway :) ) | 23:19 |
clarkb | I doubt that will be done before I need to figure out dinner. If someone else wants to give it a go feel free. Othewise I'll try it in the morning | 23:24 |
ianw | clarkb: sorry 785286 is what we need to look at? | 23:31 |
clarkb | ianw: https://review.opendev.org/c/openstack/diskimage-builder/+/785247 is the dib change and 785286 depends on it in nodepool to try and build an arm64 image with that fix in it | 23:32 |
clarkb | ianw: the jobs that run against dib only do x86 out of the box so the second change updates the job in nodepool to also do arm | 23:33 |
ianw | ok, so which one failed? | 23:34 |
clarkb | 785286's previous patchset failed to build the arm image | 23:35 |
clarkb | https://5acac5304d1654f87681-c72835273dbeced4955618c918040aa4.ssl.cf5.rackcdn.com/785286/1/check/nodepool-build-image-siblings/ab6540a/job-output.txt is the failure | 23:35 |
clarkb | current versions of both changes are passing | 23:36 |
clarkb | though the dib one did fail a nonvoting job | 23:36 |
ianw | ok, so in short we want to test the dib in that builder image building a centos image right? | 23:39 |
clarkb | yup and the key thing for this particular issue is the resulting centos image boots | 23:41 |
clarkb | the current images fail bootign very early in the process | 23:42 |
clarkb | because root device target for the linux kernel as set by grub is /dev/mapper/loop7p3 but should be LABEL=cloudimg-rootfs (I think I got the two names correct) | 23:42 |
fungi | as in dracut essentially times out waiting for someone to hotplug the root device it's been told to look for | 23:42 |
clarkb | and the reason this happened is for centos efi builds we were running grub-mkconfig prior to setting GRUB_DEVICE=LABEL=cloudimg-rootfs in /etc/default/grub | 23:43 |
ianw | ok. i'll see about getting a test env up. | 23:44 |
ianw | i feel like the devstack arm64 work has continued, but iirc has issues fitting into 8gb. that's the requirement for getting an end-to-end arm64 test which would obviously be good | 23:44 |
clarkb | what my fix attempts to do is run grub-mkconfig only once then copy the resulting config to the appropriate locations once it has been written and had some tweaks applied to it | 23:45 |
ianw | https://grafana.opendev.org/d/T5zTt6PGk/afs?viewPanel=34&orgId=1&from=now-7d&to=now ... i think i underestimated the time openafs requires to keep the tarballs mirror in sync | 23:51 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!