opendevreview | Merged zuul/zuul-jobs master: Add jammy testing https://review.opendev.org/c/zuul/zuul-jobs/+/851418 | 00:27 |
---|---|---|
opendevreview | Merged zuul/zuul-jobs master: Sort supported platforms https://review.opendev.org/c/zuul/zuul-jobs/+/851419 | 00:36 |
opendevreview | Merged zuul/zuul-jobs master: Support subsets of platforms in update-test-platforms https://review.opendev.org/c/zuul/zuul-jobs/+/851420 | 00:43 |
opendevreview | Merged zuul/zuul-jobs master: Revert "Revert cri-dockerd changes" https://review.opendev.org/c/zuul/zuul-jobs/+/851425 | 00:44 |
opendevreview | Merged zuul/zuul-jobs master: Test ensure-kubernetes on all Ubuntu platforms https://review.opendev.org/c/zuul/zuul-jobs/+/851421 | 00:44 |
ianw | https://zuul.opendev.org/t/openstack/builds?job_name=propose-translation-update is showing a lot of green | 00:47 |
clarkb | ya I think elodilles confirmed it was looking good earlier today too | 00:48 |
clarkb | thank you for working through that | 00:48 |
ianw | i guess something like https://zuul.opendev.org/t/openstack/build/6c1c8a3b6f6042539ddbf9562f005fa7/console is really a failure of the package and it's bindep dependencies | 00:49 |
opendevreview | Merged openstack/project-config master: Synchronize diskimages on builders https://review.opendev.org/c/openstack/project-config/+/851444 | 00:55 |
*** ysandeep|out is now known as ysandeep | 01:54 | |
*** ysandeep is now known as ysandeep|afk | 03:24 | |
*** ysandeep|afk is now known as ysandeep | 03:38 | |
tonyb | How do I see which groups/users have access to a repo? I expected it to be https://review.opendev.org/admin/repos/openstack/pymod2pkg,access but that has an empty area where I'd expect the details to be | 04:24 |
tonyb | I can see the answer in the approprate REST API | 04:30 |
*** ysandeep is now known as ysandeep|break | 04:32 | |
*** ysandeep|break is now known as ysandeep | 05:36 | |
*** bhagyashris is now known as bhagyashris|ruck | 06:05 | |
*** ysandeep is now known as ysandeep|lunch | 07:14 | |
bbezak | Hi - I've noticed that 'rockylinux-8' nodes stopped working couple of days ago. Do you happen to know what's up with that? - https://zuul.opendev.org/t/openstack/builds?job_name=kayobe-seed-vm-rocky8&project=openstack/kayobe | 07:37 |
opendevreview | Slawek Kaplonski proposed openstack/project-config master: Add new project "whitebox-neutron-tempest-plugin" in the x/ namespace https://review.opendev.org/c/openstack/project-config/+/851031 | 07:41 |
*** jpena|off is now known as jpena | 08:45 | |
*** akahat|ruck is now known as akahat | 09:12 | |
*** ysandeep|lunch is now known as ysandeep | 10:21 | |
*** rlandy|out is now known as rlandy | 10:35 | |
*** dviroel|afk is now known as dviroel | 11:15 | |
*** dviroel is now known as dviroel|rover | 11:18 | |
dtantsur | hi folks, could someone elaborate on the error message in https://review.opendev.org/c/openstack/ironic-inspector/+/851501 and why it ignored depends-on? | 11:40 |
dtantsur | this is definitely something that used to work | 11:40 |
*** soniya is now known as soniya|ruck | 11:52 | |
*** soniya|ruck is now known as soniya|afk | 11:52 | |
*** frenzy_friday is now known as frenzyfriday|rover | 11:54 | |
NeilHanlon | bbezak: I'm poking at that today, came to ask for some help myself :) | 12:17 |
fungi | tonyb: unfortunately, gerrit's security model is such that it errs on the side of only showing you access information for things you have access to do, so you can't actually see the acl entries for things that don't authorize you to do them | 12:19 |
fungi | tonyb: the workaround would be to parse in the gerrit/projects.yaml and gerrit/acls/*/* files in the openstack/project-config repo | 12:20 |
tonyb | fungi: but I can see the information via that rest API with curl | 12:20 |
fungi | i'm not surprised that they apply that security model inconsistently | 12:20 |
tonyb | Fair enough. i wanted to answer the question "who can land this patch" which I used to be able to answer in an older Gerrit at that url. | 12:23 |
tonyb | I got my answer just found it strange/a regression I can't do it in the web UI like I could | 12:23 |
tonyb | anyway, thanks fungi for the reply | 12:23 |
fungi | tonyb: i had a script for that at https://opendev.org/opendev/system-config/src/branch/master/tools/who-approves.py but it likely needs some updating for recent changes in gerrit | 12:24 |
fungi | it handled things like recursive group resolution too | 12:25 |
fungi | bbezak: NeilHanlon: it looks like the images are being built successfully, just looking at the logs on our nodepool image builders. my guess is they've ceased to be bootable for some reason | 12:25 |
tonyb | fungi thanks. I'll check that out over the weekend | 12:26 |
fungi | bbezak: NeilHanlon: the most recently built one seems to be here if you want to give it a try locally: https://nb02.opendev.org/images/ | 12:27 |
fungi | dtantsur: a while back, zuul went from ignoring approvals on changes like that to registering a failure, so that you'd be aware | 12:30 |
fungi | the reason and solution remain the same: if changes depend on each other but their projects don't share a change queue, don't approve the depending change until the one it depends on has merged | 12:31 |
NeilHanlon | sweet, thanks fungi. checking out now | 12:31 |
dtantsur | fungi: hmm, I see. That's probably a message that did not come across, at least I somehow missed it. | 12:35 |
dtantsur | maybe an error message could be expanded with an explanation? | 12:35 |
NeilHanlon | fungi: so I grabbed that rocky8 qcow and booted it fine in KVM using virt-install. are there any logs I can look at for it trying to start up? | 12:47 |
fungi | dtantsur: well, it already says "Change 851500 in project openstack/requirements does not share a change queue with 851501 in project openstack/ironic-inspector" as its reason. are you suggestion it should explain why changes have to share a change queue in order to be enqueued together, or that zuul start embedding links to relevant sections of its documentation in errors, or something else? | 12:57 |
fungi | s/suggestion/suggesting/ | 12:58 |
fungi | NeilHanlon: i think we might have a feature in nodepool which tries to capture the console log from nova for boot failures, i'll have to see if that happens by default (and where it gets written), or if it's something we have to turn on | 12:59 |
fungi | just booting isn't sufficient though, we need to be able to ssh into it for the boot to be considered successful, so if it's coming up with no networking or sshd isn't starting that could still explain it | 12:59 |
NeilHanlon | ah, yeah that makes sense, too. i'm also rebuilding my AIO here as I ruined it a few weeks ago and haven't had a chance to fix it. so I can test in nova too | 13:00 |
*** ysandeep is now known as ysandeep|out | 13:02 | |
dtantsur | fungi: well, it's confusing for those who got used to the old behavior (which I liked much more tbh). Like, it never shared a change queue, so the cause is unclear without a further explanation. | 13:03 |
fungi | ahh, yeah the old behavior was that it ignored your approval completely and you had to approve it again once the dependency merged | 13:04 |
fungi | now it actually gives you feedback explaining why it wasn't enqueued, rather than just silence | 13:04 |
fungi | NeilHanlon: i can confirm, our launcher logs indicate a timeout waiting for a connection to port 22 on the vm | 13:05 |
fungi | it reports that the node makes it to a running state in nova, but we can't reach it | 13:05 |
fungi | so something has happened with networking setup i guess, either in rocky itself or in dib | 13:06 |
NeilHanlon | fungi: is it safe to assume it's trying to ssh in with a password, not a pubkey? | 13:17 |
fungi | nope, we embed allowed keys into the image, but you can override them through configdrive | 13:19 |
fungi | first order of business though would be to see if there's even a reachable socket on 22/tcp | 13:19 |
fungi | if you can netcat or telnet to that and get an sshd banner back, then at least that much is working | 13:19 |
*** dviroel|rover is now known as dviroel | 13:20 | |
fungi | as far as i can tell from our launcher logs, i don't think we're getting that far even | 13:20 |
Clark[m] | If the issue is with glean the openstack console log should record what glean has done or attempted to do. Often times with issues like this we end up needing to manually boot the image, check console log, then maybe rescue instance to edit something and try rebooting and so on | 13:21 |
NeilHanlon | on my qemu/kvm setup here, sshd is started and I can ssh to it | 13:22 |
NeilHanlon | (I recognize this is not the setup it's failing in, though) | 13:22 |
*** dviroel is now known as dviroel|afk | 13:25 | |
fungi | i can confirm, for example, that it's failing in rackspace but that's of course xen and not kvm. i'll see if i can also find a similar situation in another provider | 13:33 |
fungi | Clark[m]: did we or did we not add a feature to nodepool for capturing console logs automatically? | 13:34 |
Clark[m] | I think you might have to toggle it on for the label or the image | 13:36 |
Clark[m] | I've got to pop out now but can take a second look in a couple hours | 13:36 |
fungi | strangely, i see a bunch of rockylinux-8 nodes in a ready state in various providers, about half of which are over a day old | 13:37 |
fungi | and the rest are from roughly 4 hours ago | 13:37 |
fungi | about 20 in total | 13:38 |
fungi | yet they're not getting used to fulfill node requests | 13:38 |
fungi | no, nevermind. i was looking at image-list. those are images. *sigh* | 13:39 |
* fungi finishes his coffee | 13:39 | |
fungi | it looks like nl01 (our rackspace launcher) is the only one that's tried booting grep rockylinux-8 nodes in the past few hours, so maybe it's failing on xen but no other launchers are taking over the node request? | 13:42 |
fungi | s/grep // | 13:42 |
*** arxcruz|rover is now known as arxcruz | 13:43 | |
fungi | nevermind, i found an example in iweb from 07:38:10 utc, same story (timeout waifing for connection to port 22) | 13:48 |
fungi | so we're seeing it in a kvm provider too | 13:48 |
fungi | er, timeout waiting | 13:49 |
fungi | anyway, i think this confirms the problem isn't provider or backend specific, so we need to see what's being recorded to the console log (which i think will be easier in iweb due to rackspace only providing web-based console access) | 13:50 |
*** dasm|off is now known as dasm | 13:59 | |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Temporarily turn on console logs for rocky in iweb https://review.opendev.org/c/openstack/project-config/+/851519 | 13:59 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Revert "Temporarily turn on console logs for rocky in iweb" https://review.opendev.org/c/openstack/project-config/+/851520 | 13:59 |
fungi | i'll set the revert to wip for now | 13:59 |
*** pojadhav is now known as pojadhav|out | 14:08 | |
opendevreview | Merged openstack/project-config master: Allow Stackalytics maintainers to rewrite history https://review.opendev.org/c/openstack/project-config/+/851454 | 14:16 |
fungi | self-approving 851519 so we can start collecting logs on that asap | 14:18 |
opendevreview | Merged openstack/project-config master: Temporarily turn on console logs for rocky in iweb https://review.opendev.org/c/openstack/project-config/+/851519 | 14:31 |
fungi | deploy job for it is almost done running | 14:59 |
fungi | and it's deployed. now to wait for another rockylinux-8 boot attempt in iweb-mtl01 | 15:03 |
fungi | it doesn't appear to actually put the captured console in the launcher's debug log, nor are there any separate files for that i can see in the log dir | 15:25 |
fungi | nothing relevant in /tmp either | 15:25 |
fungi | the launcher doesn't need to be restarted after a config update, does it? | 15:26 |
Clark[m] | It shouldn't. It reloads the config each pass through the run loop | 15:35 |
fungi | maybe the console log collector is bitrotten? | 15:38 |
fungi | ahh, nope it's in there, i was just having trouble sifting it out | 15:43 |
fungi | very little console output captured because it's sitting at a grub> prompt | 15:45 |
fungi | looks like it probably has no idea what to boot | 15:46 |
fungi | no errors though that i can see, it's just not automatically booting. maybe it can't find the grub config? | 15:48 |
fungi | looks like we're generating a /boot/grub2/grub.cfg during image building at least | 15:51 |
fungi | starting at 08:29:32 in the log here: https://nb02.opendev.org/rockylinux-8-0000009494.log | 15:51 |
clarkb | is the device label wrong maybe? or did we switch to efi somehow? | 15:52 |
*** jpena is now known as jpena|off | 15:52 | |
fungi | it's not finding a /sys/firmware/efi and so doing mbr | 15:53 |
fungi | /usr/sbin/grub2-install '--modules=part_msdos part_gpt lvm biosdisk' --target=i386-pc --force /dev/loop0 | 15:53 |
clarkb | ok thats good. we want bios for the x86 clouds | 15:54 |
clarkb | fungi: often at this point a rescue instance or mounting the image locally can be helpful just to see what t looks like. With a rescue instance you can make changes and rebootto see if they fix | 15:55 |
fungi | looks like the image builds were failing on the 26th which is as far back as our log retention for those goes, so i suspect it went from "we have working images" to "we can't build images" for some period of time and then to "now we can build images again but they no longer boot in our providers" at which point people started getting node_error results | 15:55 |
clarkb | it is odd that the image would boot for NeilHanlon but not in the cloud if grub config is the problem | 15:55 |
clarkb | yes all image builds were failing due to the full disks due to the leaked fedora-35 images I cleaned up | 15:56 |
clarkb | once I cleared that up we had room to make images again and new images would've been created | 15:56 |
fungi | ahh, okay. so really no idea how long ago this broke, because it's likely the working nodes were booting from quite stale images | 15:56 |
clarkb | ya | 15:56 |
*** iurygregory_ is now known as iurygregory | 15:57 | |
fungi | last time i see any substantive changes for rocky image builds in dib was february when it started insisting on networkmanager | 15:59 |
fungi | so probably some outside factor changed | 15:59 |
fungi | we updated our container base images for nodepool on june 30, but that's also quite a while ago | 16:01 |
clarkb | and the container base images should't affect building much | 16:01 |
clarkb | dib makes a chroot and isolates itself from the host env | 16:01 |
fungi | ahh, yep | 16:02 |
clarkb | it is probably some change to the distro itself that we aren't accomodating properly | 16:02 |
*** marios is now known as marios|out | 16:02 | |
fungi | "the distro" meaning rocky, not the builder's distro? | 16:03 |
clarkb | yes | 16:03 |
fungi | rocky linux 8.6 happened in may | 16:05 |
fungi | ~10 weeks ago | 16:05 |
NeilHanlon | there are some changes, definitely. i didn't anticipate any making the builds fail though. my apologies | 16:06 |
fungi | 9.0 happened a couple of weeks back though, are we accidentally installing that thinking it's v8? | 16:06 |
fungi | NeilHanlon: "recent" changes after 8.6? | 16:07 |
NeilHanlon | the latest containers I pushed in early July | 16:08 |
NeilHanlon | the only difference in the container package set is a langpack, so my assumption is there's a change in config somewhere/somehow going on | 16:12 |
*** dviroel|afk is now known as dviroel | 16:14 | |
fungi | i suppose we could try to figure out how to role back to the earlier container for that and test the theory | 16:14 |
NeilHanlon | if the build does any dnf upgrade, it would go to the latest versions, so i'm not sure that would help | 16:15 |
clarkb | ya I think it will be more productive to work backward from grub doesn't work | 16:16 |
clarkb | rather than try and identify a needle in a haystack | 16:16 |
clarkb | it could also be a change to how dib manages grub | 16:18 |
NeilHanlon | https://review.opendev.org/c/openstack/diskimage-builder/+/826976 maybe? | 16:21 |
clarkb | ya I wonder what was in the most recent dib release | 16:23 |
fungi | `git tag --contains 9987d09` says that first appeared in 3.21.0 tagged 2022-05-04 | 16:26 |
fungi | so it's been in for a while | 16:26 |
fungi | and we bumped our nodepool images to that dib version the same day | 16:26 |
clarkb | unrelated, why in 2022 do we still have to do an initial login to set an admin user password | 16:43 |
fungi | is that for postorious/hyperkitty? | 16:47 |
clarkb | yes | 17:00 |
clarkb | reading the script that handles it it looks like you can just not set the vars to create an admin user and I'm hoping doing that is "secure" the docs say you have to do it though so we'll see | 17:00 |
clarkb | these containers use alpine too and the user management is weird. I'm worried we might have to replace them. But one step at a time | 17:01 |
fungi | yeah, i had similar concerns when looking at them. though when i first looked they were building "fat" containers with an init and multiple services | 17:14 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 17:14 |
clarkb | fbo[m]: ya so far no real deal breakers just some odd decisions (but I've got my own biases). Another thing I Noticed is one container uses pymysql to use the db and the other uses the C bound library | 17:15 |
clarkb | er sorry fbo[m] that was for fungi | 17:15 |
fungi | yes, it seems like a somewhat disjoint hodgepodge | 17:15 |
clarkb | that latest patchset will almost certainly explode, but it should give us logs I hope for what assumptions we need to work around | 17:17 |
clarkb | it isn't clear to me how to configure the vhost aspects of mailman3 through the docker containers yet for example | 17:17 |
fungi | it's been a while since i did the first poc, but i can pull up my notes when i'm at another computer. my recollection though is you don't set up the vhosts, you just add mailing lists to them and they appear as if by magic. also possible i was drunk, no guarantees | 17:19 |
clarkb | ha ok | 17:20 |
fungi | in mm3 though, i'm pretty sure the domain is just part of the list info | 17:20 |
clarkb | but ya Ifigure a lot of this stuff will make more sense if we just get a thing in CI with some tests | 17:20 |
fungi | clarkb: my old notes (from 4 years ago, omg!) are at https://etherpad.openstack.org/p/mm3poc | 17:32 |
fungi | those include creating mailing lists in multiple domains | 17:32 |
fungi | looks like you pass the `mailman create` subcommand a parameter for the list address which necessarily includes the domain (unlike in mm2 where it's just the base name for the ml) | 17:34 |
fungi | a downside to that design is that we'll probably need to have some extra validation or maybe a hierarchical data structure (like now) in our meta-config to make sure you don't accidentally typo a new domain into existence | 17:39 |
fungi | so even though mm3 fully qualifies the listnames now, we could keep our yaml as a nested associative array of domain:list | 17:41 |
clarkb | ++ | 17:50 |
fungi | technically we organize it by a "site name" and then a list at present, but we could replace the site names with corresponding fqdn from our existing map | 17:51 |
fungi | this is a good opportunity to simplify all that anyway | 17:52 |
clarkb | I've booted clarkb-test-rocky in ovh bhs1 just to verify the behavior is consistent there (it is) | 17:56 |
clarkb | Now I'm going to try rescueing it to see if anything stands out in the image | 17:56 |
fungi | oh, thanks! | 17:57 |
fungi | NeilHanlon: ^ | 17:57 |
clarkb | side note ovh apparnetly has uefi images. I wonder ifwe can boot efi there now | 17:57 |
fungi | i got briefly sidetracked by overdue yardwork and an impending appliance delivery | 17:57 |
fungi | oh neat | 17:58 |
clarkb | Ok first thing I checked is that it isn't somehow rocky 9. /etc/os-release reports 8.6 | 18:02 |
fungi | oh good | 18:03 |
clarkb | next the device lable is cloudimg-rootfs so that was set proprely | 18:03 |
corvus | the zuul pipeline refresh change merged, so that should be in place after this weekends restarts (this is the issue that caused 2 zuul changes to get stuck in gate for 24h) | 18:04 |
clarkb | fungi: NeilHanlon /boot/grub2/grub.cfg doesn't appear to have any menu entries | 18:06 |
clarkb | I think we havne't specified what to boot in the grub config | 18:06 |
clarkb | NeilHanlon: I'm surprised it was able to boot foryou? | 18:07 |
clarkb | oh wait | 18:08 |
fungi | thanks corvus! | 18:08 |
clarkb | I may have derped and looked at the wrong path | 18:08 |
NeilHanlon | the image i downloaded from nb02 seems to be fine | 18:08 |
clarkb | no I looked at the correct grub config. I looked at the wrong etc/default/grub | 18:08 |
NeilHanlon | w.r.t. the kernel, i mean | 18:08 |
clarkb | NeilHanlon: thats the image I booted on ovh | 18:08 |
clarkb | NeilHanlon: grub.cfg is present but without any entries to boot | 18:09 |
fungi | maybe grub picks the first definition it finds in that case? | 18:09 |
fungi | but yeah, the behavior we're observing looks (from the console log) exactly like "i launched grub... what next?" | 18:10 |
clarkb | well there isn't anything in the grub config to say what it should boot | 18:10 |
NeilHanlon | https://paste.opendev.org/show/bhdevsP8brrCdOg9LXnH/ | 18:10 |
clarkb | https://nb02.opendev.org/images/rockylinux-8-0000009494.qcow2 is the images i hsould be one in ovh | 18:10 |
NeilHanlon | yeah that's the one I grabbed | 18:11 |
clarkb | NeilHanlon: uh this image only has /boot/grub2/grub.cfg | 18:11 |
NeilHanlon | 🤔 | 18:11 |
clarkb | that paste looks like grub 1 | 18:11 |
clarkb | there is also a /boot/efi/EFI here (which on the surface shouldn't affect anything but it is unexpected to me) | 18:12 |
clarkb | parted shows the partition type is mbr not gpt so I think that implies we aren't tryingto build an efi images with dib (as dib assumes gpt with efi?) | 18:14 |
clarkb | NeilHanlon: can you check your /etc/dib-build-date.txt | 18:15 |
clarkb | er /etc/dib-builddate.txt | 18:15 |
clarkb | 2022-07-29 08:17 is what I've got in there | 18:15 |
NeilHanlon | i'm very confused. i downloaded the image again and now it is booting to a grub console... | 18:18 |
NeilHanlon | i don't believe the lack of menuentries in grub.cfg is a problem; none of my installs have that either | 18:19 |
clarkb | how doe sgrub know what to boot in that case? | 18:19 |
clarkb | it may be that in the efi case shortcuts can be taken but I think in the bios case you have to have something there for grub to boot | 18:20 |
clarkb | /etc/default/grub appear to be set properly with GRUB_DEVICE=LABEL=cloudimg-rootfs so whatever is generating the grub config is just not producing entries for that device and the kernel? | 18:21 |
NeilHanlon | https://rpa.st/A3WA this is from a different rocky 8 system | 18:21 |
clarkb | oh! I see no kernels in /boot | 18:22 |
NeilHanlon | 😂oops | 18:22 |
NeilHanlon | we don't need those, right? | 18:22 |
clarkb | I bet the underlying issue is no kernel means grub config isn't properly populated | 18:23 |
clarkb | in your past you have stuff under 10_linux starting at line 106 which is empty in these images | 18:23 |
clarkb | but ya I suspect if we fix the lack of kernels then grub instal will be happy | 18:23 |
clarkb | I think that is enough info to debug via the build logs and local dib runs now? I'm going to clean up my test instance | 18:23 |
NeilHanlon | fair enough! :) yep, definitely | 18:24 |
NeilHanlon | thank you and fungi for the assistance thus far | 18:24 |
clarkb | you're welcome. Fwiw I was expecting a glean issue because it is always a glean (or networkmanager) issue :) | 18:25 |
clarkb | neat to find something different | 18:25 |
fungi | also happy when the problem isn't something we have to fix ;) | 18:43 |
NeilHanlon | i enjoy breaking things in new and exciting ways | 18:45 |
fungi | that's the thrill of computering | 18:52 |
NeilHanlon | it seems to be installing the kernel, so I am le confused | 18:52 |
NeilHanlon | but that is relatively normal for me, so i'll take it | 18:53 |
fungi | you're looking at the image build log i linked earlier, right? | 18:53 |
NeilHanlon | that, and running locally | 18:53 |
fungi | 2022-07-29 08:15:30.945 | > ---> Package kernel.x86_64 4.18.0-372.16.1.el8_6.0.1 will be installed | 18:55 |
fungi | that looks good | 18:55 |
fungi | 2022-07-29 08:16:40.587 | > Installing : kernel-headers-4.18.0-372.16.1.el8_6.0.1.x86_64 79/193 | 18:55 |
fungi | oh, wait, wrong pacage | 18:55 |
NeilHanlon | https://paste.opendev.org/show/b9N8BkCEAptouFUklhsS/ | 18:56 |
fungi | 2022-07-29 08:17:03.186 | > Installing : kernel-4.18.0-372.16.1.el8_6.0.1.x86_64 171/193 | 18:56 |
fungi | "/boot is not a mountpoint" so it wants a separate /boot partition? | 18:58 |
clarkb | it is common practice t odo that but I'm not aware of anyone requiring it be done | 18:58 |
fungi | or is it just that something needs to add a /boot/grub2/grubenv? | 18:58 |
NeilHanlon | i think it is stemming from /boot/loader/entries/ being empty | 18:59 |
clarkb | it is required to hvae a fat32 partition for efi at /boot/efi or whatever but we aren't efi'ing | 18:59 |
fungi | yeah, it's not clear to me whether any of the entries in that paste are errors or just informing the decisions | 19:00 |
fungi | i do see it in our build log too: 2022-07-29 08:29:32.972 | grep: /boot/grub2/grubenv: No such file or directory | 19:00 |
fungi | though "/boot is not a mountpoint" is not in our log | 19:01 |
NeilHanlon | yeah, same. will have to dig in, I think. I wonder if it is a separate partition at point point, and then after save it reverts to being a (different) directory? Does that make sense? i.e., something is mounting on top of an existing directory and doing work that gets lost | 19:02 |
clarkb | if that were happening it would have to do a weird end around dib's disk management. I think that would be weird for physical devices too | 19:03 |
NeilHanlon | I.. might have found it. Confirming | 19:06 |
NeilHanlon | https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/iscsi-boot/finalise.d/51-open-iscsi-config#L10 | 19:07 |
fungi | last touched 2 years ago, did something change in rocky to need it? | 19:09 |
fungi | a recentaddition of iscsi element to rocky maybe necessitates it? | 19:09 |
clarkb | fungi: I think it is the source of that mesage you saw but unlikely realted to this bug | 19:09 |
fungi | ahh, yes | 19:09 |
NeilHanlon | no, this wasn't it either. i was hoping it was somehow responsible for making the configs on centos images, but I was mistaken | 19:10 |
fungi | half of the problem is that i would normally do a sxs comparison of the working and broken builds, but since we also had a bout of rapid-fire build failures we no longer have a good log to compare against | 19:10 |
NeilHanlon | I believe what needs to happen is adding `GRUB_ENABLE_BLSCFG=true` to /etc/default/grub - but it is not clear to me why it's unnecessary on other images, too | 19:11 |
NeilHanlon | e.g., I expect this to be roughly equivalent to a centos8 one or a rhel 8 | 19:12 |
clarkb | we only boot centos-8-stream now so maybe they have diverged and rhel 8 is also broken | 19:13 |
fungi | yeah, at this point rocky and openeuler are the closest things we have to rhel | 19:14 |
fungi | or did someone get alma images going too? | 19:15 |
NeilHanlon | i don't think so | 19:15 |
clarkb | ya no alma | 19:15 |
*** dasm is now known as dasm|off | 20:46 | |
clarkb | when I run into distro specific issues I tend to fire up a container really quickly and sanity check. Unfortuantely that doesn't help with kernels and boot stuff | 20:51 |
clarkb | dib-run-parts Running /tmp/in_target.d/finalise.d/01-clean-old-kernels seems to run and not find any old kernels to clean | 20:55 |
BlaisePabon[m]1 | you had me going for a bit here, because this is (almost) a plausible sentence in Spanish and I was staring at it, trying to figure out what you were trying to say. | 20:55 |
clarkb | BlaisePabon[m]1: with "ya no alma" ? | 20:56 |
BlaisePabon[m]1 | I imagined a woman named Alma who was giving you a hard time and you were telling her off. | 20:57 |
clarkb | NeilHanlon: I see in the image size report that 69MiB /opt/dib_tmp/dib_build.5sTNqxWo/built/usr/lib/modules/4.18.0-372.16.1.el8_6.0.1.x86_64 is present. Is it possible that the symlinks in /boot are missing due to package changes? | 21:00 |
clarkb | maybe we need another package to add those symlinks? | 21:00 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 21:05 |
NeilHanlon | that seems reasonable. i'll be trying to sort it later this evening. gotta go do family stuff | 21:07 |
clarkb | no rush, thanks for looking | 21:10 |
*** dviroel is now known as dviroel|out | 21:33 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!