opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire PowerVMStacker SIG: Update ACL for repositories https://review.opendev.org/c/openstack/project-config/+/909628 | 01:28 |
---|---|---|
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire PowerVMStacker SIG: Remove Project from Infrastructure Systems https://review.opendev.org/c/openstack/project-config/+/909584 | 01:29 |
opendevreview | Takashi Kajinami proposed openstack/diskimage-builder master: Bump upper version of flake8 https://review.opendev.org/c/openstack/diskimage-builder/+/909336 | 02:38 |
*** liuxie is now known as liushy | 06:15 | |
opendevreview | Merged openstack/project-config master: Retire PowerVMStacker SIG: Update ACL for repositories https://review.opendev.org/c/openstack/project-config/+/909628 | 11:29 |
frickler | not sure if related to the general connectivity issues for gerrit, but I've now had multiple times the situation that a review was shown as usual in the CI, except that all comments were missing, the corresponding fields were just blank. resolved each time by a reload of the page. just mentioning in case others are seeing this, too | 16:38 |
frickler | s/CI/UI/ | 16:38 |
fungi | thanks, noted. pretty much the entire ui is javascript making asynchronous calls to the gerrit rest api and rendering them in the browser, so i can imagine how intermittent connectivity issues could make some of those queries fail | 16:40 |
clarkb | fungi: yes that is how the ui operates. If it happens again I would check your browser debug tools to see if some requests failed | 16:44 |
clarkb | or maybe timed out etc | 16:44 |
clarkb | re centos-7 cleanup I just discovered that starlingx appears to still be building on it.... | 16:50 |
clarkb | thoguh the job last ran in 2020 and wasn't successful in the history we've got | 16:51 |
clarkb | so maybe this zuul config is deader than I thought | 16:51 |
fungi | ...undead, undead, undead | 17:06 |
fungi | (apologies to bauhaus) | 17:06 |
clarkb | its surprising how entrenched these not very used thinsg can end up in configs | 17:09 |
clarkb | opensuse-15 is all over stable branches I think | 17:09 |
clarkb | despite never really running successfully or running very often | 17:10 |
clarkb | I'm pushing changes up to clean up some of that, but I don't expect us to have to clean up everything. Nor do I expect us to have to wait | 17:10 |
clarkb | this is more of a courtesy saying "hey this isn't working and hasn't worked in forever you should clean it up" | 17:10 |
fungi | makes sense | 17:12 |
opendevreview | Clark Boylan proposed opendev/bindep master: Drop bindep-opensuse-15 job https://review.opendev.org/c/opendev/bindep/+/909771 | 17:15 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Drop opensuse-15 jobs https://review.opendev.org/c/zuul/zuul-jobs/+/909772 | 17:18 |
opendevreview | Clark Boylan proposed opendev/base-jobs master: Drop the opensuse-15 nodeset https://review.opendev.org/c/opendev/base-jobs/+/909773 | 17:21 |
clarkb | infra-root if you start with 909773 ^ and then follow its depends on list those are the subset of thinsg I think we should try and remove before cleaning up nodepool and mirrors | 17:22 |
clarkb | I'll work on getting all the changes up with topic:drop-opensuse but some of the changes will be in places like tempest and keystone where I don't think we should hold up cleanup for | 17:22 |
fungi | can do | 17:25 |
opendevreview | Clark Boylan proposed openstack/project-config master: Set opensuse-15 min-ready to 0 https://review.opendev.org/c/openstack/project-config/+/909774 | 17:25 |
clarkb | oh I missed one spot too. I'll have to update the base-jobs change with another entry | 17:28 |
clarkb | so many tendrils | 17:28 |
opendevreview | Clark Boylan proposed openstack/project-config master: Set opensuse-15 min-ready to 0 https://review.opendev.org/c/openstack/project-config/+/909774 | 17:33 |
opendevreview | Clark Boylan proposed openstack/project-config master: Drop bindep fallback testing on opensuse https://review.opendev.org/c/openstack/project-config/+/909775 | 17:33 |
opendevreview | Clark Boylan proposed openstack/project-config master: Drop opensuse nodes and images from nodepool https://review.opendev.org/c/openstack/project-config/+/909776 | 17:33 |
opendevreview | Clark Boylan proposed opendev/base-jobs master: Drop the opensuse-15 nodeset https://review.opendev.org/c/opendev/base-jobs/+/909773 | 17:33 |
opendevreview | Merged openstack/diskimage-builder master: Fetch compatibile dnf download command in functest setup https://review.opendev.org/c/openstack/diskimage-builder/+/909466 | 17:36 |
opendevreview | Clark Boylan proposed opendev/system-config master: Stop mirroring OpenSUSE Leap 15 https://review.opendev.org/c/opendev/system-config/+/909779 | 17:47 |
clarkb | Ok I think that is the last change I'll be proposing for this cleanup assuming I didn't miss anything | 17:47 |
clarkb | yoctozepto: your name popped up related to centos 7 and kolla testing stuff. CentOS 7 is also on the removal list. Is this something kolla would be prepared for at this point? | 17:49 |
frickler | clarkb: yoctozepto is no longer involved in kolla. mnasiadka might know. but since we eoled xena and later, it should all be centos8 or later I think | 17:51 |
clarkb | frickler: great. There are a few other places centos 7 has popped up that I need tor un down but that seems to be the theme. Centos 7 was used but not so much anymore | 17:51 |
mnasiadka | centos 7 is long gone in Kolla | 17:52 |
clarkb | whcih makes sense since centos 7 is about 10 yaers old now and eols in a few months | 17:52 |
mnasiadka | unmaintained/yoga is the last one to use centos 8 stream | 17:52 |
mnasiadka | (but I guess you're not phasing that out yet) | 17:52 |
clarkb | no. just 7 | 17:52 |
clarkb | mnasiadka: the full list is opensuse-15, Debian buster, centos 7 and ubuntu xenial | 17:52 |
mnasiadka | clarkb: that's safe to delete from kolla perspective, branches that supported that are long EOL | 17:53 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Drop debian buster testing https://review.opendev.org/c/zuul/zuul-jobs/+/909786 | 18:19 |
opendevreview | Clark Boylan proposed opendev/system-config master: Replace buster with bookworm in role integration testing https://review.opendev.org/c/opendev/system-config/+/909787 | 18:23 |
clarkb | mnaser: for buster and centos 7 cleanup vexxhost/ansible-role-base-server and vexxhost/ansible-role-frrouting show use of centos 7 and/or debian buster in their zuul configs but the last time the jobs ran appears to have been around 2020. I don't think cleaning the node types up poses a problem as result but figured I'd mention it | 18:31 |
mnaser | clarkb: go for dropping it :) | 18:35 |
clarkb | fungi: any idea why this failed: https://zuul.opendev.org/t/openstack/build/d8367d9080c54fff8c082991724eccb1/console#1/2/17/base seems like dkms didn't run to build the package | 18:49 |
clarkb | but https://zuul.opendev.org/t/openstack/build/d8367d9080c54fff8c082991724eccb1/console#1/2/16/base seems to show that it does build | 18:50 |
fungi | clarkb: we'd need to collect the dkms log or hold a node to inspect the same | 18:50 |
clarkb | got it | 18:50 |
clarkb | autohold is set and I rechecked the change | 18:52 |
fungi | it'll write to something like /var/lib/dkms/openafs/<some_version>/build/make.log (i think) | 18:54 |
fungi | unfortunately, as it's compiling c with kernel bindings, there are an infinite number of possible reasons it could have failed | 18:54 |
fungi | often i've seen it be incompatible kernel header package versions | 18:55 |
fungi | but s this is ab arm build, i can imagine the builds for that aren't as well-trodden | 18:56 |
clarkb | ya I guess I'm surprised the second link would seem to show success if it actually failed | 18:57 |
clarkb | but it could be a deeper issue than dpkg is able to detect | 18:57 |
clarkb | 147.28.149.110 is the node that will eventually be held (the job is still running) | 19:01 |
clarkb | /var/lib/dkms/openafs/1.8.9/6.1.0-18-arm64/aarch64 has stuff in it include a module/ dir with a .ko | 19:06 |
clarkb | should I try to modprobe things? | 19:06 |
fungi | check /var/log/dpkg.log for errors too | 19:06 |
fungi | though i would have expected those to end up on stdout/stderr | 19:06 |
clarkb | /lib/modules/6.1.0-18-arm64/updates/dkms/openafs.ko also exists (it isn't a symlink but filesizes are the same. I'm going to hash comparen ow | 19:07 |
clarkb | ya hashes compare the same so whatever was built appears to be properly installed to the kernel's module list | 19:08 |
clarkb | the dpkglog isn't suepr helpful just a list of steps with no errors | 19:08 |
clarkb | `modprobe -v openafs` reports modprobe: ERROR: could not insert 'openafs': Exec format error | 19:09 |
clarkb | which could be a different kernel version | 19:09 |
fungi | does the booted kernel version match the installed kernel header package? | 19:10 |
clarkb | it appears to: uname -a says Linux np0036828704 6.1.0-18-arm64 #1 SMP Debian 6.1.76-1 (2024-02-01) aarch64 GNU/Linux and the module is in /lib/modules/6.1.0-18-arm64/updates/dkms/openafs.ko | 19:11 |
fungi | they can sometimes get out of sync, but that's usually just in unstable where kernel versions change | 19:11 |
clarkb | there is only the one kernel in /boot | 19:11 |
clarkb | /boot/vmlinuz-6.1.0-18-arm64 appears to be the only kernel | 19:11 |
clarkb | I can apt-get dist-upgrade and reboot and retry. Maybe we are on an older kernel but the openafs package can only build on the latest kernel? | 19:12 |
fungi | does dpkg -l report a linux-headers-.*-arm64 package installed with the same version? | 19:12 |
clarkb | yes ii linux-headers-6.1.0-18-arm64 6.1.76-1 | 19:13 |
clarkb | apt-get update is sloooooowwwwww | 19:13 |
jrosser | you can try modinfo on the built module too | 19:13 |
clarkb | vermagic: 6.1.0-18-arm64 SMP mod_unload modversions aarch64 | 19:14 |
clarkb | dist-upgrade reports nothing to upgrade to other than trying to resolve the broken package | 19:14 |
clarkb | so not a newer kernel that we need to build against | 19:15 |
fungi | i suppose another possibility could be the kernel enforcing signed modules, but i wouldn't expect that to suddenly start happening in stable and would also expect a clearer error from modprobe/insmod | 19:15 |
clarkb | oh! arm64 is the only place we boot with uefi so that could be | 19:16 |
clarkb | fungi: it isn't suddenly happening in stable though this is the first time we've added the job to bookworm | 19:16 |
clarkb | so maybe it would have always been an issue with uefi booted bookworm and we'rej ust noticing now | 19:16 |
fungi | ohhh | 19:16 |
clarkb | also I thought when you used dkms packages they were supposed to figure this out for you | 19:17 |
fungi | and it's working for amd64 just not arm64? | 19:17 |
clarkb | correct | 19:17 |
clarkb | https://unix.stackexchange.com/questions/543576/modprobe-fails-with-operation-not-permitted indicates we should get an operation not permitted error instead maybe | 19:17 |
fungi | i know i've been struggling lately with https://bugs.debian.org/945506 but this seems different (especially since it's not breaking amd64) | 19:18 |
clarkb | giving sysrq-trigger the x value to disable kernel lockdown doesn't seem to have helped | 19:21 |
clarkb | likely a different issue | 19:21 |
clarkb | there is a mok.key and mok.pub in /var/lib/dkms | 19:22 |
clarkb | I seem to recall that using official dkms packages was meant to work at least on debuntu | 19:22 |
jrosser | it is possible in that case you still have to manually enrol the dkms mok key to make it trusted | 19:23 |
jrosser | modinfo will tell you about any signing during the dkms build | 19:24 |
clarkb | jrosser: ah. that said the error isn't the one that appears teid to kernel lockdown | 19:24 |
clarkb | This is suspicous too from the mak log Skipping BTF generation for /var/lib/dkms/openafs/1.8.9/build/src/libafs/MODLOAD-6.1.0-18-arm64-SP/openafs.ko due to unavailability of vmlinux | 19:24 |
clarkb | modinfo seems to report it is properly signed, but as you say the boot process itself may not trust that | 19:25 |
clarkb | or the kernel | 19:25 |
jrosser | there is a ton of info here https://wiki.debian.org/SecureBoot#MOK_-_Machine_Owner_Key | 19:26 |
clarkb | SecureBoot disabled | 19:33 |
clarkb | I guess kernel lockdown can still occur at runtime, but considering the module is signed and secure boot is disabled and the error message isn't about permissions but instead format i don't think this is the issue | 19:34 |
clarkb | ok here we go from dmesg module openafs: unsupported RELA relocation: 311 | 19:42 |
clarkb | https://github.com/NixOS/nixpkgs/issues/284501 | 19:45 |
fungi | good find | 19:47 |
clarkb | so the dkms compilation is using some sort of cpu features that aren't present on the running system? | 19:47 |
fungi | affecting nixos and debian | 19:48 |
fungi | and yeah, that unresolved github issue seems to be the only hit google has for the error, ddg doesn't even find that | 19:48 |
clarkb | 0000000779c4 091800000137 R_AARCH64_ADR_GOT 0000000000000000 key_type_keyring + 0 | 19:49 |
clarkb | I get that from the readelf command against our build | 19:49 |
clarkb | which is the same thing there. Though I'm not sure I understand this well enoguh to know if that is the problematic entry in openafs | 19:50 |
clarkb | I have a feeling I'm going to learn about PIC instructions after lunch | 19:50 |
clarkb | https://developer.arm.com/documentation/101458/1930/Coding-best-practice/Note-about-building-Position-Independent-Code--PIC--on-AArch64?lang=en | 19:54 |
clarkb | make.log shows gcc-12 -fPIC | 19:55 |
clarkb | so ya a package bug I guess? | 19:55 |
clarkb | oh wait -fpic and -fPIC are different | 19:56 |
fungi | of course they are! | 19:57 |
fungi | why would you expect otherwise? ;) | 19:57 |
fungi | working on embedding the gerrit commit-msg hook into git-review i've turned up a 13-year-old file permission handling bug which came in with the very first commit, and it's sent me down a rabbit hole of some of the more terrible parts of permissions management in python | 21:33 |
fungi | https://opendev.org/opendev/git-review/src/commit/ae80cb6/git_review/cmd.py#L433 | 21:34 |
fungi | if the script isn't executable after writing to disk (which i think is only when it gets pulled over https), permissions get set to 0x0400 | 21:35 |
fungi | unfortunately, python has no analogue of `chmod a+x` so if you really want to only add flags, you need to stat the file and then compose the new mask based on the existing one | 21:38 |
fungi | worse, it's unclear to me how portable that is, or how portable the existing code is for that matter | 21:39 |
fungi | do people use git-review on windows? | 21:39 |
clarkb | yes I think we've gotten bugs from windows users at least | 21:45 |
fungi | the main concern i have with that os.chmod() call is that, when writing the vendored hook to disk becomes the default behavior, the buggy (and possibly non-portable) code path will be hit by a majority of users | 21:54 |
clarkb | I'm beginning to wonder if the issue with openafs on arm is that we need -fPIC on more things to enable reloactions across all the compiled code | 21:54 |
clarkb | fungi: can't we safely set it to 755? | 21:56 |
clarkb | 555 should be fine too | 21:56 |
fungi | open to suggestions if anybody knows recommended practices for portable file perms handling in python | 21:56 |
fungi | i haven't thought through the potential ramifications of not obeying the user's umask either | 21:56 |
clarkb | basically don't bother determining if we need to do + behavor for +x. just make it executable and readabl | 21:56 |
clarkb | both are required for the file to work as intended and both should be safe | 21:57 |
fungi | also i seem to be coming down with some crud, so possible my investigations have a tinge of fever-induced confusion, hard to tell | 21:57 |
clarkb | I would have python write the file then chmod the file to 755 | 21:58 |
fungi | er, also related, i mis-stated earlier, file is being set to 0x0500 not 0x0400 | 21:58 |
clarkb | chmod won't fail on windows (though i'm not sure if it will do what we want) and windows users can report if something doesn't work properly for them | 21:59 |
fungi | yeah, 1. not sure if there are precautions we need to take to avoid that causing crashes for windows users, 2. setting to 0x0755 could be surprising for users with a 0x007 or 0x077 umask | 22:00 |
clarkb | python docs indicate 1 shoudl probably succeed though the side effects are less clear to me. Basically crashes would be unexpected | 22:02 |
clarkb | for 2. I think meh | 22:02 |
fungi | probably setting 0x0700 perms is the best compromise. avoids the file being harder to delete while not exposing it on systems where the users expect tighter controlled umask | 22:02 |
clarkb | its an executable script that neesd to be executable and there is no harm in groups or others being able to execute it | 22:02 |
clarkb | its just hashing git tree info to spit out a unique identifier | 22:02 |
clarkb | if anyone on a system can execute that we'll be fine | 22:03 |
fungi | yes, i agree there's no inherent privacy concern with it being globally readable. and global execute is only a concern for files that are setuid (or gid) | 22:03 |
fungi | i suppose if anyone complains that git-review isn't respecting their umask, "patches welcome" | 22:04 |
fungi | and yeah, i was confusing os with sys, stuff under os is generally portable i think, sys less so? | 22:05 |
clarkb | I'm quickly starting to get lost in this openafs arm thing. It isn't claer to me if there is too little PICing or too much. Perhaps we need the flag to be more consistently set (there are 4 gcc command recorded by the make log but only 2 set -fPIC) or maybe we need to set no PIC at all because the cpu here can't support that functionality? | 22:16 |
fungi | ah, yeah it could be specific to the hypervisor configuration/cpu flags exposed? | 22:18 |
clarkb | ya except the arm64_linux26 autoconf compiler flags settings have been set this way since 2015 | 22:19 |
clarkb | and this does seem to work on bullseye | 22:19 |
clarkb | it could be a bug with gcc-12 on debian for aarch64 maybe? | 22:19 |
clarkb | in the nixos case the issue I linked above implies that they are trying to disable PIC entirely for openafs | 22:20 |
clarkb | though they apepar to have disabled pic (or tried to anyway) for a long long time so not a new issue there | 22:24 |
clarkb | "Most notably, the Linux kernel, kernel modules and other code not running in an operating system environment like boot loaders won’t build with PIC enabled" from nixos docs | 22:26 |
clarkb | https://www.phoronix.com/news/Linux-x86_64-PIE-2023 this says position independent executable for the kernel itself is relatively recently for x86_64 | 22:29 |
clarkb | I think PIC is different in that it is for libs not executables and I guess kernel modules are more like libs?) | 22:29 |
clarkb | oh bullseye didn't run recently so maybe it could fail too | 22:33 |
opendevreview | Clark Boylan proposed opendev/system-config master: Replace buster with bookworm in role integration testing https://review.opendev.org/c/opendev/system-config/+/909787 | 22:38 |
clarkb | I think that will build for bullseye and we can see if this fails there too | 22:38 |
clarkb | hrm that still didn't trigger the jobs I awnt to trigger | 22:40 |
opendevreview | Clark Boylan proposed opendev/system-config master: Replace buster with bookworm in role integration testing https://review.opendev.org/c/opendev/system-config/+/909787 | 22:42 |
clarkb | that confirms it does build on bullseye | 23:03 |
clarkb | gcc 10 is used on bullseye and gcc 12 is used on bookworm. The -fPIC flags don't change though. Other changes are the openafs version 1.8.6 vs 1.8.9 and the kernel version | 23:03 |
opendevreview | Clark Boylan proposed opendev/system-config master: Replace buster with bookworm in role integration testing https://review.opendev.org/c/opendev/system-config/+/909787 | 23:08 |
clarkb | I'm forcing failure there to see if readelf shows the same reloc stuff on bullseye but not on bookworm | 23:09 |
clarkb | ya readelf doesn't show that reloc (or any other ones matching that egrep expression) | 23:25 |
clarkb | fungi: I guess we should file a bug against debian? | 23:26 |
clarkb | fungi: something along the lines of "we install package and the insmod fails with this error and dmesg records this toher thing" | 23:26 |
clarkb | and my hunch is the issue is in either code changes to the newer version of openafs that produce that result or behavior of gcc 10 vs 12 | 23:26 |
clarkb | the key_type_keyring code looks pretty similar between the two openafs versions doing some naive diffing | 23:29 |
clarkb | specifically the objects appaer to be defined in src/afs/LINUX/osi_groups.c and that file isn't different between 1.8.6 and 1.8.9 | 23:29 |
clarkb | oh but there is conditional stuff based on the newness of the kernel supporting the keyring | 23:30 |
clarkb | make.log reports checking for exported key_type_keyring... yes for both kernel versions and builds | 23:31 |
clarkb | (the file actually does have some small diffs in ifdef checks but they are just renamed flags and none of the flags are arm specifc (they do seem arch specific so maybe we need an arch specific check for arm here) | 23:32 |
clarkb | ianw: do you recall if you ever ran into problems with reloc instructions for openafs' kernel module? | 23:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!