Wednesday, 2024-02-21

opendevreviewTakashi Kajinami proposed openstack/project-config master: Retire PowerVMStacker SIG: Update ACL for repositories  https://review.opendev.org/c/openstack/project-config/+/90962801:28
opendevreviewTakashi Kajinami proposed openstack/project-config master: Retire PowerVMStacker SIG: Remove Project from Infrastructure Systems  https://review.opendev.org/c/openstack/project-config/+/90958401:29
opendevreviewTakashi Kajinami proposed openstack/diskimage-builder master: Bump upper version of flake8  https://review.opendev.org/c/openstack/diskimage-builder/+/90933602:38
*** liuxie is now known as liushy06:15
opendevreviewMerged openstack/project-config master: Retire PowerVMStacker SIG: Update ACL for repositories  https://review.opendev.org/c/openstack/project-config/+/90962811:29
fricklernot sure if related to the general connectivity issues for gerrit, but I've now had multiple times the situation that a review was shown as usual in the CI, except that all comments were missing, the corresponding fields were just blank. resolved each time by a reload of the page. just mentioning in case others are seeing this, too16:38
fricklers/CI/UI/16:38
fungithanks, noted. pretty much the entire ui is javascript making asynchronous calls to the gerrit rest api and rendering them in the browser, so i can imagine how intermittent connectivity issues could make some of those queries fail16:40
clarkbfungi: yes that is how the ui operates. If it happens again I would check your browser debug tools to see if some requests failed16:44
clarkbor maybe timed out etc16:44
clarkbre centos-7 cleanup I just discovered that starlingx appears to still be building on it....16:50
clarkbthoguh the job last ran in 2020 and wasn't successful in the history we've got16:51
clarkbso maybe this zuul config is deader than I thought16:51
fungi...undead, undead, undead17:06
fungi(apologies to bauhaus)17:06
clarkbits surprising how entrenched these not very used thinsg can end up in configs17:09
clarkbopensuse-15 is all over stable branches I think17:09
clarkbdespite never really running successfully or running very often17:10
clarkbI'm pushing changes up to clean up some of that, but I don't expect us to have to clean up everything. Nor do I expect us to have to wait17:10
clarkbthis is more of a courtesy saying "hey this isn't working and hasn't worked in forever you should clean it up"17:10
fungimakes sense17:12
opendevreviewClark Boylan proposed opendev/bindep master: Drop bindep-opensuse-15 job  https://review.opendev.org/c/opendev/bindep/+/90977117:15
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Drop opensuse-15 jobs  https://review.opendev.org/c/zuul/zuul-jobs/+/90977217:18
opendevreviewClark Boylan proposed opendev/base-jobs master: Drop the opensuse-15 nodeset  https://review.opendev.org/c/opendev/base-jobs/+/90977317:21
clarkbinfra-root if you start with 909773 ^ and then follow its depends on list those are the subset of thinsg I think we should try and remove before cleaning up nodepool and mirrors17:22
clarkbI'll work on getting all the changes up with topic:drop-opensuse but some of the changes will be in places like tempest and keystone where I don't think we should hold up cleanup for17:22
fungican do17:25
opendevreviewClark Boylan proposed openstack/project-config master: Set opensuse-15 min-ready to 0  https://review.opendev.org/c/openstack/project-config/+/90977417:25
clarkboh I missed one spot too. I'll have to update the base-jobs change with another entry17:28
clarkbso many tendrils17:28
opendevreviewClark Boylan proposed openstack/project-config master: Set opensuse-15 min-ready to 0  https://review.opendev.org/c/openstack/project-config/+/90977417:33
opendevreviewClark Boylan proposed openstack/project-config master: Drop bindep fallback testing on opensuse  https://review.opendev.org/c/openstack/project-config/+/90977517:33
opendevreviewClark Boylan proposed openstack/project-config master: Drop opensuse nodes and images from nodepool  https://review.opendev.org/c/openstack/project-config/+/90977617:33
opendevreviewClark Boylan proposed opendev/base-jobs master: Drop the opensuse-15 nodeset  https://review.opendev.org/c/opendev/base-jobs/+/90977317:33
opendevreviewMerged openstack/diskimage-builder master: Fetch compatibile dnf download command in functest setup  https://review.opendev.org/c/openstack/diskimage-builder/+/90946617:36
opendevreviewClark Boylan proposed opendev/system-config master: Stop mirroring OpenSUSE Leap 15  https://review.opendev.org/c/opendev/system-config/+/90977917:47
clarkbOk I think that is the last change I'll be proposing for this cleanup assuming I didn't miss anything17:47
clarkbyoctozepto: your name popped up related to centos 7 and kolla testing stuff. CentOS 7 is also on the removal list. Is this something kolla would be prepared for at this point?17:49
fricklerclarkb: yoctozepto is no longer involved in kolla. mnasiadka might know. but since we eoled xena and later, it should all be centos8 or later I think17:51
clarkbfrickler: great. There are a few other places centos 7 has popped up that I need tor un down but that seems to be the theme. Centos 7 was used but not so much anymore17:51
mnasiadkacentos 7 is long gone in Kolla17:52
clarkbwhcih makes sense since centos 7 is about 10 yaers old now and eols in a few months17:52
mnasiadkaunmaintained/yoga is the last one to use centos 8 stream17:52
mnasiadka(but I guess you're not phasing that out yet)17:52
clarkbno. just 717:52
clarkbmnasiadka: the full list is opensuse-15, Debian buster, centos 7 and ubuntu xenial17:52
mnasiadkaclarkb: that's safe to delete from kolla perspective, branches that supported that are long EOL17:53
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Drop debian buster testing  https://review.opendev.org/c/zuul/zuul-jobs/+/90978618:19
opendevreviewClark Boylan proposed opendev/system-config master: Replace buster with bookworm in role integration testing  https://review.opendev.org/c/opendev/system-config/+/90978718:23
clarkbmnaser: for buster and centos 7 cleanup vexxhost/ansible-role-base-server and vexxhost/ansible-role-frrouting show use of centos 7 and/or debian buster in their zuul configs but the last time the jobs ran appears to have been around 2020. I don't think cleaning the node types up poses a problem as  result but figured I'd mention it18:31
mnaserclarkb: go for dropping it :)18:35
clarkbfungi: any idea why this failed: https://zuul.opendev.org/t/openstack/build/d8367d9080c54fff8c082991724eccb1/console#1/2/17/base seems like dkms didn't run to build the package18:49
clarkbbut https://zuul.opendev.org/t/openstack/build/d8367d9080c54fff8c082991724eccb1/console#1/2/16/base seems to show that it does build18:50
fungiclarkb: we'd need to collect the dkms log or hold a node to inspect the same18:50
clarkbgot it18:50
clarkbautohold is set and I rechecked the change18:52
fungiit'll write to something like /var/lib/dkms/openafs/<some_version>/build/make.log (i think)18:54
fungiunfortunately, as it's compiling c with kernel bindings, there are an infinite number of possible reasons it could have failed18:54
fungioften i've seen it be incompatible kernel header package versions18:55
fungibut s this is ab arm build, i can imagine the builds for that aren't as well-trodden18:56
clarkbya I guess I'm surprised the second link would seem to show success if it actually failed18:57
clarkbbut it could be a deeper issue than dpkg is able to detect18:57
clarkb147.28.149.110 is the node that will eventually be held (the job is still running)19:01
clarkb/var/lib/dkms/openafs/1.8.9/6.1.0-18-arm64/aarch64 has stuff in it include a module/ dir with a .ko19:06
clarkbshould I try to modprobe things?19:06
fungicheck /var/log/dpkg.log for errors too19:06
fungithough i would have expected those to end up on stdout/stderr19:06
clarkb/lib/modules/6.1.0-18-arm64/updates/dkms/openafs.ko also exists (it isn't a symlink but filesizes are the same. I'm going to hash comparen ow19:07
clarkbya hashes compare the same so whatever was built appears to be properly installed to the kernel's module list19:08
clarkbthe dpkglog isn't suepr helpful just a list of steps with no errors19:08
clarkb`modprobe -v openafs` reports modprobe: ERROR: could not insert 'openafs': Exec format error19:09
clarkbwhich could be a different kernel version19:09
fungidoes the booted kernel version match the installed kernel header package?19:10
clarkbit appears to: uname -a says Linux np0036828704 6.1.0-18-arm64 #1 SMP Debian 6.1.76-1 (2024-02-01) aarch64 GNU/Linux and the module is in /lib/modules/6.1.0-18-arm64/updates/dkms/openafs.ko19:11
fungithey can sometimes get out of sync, but that's usually just in unstable where kernel versions change19:11
clarkbthere is only the one kernel in /boot19:11
clarkb/boot/vmlinuz-6.1.0-18-arm64 appears to be the only kernel 19:11
clarkbI can apt-get dist-upgrade and reboot and retry. Maybe we are on an older kernel but the openafs package can only build on the latest kernel?19:12
fungidoes dpkg -l report a linux-headers-.*-arm64 package installed with the same version?19:12
clarkbyes ii  linux-headers-6.1.0-18-arm64  6.1.76-119:13
clarkbapt-get update is sloooooowwwwww19:13
jrosseryou can try modinfo on the built module too19:13
clarkbvermagic:       6.1.0-18-arm64 SMP mod_unload modversions aarch6419:14
clarkbdist-upgrade reports nothing to upgrade to other than trying to resolve the broken package19:14
clarkbso not a newer kernel that we need to build against19:15
fungii suppose another possibility could be the kernel enforcing signed modules, but i wouldn't expect that to suddenly start happening in stable and would also expect a clearer error from modprobe/insmod19:15
clarkboh! arm64 is the only place we boot with uefi so that could be19:16
clarkbfungi: it isn't suddenly happening in stable though this is the first time we've added the job to bookworm19:16
clarkbso maybe it would have always been an issue with uefi booted bookworm and we'rej ust noticing now19:16
fungiohhh19:16
clarkbalso I thought when you used dkms packages they were supposed to figure this out for you19:17
fungiand it's working for amd64 just not arm64?19:17
clarkbcorrect19:17
clarkbhttps://unix.stackexchange.com/questions/543576/modprobe-fails-with-operation-not-permitted indicates we should get an operation not permitted error instead maybe19:17
fungii know i've been struggling lately with https://bugs.debian.org/945506 but this seems different (especially since it's not breaking amd64)19:18
clarkbgiving sysrq-trigger the x value to disable kernel lockdown doesn't seem to have helped19:21
clarkblikely a different issue19:21
clarkbthere is a mok.key and mok.pub in /var/lib/dkms19:22
clarkbI seem to recall that using official dkms packages was meant to work at least on debuntu19:22
jrosserit is possible in that case you still have to manually enrol the dkms mok key to make it trusted19:23
jrossermodinfo will tell you about any signing during the dkms build19:24
clarkbjrosser: ah. that said the error isn't the one that appears teid to kernel lockdown19:24
clarkbThis is suspicous too from the mak log Skipping BTF generation for /var/lib/dkms/openafs/1.8.9/build/src/libafs/MODLOAD-6.1.0-18-arm64-SP/openafs.ko due to unavailability of vmlinux19:24
clarkbmodinfo seems to report it is properly signed, but as you say the boot process itself may not trust that19:25
clarkbor the kernel19:25
jrosserthere is a ton of info here https://wiki.debian.org/SecureBoot#MOK_-_Machine_Owner_Key19:26
clarkbSecureBoot disabled19:33
clarkbI guess kernel lockdown can still occur at runtime, but considering the module is signed and secure boot is disabled and the error message isn't about permissions but instead format i don't think this is the issue19:34
clarkbok here we go from dmesg module openafs: unsupported RELA relocation: 31119:42
clarkbhttps://github.com/NixOS/nixpkgs/issues/28450119:45
fungigood find19:47
clarkbso the dkms compilation is using some sort of cpu features that aren't present on the running system?19:47
fungiaffecting nixos and debian19:48
fungiand yeah, that unresolved github issue seems to be the only hit google has for the error, ddg doesn't even find that19:48
clarkb0000000779c4  091800000137 R_AARCH64_ADR_GOT 0000000000000000 key_type_keyring + 019:49
clarkbI get that from the readelf command against our build19:49
clarkbwhich is the same thing there. Though I'm not sure I understand this well enoguh to know if that is the problematic entry in openafs19:50
clarkbI have a feeling I'm going to learn about PIC instructions after lunch19:50
clarkbhttps://developer.arm.com/documentation/101458/1930/Coding-best-practice/Note-about-building-Position-Independent-Code--PIC--on-AArch64?lang=en19:54
clarkbmake.log shows gcc-12 -fPIC19:55
clarkbso ya a package bug I guess?19:55
clarkboh wait -fpic and -fPIC are different19:56
fungiof course they are!19:57
fungiwhy would you expect otherwise? ;)19:57
fungiworking on embedding the gerrit commit-msg hook into git-review i've turned up a 13-year-old file permission handling bug which came in with the very first commit, and it's sent me down a rabbit hole of some of the more terrible parts of permissions management in python21:33
fungihttps://opendev.org/opendev/git-review/src/commit/ae80cb6/git_review/cmd.py#L43321:34
fungiif the script isn't executable after writing to disk (which i think is only when it gets pulled over https), permissions get set to 0x040021:35
fungiunfortunately, python has no analogue of `chmod a+x` so if you really want to only add flags, you need to stat the file and then compose the new mask based on the existing one21:38
fungiworse, it's unclear to me how portable that is, or how portable the existing code is for that matter21:39
fungido people use git-review on windows?21:39
clarkbyes I think we've gotten bugs from windows users at least21:45
fungithe main concern i have with that os.chmod() call is that, when writing the vendored hook to disk becomes the default behavior, the buggy (and possibly non-portable) code path will be hit by a majority of users21:54
clarkbI'm beginning to wonder if the issue with openafs on arm is that we need -fPIC on more things to enable reloactions across all the compiled code21:54
clarkbfungi: can't we safely set it to 755?21:56
clarkb555 should be fine too21:56
fungiopen to suggestions if anybody knows recommended practices for portable file perms handling in python21:56
fungii haven't thought through the potential ramifications of not obeying the user's umask either21:56
clarkbbasically don't bother determining if we need to do + behavor for +x. just make it executable and readabl21:56
clarkbboth are required for the file to work as intended and both should be safe21:57
fungialso i seem to be coming down with some crud, so possible my investigations have a tinge of fever-induced confusion, hard to tell21:57
clarkbI would have python write the file then chmod the file to 75521:58
fungier, also related, i mis-stated earlier, file is being set to 0x0500 not 0x040021:58
clarkbchmod won't fail on windows (though i'm not sure if it will do what we want) and windows users can report if something doesn't work properly for them21:59
fungiyeah, 1. not sure if there are precautions we need to take to avoid that causing crashes for windows users, 2. setting to 0x0755 could be surprising for users with a 0x007 or 0x077 umask22:00
clarkbpython docs indicate 1 shoudl probably succeed though the side effects are less clear to me. Basically crashes would be unexpected22:02
clarkbfor 2. I think meh22:02
fungiprobably setting 0x0700 perms is the best compromise. avoids the file being harder to delete while not exposing it on systems where the users expect tighter controlled umask22:02
clarkbits an executable script that neesd to be executable and there is no harm in groups or others being able to execute it22:02
clarkbits just hashing git tree info to spit out a unique identifier22:02
clarkbif anyone on a system can execute that we'll be fine22:03
fungiyes, i agree there's no inherent privacy concern with it being globally readable. and global execute is only a concern for files that are setuid (or gid)22:03
fungii suppose if anyone complains that git-review isn't respecting their umask, "patches welcome"22:04
fungiand yeah, i was confusing os with sys, stuff under os is generally portable i think, sys less so?22:05
clarkbI'm quickly starting to get lost in this openafs arm thing. It isn't claer to me if there is too little PICing or too much. Perhaps we need the flag to be more consistently set (there are 4 gcc command recorded by the make log but only 2 set -fPIC) or maybe we need to set no PIC at all because the cpu here can't support that functionality?22:16
fungiah, yeah it could be specific to the hypervisor configuration/cpu flags exposed?22:18
clarkbya except the arm64_linux26 autoconf compiler flags settings have been set this way since 201522:19
clarkband this does seem to work on bullseye22:19
clarkbit could be a bug with gcc-12 on debian for aarch64 maybe?22:19
clarkbin the nixos case the issue I linked above implies that they are trying to disable PIC entirely for openafs22:20
clarkbthough they apepar to have disabled pic (or tried to anyway) for a long long time so not a new issue there22:24
clarkb"Most notably, the Linux kernel, kernel modules and other code not running in an operating system environment like boot loaders won’t build with PIC enabled" from nixos docs22:26
clarkbhttps://www.phoronix.com/news/Linux-x86_64-PIE-2023 this says position independent executable for the kernel itself is relatively recently for x86_6422:29
clarkbI think PIC is different in that it is for libs not executables and I guess kernel modules are more like libs?)22:29
clarkboh bullseye didn't run recently so maybe it could fail too22:33
opendevreviewClark Boylan proposed opendev/system-config master: Replace buster with bookworm in role integration testing  https://review.opendev.org/c/opendev/system-config/+/90978722:38
clarkbI think that will build for bullseye and we can see if this fails there too22:38
clarkbhrm that still didn't trigger the jobs I awnt to trigger22:40
opendevreviewClark Boylan proposed opendev/system-config master: Replace buster with bookworm in role integration testing  https://review.opendev.org/c/opendev/system-config/+/90978722:42
clarkbthat confirms it does build on bullseye23:03
clarkbgcc 10 is used on bullseye and gcc 12 is used on bookworm. The -fPIC flags don't change though. Other changes are the openafs version 1.8.6 vs 1.8.9 and the kernel version23:03
opendevreviewClark Boylan proposed opendev/system-config master: Replace buster with bookworm in role integration testing  https://review.opendev.org/c/opendev/system-config/+/90978723:08
clarkbI'm forcing failure there to see if readelf shows the same reloc stuff on bullseye but not on bookworm23:09
clarkbya readelf doesn't show that reloc (or any other ones matching that egrep expression)23:25
clarkbfungi: I guess we should file a bug against debian?23:26
clarkbfungi: something along the lines of "we install package and the insmod fails with this error and dmesg records this toher thing"23:26
clarkband my hunch is the issue is in either code changes to the newer version of openafs that produce that result or behavior of gcc 10 vs 1223:26
clarkbthe key_type_keyring code looks pretty similar between the two openafs versions doing some naive diffing23:29
clarkbspecifically the objects appaer to be defined in src/afs/LINUX/osi_groups.c and that file isn't different between 1.8.6 and 1.8.923:29
clarkboh but there is conditional stuff based on the newness of the kernel supporting the keyring23:30
clarkbmake.log reports checking for exported key_type_keyring... yes for both kernel versions and builds23:31
clarkb(the file actually does have some small diffs in ifdef checks but they are just renamed flags and none of the flags are arm specifc (they do seem arch specific so maybe we need an arch specific check for arm here)23:32
clarkbianw: do you recall if you ever ran into problems with reloc instructions for openafs' kernel module?23:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!