Wednesday, 2024-11-20

fungiup to 48 uncaught bounce notifications to openstack-discuss-owner now00:49
fungihopefully should see some users with disabled subscriptions soon00:50
opendevreviewDr. Jens Harbott proposed opendev/irc-meetings master: Typo fix for eventlet-removal meeting  https://review.opendev.org/c/opendev/irc-meetings/+/93574409:03
opendevreviewMerged opendev/irc-meetings master: Typo fix for eventlet-removal meeting  https://review.opendev.org/c/opendev/irc-meetings/+/93574411:35
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404512:05
fungiopenstack-discuss-owner has now received 112 uncaught bounce notifications. more than 5 per address for most of the defunct subscribers. i would have expected to see their subscriptions disabled by this point, so will look closer in a bit14:19
fungihuh, so there are subscribers with nonzero bounce scores, but checking a sample of the ones i'm getting uncaught bounce notifications for they all have bounce scores of 0. i guess that's what the "uncaught" means, i think they are set up as forwards, so the ndr which comes back isn't for the same address as what's in the subscription. i guess i need to try to correlate them to14:25
fungisubscribed addresses (luckily 95% are red hat addresses and they only seem to differ by the domain part), then manually disable delivery for those14:25
fungias for bounce scores, the highest i'm seeing for any subscriber is 2, i think it must only increment them at most once a day14:27
fungiso it will take until the weekend to reach the necessary threshold for disablement on the ones that are bouncing sensibly14:28
karolinku[m]Hey folks, im working of adding CS10 support in DIB, recently some issues with architecure appears, so im testing it on nested-virt label. Unfortunately, it looks like devstack jobs can't deal with nested virtualization https://github.com/openstack/devstack/blob/72f99641f15464dca45e42ab0bdae9d3e0cbbe0f/.zuul.yaml#L349-L350, so I wanted to try devstack's var LIBVIRT_TYPE=kvm. So the question is, can I somehow, it tricky way inject this14:51
karolinku[m]variable from DIB repo level to devstack?14:51
fungikarolinku[m]: there should be numerous examples of devstack-based jobs which override its envvars, for example in the openstack/neutron repo... i'll find you one15:16
fungikarolinku[m]: i think this is the sort of thing you're looking for? https://opendev.org/openstack/neutron/src/branch/master/zuul.d/base.yaml#L55-L5815:17
fungibasically, child jobs inheriting from e.g. the devstack-minimal parent job then insert values into devstack's localrc via the devstack_localrc array15:18
fungithis might also be a question to bring up with the devstack maintainers in #openstack-qa15:19
karolinku[m]yeah, that may be something I need. Thanks for tips!15:19
fungiany time!15:20
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404515:27
corvusfungi: but shouldn't the bounce come from a verp which should match a subscribed address?15:27
fungicorvus: for handled ("caught") bounces yes15:36
fungiuncaught bounce notifications come with the following preface:15:37
fungi"The attached message was received as a bounce, but either the bounce format was not recognized, or no member addresses could be extracted from it.  This mailing list has been configured to send all unrecognized bounce messages to the list administrator(s)."15:37
fungiinterestingly, the "to" address in the ndr is <openstack-discuss-bounces+someuser=somedomain@lists.openstack.org> where someuser@somedomain does correspond to a list subscriber, so it's unclear why mailman is unable to map them back15:40
fungithey seem to all be redhat.com or linux.vnet.ibm.com addresses which were at some point disabled or deleted at the receiving mta, but the way they're being bounced back seems to confound mailman's bounce professing15:43
corvusfungi: yeah, that's the part that's weird.  the "to" verp should be authoritative.15:44
corvusi would only expect that message in response to a message to <openstack-discuss-bounces@lists.o.o>, not to one with a verp.15:44
corvusmakes me wonder if the verp bounces are not being delivered to mm in a way that it understands they are verp bounces.15:45
fungibut it definitely is incrementing bounce counts for other subscribers, just not these15:45
corvusfungi: or the "to" address does not match the envelope-to (rcpt to)15:46
fungiit seems to though, expanding the full headers of the attached message/rfc822 part15:48
fungiReceived: from ... by lists01.opendev.org .. for openstack-discuss-bounces+someuser=somedomain@lists.openstack.org15:49
opendevreviewMarios Andreou proposed opendev/irc-meetings master: Update Watcher team meeting information  https://review.opendev.org/c/opendev/irc-meetings/+/93580615:59
fungihttps://docs.mailman3.org/projects/mailman/en/latest/src/mailman/model/docs/bounce.html has a fairly detailed explanation of how bounce processing works, but unfortunately doesn't explain why these specific bounces weren't caught16:01
corvusmaybe something in the logs?16:02
corvus0xb1/0xff = blob delete 69% complete16:03
fungiyeah, i'm pouring through /var/lib/mailman/core/var/logs/bounce.log currently16:03
fungilots of entries for "VERPed bounce message but not a recognized DSN" which seem to correspond16:03
fungilooks like mailman tries to suss out whether the bounce was a temporary or permanent condition from the ndr text, and https://gitlab.com/mailman/mailman/-/merge_requests/913 switched things so that if it can't figure it out then it assumes it's non-permanent16:07
fungitrying to work out whether flufl.bounce has an accessible vcs presence somewhere16:12
corvusi wouldn't expect a bounce to represent a temporary condition; i'd expect that to cause our local mta to queue.  iow, shouldn't any verp bounce be considered permanent?16:13
fungiapparently some e.g. vacation autoresponders reply to the verp address16:13
fungihence the above mr16:13
corvusof course they do.16:14
fungihttps://gitlab.com/warsaw/flufl.bounce/-/tree/master/flufl/bounce/_detectors?ref_type=heads16:14
fungithat's where the various dsn matchers live16:14
corvuscynical-corvus says score them as bounces anyway; autoresponders shouldn't responds to lists at all.16:15
fungiyeah, per https://www.rfc-editor.org/rfc/rfc5230.html#section-4.6%3E16:16
fungier, https://www.rfc-editor.org/rfc/rfc5230.html#section-4.616:16
fungiImplementations SHOULD NOT respond to any message that contains a "List-Id" ...16:17
fungianyway, it looks like people do submit patters for additional dsn formats at https://gitlab.com/warsaw/flufl.bounce/-/merge_requests?scope=all&state=all16:20
fungis/patters/patterns/16:20
fungie.g. https://gitlab.com/warsaw/flufl.bounce/-/merge_requests/15/diffs16:21
JayFI had more unread emails at one time in my inbox than I have for a decade, thanks bounce-processor-email-thingy /s :P hehe16:23
* JayF has setup a filter but it's just funny16:23
fungii guess we could even shoehorn temporary flufl.bounce patches into our mailman container image builds and then push an mr once we see they're working16:23
JayFI'm very sad to learn that is a pypi package and not a domain with a fun TLD16:24
fungiat least this explains why we're seeing them from specific domains and not others16:25
corvusthe more i think about it, the more i'm convinced it was just a wrong path to go down.  it neuters the point of verp.  if there's an option to enable rfc-compliant bounce processing at the cost of vacation bounces increasing the score, i would be in favor of that.16:30
corvusanother argument for that: ^ vacation responses should only add one point, and that should roll off.  if we get 5 vacation response bounces, then, honestly, seriously, that address should be removed from the list.16:30
fungiyeah, i guess an option to toggle https://gitlab.com/mailman/mailman/-/blob/master/src/mailman/runners/bounce.py?ref_type=heads#L68 could do that, then it at least still comes with some logging and notification of those mismatches16:37
corvusi can't believe they changed that with no accompanying toggle.  i mean, i'm just waking up from a long slumber, but it seems to me that single-handedly (and i mean single-handedly -- where's the code review on that change?) undid mailman's near-perfect bounce handling (that it has had for decades!)16:40
corvusthat change is just wrong.16:41
clarkbcatching up do we think the bounces we're getting are vacation bounes so are handled "properly" according to mm3 rules? Or are they just caught up in buggy processing related to taht?16:42
corvusno they're caught in buggy processing16:43
corvusthe whole point of verp was that parsing every dsn any mta on the internet can produce is a losing proposition.  it would never be complete.  but with verp you don't need to do that.16:43
corvusbut this change has basically said "verp is not enough to detect a bounce, we still have to process the dsn".16:43
clarkbah16:43
corvusi think what it effectively does is say that in order to remove someone from a list, we have to in all circumstances, recognize that a dsn says there is a permanent failure.  then verp can be used to get the address so that we don't have to textually extract the address from the dsn.16:45
fungithese are dsn messages from mimecast, quobyte, and similar third-party hosting/filtering services stating that the account is disabled or does not exist16:46
corvusso basically, instead of "verp means we got a permanent bounce and can remove the member, full-stop" we now have "verp is a helper so that we don't have to fully parse a dsn, but we still have to recognize what it's trying to tell us".16:46
fungiand yeah, they don't match the recognized patterns in flufl.bounce so mailman is erring on the side of (excessive) caution in not incrementing te bounce score and instead forwarding the dsn to the list owner16:47
corvusif they want to do this, they should probably just call out to claude and ask if it's a permanent error.16:50
corvusthat's the only way this doesn't turn into whack-a-mole.  which is exactly what the world of list processing was before verp.16:50
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade and reboot test nodes before openafs installation  https://review.opendev.org/c/opendev/system-config/+/93581217:08
opendevreviewMerged zuul/zuul-jobs master: Cap the ansible version used by ansible-lint  https://review.opendev.org/c/zuul/zuul-jobs/+/93572617:11
clarkbthe above change and remote:   https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/935813 Update system packages and reboot when building centos openafs should hopefully make openafs arm64 package builds on centos less flaky17:13
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade and reboot test nodes before openafs installation  https://review.opendev.org/c/opendev/system-config/+/93581217:17
opendevreviewMerged zuul/zuul-jobs master: Support new style mirror_info in use-docker-mirror  https://review.opendev.org/c/zuul/zuul-jobs/+/93572217:30
opendevreviewJay Faulkner proposed openstack/diskimage-builder master: [gentoo] Fix+Update CI for 23.0 profile  https://review.opendev.org/c/openstack/diskimage-builder/+/92398517:33
clarkbfungi: I think https://zuul.opendev.org/t/openstack/build/0a5188f5db5c435eb2dcc4039c29ab5a/console might show that the rpm dkms packaging for openafs is kernel specific?19:05
clarkbwhereas ubuntu/debian packages is more flexible?19:05
clarkbinfra-root anyone else want to weigh in on https://review.opendev.org/c/openstack/project-config/+/935725 to disable use of docker proxy caching so that we can hopefully get more reliable use of docker again?19:28
clarkbthe depends on has merged19:28
clarkbfungi: since package upgrades and rebooting may not fix things for openafs package builds/installs do we want to go ahead and start a new arm64 centos 9 image build and/or make that job non voting?19:33
clarkbmsotly I don't want to get hung up on this problem since it is somewhat orthogonal and has been ongoing. I'd rather we find a way forward19:33
fungisorry, pulled in too many directions at once. i don't think the noble openafs upgrades are super urgent, so if just waiting a little longer to let things work themselves out is an option, i'm in favor19:52
clarkbok I'll trigger an arm64 image rebuild now19:57
clarkbmore generally though it has been frustrating that anytime we trigger those jobs we're forced to wait for an image rebuild. I think I would be in favor of making that specific job nonvoting19:58
fungii would +2 that change19:59
clarkbwe are currently building an ubuntu image so may be a while to get through whatever build requests are queued up too20:00
* clarkb will push that up20:00
opendevreviewClark Boylan proposed opendev/system-config master: Make system-config-zuul-role-integration-centos-9-stream-arm64 nonvoting  https://review.opendev.org/c/opendev/system-config/+/93582820:04
opendevreviewJay Faulkner proposed openstack/diskimage-builder master: [gentoo] Fix+Update CI for 23.0 profile  https://review.opendev.org/c/openstack/diskimage-builder/+/92398520:09
clarkbfungi: any opinion on https://review.opendev.org/c/openstack/project-config/+/935725 ? I'm hoping that will make things less likely to fail when doing anything with docker20:34
fungilgtm, depends-on has already merged20:36
clarkbcourse now that I've said that I have to climb up a roof to fix a leak20:37
clarkbif it goes sideways feel free to revert quickly :) othewise I'll check in when I can20:37
fungisure, i'm around, and not getting on a roof in the coming hours (afaik anyway)20:38
opendevreviewMerged openstack/project-config master: Disable docker hub mirror use in jobs  https://review.opendev.org/c/openstack/project-config/+/93572520:47
Clark[m]Any sense yet if ^ is helping (or at least doesn't make it worse)? I need to turn laptop on and recheck some changes21:46
fungii've seen no complaints yet, though it's rather soon21:48
Clark[m]There is a lodgeit change to recheck and corvus' nodepool rix21:49
Clark[m]*fix21:49
clarkbhttps://review.opendev.org/c/opendev/lodgeit/+/935712/3 and https://review.opendev.org/c/zuul/nodepool/+/935820 and https://review.opendev.org/c/zuul/zuul-jobs/+/849989 have been rechecked21:56
clarkbI see a bug22:10
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Fix dockerhub check in use-docker-mirror role  https://review.opendev.org/c/zuul/zuul-jobs/+/93583722:14
clarkbinfra-root corvus fyi ^22:14
clarkbsaw that here: https://zuul.opendev.org/t/zuul/build/e9afc70473a142428cc8c4594319b80f22:14
corvusclarkb: aprvd22:22
clarkbthanks22:24
ianwre:  935812,2 there's two boots, but both seem to be into 5.14.0-529.el9.x86_64 (https://753af432ac11edc3fa55-e24395b1f226ab7bf437be1dd808e069.ssl.cf1.rackcdn.com/935812/2/check/system-config-zuul-role-integration-centos-9-stream/4560d78/messages.txt)22:42
ianwwhich is actually the latest anyway -> https://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/Packages/kernel-5.14.0-529.el9.x86_64.rpm22:44
ianwohhhh, this is x86, where the image is up to date22:44
ianwok, on arm64 it seems like both boots were into 527 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_4b7/935812/2/check/system-config-zuul-role-integration-centos-9-stream-arm64/4b7c3f8/messages.txt22:45
ianw2024-11-20 19:06:56.696207 | TASK [DNF Update]22:46
ianw2024-11-20 19:06:58.818939 | base | ok: Nothing to do22:46
ianwseems unlikely22:46
ianwhttps://mirror.iad.rax.opendev.org/centos-stream/9-stream/BaseOS/aarch64/os/Packages/kernel-5.14.0-529.el9.aarch64.rpm exists22:50
clarkboh weird22:59
opendevreviewIan Wienand proposed opendev/system-config master: Upgrade and reboot test nodes before openafs installation  https://review.opendev.org/c/opendev/system-config/+/93581223:01
ianwtry with some debugging23:01
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Fix dockerhub check in use-docker-mirror role  https://review.opendev.org/c/zuul/zuul-jobs/+/93583723:06
clarkbcorvus: ^ yaml got me there23:06
clarkbhaving a leading quote but then not quoting the entire string made the yaml parsing sad23:06
ianwclarkb: it looks like it does see the updated packages, and i think it likely installs them @ https://zuul.opendev.org/t/openstack/build/d2751de1692e48bdb0eb2c6a5256fac4/console (roles-test/pre.yaml) 23:18
ianwbut i suspect it's perhaps not booting into the new kernel :/23:18
ianwas for Failed to download packages: krb5-pkinit-1.21.1-4.el9.x86_64: Cannot download, all mirrors were already tried without success23:22
ianwhttps://mirror.iad.rax.opendev.org/centos-stream/9-stream/BaseOS/x86_64/os/Packages/krb5-pkinit-1.21.1-4.el9.x86_64.rpm23:23
clarkbI wonder if something about uefi and grub and gpt isn't happy in those images?23:24
clarkbor maybe kvm is bypassing grub and picking a kernel for us?23:25
opendevreviewMerged zuul/zuul-jobs master: Fix dockerhub check in use-docker-mirror role  https://review.opendev.org/c/zuul/zuul-jobs/+/93583723:25
ianwyeah it's kind of intersection of dnf/grub/disk-image-builder ... i don't have an immediate answer23:29
ianwnor why that 1.21.1-4 package would fail to download.  it's definitely in the mirror that test ran in (https://mirror.sjc3.raxflex.opendev.org/centos-stream/9-stream/BaseOS/x86_64/os/Packages/)23:30
ianwunfortunately any useful debugging has been swallowed :/23:30
clarkbthe internet suggests it could be related to bootloaderspec23:44
clarkbbsaically grub doens't update if you enable bootloaderspec23:44
clarkband at some point fedora went to only use bootloaderspec. Maybe centos 9 stream is similar?23:45
clarkblooks like maybe people are complaining about that in rocky linux too so ya maybe this is realted23:45
clarkbI suspect we're using grub because dib uses grub but then dnf doesn't touch grub because of bls? maybe we should do a grub mkconfig and call it a day23:46
clarkblooks like current centos builds fail on dns lookups for opendev.org when caching git repos... I was looking there to find paths for grub-mkconfig23:47
clarkb`grub2-mkconfig --update-bls-cmdline -o /boot/grub2/grub.cfg` appears to be what we're running in image builds. I'll do that in ansible and see if that helps23:48
ianw... ohhh ... unbound issues?  that would explain that download error23:50
clarkbianw: the dns lookup issue was on nb04 but ya that should run an unbound too23:51
clarkband maybe its similar problmes from that test node23:51
ianw++ on testing that ... grub+efi+arm64+dib images == basically a black box for me :)  23:51
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade and reboot test nodes before openafs installation  https://review.opendev.org/c/opendev/system-config/+/93581223:52
clarkbI think maybe ns04 is not resolving things23:53
clarkbnow to figure out hostkey things from sshfp records since I'm on a non home network on a laptop that hasn't ssh'd there before23:54
clarkb#status log Restarted nsd on ns0423:57
opendevstatusclarkb: finished logging23:58
clarkbit looks like maybe nsd is trying to startup and bind to the external address before it is available on the system post boot23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!