Thursday, 2025-09-04

opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Simplify testing of some upload roles  https://review.opendev.org/c/zuul/zuul-jobs/+/95940200:02
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Simplify testing of some upload roles  https://review.opendev.org/c/zuul/zuul-jobs/+/95940200:11
opendevreviewMerged zuul/zuul-jobs master: Simplify testing of some upload roles  https://review.opendev.org/c/zuul/zuul-jobs/+/95940200:24
*** liuxie is now known as liushy02:28
opendevreviewOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/95799502:30
*** bbezak_ is now known as bbezak06:22
*** liuxie is now known as liushy07:04
opendevreviewMerged openstack/diskimage-builder master: bootloader: Fix searching for grub-mkconfig  https://review.opendev.org/c/openstack/diskimage-builder/+/94949308:14
amorinhey team, are you aware of some ipv6 issues recently on openstack CI? I got 503 error that seems random and not related to my work since few days09:49
amorine.g. https://zuul.opendev.org/t/openstack/build/f0ae336a1b2142a7930d68e46f3a965f09:50
frickleramorin: nothing I'm aware of, but it also doesn't look related to IPv6, like I'm seeing the same (or so it seems) failure on plain devstack here https://zuul.opendev.org/t/openstack/build/47e48f0c00614ea4b215ff407b35cdfc , would rather look like some flaky/racey test?10:42
amorinack, thanks, so if it's only me, it's very high possibility that this is my code change that is doing races :)10:44
fungibad news on the afs01.dfw upgrade. while it did successfully move mirror.ubuntu it ran out of space on afs02.dfw for mirror.ubuntu-ports12:30
fungiin retrospect, i should probably have inserted a vos release after each move, i bet that's what frees the temporary consumption12:31
fungii'll make sure to do that when i move them all back to their original homes12:31
fungianyway, it's in progress again12:37
Clark[m]If you do vos releases you need to hold the lock right? So maybe it does make sense to grab those on the mirror node for the next pass?13:29
amorinahah frickler, in mistral test, we do a request against: https://httpbin.org/encoding/utf814:00
amoringiving 503 at the moment14:00
fungiClark[m]: oh, good point, so maybe it's more a matter of pausing between the larger volume moves to let our scheduled processes execute a vos release on their own14:30
clarkbfungi: if you have time I have a whole bunch of changes that have come up while you were out that shouldn't require too much effort to review: https://review.opendev.org/c/opendev/system-config/+/959236 https://review.opendev.org/q/hashtag:"drop-bionic" https://review.opendev.org/c/zuul/zuul-jobs/+/958800 https://review.opendev.org/c/opendev/infra-manual/+/958571 note that some of14:47
clarkbthem also have children14:47
fungii'll take a look after lunch. trying to catch up on languishing mailing list moderation tasks first14:47
fungioh, also an additional stat, the mirror.ubuntu move last week took a little over 19 hours to complete15:04
fungii'm guessing the mirror.ubuntu-ports move will be a little faster owing to the removal of bionic-arm64 earlier15:04
fungigonna grab a quick lunch, bbiab15:21
opendevreviewMohammad Issa proposed openstack/project-config master: Add repo app-metallb for starlingx  https://review.opendev.org/c/openstack/project-config/+/95957418:14
clarkbthat change sets requiercontributoragreement = true18:15
clarkbI don't think we are doing that anymore and expect ci to -1 for that reason. but figured I'd call it out if we want to provide any special messaging18:16
opendevreviewMohammad Issa proposed openstack/project-config master: Add repo app-metallb for starlingx  https://review.opendev.org/c/openstack/project-config/+/95957418:18
fungilooks like they got the hint18:21
opendevreviewClark Boylan proposed opendev/system-config master: Update UA filter rules  https://review.opendev.org/c/opendev/system-config/+/95957618:32
opendevreviewMohammad Issa proposed openstack/project-config master: Add repo app-metallb for starlingx  https://review.opendev.org/c/openstack/project-config/+/95957418:33
clarkbside note on 959576 at least one UA is already blocked and shows up with a ton of 403 hits18:38
clarkbso they don't even back off when they get told to go away but at least that is far cheaper for the server to respond with than trying ot process expensive requests for bots that don't identify themselves properly18:39
fungiyeah, at least it doesn't result in more load on the db19:08
fungiis there a way to do a quoted reply to an inline comment in gerrit where someone else has already replied? the ui only seems to let me reply to the most recent comment, not any earlier ones19:11
fungithe very bottom of the comment thread has a "quote" link but it only ever quotes the last comment in the thread19:11
opendevreviewMerged openstack/project-config master: Add repo app-metallb for starlingx  https://review.opendev.org/c/openstack/project-config/+/95957419:27
opendevreviewNicolas Hicher proposed zuul/zuul-jobs master: Refactor: multi-node-bridge to use linux bridge  https://review.opendev.org/c/zuul/zuul-jobs/+/95939319:41
clarkbfungi: it uses email like quoting > foo19:59
clarkbbut no the UI doesn't give you an easy button like it does for earlier comments. But you can copy the content and prefix with >20:00
fungii didn't know if it did any sort of fancy linking or username reference beyond just the actual quote markers20:04
clarkbI don't think it does20:04
clarkbthe problem from the UI perspective is you can only respond to the last comment20:05
clarkbregardless of replies or quoting20:05
corvusthere actually is a comment thread, but it's really hard to piece together using the api (which is why gertty's implementation of it is not quite 100% yet)20:08
corvus(as in, replies to have a reference to their parent comment.)20:09
clarkbya and the web UI only lets you reply to the last comment in the thread. It doesn't do branching20:10
fungiclarkb: question on 95880920:23
funginever mind! i should have read the next change in that series first20:24
clarkbfungi: zuul-jobs automatic job management doesn't let non voting jobs go into the gate20:25
clarkbso when I made them nonvoting when changing the behavior of the test to enforce the behavior we want things got sad20:25
fungiyeah, mentioning that in the commit message might have helped, but no biggie20:25
clarkbah ya you weren't around for the day of wtf is going on debugging that happened. Sorry I should've included more detail there20:26
fungii re-read that commit message several times to make sure i wasn't missing some additional context20:26
fungii saw a bit of the discussion in here or in matrix (i forget which), but that was many beers ago for me now20:26
clarkbthe main source of the confusion was that the uwsgi image build worked. But eventually I realized that was because we were doing a multiarch build for it with a single arch listed20:27
clarkbthen it all sort of started to come together after some manual testing on a held node20:27
clarkbbut it was very confusing for a while20:27
fungiunderstandably confusing, it's quite complicated20:27
fungii'm still working to wrap my head around some of it20:28
clarkbas a side note: I think it is crazy that the default buildx builder doesn't honor the settings in the documented config file. I also think it is crazy that ssl cert amnagement isn't simpler, but I learned long ago that docker really wants you to use things the prescribed way and anything else is good luck have fun20:28
fungistarting with not rolling your own network connectivity20:28
fungiif you pass a list as the condition for an ansible "when" directive, are the items tested as a logical and?20:33
opendevreviewMerged zuul/zuul-jobs master: Fix kubernetes install methods  https://review.opendev.org/c/zuul/zuul-jobs/+/95880020:33
clarkbfungi: yes listed conditions should all be satisfied to trigger the task/block20:34
opendevreviewNicolas Hicher proposed zuul/zuul-jobs master: Refactor: multi-node-bridge to use linux bridge  https://review.opendev.org/c/zuul/zuul-jobs/+/95939320:37
opendevreviewMerged opendev/system-config master: Exclude django caches from lists.o.o backups  https://review.opendev.org/c/opendev/system-config/+/95923620:42
fungiclarkb: okay, now i have a question on 958783. i expect i'm missing some fundamental nuance of the change20:45
clarkbI just remembered to remove mirror02.sjc3.raxflex.opendev.org from the emergency file. THis is now done20:45
opendevreviewMerged zuul/zuul-jobs master: Update registry tests to better cover speculative image builds  https://review.opendev.org/c/zuul/zuul-jobs/+/95880920:46
clarkbfungi: responded. Does that help?20:47
fungiah, yep, so it was being done needlessly before in single-arch jobs?20:49
clarkbfungi: no single arch builds used the default docker buildx builder. They didn't use the custom one.20:49
opendevreviewMerged opendev/infra-manual master: Drop the suggestion for using the x/ namespace  https://review.opendev.org/c/opendev/infra-manual/+/95857120:49
clarkbfungi: using the default builder doesn't work with our buildset registries because it seems to ignore the config file and even then we'd have to configure /etc/hosts or DNS and ssl certs to amek ti all work20:49
clarkbfungi: the multiarch builds use a custom buildx builder that solves all of those problems. So we can run single arch builds just like the multiarch builds with a custom builder. We just leave out the multiarch specific bits (the emulation essentially)20:50
fungioh, so this starts using the custom one for single-arch runs, and then you're just omitting steps that they don't need20:50
clarkbyes20:50
fungigot it20:50
clarkbhttps://review.opendev.org/c/zuul/zuul-jobs/+/958783/9/roles/build-container-image/tasks/main.yaml thats what this diff does essentially20:51
clarkbwhen container_command == docker we always use the custom buildx builder. Wheras before it split based on wether or not the build was multiarch20:51
fungiclarkb: and now a question on 95890620:54
fungiin 958907 (for bindep) you did increase it20:55
fungisadly some of the packaging improvements won't be possible until we also drop support for 3.8, not quite there yet20:57
clarkbfungi: responded20:58
clarkbI'm ahppy to go either way on it20:58
fungia bunch of the support for more modern metadata didn't show up until setuptools 77, which requires python 3.9 or later20:58
clarkbalso for some reason I thought that was the bindep change (whcih is probably evident in my response)20:58
clarkbI think I mixed up where I wanted to update python requires. I meant to leave it on bindep and bump it on git review20:58
clarkbessentially because bindep doesn't change anymore20:58
clarkbbut I'm happy to bump it for all of them if we think that is just simpler20:59
fungiyeah, there are things we can't do with bindep's packaging, as i just mentioned, until we're only supporting newer python, so this is a forcing function for us to slowly get there20:59
clarkback so the purpose behind it is to simplify and converge the packaging behavior/tooling on a modern common platform21:00
clarkbI'll update the git review change21:00
opendevreviewClark Boylan proposed opendev/git-review master: Drop testing on Bionic and Python36  https://review.opendev.org/c/opendev/git-review/+/95890621:00
fungithat's at least the driving concern for me. newer packaging needs newer deps which need newer python, so we're essentially stuck producing python 3.6 era packages as long as we want to support python 3.621:01
clarkbI guess this also goes back to my frustrations with python packing breaking old stuff that was perfectly fine21:02
clarkbsimilar to how old openstack packages are no longer installable from pypi anymore or something21:02
fungiand then not backporting fixes for older setuptools, yeah21:03
clarkbruamel.yaml had a warning about this too but that seems to still work for now21:03
clarkbbasically if the old code isn't broken as it stands then we shouldn't be breaking it just because21:03
clarkbbut that is the approach most of the python ecosystem has taken and trying to swim against that tide is a losing battle21:03
fungithey introduced a regression that nobody pointed out before they dropped old python support in the next version, and then couldn't fix setuptools for people on the older python version they broke21:04
fungiwith the argument that such old python versions are eol upstream21:04
fungiso trying to continue supporting them is too much work21:04
clarkbits literally more work to do what they've done...21:05
clarkbfungi: I'm noticing in https://review.opendev.org/c/opendev/system-config/+/959236/1/inventory/service/group_vars/mailman3.yaml I kept a trailing / I don't think it matters but none of the other values have a trailing /21:06
clarkbya those values get passed to borg --exclude21:07
fungii didn't spot that, i guess it's a question of whether borg thinks trailing slashes are special or just filters them out21:07
clarkbhttps://borgbackup.readthedocs.io/en/stable/usage/help.html#borg-patterns discusses patterns21:09
clarkbI don't see it handling the / special at all. I expect this will be fine and we can confirm after the next rounds of backups21:09
fungianyway, on the python_requires/requires-python front, i think the setuptools situation is why i'd rather take this opportunity to increase it when we drop testing for it, because otherwise we may drop support after merging a regression for older python and be unable to go back and fix that as easily21:10
clarkbfungi: # Exclude the contents of '/home/user/cache' but not the directory itself: when using -e /home/user/cache/ there21:10
clarkbso the diskcache/ dir will get backed up but none of its contents. I think that is fine21:10
fungiyeah, it won't really take up any room21:10
fungiat most its last updated time will have some churn21:10
clarkbya shouldn't be a big deal. But happy to push a change up to drop the / if we want21:11
fungino need in my opinion21:14
opendevreviewMerged zuul/zuul-jobs master: Always build docker images with custom buildx builder  https://review.opendev.org/c/zuul/zuul-jobs/+/95878321:14
fungias for preventing users from installing newer bindep on older platforms, the issue is that pip will end up automatically selecting a bindep not tested with python 3.6 when users `pip install bindep` there, while setting python_requires/requires-python higher will stop it from being auto-selected on platforms where it might have stopped working21:18
fungii do wish pip had a middle ground and "recommended for use with" wasn't the same as "will only be installable on"21:19
opendevreviewClark Boylan proposed opendev/grafyaml master: Pull python base images from quay.io  https://review.opendev.org/c/opendev/grafyaml/+/95860121:20
clarkbwith 958783 landed I can clean up changes like ^ and recheck the others that failed21:21
fungithe python version support removal changes probably also warrant release notes, but we can worry about those when we get ready to release. they're fairly copy-paste one sentence deals anyway21:23
opendevreviewClark Boylan proposed opendev/lodgeit master: Pull base images from opendevorg rather than opendevmirror  https://review.opendev.org/c/opendev/lodgeit/+/95860221:24
fungiat least for bindep and git-review, i don't recall if we ever bothered to do them for glean21:24
clarkbI cant' recall for glean21:24
fungii don't see any, don't think we did21:25
fungialso looks like we never added python_requires to the setup.cfg for glean?21:26
fungibut that's not something like a tool end users typically install directly, so probably fine21:26
clarkbprobably not, because of our reliance on it and the slowness with which we remove diskimages I don't think we've run into problems there21:27
clarkbwe could potentially here with bionic testing going away before we remove bionic nodes but we're activelytrying to remove bionic nodes and that chicken and egg has always existed with glean so I epxect its mostly fine21:27
fungiright, i'm not overly concerned21:27
clarkbfungi: were you going to review https://review.opendev.org/c/opendev/glean/+/953163/2 as well?21:29
clarkbI think I'm happy to approve that as is if not21:29
fungiyeah, i'm looking at it already21:29
fungiseems like the dib change it depends-on has already merged21:29
opendevreviewMerged opendev/bindep master: Drop Bionic testing  https://review.opendev.org/c/opendev/bindep/+/95890721:29
clarkbfungi: yes there was a lot more momentum on the dib side to get those new jobs in place to test centos 1021:31
fungicool, well this finishes it off i guess21:32
opendevreviewMerged opendev/git-review master: Drop testing on Bionic and Python36  https://review.opendev.org/c/opendev/git-review/+/95890621:41
fungilooking like i'll probably be afk by the time the mirror.ubuntu-ports move completes, but assuming it does then i can likely knock out the afs01.dfw upgrade to noble tomorrow, as well as the afsdb and kdc servers, and get the rw volume moves back to afs01.dfw underway in preparation to upgrade afs01.ord and afs02.dfw early next week21:45
fungithough even if the volume moves start tomorrow, i expect it'll be tuesday at the earliest before they're finished21:46
clarkbmakes sense given how long it took the first pass through21:49
opendevreviewNicolas Hicher proposed zuul/zuul-jobs master: Refactor: multi-node-bridge to use linux bridge  https://review.opendev.org/c/zuul/zuul-jobs/+/95939321:55
clarkbcorvus: I think the image builds are a bit slower now. Not the end of the world I just wanted to make note of it (not surprising given the few extra steps we're taking)22:06
opendevreviewClark Boylan proposed opendev/system-config master: Build gerrit image with python base from quay.io  https://review.opendev.org/c/opendev/system-config/+/95859722:10
opendevreviewClark Boylan proposed opendev/system-config master: Pull hound's base python image from quay  https://review.opendev.org/c/opendev/system-config/+/95859322:11
corvuslike pulling the builder image and starting it?  or is this a situation where we've got too many ansible tasks and we should just squash them into a script?22:12
opendevreviewClark Boylan proposed opendev/system-config master: Build ircbots with base python image from quay.io  https://review.opendev.org/c/opendev/system-config/+/95859622:13
opendevreviewClark Boylan proposed opendev/system-config master: Pull python base image for statsd metric reporters from quay.io  https://review.opendev.org/c/opendev/system-config/+/95859422:16
clarkbcorvus: yes I think it is all of the extra pulls and pushes22:16
clarkbcorvus: we also have the temporary registery that we pull into from the custom buildx builder22:16
corvusi think that was due to multi-arch; i wonder if modern skopeo could accomplish a transfer that retains all the info.  that might be a thread to pull on.22:17
opendevreviewClark Boylan proposed opendev/system-config master: Update jinjia-init and gitea-init to modern image build tooling  https://review.opendev.org/c/opendev/system-config/+/95859822:18
clarkbcorvus: oh something like tell skopeo to directly transfer from the customer builder to the buildset/intermediate registries?22:18
clarkbya that could be quicker if possible22:19
corvusyep22:19
clarkbin any case all of these chagnes I just updated/rechecked should be good exercise of the updated system22:19
clarkband extra prove that this now works as expected for docker22:19
fungilooking at the graphs, mirror.ubuntu-ports is roughly 77% the size of mirror.ubuntu at the moment, so projecting completion time around 03:25 utc22:31
fungiif i'm still awake i'll check in on it22:32
fungibut doesn't look like i'll be able to proceed with upgrades until tomorrow morning my time in any case22:33
fungialso a heads up, mirror.centos-stream seems to be over 95% of its quota, so could fill up pretty easily22:34
clarkbno rush. I've got to pop out this evening for school thing myself22:34
clarkbfungi: yes one of the issues with centos mirroring (and why I'm wary of mirroring it more) is that they don't remove a lot of old packages like debuntu do22:35
clarkband some of the packages are quite large (like firefox and thunderbird iirc) so things can grow unwieldy22:35
fungiso basically the only culling that happens for it is when we drop entire release series22:36
clarkbyup22:36
clarkbI think some packages do get deleted when replaced, but not all of them. And notably the very large packages don't22:37
opendevreviewMerged opendev/glean master: Drop testing on Bionic and Xenial  https://review.opendev.org/c/opendev/glean/+/95890922:39
clarkbhttps://mirror.dfw.rax.opendev.org/centos-stream/9-stream/AppStream/aarch64/os/Packages/?C=S;O=A and https://mirror.dfw.rax.opendev.org/centos-stream/9-stream/AppStream/x86_64/os/Packages/?C=S;O=A illustrate the problem (scroll to the bottom)22:40
clarkbhttps://mirror.dfw.rax.opendev.org/centos-stream/9-stream/CRB/x86_64/os/Packages/?C=S;O=A this is a good one. There is like 25GB of unneeeded data right there22:41
fungilooking at https://zuul.opendev.org/t/openstack/build/1f72b674eac0472fb116dfcef250f0f3 it's unclear to me whether "ensure-dib: Check if diskimage-builder is installed" should be reworked to not present a failure state22:42
fungi"Wait for server to boot or fail" later looks like the actual issue22:42
clarkbits 5 major versions of dotnet sdk with multiple versions of each one except for 1022:42
clarkbfungi: that test is building an image with dib (whcih it appears to have done), uploading to the openstack, then booting with openstack and waiting for the boot to finish. This last step appears to have failed. Not suer about the diskimage-builder is installed question22:44
clarkbI suspect we may not be waiting long enough for things to boot in all cases and may need to increase the time we wait there22:44
fungithe n-cpu log does have a bunch of tracebacks in it, but i have no idea whether that's normal22:45
fungiwell, not a bunch, just a few. and they look like slow rabbitmq maybe22:45
clarkboh I see if you look at the console log its doing a check to see if dib is installed then installing dib. I think that is just noice and an artifact of how things get rendered. Its probably a failed when false situation22:46
fungiright, i was just wondering if that task should be reworked to not exhibit a failure result, since it also ends up in the summary as the likely cause for the job failure (which it isn't)22:46
fungior, well, as a likely cause anyway22:47
fungiand yes, ultimately noise22:47
clarkbthe task is marked ignore_errors: true22:48
clarkbmaybe it needs to be failed_when: false to not set the status to failed and then ignore that status22:48
clarkbthe boot timeout is ~20 minutes. Its 120 checks with a 10 second delay each time22:48
fungior maybe the dashboard could also filter out its typical failure identification if ignore_errors is true on the task22:48
clarkboh except it logs that it only tried 100 attempts22:49
clarkboh! the server entered an ERROR status with nova that is why22:49
fungiaha, that'd do it22:50
fungimaybe still a timeout, just one internal to the scheduler/controller?22:50
clarkbya I think Instance failed to spawn: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5ef90e0256e94292986613ba4ae98e64 is why it entered an ERROR state with nova22:50
fungiokay, so those exceptions that i mistook for slow rabbitmq responses may have been an actual indication22:51
clarkbyes I suspect this is a sad cloud more than anything else22:52
clarkbcorvus: its a bit late to do anything about ze11 now. But I'm wondering if we test it one more time then maybe redeploy it?22:53
corvusoh yeah that22:53
clarkbcorvus: the interseting thing is you just deployed ze11 on noble recently and that would've built up the git caches from scratch and been fine at that time so somethign definitely changed with ipv6 on that instance but maybe the aesiest thing to do is move on like we did with the opendev.org haproxy22:53
corvustest cloning now22:54
fungioh, i missed this bit of fun... is it exhibiting the same symptoms?22:54
corvusstill slow22:55
clarkbfungi: no it never seems to drop the connection (and it is in rax not vexxhost), but it clones nova in about 12-13 minutes which is several minutes longer than our 10 minute timeout22:55
clarkbfungi: testing via ipv4 on ze11 and ipv6 on ze01 we get clones in the 3-4 minute range which is what we expect (and gives our timeout a typical 2.5x buffer)22:55
fungiah, okay so totally different problem then22:55
clarkbbut the node would've cloned nova under the timeout when it was rebuilt onto noble recently22:56
clarkbso within the last couple of months its ipv6 connection to review (and possibly elsewhere) is slower than expected22:56
clarkbwe noticed because for some reason that executor decided it neeeded to reclone nova then entered a failure loop that acutally impacted other jobs running on it (as their git stuff got backed up behind the things trying to clone nova)22:57
clarkbso we turned off ze11 (and its been off since with occasioanly manual tests to see if it is any better)22:57
clarkber we turned off the executor container on ze11 but the host is running and in the emergency file22:57
clarkbcloudnull: ^ is slower than expected ipv6 connectivity for specific hosts in rax classic something you'd be interested into digging into?22:58
clarkbcloudnull: tl;dr is we have a host that clones nova from review.opendev.org slower than other hosts in the same region. And forcing ipv4 also seems to provide consistent behavior with the other hosts22:59
clarkbbut its tough to say the problem is in rackspace, it could be in vexxhost. Or it could be some intermediate router/pathway on the internet...22:59
opendevreviewTim Burke proposed openstack/project-config master: update_constraints.sh: Better describe what we're skipping  https://review.opendev.org/c/openstack/project-config/+/95962823:04
corvusclarkb:  8:41.02elapsed23:04
corvusnot good enough to return to service i think23:04
clarkbcorvus: interesting that it is faster. So maybe whatever the issue is is not a consistent or persistent problem?23:07
clarkblike if it is packet loss maybe the amount of packet loss has reduced or something23:07
clarkbcloudnull: I guess if this is the sort of mystery you would be interested in running down I think we're willing to help. Otherwise we may recycle the node and see if we get better results with a new one23:19
clarkbfungi: https://review.opendev.org/c/opendev/system-config/+/959576 is a good one to look at tomorrow too if we think that will help lists performance. Note I don't want to land it tonight as that will affect all services using teh filter so has potentially broad impact and I have parent teacher meeting things tonight23:24
fungiyeah, would be better to have someone on hand to edit/roll back in case it blocks legit users we're not aware of23:28

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!