Thursday, 2025-09-11

opendevreviewOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/95799502:43
*** liuxie is now known as liushy05:47
*** ykarel_ is now known as ykarel07:00
*** dmellado2 is now known as dmellado07:18
*** iurygregory_ is now known as iurygregory13:10
clarkbcorvus: frickler for https://review.opendev.org/c/openstack/project-config/+/960144 do we want to push a new patchset setting max width to 90 or just go with the current code. I think I'm fine with the current change. I actually set my gerrit diff line width to 80 so the current version is slightly better for me. Also keep in mind this is a one time transition. I'm not sure the14:06
clarkblonger line width helps too much beyond making this initial diff smaller?14:06
stephenfinclarkb: Apologies if I asked/suggested this before, but how hard would it be to set up a pre-commit-mirror org on opendev and automatically mirror popular pre-commit plugin repos there? Would that help address your concerns around missing pip caches?14:34
fungii'd prefer shorter lines, personally, but just because i use 80-column terminals everywhere14:35
fungistephenfin: git mirrors, yes?14:36
stephenfinyes14:36
fungithey could be added as remote connection projects in zuul and then pushed to workspaces as required-projects14:36
opendevreviewMerged openstack/project-config master: Line-wrap projects.yaml but strip trailing spaces  https://review.opendev.org/c/openstack/project-config/+/96014414:52
*** dhill is now known as Guest2632815:14
clarkbstephenfin: I think I'm mostly concerned about it from a "its bad design and tool choice" standpoint. If projects see failures from github access that they want to mitigate then using zuul to add the repos as required projects would probably address that15:19
fungiout for a quick lunch and then going to work on more afs/kerberos server upgrades15:19
clarkbI'm going to quickly rerun some of those fio checks just to see if things remain consistent hours later15:21
fungimirror.centos-stream volume move completed in 16 hours btw, mirror.debian-security has been in progress for a few hours now (since 12:12 utc)15:22
fungibbiab15:22
clarkbmirror.dfw.rax.opendev.org randread: `read: IOPS=19.8k, BW=77.3MiB/s (81.0MB/s)(2318MiB/30003msec)` read: `read: IOPS=142k, BW=556MiB/s (583MB/s)(16.3GiB/30004msec)`15:23
clarkblists: randread: `read: IOPS=761, BW=3048KiB/s (3121kB/s)(89.6MiB/30105msec)` read: `read: IOPS=20.5k, BW=80.3MiB/s (84.2MB/s)(2413MiB/30059msec)`15:24
clarkbits interesting that lists sequential reads are on par with mirrors' random reads15:24
clarkbbut also these numbers appear to be consistent with yesterdays.15:25
opendevreviewElod Illes proposed openstack/project-config master: Make update_constraints.sh work with pyproject.toml  https://review.opendev.org/c/openstack/project-config/+/96062715:44
opendevreviewElod Illes proposed openstack/project-config master: Fix update_constraints.sh for 'python_version>=' cases  https://review.opendev.org/c/openstack/project-config/+/96062815:44
clarkbinfra-root I think today is a reasonable day to land https://review.opendev.org/c/opendev/system-config/+/957277 to start the base container image location update process. Any objections?15:46
clarkbwe're not going to get everything moved in one day or even a couple so I think we just need to start somewhere and keep pushing on it over a week or three15:47
corvusno objection15:51
clarkbok I've approved it. It should be in the gate long enough for any last minute objections16:17
clarkbonce it lands and things publish I'll send an announcement to service-announce about the move16:18
clarkbthen for followup I believe these changes are relatively low impact and can be landed afterwards with some observiation: https://review.opendev.org/c/opendev/system-config/+/958480/ https://review.opendev.org/c/opendev/system-config/+/958598/ https://review.opendev.org/c/opendev/system-config/+/958593/ https://review.opendev.org/c/opendev/system-config/+/958594/16:19
clarkbhttps://review.opendev.org/c/opendev/grafyaml/+/958601 https://review.opendev.org/c/opendev/lodgeit/+/958602 https://review.opendev.org/c/opendev/gerritbot/+/958600 and https://review.opendev.org/c/opendev/statusbot/+/95860316:20
clarkbthe more difficult changes to make are gerrit and ircbot16:28
clarkbboth are more noticeable outages16:28
opendevreviewMerged opendev/system-config master: Move python base images back to quay.io  https://review.opendev.org/c/opendev/system-config/+/95727716:46
clarkbonce promotion occurs I can get links to the new stuff and send out my announcement email16:47
fungisounds good16:47
clarkbarg promotion has failed16:48
clarkbhttps://zuul.opendev.org/t/openstack/build/78293cc7053f432689474bfd2d740a6c16:48
clarkbeverything was going so smoothly up to this point16:48
fungiParse build response is not very verbose about failures, huh?16:49
clarkbI think I see the bug16:49
clarkbgive me a few to confirm but I should be able to get a patch up that we can land quickly and fix this16:50
clarkbpromote_container_image_job: system-config-promote-image-python-builder-3.11-bookworm <- I think that needs to be the upload job not the promote job name16:51
clarkbsince the promote job is looking up the artifact from the upload job16:51
clarkbI just want to confirm this16:51
fungiworking on upgrading kdc04 at the moment16:51
fungihttps://review.opendev.org/959962 indicates noble should work16:52
opendevreviewClark Boylan proposed opendev/system-config master: Fix base image quay.io promotion  https://review.opendev.org/c/opendev/system-config/+/96064416:58
clarkbinfra-root ^ I think that should fix the issue with publication to quay. Sorry for that. I'm going to go and check the other changes now16:58
clarkbthe first base image move change and the trixie image addition change are the only two with this problem17:01
opendevreviewClark Boylan proposed opendev/system-config master: Build trixie python base container images  https://review.opendev.org/c/opendev/system-config/+/95848017:05
clarkbthat should fix it for the trixie image change17:05
fungihaving trouble untangling whatever edit you made on that revision from the rebase17:22
fungibut it does seem to be using the upload instead of publish jobs like the base change17:24
*** dhill is now known as Guest2633317:24
fungilgtm17:24
clarkbfungi: https://review.opendev.org/c/opendev/system-config/+/958480/2..3 shows it just fine17:27
clarkbfungi: if you want to remove your +A on 958480 I think that would be fine. I kinda want to wait to see things work with the existing iamges first17:28
clarkb(though at this point it should work so it is probably fine)17:29
fungiah, gertty mixed in the diff from the rebase17:29
fungii guess the gerrit webui has gotten better about filtering out files that were only rebased17:30
clarkbfungi: did the kdcs end up back in the emergency file? I guess either way the real concern is the ansible cache files needing to be cleared out17:44
fungioh, good point, i forgot they'd been taken out17:45
fungiwell, they're upgraded now, i guess i just need to clear the cache on bridge17:45
clarkbya I think the change you merged might trigger jobs against them and/or the daily runs17:46
fungii've removed /var/cache/ansible/facts/kdc0{3,4}.openstack.org again17:46
clarkbthat is why I thought about it17:46
fungikdcs are both upgraded, reachable and have the requisite cpu count (2 each, smaller flavor, but as many as they had before upgrading)17:46
fungiand i followed https://docs.opendev.org/opendev/system-config/latest/kerberos.html#no-service-outage-server-maintenance again so it should have been transparent17:47
fungigoing to follow https://docs.opendev.org/opendev/system-config/latest/afs.html#afsdb0x-openstack-org for the afsdb upgrades now17:48
clarkb*the change you approved. Sorry it hasn't merged just yet17:51
fungihuh, afsdb03 is using netplan instead of ifupdown, must be a more recent addition/rebuild17:52
clarkbI think it was a newer addition when we realized that we should have 3 dbs instead of 217:53
opendevreviewMerged opendev/system-config master: Fix base image quay.io promotion  https://review.opendev.org/c/opendev/system-config/+/96064418:09
clarkbok lets see how promotions go this time18:09
clarkbwe have our first success so this is looking good18:11
clarkbthey have all succeeded18:12
clarkbI think we can let the trixie job remain in the gate and followup with additiona image promotions given this change didn't break anything18:12
fungiand there was much rejoicing18:13
clarkbanything else we want to check before I send an announcement about the move?18:13
funginothing comes to mind18:14
fungiat current pace i should have the rest of the afs servers upgraded to noble by the end of my day, except for afs02.dfw which will need to wait until the volume moves are done (probably early next week)18:14
opendevreviewMerged opendev/system-config master: Use Noble for our Kerberos servers  https://review.opendev.org/c/opendev/system-config/+/95996218:15
fungii expect this weekend the first pass on moves will be done and i'll restart any that got skipped due to conflicting write locks or temporary lack of space18:15
clarkbhttps://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/HO6Z66QIMDIDY7CCVAREDOPSYZYNKIT3/ email should be in inboxes now18:18
clarkbfungi: did you want to review the easy list of followup changes to switch consuming images over to the new location? corvus already reviewed them and I think for most of them i'm comfortable +A'ing as I can monitor18:19
fungiindeed it is, just finished reading it18:19
clarkbbut happy to have extra eyeballs if youw ant to take a look18:19
clarkbthat list is at 16:19:58 UTC in scrollback18:20
fungii'll take a quick pass through them while these afs kernel modules build18:20
clarkbI'm happy to hit +A since for most change this will have at least a small impact on production services18:20
fungilooking at all of hashtag:opendev-on-quay (hopefully that covers them)18:21
clarkbfungi: yup that is a super set. I don't want to move gerrit or ircbot at this time18:21
clarkbboth will need coordination as we will want to restart the services they host and that causes disruption18:21
fungik18:22
clarkbbut I think all of the others are relatively safe. A small blip to hound and lodgeit on codesearch and paste isn't a big deal for example18:22
clarkbthe change to switch kdc testing to noble did not trigger infra-prod jobs against the kdcs. So it is daily runs that will do that first18:23
fungiwhat happened to make docker capable of building speculatively for single-arch?18:23
clarkbfungi: that was the change where we switched docker builds to do buildx builds just like multiarch but without the emulation18:24
fungiah, and that made the fallback work for it with the buildset/intermediate registries?18:24
clarkbthe custom buildx builder is capable of it (and was capable of it in multiarch buidls which led to confuision over why it didn't work elsewhere) so we simply amde that the builder in all cases18:24
clarkbyes18:24
fungigot it18:24
clarkbthat custom builder sets up /etc/hosts for registry access, it configures ssl certs in the builder for registry access, then creates a buildx toml config file that sets up mirrors for each proper registry to go through our buildset registry18:25
clarkbone confusing thing was buildx documents its config file and the impression is that the default builder should honor it but testing seems to indicate this isn't the case. If you want to configure buildx with a custom config then you need a custom builder too18:26
clarkbbut once we figured all of that out the solution was reasonably straightforward, just do the custom buildx builder for all builds18:26
fungi958598 is going to trigger a deploy for all the gitea backends, right?18:29
clarkbfungi: it might, but that image isn't actually used by gitea anymore so any deployment should noop18:29
clarkbfungi: that gitea-init image is a relic of the k8s attempt that we've been keeping alive for some reason18:30
fungioh!18:33
clarkbhowever, that reminds me there is a new gitea release18:34
clarkblet me work on a chagne for that while I'm thinking about it18:35
opendevreviewClark Boylan proposed opendev/system-config master: Update gitea to 1.24.6  https://review.opendev.org/c/opendev/system-config/+/96067518:38
opendevreviewClark Boylan proposed opendev/system-config master: Stop mirroring python base images to quay.io/opendevmirror  https://review.opendev.org/c/opendev/system-config/+/96067618:45
opendevreviewMerged opendev/system-config master: Build trixie python base container images  https://review.opendev.org/c/opendev/system-config/+/95848018:54
opendevreviewMerged opendev/system-config master: Pull hound's base python image from quay  https://review.opendev.org/c/opendev/system-config/+/95859318:58
clarkbhound should deploy with its updated image once the hourly jobs complete19:10
clarkbI'll keep an eye on it19:10
fungicool, i'm currently in the middle of the afsdb02 upgrade19:11
clarkbhound has restarted and is doing its startup routine19:16
opendevreviewClark Boylan proposed opendev/system-config master: Mirror golang and node trixie container images  https://review.opendev.org/c/opendev/system-config/+/96068119:25
opendevreviewMerged opendev/system-config master: Pull python base image for statsd metric reporters from quay.io  https://review.opendev.org/c/opendev/system-config/+/95859419:29
opendevreviewMerged opendev/system-config master: Update jinjia-init and gitea-init to modern image build tooling  https://review.opendev.org/c/opendev/system-config/+/95859819:29
clarkbhound seems to work for me. I'll check on gitea-lb and zookeeper deployments for 958594 once that finishes deploying19:34
fungiwith afsdb02 i tried fixing the ppa source ahead of the noble upgrade, but do-release-upgrade disables the ppa source anyway so it still requires a follow-up edit and reinstallation of the packages regardless19:38
clarkbthe statsd containers appear to have updated on all four hosts. Grafana graphs appear to have new data at https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&from=now-5m&to=now&timezone=utc and https://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-5m&to=now&timezone=utc19:38
clarkbfungi: that change did trigger a gitea deploy but it should noop so not a big deal19:39
fungicool19:39
clarkbI need to grab lunch momentarily but so far things seem happy19:39
zigoKeystone release notes are missing from website: https://docs.openstack.org/releasenotes/keystone/unreleased.html20:11
zigoThis *always* happen on each release. Something should be done for it, IMO, because that's when I need the release notes the most. :)20:11
zigoIt's ok, I can look in the diff and read the sources, but it's not very convenient ! :)20:12
Clark[m]I'm not sure what opendev is expected to do about that. It is almost certainly a job configuration or sphinx build problem20:14
Clark[m]Nova doesn't have this problem so ya I don't think it's a systemic infrastructure issue20:16
fungiit comes up a lot with release notes publication for multiple releases on different branches racing each other, because they all share the same file subtree20:18
zigoWell, something's wrong in the way things are getting released, because on each release there's instances of this. So IMO, not sure what, but something should be made so it doesn't happen every 6 months.20:18
fungithis isn't 6 months from the coordinated openstack release btw, it's avoided for the coordinated release by avoiding tagging releases on multiple branches of the same project at the same time20:20
fungii'm guessing the release team approved a point release and rc change for keystone at the same time20:21
fungiactually looks like https://review.opendev.org/960080 creates a branch and a tag at the same time, so maybe triggers two release notes builds... checking20:25
fungiclicking "view build history" at the top of https://zuul.opendev.org/t/openstack/build/1c8fdd06259f45a3bd0195e38b98760a spins forever for me20:29
corvusworks better if you drop the project20:30
corvushttps://zuul.opendev.org/t/openstack/builds?job_name=publish-openstack-releasenotes-python3&skip=020:30
fungiyeah, i guess i can hand-filter the entries for that project20:30
fungilooks like that tag pipeline build was the only one for keystone around that time, so probably not a publication race20:31
fungitaking it to the #openstack-releases channel because i suspect there's some additional step needed at rc1 tagging and stable branch creation time to create the release notes page for the upcoming release20:37
zigoSame thing for placement.20:38
clarkbfungi: note you may want to put the afs updatse here :)21:32
fungid'oh!21:33
fungiall three afsdb servers are upgraded to noble now and back in the cluster, so that just leaves afs02.dfw to upgrade once the rw volume moves are done, and then residual cleanup and removing the afs servers from the emergency list21:34
clarkbmight want to remove one db server and one fileserver first and just ensure ansible doesn't do anything unexpected21:35
clarkbthen remove the others after the daily runs. We don't have great coverage of ansible things for afs unfortunately21:35
fungigood idea21:47
fungii can do that now actually21:47
fungii've removed afs01.ord and afsdb01 from the emergency file21:48
fungialso i'll clear the fact caches for all the afs servers21:48
fungii've removed /var/cache/ansible/facts/afs* from bridge now21:50

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!