19:00:33 <clarkb> #startmeeting infra
19:00:33 <opendevmeet> Meeting started Tue Sep  2 19:00:33 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:33 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:33 <opendevmeet> The meeting name has been set to 'infra'
19:00:43 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/AYKNDGLH46IV3N5NI2BBSVMYMI6W4MQP/ Our Agenda
19:00:47 <clarkb> #topic Announcements
19:01:10 <clarkb> There is a matrix.org outage right now which I think may be impacting the irc bridge to oftc as well as any matrix accounts hosted by matrix.org (like mine)
19:01:30 <clarkb> https://status.matrix.org/ is tracking the issue if you want to know when things return to normal
19:01:57 <clarkb> Also fungi is out today so this meeting may just be me running through the agenda
19:02:09 <clarkb> but feel free to jump in on any topics if there is anything to share and you are following along
19:02:44 <clarkb> #topic Gerrit 3.11 Upgrade Planning
19:02:58 <clarkb> I don't have any updates on this item. I've been distracted by other things lately
19:03:54 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/957555 Gerrit image updates for bugfix releases
19:04:13 <clarkb> this change could still use reviews though and I'll and it amongst the other container update changes when its convenient to restart gerrit
19:04:21 <clarkb> #topic Upgrading old servers
19:04:47 <clarkb> As mentioned last week fungi managed to update most of the openafs cluster to jammy and I expect when he gets back that we will continue that effort all the way to noble
19:05:11 <fungi> i'll pick the afs/kerberos upgrades back up toward the end of this week once i'm home
19:05:12 <clarkb> this is major progress and getting off of old ubuntu releases and onto more modern stuff
19:05:18 <clarkb> fungi: thanks!
19:05:32 <clarkb> then the major remaining nodes on the list are backup servers and the graphite server
19:06:12 <clarkb> Help continues to be very much appreciated if anyone else is able to dig into this
19:06:48 <clarkb> #topic Matrix for OpenDev comms
19:06:54 <clarkb> #link https://review.opendev.org/c/opendev/infra-specs/+/954826 Spec outlining the motivation and plan for Matrix trialing
19:07:01 <corvus_> timely
19:07:04 <clarkb> followup reviews on the spec are where we're sort of treading water
19:07:23 <corvus_> the bridge appears to be broken right now
19:07:25 <clarkb> and yes the matrix.org outage may provide new thoughts/ideas on this spec if you want to wait and see how that gets resolved
19:07:46 <clarkb> corvus_: yes I think anything hosted by matrix.org (including my account and the oftc irc bridge) is not working with matrix right now
19:07:59 <clarkb> https://status.matrix.org/ is tracking the outage
19:08:36 <clarkb> reviews still very much welcome and I understand if we want to wait and see some further triage/resolution on the current issue before doing so
19:08:48 <fungi> yeah, my weechat is logging repeated connection refusal errors from the matrix.org servers
19:09:15 <corvus_> i only noticed because of this meeting
19:09:27 <clarkb> yes my browser was getting 429s from cloudflare. I suspect they've configured limits quite low to ease traffic on the backend while they fix it
19:10:14 <clarkb> #topic Pre PTG Planning
19:10:20 <clarkb> #link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document
19:10:25 <clarkb> Times: Tuesday October 7 1800-2000 UTC, Wednesday October 8 1500-1700 UTC, Thursday October 9 1500-1700
19:10:33 <clarkb> This will replace our team meeting on October 7
19:10:47 <clarkb> please add discussion topics to the agenda on that etherpad
19:10:59 <clarkb> and I'll see you there in just over a month
19:11:14 <clarkb> #topic Loss of upstream Debian bullseye-backports mirror
19:11:22 <clarkb> Zuul-jobs will no longer enable debian backports by default on September 9
19:11:26 <clarkb> #link https://lists.zuul-ci.org/archives/list/zuul-announce@lists.zuul-ci.org/thread/NZ54HYFHIYW3OILYYIQ72L7WAVNSODMR/
19:11:55 <clarkb> Once zuul-jobs' default is updated then we'll be able to delete the debian bullseye backports repo from our mirror and drop our workaround
19:12:23 <clarkb> just waiting for sufficient time to pass since this was announced on the zuul announce list
19:13:30 <clarkb> #topic Etherpad 2.5.0 Upgrade
19:13:36 <clarkb> #link https://github.com/ether/etherpad-lite/blob/v2.5.0/CHANGELOG.md
19:13:39 <corvus_> regrading matrix: btw the opendev ems instance is working (gerritbot msgs are going through) and eavesdrop is logging
19:13:47 <clarkb> ack
19:14:10 <clarkb> etherpad claims the 2.5.0 release fixes our problems. I still think the root page's css is weird but the js errors related to 2.4.2 did go away
19:14:17 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/956593/
19:14:23 <clarkb> 104.130.127.119 is a held node for testing. You need to edit /etc/hosts to point etherpad.opendev.org at that IP.
19:14:44 <clarkb> if you want to test you can punch that ip into /etc/hosts and check out the root page locally as well as the clarkb-test pad (or any other pad you create)
19:15:15 <clarkb> again I don't think this is urgent. Mostly looking for feedback on whether we think this is workable so that we don't fall behind or continue to pester them to fix things better
19:16:13 <clarkb> #topic Moving OpenDev's python-base/python-builder/uwsig-base Images to Quay
19:16:47 <clarkb> last week corvus_ suggested that the way to not wait forever on merging this change is to prep changes to update all the child images as reminders that those need updating at some point then proceeding with moving the base image publication location
19:16:56 <clarkb> I did propose those changes and it caught a problem!
19:17:13 <clarkb> Turns out to use speculative images via the buildset registry when building with docker we always need to build with a custom buildx builder
19:17:50 <clarkb> earlier this year I had changed image building to use podman by default in system-config. I don't remember the details btu I suspect I ran into this problem and just didn't track it down fully and this was the out. The problem with thati s multiarch builds
19:18:28 <clarkb> its actually probably more ideal for us to keep using docker to build images and switch to the custom buildx builder for all image builds so that single arch and multiarch builds use the same toolchain (podman doesn't support multiarch in our jobs yet but the underlying tool does)
19:18:43 <clarkb> #link https://review.opendev.org/c/zuul/zuul-jobs/+/958783 Always build docker images with custom buildx builder
19:18:58 <clarkb> this change updates zuul-jobs to do that for everyone as it plays nice with speculative image builds
19:19:33 <clarkb> so I think the rough plan here for moving base images to quay is land that zuul-jobs change, then move base images to quay, then update the child images to both build with docker and pull base images from quay
19:20:05 <clarkb> that zuul-jobs chnage has a child followup change that adds testing to zuul-jobs to cover all of this too so we should be good moving foward and any regressions should be caught early
19:20:38 <clarkb> #topic Adding Debian Trixie Base Python Container Images
19:20:50 <clarkb> Then once base images move to quay we can also add trixie based python container images
19:20:55 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/958480
19:21:39 <clarkb> with plans to worry about python3.13 after trixie is in place
19:21:53 <clarkb> just to keep the total number of images we're juggling to a reasonable number
19:22:27 <clarkb> #topic Dropping Ubuntu Bionic Test Nodes
19:22:51 <clarkb> After last week's meeting I think I convinced myself we don't need to do any major announcements for Bionic cleanups yet. Mostly because the release is long EOL at this point
19:22:57 <clarkb> #link https://review.opendev.org/q/hashtag:%22drop-bionic%22+status:open
19:23:22 <clarkb> I did write a few changes to continue to remove opendev's dependence on bionic under that hashtag. I Think all of those changes are likely quick reviews and easy approvals
19:24:00 <clarkb> and dropping releases like bionic reduces the total storage used in openafs which should make things like upgrading openafs servers easier
19:24:17 <clarkb> (though I'm not sure we'll get this done before fungi completes the upgrades. I'm just trying to justify the effort generally)
19:25:38 <clarkb> #topic Temporary Shutdown of raxflex sjc3 for provider maintenance window
19:26:05 <clarkb> last week rackspcae notified us via email that the cinder volume backing the rax flex sjc3 mirror would undergo maintenance tomorrow at 10:30am to 12:30pm cnetral time
19:26:10 <clarkb> #link https://review.opendev.org/c/opendev/zuul-providers/+/959200
19:26:46 <clarkb> this change disables this region in zuul launcher so that I can safely shutdown the mirror while they do that work. My plan is to approve that change after lunch today and then manually shutdown the mirror before EOD
19:26:58 <clarkb> that should be plenty of time for running jobs to complete
19:27:18 <clarkb> then tomorrow after the maintenance window completes I can start the mirror back up again and revert 959200
19:28:20 <clarkb> #topic Fixing Zuul's Trixie Image Builds
19:28:38 <clarkb> This item wasn't on the agenda (my bad) but it was pointed out that our Trixie images are still actually debian testing
19:28:52 <clarkb> #link https://review.opendev.org/c/opendev/zuul-providers/+/958561 Build actual Trixie now that it is released
19:29:03 <clarkb> 958561 will fix that but depends on a DIB update
19:29:27 <clarkb> did anyone else want to review the DIB update? I'm thinking I may approve that one today with mnasiadka's review as the sole +2 in order to not let this problem fester for too long
19:29:58 <clarkb> #topic Open Discussion
19:30:11 <clarkb> And with that we have reached the end of the agenda. Anything else?
19:31:23 <clarkb> I know I kinda speedran through that but with fungi out, corvus impacted by matrix bridging issues, and frickler and tonyb not typically attending I figured I should just get through it
19:31:40 <clarkb> I'll leave the floor open until 19:35 UTC then call it a meeting if nothing comes up
19:31:52 <clarkb> as always feel free to continue any discssion on the mailing list or in #opendev
19:32:40 <fungi> finishing the afs/kerberos upgrades shouldn't take long, btw, it's fairly mechanical now and hopefully i can have the rw volume migration back to the noble afs01.dfw going by the weekend
19:32:48 <corvus_> fyi there's a zuul-scheduler memory leak, but i think i have a fix
19:33:01 <corvus_> we'll probably need to restart the schedulers tomorrow whether or not it lands
19:33:06 <clarkb> corvus_: oh right that came up on Friday and over the weekend
19:33:15 <fungi> that was the root cause for the connection issues last week?
19:33:22 <corvus_> #link https://review.opendev.org/959228 fix zuul-scheduler memory leak
19:33:27 <corvus_> yeah i think so
19:33:44 <corvus_> i mean, this is all well-informed supposition, not hard proof
19:34:01 <tonyb> I've been following along just didn't have thoughts
19:34:02 <corvus_> but i think we're at "fix obvious things first" and if stuff is still broken, dig deeper.
19:34:11 <clarkb> corvus_: sounds good
19:34:21 <clarkb> I'll make a note now for tomorrow to restart schedulers
19:35:25 <clarkb> and we're at the time I noted we'd end. Thank you everyone!
19:35:35 <clarkb> We should be back here at the same time and location next week
19:35:38 <clarkb> see you then
19:35:43 <corvus_> thanks clarkb !
19:35:44 <clarkb> #endmeeting