Tuesday, 2025-07-22

clarkbmeeting time19:00
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Jul 22 19:00:57 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/7PMI6EMXUSHZA4J2CJ5XNXM3BKHH3CXH/ Our Agenda19:01
clarkb#topic Announcements19:02
clarkbAnything to announce?19:02
clarkbSounds like no19:03
clarkb#topic Zuul-launcher19:03
fungii didn't have anything19:03
clarkbWe can dive right in then19:03
clarkbmixed cloud nodesets are still happening (at a lower rate)19:04
clarkb#link https://review.opendev.org/c/zuul/zuul/+/955545 is current proposal due to lack of verbose errors from openstack clouds19:04
clarkbthere is also a bugfix to move image hashing into the image upload role so that we can hash the correct format of the data19:04
clarkb#link https://review.opendev.org/c/opendev/zuul-providers/+/955621/19:04
corvusyeah, afaict they're happening because we're hitting quota, but the clouds give us error messages that don't tell us that.19:05
corvuswe're unable to upload raw images until 621 lands19:05
clarkband at least some of those errors we expect to be included in the error response btu there is a bug in nova19:05
fungibecause you need a time machine to interpret 15 years of openstack error messages as they evolved19:05
corvuswe do a lot, but some of them are just plain empty strings19:06
funginot much you can do with a null response19:06
corvusso even though zuul-launcher is basically at the point where anything with "exceed" or "quota" in the string is treated as a quota error, yep, not much else we can do with nothing.19:06
clarkbI see you noted centos 9 failures. Those issues are indicative of a broken mirror ( due  to upstream updates happening in files out of correct order)19:07
clarkbwe mirror directly from upstream now so this is upstream not updating their mirror files in the correct order aiui19:07
corvusthere's also a bug in launcher that's causing extra "pending" uploads; a fix for that is in progress19:07
corvusclarkb: that failed a recheck too... how long should we expect those periods to last?19:07
clarkbcorvus: sometimes its until the next mirror sync which I think happens after 4 or 6 hours19:09
corvusoof19:09
clarkbbut sometimes we've seen it go days when upstream isn't concerned about fixing that stuff19:10
clarkbcorvus: for fixes like this I feel like we can force merge if most of the images build and we have a problem like this19:10
corvusyeah, maybe the way to go.  we could also think about making things nonvoting19:11
clarkbthough maybe that impacts record keeping for subsequent image uploads?19:11
clarkbnonvoting is an interesting idea too since it should work when images build?19:11
clarkband that would make things lazy/optimistic/eventually consistent19:11
corvusyep.  either way should work.19:11
clarkbcool. The other thing I wanted to note here is that nodepool is completely shutdown at this point so if you see errors or have questions we need to refer to niz now19:12
corvusyep, and i'm ready to delete the nodepool servers whenever19:12
corvushow does "now" sound? :)19:13
clarkbI think I'm ready if you are. Rolling back seems unlikely at this point as we've been able to rollforward on the majority of workload for several weeks19:13
corvushttps://review.opendev.org/955229 is the change to remove nodepool config completely; after that merges i can issue the delete commands19:13
clarkband worst case we can create new nodepool servers later if there is a reason to19:14
corvuslooks like that change may need to run the gauntlet (it failed on 2 different jobs on 2 different rechecks)19:14
corvusbut i think it's read19:14
corvusy19:14
funginow is good with me19:15
clarkbit runs a lot of jobs19:15
corvusokay i will make it so19:15
clarkbI suspect due to the changes to inventory/ we may be able to optimize the job selection there btter if it becomes a problem19:15
clarkbanything else on the topic of nodepool in zuul?19:16
corvusi think that's it19:16
clarkb#topic Gerrit 3.11 Upgrade Planning19:16
clarkbLast Friday fungi and I landed all the gerrit image backlog changes and restarted gerrit19:16
clarkbthe end result is images that we can use to test the gerrit upgrade and they are unlikely to change much between now and the actual upgrade19:17
clarkb#link https://www.gerritcodereview.com/3.11.html19:17
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.11 Planning Document for the eventual Upgrade19:17
fungiyay!19:17
clarkbIf you get a moment looking over the release notes and making notes in that etherpad for things to test/check would be great19:17
clarkbI have also held new test nodes19:18
clarkb#link https://zuul.opendev.org/t/openstack/build/f1ca0d1f2e054829a4506ececb58bed319:18
clarkb#link https://zuul.opendev.org/t/openstack/build/588723b923e94901af3065143d9df81819:18
clarkbthese two buildsare the builds with held nodes. I have not done any upgrade testing on them yet. But the 3.11 nodes might be good to interact with for any ui changes19:18
clarkbMy main concern at the moment is after the recent 3.11.4 update there are two different reports on the upstream mailing list for issues that would be problematic for is19:19
clarkbfirst is offline reindexing not working (it spins forver after completeling 99% of the work)19:19
clarkband the other is the replication plugin refusign to attempt replication to targets after some time. Even asking for a full replication run doesn't work. YOu have to restart gerrit to get it to try again19:20
clarkbso I want to see if I can test if these are problems in gerrit 3.11.4 or with the specific deployments involved as part of upgrade testing19:20
clarkbI'm hoping to start digging into this tomorrow19:21
clarkbany questions or concerns about the gerrit upgrade situation?19:21
clarkb#topic Upgrading old servers19:22
clarkbThe other train of thought I've kicked off this week is starting to look at replacing the eavesdrop server19:23
clarkb#link https://review.opendev.org/c/opendev/system-config/+/955544/ prep step for eavesdrop move to Noble19:23
clarkbthis change is a prep step to make the existing docker compose configs compatible with docker compose + podman on noble. It should be forward and backward compatible and is somethign we've applied elsewhere without issue19:23
clarkbonce that is in and happy on the old system I'll look into booting a new system and determining what the cut over looks like. I believe its something like shutdown all the irc bots on the old server, deploy the new server and ensure all the bots are started there and writing to afs happily again19:24
clarkb(all the actual data is in afs iirc so we don't need to migrate volumes but I still need to double check that assertion)19:24
clarkbfungi: any word on refstack yet?19:25
fungino, i'm planning to just write up an announcement and get someone to agree it's okay19:26
fungiand then i'll send it out to openstack-discuss once i get approval19:27
clarkbsounds good19:27
clarkbthanks19:27
clarkbanyone else have server replacement updates? I don't think we've done any recently but happy to have missed some :)19:27
fungii think nobody on the foundation staff cares what happens wrt refstack and associated git repos, it's mostly just me wanting to make sure users aren't surprised and we have some information to point at when there are questions19:28
clarkb++19:28
clarkb#topic Matrix for OpenDev Comms19:28
clarkb#link https://review.opendev.org/c/opendev/infra-specs/+/954826 Spec outlining the motivation and plan for Matrix trialing19:28
clarkbI wrote the spec19:29
clarkbI tried to capture why we've used IRC, what we like about it  and how Matrix helps fill those needs while also being more approachable to those who are more familiar with the modern Internet19:29
clarkbI don't think anyone has reviewed it yet so I'm mostly hoping for some feedback19:30
clarkbbut feel free to leave that on the change itself19:30
clarkb#topic Working through our TODO list19:31
clarkb#link https://etherpad.opendev.org/p/opendev-january-2025-meetup19:31
clarkbI have not migrated this into a more permanent home yet19:31
corvus(thanks for the spec, i'll take a look at it soon!)19:31
clarkbI did do some cleanups to the specs repo and I'm thinking maybe I can port it in there as a high level list of things that don't have teh depth of detail of a list of specs19:32
clarkbbut maybe a list of stubs that could become specs if necessary and otherwise captuer the need19:32
clarkbmaybe I will just try that and see if I like it19:32
clarkb#topic Pre PTG Planning19:32
clarkb#link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document19:32
clarkband all of that feeds into planning for our october pre ptg event19:32
clarkbif you've got topics you want to cover feel free to add them. My plan is to port things that need discussion from that todo list into there as well as anything that is more currently topical19:33
clarkbwe have a lot of time to get ready though so no rush19:34
clarkb#topic Open Discussion19:34
clarkbI did want to note that as july ends we approach service coordinator election period in August19:34
clarkbI'll start putting a plan for that next week before August actually rolls around19:34
clarkbif you are interested in runnign I'm happy to help/support anyone with the interest19:35
clarkband then fungi you are working on updating the gitea main page content19:36
clarkb#link https://review.opendev.org/c/opendev/system-config/+/95240719:36
clarkbthis is the resulting squashed change so that we don't do more than one rolling update of gitea19:36
fungiyeah, that's the squashed version now that we have some consensus19:36
clarkbI'll rereview that shortly but maybe we can get that deployed today19:37
fungiif folks are still okay with it, then whenever we're ready for another round of gitea restarts...19:37
fungii'm around all day19:37
fungihappy to help monitor the deploy19:37
clarkbgreat. Anything else to discuss before we end the meeting?19:39
clarkbsounds like that may be it. Thank you everyone19:40
clarkbWe'll be back here same time and location next week19:40
clarkb#endmeeting19:40
opendevmeetMeeting ended Tue Jul 22 19:40:27 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:40
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-22-19.00.html19:40
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-22-19.00.txt19:40
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-22-19.00.log.html19:40
fungithanks clarkb!19:41

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!