19:00:57 <clarkb> #startmeeting infra 19:00:57 <opendevmeet> Meeting started Tue Jul 22 19:00:57 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:57 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:57 <opendevmeet> The meeting name has been set to 'infra' 19:01:57 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/7PMI6EMXUSHZA4J2CJ5XNXM3BKHH3CXH/ Our Agenda 19:02:27 <clarkb> #topic Announcements 19:02:29 <clarkb> Anything to announce? 19:03:19 <clarkb> Sounds like no 19:03:25 <clarkb> #topic Zuul-launcher 19:03:26 <fungi> i didn't have anything 19:03:28 <clarkb> We can dive right in then 19:04:24 <clarkb> mixed cloud nodesets are still happening (at a lower rate) 19:04:27 <clarkb> #link https://review.opendev.org/c/zuul/zuul/+/955545 is current proposal due to lack of verbose errors from openstack clouds 19:04:47 <clarkb> there is also a bugfix to move image hashing into the image upload role so that we can hash the correct format of the data 19:04:58 <clarkb> #link https://review.opendev.org/c/opendev/zuul-providers/+/955621/ 19:05:09 <corvus> yeah, afaict they're happening because we're hitting quota, but the clouds give us error messages that don't tell us that. 19:05:27 <corvus> we're unable to upload raw images until 621 lands 19:05:37 <clarkb> and at least some of those errors we expect to be included in the error response btu there is a bug in nova 19:05:42 <fungi> because you need a time machine to interpret 15 years of openstack error messages as they evolved 19:06:02 <corvus> we do a lot, but some of them are just plain empty strings 19:06:28 <fungi> not much you can do with a null response 19:06:42 <corvus> so even though zuul-launcher is basically at the point where anything with "exceed" or "quota" in the string is treated as a quota error, yep, not much else we can do with nothing. 19:07:15 <clarkb> I see you noted centos 9 failures. Those issues are indicative of a broken mirror ( due to upstream updates happening in files out of correct order) 19:07:29 <clarkb> we mirror directly from upstream now so this is upstream not updating their mirror files in the correct order aiui 19:07:32 <corvus> there's also a bug in launcher that's causing extra "pending" uploads; a fix for that is in progress 19:07:50 <corvus> clarkb: that failed a recheck too... how long should we expect those periods to last? 19:09:38 <clarkb> corvus: sometimes its until the next mirror sync which I think happens after 4 or 6 hours 19:09:47 <corvus> oof 19:10:00 <clarkb> but sometimes we've seen it go days when upstream isn't concerned about fixing that stuff 19:10:40 <clarkb> corvus: for fixes like this I feel like we can force merge if most of the images build and we have a problem like this 19:11:05 <corvus> yeah, maybe the way to go. we could also think about making things nonvoting 19:11:09 <clarkb> though maybe that impacts record keeping for subsequent image uploads? 19:11:25 <clarkb> nonvoting is an interesting idea too since it should work when images build? 19:11:36 <clarkb> and that would make things lazy/optimistic/eventually consistent 19:11:45 <corvus> yep. either way should work. 19:12:26 <clarkb> cool. The other thing I wanted to note here is that nodepool is completely shutdown at this point so if you see errors or have questions we need to refer to niz now 19:12:47 <corvus> yep, and i'm ready to delete the nodepool servers whenever 19:13:06 <corvus> how does "now" sound? :) 19:13:49 <clarkb> I think I'm ready if you are. Rolling back seems unlikely at this point as we've been able to rollforward on the majority of workload for several weeks 19:13:57 <corvus> https://review.opendev.org/955229 is the change to remove nodepool config completely; after that merges i can issue the delete commands 19:14:02 <clarkb> and worst case we can create new nodepool servers later if there is a reason to 19:14:54 <corvus> looks like that change may need to run the gauntlet (it failed on 2 different jobs on 2 different rechecks) 19:14:58 <corvus> but i think it's read 19:14:59 <corvus> y 19:15:09 <fungi> now is good with me 19:15:15 <clarkb> it runs a lot of jobs 19:15:41 <corvus> okay i will make it so 19:15:44 <clarkb> I suspect due to the changes to inventory/ we may be able to optimize the job selection there btter if it becomes a problem 19:16:12 <clarkb> anything else on the topic of nodepool in zuul? 19:16:26 <corvus> i think that's it 19:16:34 <clarkb> #topic Gerrit 3.11 Upgrade Planning 19:16:57 <clarkb> Last Friday fungi and I landed all the gerrit image backlog changes and restarted gerrit 19:17:23 <clarkb> the end result is images that we can use to test the gerrit upgrade and they are unlikely to change much between now and the actual upgrade 19:17:28 <clarkb> #link https://www.gerritcodereview.com/3.11.html 19:17:33 <clarkb> #link https://etherpad.opendev.org/p/gerrit-upgrade-3.11 Planning Document for the eventual Upgrade 19:17:52 <fungi> yay! 19:17:59 <clarkb> If you get a moment looking over the release notes and making notes in that etherpad for things to test/check would be great 19:18:09 <clarkb> I have also held new test nodes 19:18:14 <clarkb> #link https://zuul.opendev.org/t/openstack/build/f1ca0d1f2e054829a4506ececb58bed3 19:18:19 <clarkb> #link https://zuul.opendev.org/t/openstack/build/588723b923e94901af3065143d9df818 19:18:44 <clarkb> these two buildsare the builds with held nodes. I have not done any upgrade testing on them yet. But the 3.11 nodes might be good to interact with for any ui changes 19:19:12 <clarkb> My main concern at the moment is after the recent 3.11.4 update there are two different reports on the upstream mailing list for issues that would be problematic for is 19:19:29 <clarkb> first is offline reindexing not working (it spins forver after completeling 99% of the work) 19:20:07 <clarkb> and the other is the replication plugin refusign to attempt replication to targets after some time. Even asking for a full replication run doesn't work. YOu have to restart gerrit to get it to try again 19:20:23 <clarkb> so I want to see if I can test if these are problems in gerrit 3.11.4 or with the specific deployments involved as part of upgrade testing 19:21:28 <clarkb> I'm hoping to start digging into this tomorrow 19:21:47 <clarkb> any questions or concerns about the gerrit upgrade situation? 19:22:55 <clarkb> #topic Upgrading old servers 19:23:09 <clarkb> The other train of thought I've kicked off this week is starting to look at replacing the eavesdrop server 19:23:14 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/955544/ prep step for eavesdrop move to Noble 19:23:43 <clarkb> this change is a prep step to make the existing docker compose configs compatible with docker compose + podman on noble. It should be forward and backward compatible and is somethign we've applied elsewhere without issue 19:24:19 <clarkb> once that is in and happy on the old system I'll look into booting a new system and determining what the cut over looks like. I believe its something like shutdown all the irc bots on the old server, deploy the new server and ensure all the bots are started there and writing to afs happily again 19:24:32 <clarkb> (all the actual data is in afs iirc so we don't need to migrate volumes but I still need to double check that assertion) 19:25:50 <clarkb> fungi: any word on refstack yet? 19:26:41 <fungi> no, i'm planning to just write up an announcement and get someone to agree it's okay 19:27:01 <fungi> and then i'll send it out to openstack-discuss once i get approval 19:27:19 <clarkb> sounds good 19:27:21 <clarkb> thanks 19:27:35 <clarkb> anyone else have server replacement updates? I don't think we've done any recently but happy to have missed some :) 19:28:12 <fungi> i think nobody on the foundation staff cares what happens wrt refstack and associated git repos, it's mostly just me wanting to make sure users aren't surprised and we have some information to point at when there are questions 19:28:21 <clarkb> ++ 19:28:50 <clarkb> #topic Matrix for OpenDev Comms 19:28:58 <clarkb> #link https://review.opendev.org/c/opendev/infra-specs/+/954826 Spec outlining the motivation and plan for Matrix trialing 19:29:01 <clarkb> I wrote the spec 19:29:27 <clarkb> I tried to capture why we've used IRC, what we like about it and how Matrix helps fill those needs while also being more approachable to those who are more familiar with the modern Internet 19:30:32 <clarkb> I don't think anyone has reviewed it yet so I'm mostly hoping for some feedback 19:30:38 <clarkb> but feel free to leave that on the change itself 19:31:26 <clarkb> #topic Working through our TODO list 19:31:31 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup 19:31:39 <clarkb> I have not migrated this into a more permanent home yet 19:31:59 <corvus> (thanks for the spec, i'll take a look at it soon!) 19:32:00 <clarkb> I did do some cleanups to the specs repo and I'm thinking maybe I can port it in there as a high level list of things that don't have teh depth of detail of a list of specs 19:32:19 <clarkb> but maybe a list of stubs that could become specs if necessary and otherwise captuer the need 19:32:30 <clarkb> maybe I will just try that and see if I like it 19:32:47 <clarkb> #topic Pre PTG Planning 19:32:51 <clarkb> #link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document 19:32:58 <clarkb> and all of that feeds into planning for our october pre ptg event 19:33:23 <clarkb> if you've got topics you want to cover feel free to add them. My plan is to port things that need discussion from that todo list into there as well as anything that is more currently topical 19:34:19 <clarkb> we have a lot of time to get ready though so no rush 19:34:28 <clarkb> #topic Open Discussion 19:34:41 <clarkb> I did want to note that as july ends we approach service coordinator election period in August 19:34:53 <clarkb> I'll start putting a plan for that next week before August actually rolls around 19:35:06 <clarkb> if you are interested in runnign I'm happy to help/support anyone with the interest 19:36:24 <clarkb> and then fungi you are working on updating the gitea main page content 19:36:38 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/952407 19:36:47 <clarkb> this is the resulting squashed change so that we don't do more than one rolling update of gitea 19:36:49 <fungi> yeah, that's the squashed version now that we have some consensus 19:37:00 <clarkb> I'll rereview that shortly but maybe we can get that deployed today 19:37:22 <fungi> if folks are still okay with it, then whenever we're ready for another round of gitea restarts... 19:37:31 <fungi> i'm around all day 19:37:39 <fungi> happy to help monitor the deploy 19:39:11 <clarkb> great. Anything else to discuss before we end the meeting? 19:40:14 <clarkb> sounds like that may be it. Thank you everyone 19:40:23 <clarkb> We'll be back here same time and location next week 19:40:27 <clarkb> #endmeeting