19:00:16 <clarkb> #startmeeting infra
19:00:16 <opendevmeet> Meeting started Tue Nov 11 19:00:16 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:16 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:16 <opendevmeet> The meeting name has been set to 'infra'
19:00:42 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/XEGEBPR2GSFB5UDOI5WGMTUIXHKQAEAP/ Our Agenda
19:01:06 <clarkb> #topic Announcements
19:01:44 <clarkb> we are just over 2 weeks away from a major holiday in the US. I expect to be around tuesday and probably wednesday that week but not thursday and friday
19:02:09 <clarkb> All that to say I don't think it will affect our meeting schedule, but it probably will affect when people are around and active
19:02:56 <clarkb> Was there anything else to announce?
19:04:29 <fungi> i have nothing
19:05:35 <clarkb> #topic Gerrit 3.11 Upgrade Planning
19:05:49 <clarkb> Gerrit 3.13 has released
19:06:09 <clarkb> this means the pressure to upgrade to 3.11 is increasing
19:06:15 <clarkb> Before we do that there are new bugfix releases
19:06:21 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/966084 Update to Gerrit 3.10.9 and 3.11.7
19:06:32 <clarkb> and before we update to address the bugfix relases we have a docker compose bug to fix
19:06:37 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/966083 Fix container bind mounts for Gerrit
19:07:32 <clarkb> Landing these two changes and restarting Gerrit is going to be a big goal for me this week. I'm still catching up on stuff after being out yesterday but expect to be able to merge these changes and restart Gerrit sometime this week. Maybe friday if we are trying to cut down on impacts but possibly sooner
19:08:15 <clarkb> We also heard back from vexxhost on the gerrit server and it was a memory issue which should be mitigated now
19:08:21 <clarkb> (which makes updating gerrit and restarting things safer
19:08:29 <clarkb> Any other questions or concerns about Gerrit?
19:09:36 <tonyb> if we can schedule the restart while I'm around I'd like to be a second set of eyes
19:09:51 <tonyb> mostly to confirm what a normal start looks like
19:09:54 <clarkb> oh yes we should do that. So maybe thursday afternoon (for me)/friday morning for you
19:10:13 <tonyb> sounds good
19:10:15 <clarkb> tonyb: feel free to propose some time blocks. I'm generally pretty flexible late week
19:10:57 <clarkb> #topic Upgrading old servers
19:11:28 <clarkb> tonyb has the wiki change stack been updated for quay and/or noble?
19:11:47 <tonyb> noble yes quay no
19:12:15 <clarkb> ack thanks. I think updating the image build change to do that is the next step for this effort.
19:12:43 <tonyb> also the ansible changes are going to be restructured a little to move ansible-nextvto jammy!+3.11
19:13:49 <tonyb> I'll get them updated this week
19:13:54 <clarkb> thanks
19:14:33 <clarkb> any other server upgrade updates? (I don't think so but want to double check before we move on)
19:14:44 <tonyb> (sorry about the typos, speed and accuracy are low on my phone)
19:15:01 <tonyb> nothing more from me
19:15:07 <clarkb> #topic Matrix for OpenDev comms
19:15:23 <clarkb> tonyb offered to look into creating the new room last week. Not sure if that happened
19:15:36 <clarkb> that is step -2 of many to get this moving forward but it is an important step
19:15:51 <tonyb> nope.   today!
19:16:01 <clarkb> thanks!
19:16:15 <clarkb> #topic Upgrade Zuul Zookeeper Cluster to 3.9
19:16:20 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/966612
19:16:27 <tonyb> I was thinking I might also make a tooling test room .... to target with tools ... for testing
19:16:36 <corvus> we have one
19:16:49 <tonyb> oh!  never mind then
19:16:59 <clarkb> the zookeeper cluster is running 3.8 which is the stable release
19:17:10 <clarkb> 3.9 is the current release and has existed for enough time now to probably also be considered stable
19:17:42 <clarkb> the normal upgrade process is to upgrade each of the non leaders first then the leader which our ansible is not smart enough to do
19:17:45 <corvus> i think it's very likely that zuul is going to make 3.9 a requirement for zuul-launcher
19:18:03 <clarkb> corvus: what specific feature(s) make 3.9 useful for thel auncher?
19:18:08 <corvus> so getting ahead of that would be beneficial
19:18:38 <corvus> the watch event returns the zk transaction id starting with 3.9, so we can tell our current position in cache replays
19:19:05 <clarkb> got it
19:19:06 <corvus> https://review.opendev.org/966501 is the zuul change that takes advantage of it
19:19:25 <corvus> i've written a fallback change for zuul
19:19:31 <clarkb> as far as upgrading goes I have no objections to moving to 3.9. i think I have a slight preference for manually doing the upgrade to employ the correct expected process
19:19:46 <clarkb> note you have to check the status of each member after each restart beacuse sometimes the leader moves
19:19:56 <corvus> so this doesn't have to be in the critical path, we can upgrade whenever, but i'd like soon to increase our confidence
19:20:34 <clarkb> but I'm happy to help with the process which is something like put servers in emergency file, edit docker compose.yaml by hand and upgrade the first follower, repeat on the second follower after checking which node is leader, then finally do the last node
19:20:59 <clarkb> the release notes for 3.9 say no special steps are required to upgrade from 3.8 to 3.9 so it should be striaghtforward if we use the normal process
19:21:00 <corvus> i could do it this saturday morning (my time)
19:21:13 <corvus> yeah, i also went over the notes and didn't see anything
19:21:26 <corvus> also, a lot of our zuul tests have already been using 3.9
19:21:32 <clarkb> and when done we can merge that change and pull the nodes out of the emergency file
19:21:56 <clarkb> so I guess heads up, review the upgrade change but don't approve it and if you have any concerns please raise them
19:22:14 <corvus> that process sounds good to me, and it sounds like if no one objects we could do it saturday
19:22:39 <corvus> we should make sure to take a zuul zk backup before starting too, just in case
19:22:46 <clarkb> ++
19:22:49 <corvus> (with zuul-client)
19:23:00 <clarkb> I just approved the test fix that the zk upgrade is a child of
19:23:28 <tonyb> sounds good to me
19:24:07 <clarkb> #topic Gitea 1.25.1 Upgrade
19:24:15 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/965960 Upgrade Gitea to 1.25.1
19:24:57 <clarkb> https://158.69.67.86/opendev/system-config is a held node you can interact with to check this upgrade
19:25:15 <clarkb> gerrit bug fix upgrades, gitea new release upgrade, and zookeeper upgrades all on tap this week
19:25:44 <clarkb> I'd appreciate reviews of the change itself to make sure I haven't done anything silly when updating templates, but also read over the release notes and make sure there aren't new features we need to enable/disable/configure
19:25:57 <clarkb> This release seemed to avoid big changes like that so I think it should be easy but let me know
19:26:27 <clarkb> mostly just trying to keep up so we don't fall behind
19:26:33 <clarkb> #topic Gitea Performance
19:26:49 <clarkb> Then related to that I spot checked giteas today and they all look busy but not to the point where they are slow
19:27:21 <clarkb> both the memcached memory increase and the "force everything through the load balancer" changes merged
19:27:39 <clarkb> probably a bit early to claim improvement, but not having evidence of problems is something
19:27:54 <clarkb> fungi: related I noticed this morning when prepping for the meeting that the lists server seems sad again
19:28:33 <clarkb> I think mariadb is busy so we may have something crawling apis again and maybe we need to double check iops look reasonable still
19:28:43 <fungi> mmm
19:28:47 <clarkb> but wanted to call that out if we're discussing general performance issues related to crawlers
19:29:15 <fungi> load average is hovering around 10 at the moment, yeah
19:29:48 <clarkb> I suspect its the same story just hitting us in new and exciting ways as we continue to improve bottlenecks
19:29:55 <clarkb> every fixed bottleneck is an opportunity to find a new one
19:30:57 <clarkb> Please say something if you notice problems in gitea (or any other service).
19:31:01 <clarkb> #topic Raxflex DFW3 Disabled
19:31:08 <clarkb> I don't think this server has been fixed or replaced yet
19:31:24 <clarkb> last week we basically said if after a week it wasn't fixed we'd boot a new one
19:31:32 <clarkb> I think we can probably proceed with that plan now if anyone has time
19:31:58 <clarkb> (my focus is probably on gerrit and gitea and whatever lists needs to be performant, but I'm happy to help if you point me to specific actions that are needed)
19:33:24 <clarkb> #topic Open Discussion
19:33:40 <tonyb> I'll try but if someone else has cycles don't let me stop you
19:34:01 <clarkb> That was all I had on the agenda. I cut out afs stuff since trixie is mirrored now. I cut out launcher things because the major bug there was fixed. We also got vexxhost to address the gerrit vm issues. We upgraded etherpad too
19:34:17 <clarkb> all that to say we got a lot done last week and I was able to trim the agenda as a result. Thank you everyone for making that happen
19:35:58 <tonyb> yeah well done!
19:36:47 <fungi> great work everyone!
19:37:18 <clarkb> maybe we can upgrade gitea tomorrow and plan for gerrit thursday. tonyb we can sync up outside of the meeting on timing for gerrit
19:37:27 <clarkb> and with that I think we can probably end early if there is nothing else
19:37:36 <clarkb> I have some zuul launcher bug fix code reviews I need to do then lunch
19:38:20 <clarkb> thanks everyone. We'll be back here at the same time and location next week
19:38:27 <clarkb> #endmeeting