19:00:16 <clarkb> #startmeeting infra 19:00:16 <opendevmeet> Meeting started Tue Nov 11 19:00:16 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:16 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:16 <opendevmeet> The meeting name has been set to 'infra' 19:00:42 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/XEGEBPR2GSFB5UDOI5WGMTUIXHKQAEAP/ Our Agenda 19:01:06 <clarkb> #topic Announcements 19:01:44 <clarkb> we are just over 2 weeks away from a major holiday in the US. I expect to be around tuesday and probably wednesday that week but not thursday and friday 19:02:09 <clarkb> All that to say I don't think it will affect our meeting schedule, but it probably will affect when people are around and active 19:02:56 <clarkb> Was there anything else to announce? 19:04:29 <fungi> i have nothing 19:05:35 <clarkb> #topic Gerrit 3.11 Upgrade Planning 19:05:49 <clarkb> Gerrit 3.13 has released 19:06:09 <clarkb> this means the pressure to upgrade to 3.11 is increasing 19:06:15 <clarkb> Before we do that there are new bugfix releases 19:06:21 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/966084 Update to Gerrit 3.10.9 and 3.11.7 19:06:32 <clarkb> and before we update to address the bugfix relases we have a docker compose bug to fix 19:06:37 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/966083 Fix container bind mounts for Gerrit 19:07:32 <clarkb> Landing these two changes and restarting Gerrit is going to be a big goal for me this week. I'm still catching up on stuff after being out yesterday but expect to be able to merge these changes and restart Gerrit sometime this week. Maybe friday if we are trying to cut down on impacts but possibly sooner 19:08:15 <clarkb> We also heard back from vexxhost on the gerrit server and it was a memory issue which should be mitigated now 19:08:21 <clarkb> (which makes updating gerrit and restarting things safer 19:08:29 <clarkb> Any other questions or concerns about Gerrit? 19:09:36 <tonyb> if we can schedule the restart while I'm around I'd like to be a second set of eyes 19:09:51 <tonyb> mostly to confirm what a normal start looks like 19:09:54 <clarkb> oh yes we should do that. So maybe thursday afternoon (for me)/friday morning for you 19:10:13 <tonyb> sounds good 19:10:15 <clarkb> tonyb: feel free to propose some time blocks. I'm generally pretty flexible late week 19:10:57 <clarkb> #topic Upgrading old servers 19:11:28 <clarkb> tonyb has the wiki change stack been updated for quay and/or noble? 19:11:47 <tonyb> noble yes quay no 19:12:15 <clarkb> ack thanks. I think updating the image build change to do that is the next step for this effort. 19:12:43 <tonyb> also the ansible changes are going to be restructured a little to move ansible-nextvto jammy!+3.11 19:13:49 <tonyb> I'll get them updated this week 19:13:54 <clarkb> thanks 19:14:33 <clarkb> any other server upgrade updates? (I don't think so but want to double check before we move on) 19:14:44 <tonyb> (sorry about the typos, speed and accuracy are low on my phone) 19:15:01 <tonyb> nothing more from me 19:15:07 <clarkb> #topic Matrix for OpenDev comms 19:15:23 <clarkb> tonyb offered to look into creating the new room last week. Not sure if that happened 19:15:36 <clarkb> that is step -2 of many to get this moving forward but it is an important step 19:15:51 <tonyb> nope. today! 19:16:01 <clarkb> thanks! 19:16:15 <clarkb> #topic Upgrade Zuul Zookeeper Cluster to 3.9 19:16:20 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/966612 19:16:27 <tonyb> I was thinking I might also make a tooling test room .... to target with tools ... for testing 19:16:36 <corvus> we have one 19:16:49 <tonyb> oh! never mind then 19:16:59 <clarkb> the zookeeper cluster is running 3.8 which is the stable release 19:17:10 <clarkb> 3.9 is the current release and has existed for enough time now to probably also be considered stable 19:17:42 <clarkb> the normal upgrade process is to upgrade each of the non leaders first then the leader which our ansible is not smart enough to do 19:17:45 <corvus> i think it's very likely that zuul is going to make 3.9 a requirement for zuul-launcher 19:18:03 <clarkb> corvus: what specific feature(s) make 3.9 useful for thel auncher? 19:18:08 <corvus> so getting ahead of that would be beneficial 19:18:38 <corvus> the watch event returns the zk transaction id starting with 3.9, so we can tell our current position in cache replays 19:19:05 <clarkb> got it 19:19:06 <corvus> https://review.opendev.org/966501 is the zuul change that takes advantage of it 19:19:25 <corvus> i've written a fallback change for zuul 19:19:31 <clarkb> as far as upgrading goes I have no objections to moving to 3.9. i think I have a slight preference for manually doing the upgrade to employ the correct expected process 19:19:46 <clarkb> note you have to check the status of each member after each restart beacuse sometimes the leader moves 19:19:56 <corvus> so this doesn't have to be in the critical path, we can upgrade whenever, but i'd like soon to increase our confidence 19:20:34 <clarkb> but I'm happy to help with the process which is something like put servers in emergency file, edit docker compose.yaml by hand and upgrade the first follower, repeat on the second follower after checking which node is leader, then finally do the last node 19:20:59 <clarkb> the release notes for 3.9 say no special steps are required to upgrade from 3.8 to 3.9 so it should be striaghtforward if we use the normal process 19:21:00 <corvus> i could do it this saturday morning (my time) 19:21:13 <corvus> yeah, i also went over the notes and didn't see anything 19:21:26 <corvus> also, a lot of our zuul tests have already been using 3.9 19:21:32 <clarkb> and when done we can merge that change and pull the nodes out of the emergency file 19:21:56 <clarkb> so I guess heads up, review the upgrade change but don't approve it and if you have any concerns please raise them 19:22:14 <corvus> that process sounds good to me, and it sounds like if no one objects we could do it saturday 19:22:39 <corvus> we should make sure to take a zuul zk backup before starting too, just in case 19:22:46 <clarkb> ++ 19:22:49 <corvus> (with zuul-client) 19:23:00 <clarkb> I just approved the test fix that the zk upgrade is a child of 19:23:28 <tonyb> sounds good to me 19:24:07 <clarkb> #topic Gitea 1.25.1 Upgrade 19:24:15 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/965960 Upgrade Gitea to 1.25.1 19:24:57 <clarkb> https://158.69.67.86/opendev/system-config is a held node you can interact with to check this upgrade 19:25:15 <clarkb> gerrit bug fix upgrades, gitea new release upgrade, and zookeeper upgrades all on tap this week 19:25:44 <clarkb> I'd appreciate reviews of the change itself to make sure I haven't done anything silly when updating templates, but also read over the release notes and make sure there aren't new features we need to enable/disable/configure 19:25:57 <clarkb> This release seemed to avoid big changes like that so I think it should be easy but let me know 19:26:27 <clarkb> mostly just trying to keep up so we don't fall behind 19:26:33 <clarkb> #topic Gitea Performance 19:26:49 <clarkb> Then related to that I spot checked giteas today and they all look busy but not to the point where they are slow 19:27:21 <clarkb> both the memcached memory increase and the "force everything through the load balancer" changes merged 19:27:39 <clarkb> probably a bit early to claim improvement, but not having evidence of problems is something 19:27:54 <clarkb> fungi: related I noticed this morning when prepping for the meeting that the lists server seems sad again 19:28:33 <clarkb> I think mariadb is busy so we may have something crawling apis again and maybe we need to double check iops look reasonable still 19:28:43 <fungi> mmm 19:28:47 <clarkb> but wanted to call that out if we're discussing general performance issues related to crawlers 19:29:15 <fungi> load average is hovering around 10 at the moment, yeah 19:29:48 <clarkb> I suspect its the same story just hitting us in new and exciting ways as we continue to improve bottlenecks 19:29:55 <clarkb> every fixed bottleneck is an opportunity to find a new one 19:30:57 <clarkb> Please say something if you notice problems in gitea (or any other service). 19:31:01 <clarkb> #topic Raxflex DFW3 Disabled 19:31:08 <clarkb> I don't think this server has been fixed or replaced yet 19:31:24 <clarkb> last week we basically said if after a week it wasn't fixed we'd boot a new one 19:31:32 <clarkb> I think we can probably proceed with that plan now if anyone has time 19:31:58 <clarkb> (my focus is probably on gerrit and gitea and whatever lists needs to be performant, but I'm happy to help if you point me to specific actions that are needed) 19:33:24 <clarkb> #topic Open Discussion 19:33:40 <tonyb> I'll try but if someone else has cycles don't let me stop you 19:34:01 <clarkb> That was all I had on the agenda. I cut out afs stuff since trixie is mirrored now. I cut out launcher things because the major bug there was fixed. We also got vexxhost to address the gerrit vm issues. We upgraded etherpad too 19:34:17 <clarkb> all that to say we got a lot done last week and I was able to trim the agenda as a result. Thank you everyone for making that happen 19:35:58 <tonyb> yeah well done! 19:36:47 <fungi> great work everyone! 19:37:18 <clarkb> maybe we can upgrade gitea tomorrow and plan for gerrit thursday. tonyb we can sync up outside of the meeting on timing for gerrit 19:37:27 <clarkb> and with that I think we can probably end early if there is nothing else 19:37:36 <clarkb> I have some zuul launcher bug fix code reviews I need to do then lunch 19:38:20 <clarkb> thanks everyone. We'll be back here at the same time and location next week 19:38:27 <clarkb> #endmeeting