19:00:14 <clarkb> #startmeeting infra
19:00:14 <opendevmeet> Meeting started Tue Sep 16 19:00:14 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:14 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:14 <opendevmeet> The meeting name has been set to 'infra'
19:00:20 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/3XPEGJQNZUENYS54A2BRGINSG2EU7X6I/ Our Agenda
19:00:23 <clarkb> #topic Announcements
19:00:43 <clarkb> We're nearing the openstack release. I think starlingx may be working on a release too. Just keep that in mind as we're making changes
19:01:03 <clarkb> Also I will be out on the 25th (that is Thursday next week)
19:01:19 <clarkb> that was all I wanted to mention here. Did anyone else have anything to announce?
19:01:41 <fungi> i'm taking off most/all the day this thursday
19:02:15 <fungi> (but am around all next week)
19:02:45 <clarkb> thanks for the heads up
19:03:25 <clarkb> #topic Moving OpenDev's python-base/python-builder/uwsgi-base Images to Quay
19:03:36 <clarkb> #link https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/HO6Z66QIMDIDY7CCVAREDOPSYZYNKIT3/
19:03:45 <clarkb> The images were moved and the move was announced in the link above
19:04:18 <clarkb> since then my changes to update opendev's consumption of those images have all merged
19:04:36 <clarkb> I believe that means this task is basically complete. I even dropped mirroring of the images on docker hub to quay since quay is the canonical location now
19:05:19 <clarkb> The one thing I haven't pushed a change for and will do so immediately after the meeting is flipping our default container command for the generic zuul-jobs container roles back to docker from podman. The changes that have landed all selectively opted into that already so this is a noop that will simply ensure new images do the right thing
19:05:43 <clarkb> thank you everyone for all the help with this. It has been a big effort over a couple of years at this point to be able to do this
19:06:04 <clarkb> I should note that some images that are not part of the python base image inheritance path are still published to docker hub (gitea, mailman, etc)
19:06:25 <clarkb> but the servers they run on are not on noble yet which means not on podman yet so they aren't able to run speculative image builds at runtime
19:06:49 <clarkb> but we've got a good chunk moved and I think we're proving out that this generally works which is nice
19:07:08 <clarkb> any questions/concerns/comments around this change? its a fairly big one but also largely behind the scenes as long as everything is working properly
19:08:19 <clarkb> oh as a note we also added trixie base images for python3.12 and 3.11
19:08:33 <clarkb> so eventually we'll want to start migrating things onto that platform and look at adding 3.13 images too
19:08:51 <clarkb> gerrit will be a bit trickier as there are jvm implications with that. Which we can talk about now
19:08:56 <clarkb> #topic Gerrit 3.11 Upgrade Planning
19:09:28 <clarkb> One of the things I was waiting on with Gerrit was getting the container images updated to 3.10.8 and 3.11.5 (the latest bugfix releases). We did that yesterday and also updated the base image location to quay
19:09:45 <clarkb> I then set up two new node holds on jobs using those newer images
19:09:53 <clarkb> #link https://zuul.opendev.org/t/openstack/build/54f6629a3041466ca2b1cc6bf17886c4 3.10.8 held node
19:10:02 <clarkb> #link https://zuul.opendev.org/t/openstack/build/c9051c435bf7414b986c37256f71538e 3.11.5 held node
19:10:40 <clarkb> due to the proximity of the openstack release I don't think we're upgraded in the next couple of weeks however I think we can make a good push to get everything ready to upgrade next month sometime (after the summit maybe that tends to be a quieter time for us)
19:11:08 <clarkb> then once we are upgraded to 3.11.5 we can switch to trixie as I believe 3.11.5 is the first java 21 compatible release
19:11:35 <clarkb> The other Gerrit thing we learned yseterday was that when our h2 cache files get very large gerrit shutdown is very slow and hits the podman/docker shutdown timeout
19:11:42 <clarkb> that timeout is currently set to 300 seconds
19:12:15 <clarkb> I think that we should consider more regular gerrit restarts to try and avoid this problem. We could also "embrace" it and set the timeout to a much shorter value say 60 seconds
19:12:42 <clarkb> and basically expect that we're going to forcefully stop gerrit while it is trying to prune its h2 db caches that we will delete as soon as its shutdown
19:12:58 <clarkb> I'm interested in feedback on ^ but I don't think either item is urgent right now
19:14:09 <clarkb> #topic Upgrading old servers
19:14:45 <clarkb> fungi has continued to make great progress with the openafs and kerberos clusters. Everything is upgraded to noble except for afs02.dfw.openstack.org. Waiting on one more mirror volume to move its RW copy to afs01.dfw before proceeding with afs02
19:14:56 <fungi> yeah, we're a little over 30 hours into the mirror.ubuntu rw volume move from afs02.dfw to afs01.dfw, but once that completes i'll be able to upgrade afs02.dfw from jammy to noble
19:15:08 <clarkb> at least one afs server and one kdc have been removed from the emergency file list too so ansible seems happy with the results as well
19:15:36 <clarkb> fungi: do you think we should remove all of the nodes from the emergency file except for afs02 at this point or should we awit for afs02 to be done and do the whole lot at once?
19:15:56 <fungi> doesn't matter, i can do it now sure
19:16:08 <clarkb> mostly I don't want us to forget
19:16:24 <fungi> done now, only afs02.dfw is still disabled at this point
19:16:57 <clarkb> once these servers are done the next on the list are graphite and the backup servers. Then we can start looking at getting an early jump on jammy server upgrades
19:17:12 <clarkb> (as well as continued effort to uplift the truly ancient servers)
19:17:19 <clarkb> but ya this is great progress thank you!
19:17:27 <clarkb> Any other questions/concerns/comments around server upgrades?
19:18:41 <clarkb> #topic AFS mirror content cleanup
19:18:53 <clarkb> fungi discovered we're carrying some openafs mirror content that we don't need to any longer
19:18:57 <fungi> this is next on my plate after the volume moves/upgrades
19:19:15 <clarkb> specifically openeuler (we don't have openeuler test nodes) and debian stretch content
19:19:30 <fungi> #link https://review.opendev.org/959892 Stop mirroring OpenEuler packages
19:19:38 <clarkb> then hopefully not too far into the future we'll be dropping ubuntu bionic test nodes too and we can clear its content out as well
19:19:42 <fungi> we stopped mirroring debian stretch long ago
19:19:47 <fungi> just didn't delete the files
19:20:10 <clarkb> and with that all done we'll be able to answer questions about whether or not we can mirror trixie or centos 10 stream or whatever
19:20:34 <fungi> oh, also we can drop bullseye backports
19:20:43 <fungi> since it's vanished upstream
19:20:45 <clarkb> ++
19:21:03 <fungi> and as of about a week ago nothing should be accidentally using those files anyway
19:21:20 <fungi> so they can safely disappear now
19:21:34 <clarkb> if you identify any other stale content that should be removed say something
19:21:55 <clarkb> though I think after these items its mostly old docker and ceph packages which are relatively tiny and don't have similar impact
19:22:20 <fungi> i expect there's plenty of puppet and ceph package mirroring that can be cleaned up
19:22:29 <fungi> er, docker, yeah
19:23:09 <clarkb> potentially puppet too. I think either the deb or rpm puppet mirror is quite large too but also seems to still be used
19:23:21 <clarkb> its possible the changes to puppet binary releases will affect that though
19:23:34 <clarkb> (puppet upstream is only releases source code for future work aiui and will no longer supply packages or binaries)
19:23:41 <fungi> well, we may be mirroring a lot of old packages that nobody's using too
19:23:49 <fungi> for puppey
19:23:50 <clarkb> true
19:24:07 <clarkb> could be worth a quick check given its relative size. Start there rather than docker or ceph
19:25:27 <clarkb> #topic Matrix for OpenDev comms
19:25:33 <clarkb> The spec (954826) has merged.
19:26:07 <clarkb> I'm thinking this is a good task to get going while openstack release stuff is happening as its impact should be nonexistant to that process
19:26:17 <clarkb> so once I dig out of my current backlog a bit I'll try to start on this
19:27:06 <clarkb> I don't think there is anything else really to do here other than start on the process as described in the spec. I'll let everyone know if I hit any issues
19:27:11 <clarkb> #topic Pre PTG Planning
19:27:22 <clarkb> #link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document
19:27:28 <clarkb> Times: Tuesday October 7 1800-2000 UTC, Wednesday October 8 1500-1700 UTC, Thursday October 9 1500-1700
19:27:52 <clarkb> This is 3 weeks away. Add your ideas to the etherpad if you've got them
19:28:27 <clarkb> #topic Etherpad 2.5.0 Upgrade
19:28:32 <clarkb> #link https://github.com/ether/etherpad-lite/blob/v2.5.0/CHANGELOG.md
19:28:37 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/956593/
19:28:59 <clarkb> as mentioned previously I think the root page css is still a bit odd, but I'm hoping others will have a chance to check it and indicate whether or not they feel this is a blocker for us
19:29:05 <clarkb> 104.130.127.119 is a held node for testing. You need to edit /etc/hosts to point etherpad.opendev.org at that IP.
19:29:15 <clarkb> I set up the clarkb-test etherpad there if you want to see some existing edits
19:29:29 <fungi> what was the link to the upstream issue about the css regression?
19:29:38 <fungi> i guess there's been no further movement there
19:30:07 <clarkb> fungi: tehy "fixed" it while you were on vacation
19:30:12 <clarkb> and the problem went away for the etherpad pages
19:30:27 <clarkb> https://github.com/ether/etherpad-lite/issues/7065
19:30:37 <clarkb> but the root page still looks odd (not as bad as with the 2.4.2 release though)
19:30:53 <clarkb> so its mostly a question of is this good enough or should I reopen the issue / file a new issue
19:31:00 <fungi> oh nice
19:31:24 <fungi> that's probably a new issue about an incomplete fix
19:32:08 <fungi> and reference the prior issue in it so github adds a mention
19:32:11 <clarkb> I also wondered if we could just edit some css to fix it
19:32:25 <clarkb> but I haven't looked beyond the visual check to say oh huh its still a bit weird
19:32:39 <fungi> worth a try for sure, might be an excuse for us to custom "theme" that page i guess
19:32:40 <clarkb> so ya feedback on whether we think it is an issue would be appreciated
19:33:38 <clarkb> #topic Lists Server Slowness
19:33:55 <clarkb> "good" news! I think I managed track down the source of the iowait on this server
19:34:30 <clarkb> the tl;dr is that the server is using a "standard" flavor not a "performance" flavor and the flavors have disk_io_index properties. The standard flavor is set to 2 and performance is set to 40
19:35:00 <clarkb> that is a 20x difference something that is experimentally confirmed using fio's randread test. I get about 1k iops on standard there and 20k iops on performance
19:35:31 <clarkb> considering that iowait is very high during busy periods/when mailman is slow I think the solution here is to move mailman onto a device with better iops performance
19:35:50 <clarkb> earlier today fungi attached and ssd volume to the instance and started copying data in preparation of a such a move
19:36:02 <fungi> at this point i've primed a copy of all the mailman state (archives, database, et cetera) onto an ssd-backed cinder volume. first copy took 53m30s to complete, i'm doing a second one now to see how much faster that goes
19:36:11 <clarkb> fungi: it just occurred to me that you can run the fio tests on that ssd volume when the rsync is done just to confirm iops are better
19:36:19 <fungi> good idea
19:36:28 <clarkb> fungi: keep in mind that fio will create files to read against and not delete them after so you may need to do manual cleanup
19:36:37 <clarkb> but I think that is a good sanity check before we commit to this solution
19:36:58 <fungi> as for the cut-over, i've outlined a rough maintenance plan to minimize service downtime
19:37:06 <fungi> #link https://etherpad.opendev.org/p/2025-09-mailman-volume-maintenance
19:37:35 <clarkb> the fio commands should be in my user's history on that server. It creates the files in the current working dir. Note there is a read test and a randread test. You should probably run both
19:37:42 <clarkb> or I can run them if you like just let me know if that is easier
19:38:10 <fungi> i can give it a shot after i get back from dinner
19:39:01 <clarkb> cool I'll take a look at that etherpad plan shortly too
19:39:21 <clarkb> anything else on this topic?
19:39:27 <fungi> not from me
19:40:00 <clarkb> #topic Open Discussion
19:40:28 <clarkb> I don't know who will be attending the summit next month, but if you are going to be there it looks like Friday evening is the opportunity for an opendev+zuul type of get together
19:40:54 <fungi> let's get opendev and zuul together
19:40:54 <clarkb> I don't plan on doing anything formal, but we'll probably try to aim at a common location for dinner that night if anyone else is interested
19:41:15 <fungi> you got your opendev in my zuul! you got your zuul in my opendev!
19:41:38 <corvus> oh so we're going to cross the streams?
19:41:48 <fungi> that would be... not so bad?
19:41:57 <clarkb> also I may have found some stickers....
19:42:06 <clarkb> I just have to remember to pack them
19:42:26 <corvus> yay!
19:43:45 <clarkb> Last call. Anything else that we haven't covered that should be discussed? Also we can always discuss things on the mailing list or the regular irc channel
19:45:26 <clarkb> sounds like now
19:45:28 <clarkb> *no
19:45:40 <fungi> thanks clarkb!
19:45:43 <clarkb> thank you everyone! we'll be back here same time and location next week. Until then thank you for your helpf working on opendev
19:45:51 <clarkb> and see you there
19:45:53 <clarkb> #endmeeting