Tuesday, 2025-09-16

*** diablo_rojo_phone is now known as Guest2664313:12
*** clarkb is now known as Guest2667313:57
*** Guest26673 is now known as clarkb14:46
*** Guest26643 is now known as diablo_rojo_phone17:52
clarkbAlmost meeting time18:58
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Sep 16 19:00:14 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/3XPEGJQNZUENYS54A2BRGINSG2EU7X6I/ Our Agenda19:00
clarkb#topic Announcements19:00
clarkbWe're nearing the openstack release. I think starlingx may be working on a release too. Just keep that in mind as we're making changes19:00
clarkbAlso I will be out on the 25th (that is Thursday next week)19:01
clarkbthat was all I wanted to mention here. Did anyone else have anything to announce?19:01
fungii'm taking off most/all the day this thursday19:01
fungi(but am around all next week)19:02
clarkbthanks for the heads up19:02
clarkb#topic Moving OpenDev's python-base/python-builder/uwsgi-base Images to Quay19:03
clarkb#link https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/HO6Z66QIMDIDY7CCVAREDOPSYZYNKIT3/19:03
clarkbThe images were moved and the move was announced in the link above19:03
clarkbsince then my changes to update opendev's consumption of those images have all merged19:04
clarkbI believe that means this task is basically complete. I even dropped mirroring of the images on docker hub to quay since quay is the canonical location now19:04
clarkbThe one thing I haven't pushed a change for and will do so immediately after the meeting is flipping our default container command for the generic zuul-jobs container roles back to docker from podman. The changes that have landed all selectively opted into that already so this is a noop that will simply ensure new images do the right thing19:05
clarkbthank you everyone for all the help with this. It has been a big effort over a couple of years at this point to be able to do this19:05
clarkbI should note that some images that are not part of the python base image inheritance path are still published to docker hub (gitea, mailman, etc)19:06
clarkbbut the servers they run on are not on noble yet which means not on podman yet so they aren't able to run speculative image builds at runtime19:06
clarkbbut we've got a good chunk moved and I think we're proving out that this generally works which is nice19:06
clarkbany questions/concerns/comments around this change? its a fairly big one but also largely behind the scenes as long as everything is working properly19:07
clarkboh as a note we also added trixie base images for python3.12 and 3.1119:08
clarkbso eventually we'll want to start migrating things onto that platform and look at adding 3.13 images too19:08
clarkbgerrit will be a bit trickier as there are jvm implications with that. Which we can talk about now19:08
clarkb#topic Gerrit 3.11 Upgrade Planning19:08
clarkbOne of the things I was waiting on with Gerrit was getting the container images updated to 3.10.8 and 3.11.5 (the latest bugfix releases). We did that yesterday and also updated the base image location to quay19:09
clarkbI then set up two new node holds on jobs using those newer images19:09
clarkb#link https://zuul.opendev.org/t/openstack/build/54f6629a3041466ca2b1cc6bf17886c4 3.10.8 held node19:09
clarkb#link https://zuul.opendev.org/t/openstack/build/c9051c435bf7414b986c37256f71538e 3.11.5 held node19:10
clarkbdue to the proximity of the openstack release I don't think we're upgraded in the next couple of weeks however I think we can make a good push to get everything ready to upgrade next month sometime (after the summit maybe that tends to be a quieter time for us)19:10
clarkbthen once we are upgraded to 3.11.5 we can switch to trixie as I believe 3.11.5 is the first java 21 compatible release19:11
clarkbThe other Gerrit thing we learned yseterday was that when our h2 cache files get very large gerrit shutdown is very slow and hits the podman/docker shutdown timeout19:11
clarkbthat timeout is currently set to 300 seconds19:11
clarkbI think that we should consider more regular gerrit restarts to try and avoid this problem. We could also "embrace" it and set the timeout to a much shorter value say 60 seconds19:12
clarkband basically expect that we're going to forcefully stop gerrit while it is trying to prune its h2 db caches that we will delete as soon as its shutdown19:12
clarkbI'm interested in feedback on ^ but I don't think either item is urgent right now19:12
clarkb#topic Upgrading old servers19:14
clarkbfungi has continued to make great progress with the openafs and kerberos clusters. Everything is upgraded to noble except for afs02.dfw.openstack.org. Waiting on one more mirror volume to move its RW copy to afs01.dfw before proceeding with afs0219:14
fungiyeah, we're a little over 30 hours into the mirror.ubuntu rw volume move from afs02.dfw to afs01.dfw, but once that completes i'll be able to upgrade afs02.dfw from jammy to noble19:14
clarkbat least one afs server and one kdc have been removed from the emergency file list too so ansible seems happy with the results as well19:15
clarkbfungi: do you think we should remove all of the nodes from the emergency file except for afs02 at this point or should we awit for afs02 to be done and do the whole lot at once?19:15
fungidoesn't matter, i can do it now sure19:15
clarkbmostly I don't want us to forget19:16
fungidone now, only afs02.dfw is still disabled at this point19:16
clarkbonce these servers are done the next on the list are graphite and the backup servers. Then we can start looking at getting an early jump on jammy server upgrades19:16
clarkb(as well as continued effort to uplift the truly ancient servers)19:17
clarkbbut ya this is great progress thank you!19:17
clarkbAny other questions/concerns/comments around server upgrades?19:17
clarkb#topic AFS mirror content cleanup19:18
clarkbfungi discovered we're carrying some openafs mirror content that we don't need to any longer19:18
fungithis is next on my plate after the volume moves/upgrades19:18
clarkbspecifically openeuler (we don't have openeuler test nodes) and debian stretch content19:19
fungi#link https://review.opendev.org/959892 Stop mirroring OpenEuler packages19:19
clarkbthen hopefully not too far into the future we'll be dropping ubuntu bionic test nodes too and we can clear its content out as well19:19
fungiwe stopped mirroring debian stretch long ago19:19
fungijust didn't delete the files19:19
clarkband with that all done we'll be able to answer questions about whether or not we can mirror trixie or centos 10 stream or whatever19:20
fungioh, also we can drop bullseye backports19:20
fungisince it's vanished upstream19:20
clarkb++19:20
fungiand as of about a week ago nothing should be accidentally using those files anyway19:21
fungiso they can safely disappear now19:21
clarkbif you identify any other stale content that should be removed say something19:21
clarkbthough I think after these items its mostly old docker and ceph packages which are relatively tiny and don't have similar impact19:21
fungii expect there's plenty of puppet and ceph package mirroring that can be cleaned up19:22
fungier, docker, yeah19:22
clarkbpotentially puppet too. I think either the deb or rpm puppet mirror is quite large too but also seems to still be used19:23
clarkbits possible the changes to puppet binary releases will affect that though19:23
clarkb(puppet upstream is only releases source code for future work aiui and will no longer supply packages or binaries)19:23
fungiwell, we may be mirroring a lot of old packages that nobody's using too19:23
fungifor puppey19:23
clarkbtrue19:23
clarkbcould be worth a quick check given its relative size. Start there rather than docker or ceph19:24
clarkb#topic Matrix for OpenDev comms19:25
clarkbThe spec (954826) has merged.19:25
clarkbI'm thinking this is a good task to get going while openstack release stuff is happening as its impact should be nonexistant to that process19:26
clarkbso once I dig out of my current backlog a bit I'll try to start on this19:26
clarkbI don't think there is anything else really to do here other than start on the process as described in the spec. I'll let everyone know if I hit any issues19:27
clarkb#topic Pre PTG Planning19:27
clarkb#link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document19:27
clarkbTimes: Tuesday October 7 1800-2000 UTC, Wednesday October 8 1500-1700 UTC, Thursday October 9 1500-170019:27
clarkbThis is 3 weeks away. Add your ideas to the etherpad if you've got them19:27
clarkb#topic Etherpad 2.5.0 Upgrade19:28
clarkb#link https://github.com/ether/etherpad-lite/blob/v2.5.0/CHANGELOG.md19:28
clarkb#link https://review.opendev.org/c/opendev/system-config/+/956593/19:28
clarkbas mentioned previously I think the root page css is still a bit odd, but I'm hoping others will have a chance to check it and indicate whether or not they feel this is a blocker for us19:28
clarkb104.130.127.119 is a held node for testing. You need to edit /etc/hosts to point etherpad.opendev.org at that IP.19:29
clarkbI set up the clarkb-test etherpad there if you want to see some existing edits19:29
fungiwhat was the link to the upstream issue about the css regression?19:29
fungii guess there's been no further movement there19:29
clarkbfungi: tehy "fixed" it while you were on vacation19:30
clarkband the problem went away for the etherpad pages19:30
clarkbhttps://github.com/ether/etherpad-lite/issues/706519:30
clarkbbut the root page still looks odd (not as bad as with the 2.4.2 release though)19:30
clarkbso its mostly a question of is this good enough or should I reopen the issue / file a new issue19:30
fungioh nice19:31
fungithat's probably a new issue about an incomplete fix19:31
fungiand reference the prior issue in it so github adds a mention19:32
clarkbI also wondered if we could just edit some css to fix it19:32
clarkbbut I haven't looked beyond the visual check to say oh huh its still a bit weird19:32
fungiworth a try for sure, might be an excuse for us to custom "theme" that page i guess19:32
clarkbso ya feedback on whether we think it is an issue would be appreciated19:32
clarkb#topic Lists Server Slowness19:33
clarkb"good" news! I think I managed track down the source of the iowait on this server19:33
clarkbthe tl;dr is that the server is using a "standard" flavor not a "performance" flavor and the flavors have disk_io_index properties. The standard flavor is set to 2 and performance is set to 4019:34
clarkbthat is a 20x difference something that is experimentally confirmed using fio's randread test. I get about 1k iops on standard there and 20k iops on performance19:35
clarkbconsidering that iowait is very high during busy periods/when mailman is slow I think the solution here is to move mailman onto a device with better iops performance19:35
clarkbearlier today fungi attached and ssd volume to the instance and started copying data in preparation of a such a move19:35
fungiat this point i've primed a copy of all the mailman state (archives, database, et cetera) onto an ssd-backed cinder volume. first copy took 53m30s to complete, i'm doing a second one now to see how much faster that goes19:36
clarkbfungi: it just occurred to me that you can run the fio tests on that ssd volume when the rsync is done just to confirm iops are better19:36
fungigood idea19:36
clarkbfungi: keep in mind that fio will create files to read against and not delete them after so you may need to do manual cleanup19:36
clarkbbut I think that is a good sanity check before we commit to this solution19:36
fungias for the cut-over, i've outlined a rough maintenance plan to minimize service downtime19:36
fungi#link https://etherpad.opendev.org/p/2025-09-mailman-volume-maintenance19:37
clarkbthe fio commands should be in my user's history on that server. It creates the files in the current working dir. Note there is a read test and a randread test. You should probably run both19:37
clarkbor I can run them if you like just let me know if that is easier19:37
fungii can give it a shot after i get back from dinner19:38
clarkbcool I'll take a look at that etherpad plan shortly too19:39
clarkbanything else on this topic?19:39
funginot from me19:39
clarkb#topic Open Discussion19:40
clarkbI don't know who will be attending the summit next month, but if you are going to be there it looks like Friday evening is the opportunity for an opendev+zuul type of get together19:40
fungilet's get opendev and zuul together19:40
clarkbI don't plan on doing anything formal, but we'll probably try to aim at a common location for dinner that night if anyone else is interested19:40
fungiyou got your opendev in my zuul! you got your zuul in my opendev!19:41
corvusoh so we're going to cross the streams?19:41
fungithat would be... not so bad?19:41
clarkbalso I may have found some stickers....19:41
clarkbI just have to remember to pack them19:42
corvusyay!19:42
clarkbLast call. Anything else that we haven't covered that should be discussed? Also we can always discuss things on the mailing list or the regular irc channel19:43
clarkbsounds like now19:45
clarkb*no19:45
fungithanks clarkb!19:45
clarkbthank you everyone! we'll be back here same time and location next week. Until then thank you for your helpf working on opendev19:45
clarkband see you there19:45
clarkb#endmeeting19:45
opendevmeetMeeting ended Tue Sep 16 19:45:53 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:45
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2025/infra.2025-09-16-19.00.html19:45
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-09-16-19.00.txt19:45
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2025/infra.2025-09-16-19.00.log.html19:45

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!