Friday, 2025-09-12

fungithe project.tarballs move has been in progress since about the past 30 minutes13:03
profcoreyhi @fungi recently I did a quick review of the other 'x' namespace projects and seems not much activity is occurring with other x projects. If there's any talk about shutting down the 'x' namespace, please let me know ahead of time. My plan is to propose to the TC to accept cascade as an official openstack project at the next NA summit. If accepted, I'll need help moving cascde from x to the openstack namespace, thank you.14:45
fungiprofcorey: yes, we have a process for moving repositories between namespaces, that's fine14:50
profcorey@fungi, fantastic thanks!14:52
opendevreviewJan Gutter proposed zuul/zuul-jobs master: Make buildx builder image configurable  https://review.opendev.org/c/zuul/zuul-jobs/+/96084014:54
clarkbI'm rerunning fio sanity checks on lists and mirrors now15:03
clarkbthen I'm going to start looking at what container update chagnes I can land after the big container location move yesterday. fungi you also mentioned today is probably a good day to run the fio tests with services on lists off. Is there a time you want to do that or think would be ideal for doing that?15:03
clarkbmirror randread `read: IOPS=20.0k, BW=78.2MiB/s (82.0MB/s)(2346MiB/30004msec)` read `read: IOPS=137k, BW=534MiB/s (560MB/s)(15.7GiB/30005msec)`15:04
opendevreviewMerged openstack/project-config master: Make update_constraints.sh work with pyproject.toml  https://review.opendev.org/c/openstack/project-config/+/96062715:05
clarkblists randread `read: IOPS=814, BW=3260KiB/s (3338kB/s)(95.8MiB/30085msec)` read `read: IOPS=15.2k, BW=59.3MiB/s (62.2MB/s)(1782MiB/30060msec)`15:08
clarkbthings appear to remain pretty consistent over time (which I guess is good?)15:08
clarkbfungi: the thing that occurred to me overnight is with the bit root disk on lists we must be using a different flavor type and I wonder if that flavor type comes with different disk performance out of the box?15:08
funginot sure15:10
clarkbI'll look into that momentarily15:10
clarkbmeanwhile https://review.opendev.org/c/opendev/system-config/+/960676 is an easy review for post container image move cleanup I think15:10
clarkbI'm going to approve gerritbot, statusbot, grafyaml and lodgeit base image updates now15:12
clarkbI'm starting with gerritbot because the merges of the other changes should be a good test of gerritbot15:12
opendevreviewJan Gutter proposed zuul/zuul-jobs master: Make buildx builder image configurable  https://review.opendev.org/c/zuul/zuul-jobs/+/96084015:19
opendevreviewMerged opendev/gerritbot master: Pull python base images from quay.io  https://review.opendev.org/c/opendev/gerritbot/+/95860015:20
clarkbI don't think ^ will trigger automatic deployments (we'd have to wait for daily runs) so I'll go and manually update gerritbot shortly15:22
fungik15:22
clarkbfungi: ok the lists server is a '8GB Standard Instance'. The mirror server is a '8 GB Performance'. Running flavor show on each of those gives properties listings that include 'disk_io_index' for the general flavor it is 2 and for the performance flavor it is 4015:23
fungi20x difference?15:23
fungiwow15:23
clarkbI'm beginning to suspect that this is the underlying issue, but also seems to be by design15:23
fungiso moving onto an ssd volume makes more sense in that case15:24
clarkbfungi: ya we get around 1k iops on randreads from lists and 20k randreads from mirror15:24
clarkbso that 20x values seems to align with the fio results15:24
fungiagreed15:24
clarkbI no longer think we need to stop lists' services and get a baseline that way15:24
corvusoh wow we haven't used "standard" for ages, i guess that's our legacy from carrying over the ip address.15:24
funginor i15:24
fungii don't recall why we used that flavor now15:25
clarkband yes I think moving the mailman 3 installation into an ssd volume is a good next step. If we do that for the entire /var/lib/mailman space and whereever mariadb is backign its data I suspect the lower iops elsewhere won't be a big deal with apache logs etc15:25
clarkbI think it was to mimic the old server more than anything else15:25
fungiyeah, that server was only created 3 years ago15:26
clarkbI just restarted gerritbot15:29
clarkbI've approved the grafyaml update now which should give us a notification in a bit once it merges15:31
clarkbto confirm gerritbot is happy15:31
clarkbI tailed the logs and didn't see anything amiss so it should be15:31
opendevreviewMerged openstack/project-config master: Fix update_constraints.sh for 'python_version>=' cases  https://review.opendev.org/c/openstack/project-config/+/96062815:36
clarkbheh a different change beat me to the test of gerritbot15:36
clarkbI went ahead and approved the lodgeit and statusbot changes too15:37
opendevreviewMerged opendev/statusbot master: Pull python base images from quay.io  https://review.opendev.org/c/opendev/statusbot/+/95860315:45
opendevreviewMerged opendev/lodgeit master: Pull base images from opendevorg rather than opendevmirror  https://review.opendev.org/c/opendev/lodgeit/+/95860215:48
clarkbthat is opendevstatus updated15:49
clarkbanyone else getting `kex_exchange_identification: read: Connection reset by peer` when attempting to ssh to paste.opendev.org? I've tried three times now and 2 failed with that and oen succeeded15:51
clarkbfourth attempt just succeeded too15:51
clarkband fifth. This is weird15:51
opendevreviewMerged opendev/grafyaml master: Pull python base images from quay.io  https://review.opendev.org/c/opendev/grafyaml/+/95860115:53
clarkbok I can't reproduce anymore so wondering if this was a blip in the Internets rather than a service/server problem15:53
clarkbI'm going to restart lodgeit on paste on the new image just as soon as I find a test paste I can check post restart15:59
clarkbhttps://paste.opendev.org/show/bWxZSt3nyaG8bxa3yW54/ is going to be my check15:59
clarkbthat paste loads again16:01
clarkbI don't know if anyone cares but in checking the logs we're definitely getting crawled on that service too16:01
fungii would be surprised to hear we weren't16:01
clarkbhttps://paste.opendev.org/show/b3GTlggxFOmP8t6cIfpB/ this is a new paste created after the update16:02
clarkbI am able to load old pastes and create new ones so I think all is well16:02
fungilgtm16:02
clarkbthat leaves https://review.opendev.org/c/opendev/system-config/+/960676 and the gerrit and ircbot changes as cleanup after the move16:03
clarkbthat first one should be a quick and easy review if anyone wants to get that done.16:03
clarkbthen for ircbot and gerrit I'm happy to defer those for now as I need to look into bwrap issues with zuul on trixie.16:04
opendevreviewJan Gutter proposed zuul/zuul-jobs master: Make buildx builder image configurable  https://review.opendev.org/c/zuul/zuul-jobs/+/96084016:05
clarkbfungi: https://zuul.opendev.org/t/openstack/build/2487e9193fc546928652e74b3ed58190 the daily afs job seems to haev been successful16:13
clarkbthe cloud launcher and backup setup jobs failed though16:13
opendevreviewMerged opendev/system-config master: Stop mirroring python base images to quay.io/opendevmirror  https://review.opendev.org/c/opendev/system-config/+/96067616:15
clarkblooking at the log for infra-prod-service-borg-backup on bridge it failed due to an undefined variable for borg-user in the loop where it creates users on the backup server. It only did this for one backup server and not the other. Also this job was successful up until yesterday. So I think this may be an ansible bug/blip and not anything systemic16:15
clarkbif both backup servers failed I would worry more16:15
clarkbLet's keep an eye on it16:15
clarkbya we dynamically generate that username based on the first part of the fqdn (up to the first .)16:16
clarkbso shouldn't need to explicitly set that under normal circumstances16:16
fungimakes sense16:18
fungineed to run a quick errand, will brb18:39
clarkbinfra-root the zuul on trixie change depends on https://review.opendev.org/c/opendev/system-config/+/960681 to mirror a couple of container images to quay.io/opendevmirror19:23
clarkband last call on https://review.opendev.org/c/opendev/infra-specs/+/954826 if anyone has feedback for that. Otherwise I plan to approve it today19:25
fungiokay, back, looks like i didn't miss much19:38
clarkbI ate lunch and it has been a quiet friday19:46
clarkbI'm just slowly getting through my todo list19:46
opendevreviewMerged opendev/system-config master: Mirror golang and node trixie container images  https://review.opendev.org/c/opendev/system-config/+/96068119:48
clarkbfungi: for afsdb03 did you end up needing to edit the netplan stuff or did you switch to /e/n/i or maybe it just worked?19:48
fungii edited the netplan configs to add enX interfaces in addition to the eth interfaces19:49
fungiwhen everything's done, i'll go back and delete the old eth interface sections19:49
clarkbgot it.19:49
fungiso basically same solution as the servers that use ifupdown, just different config files19:50
clarkbI'm thinking Monday we can upgrade gitea19:51
clarkbthough I should check the screenshots on that change. I haven't even looked yet19:51
fungisounds good19:51
clarkbya screenshots lgtm19:52
clarkbhttps://review.opendev.org/c/opendev/system-config/+/960675 is the change19:52
fungihttps://github.com/advisories/GHSA-jc7w-c686-c4v9 seems to be the background on the security item in that changelog19:59
clarkbdo you think that is urgent enough to go ahead with today (I think its probably fine to wait?)19:59
clarkbmostly I didn't want to rush and ask for reviews on friday20:00
fungii don't personally see how it could even be triggered by a malicious gitea user, much less on our deployment with so much stuff already disabled20:05
fungiso, no, doesn't look urgent to me20:05
fungiodds are gitea doesn't even use the affected methods in that library20:06
clarkbwe need yaml2ical to give us a graph of when meetings happen and don't happen so that its easier to decide when to update ircbot20:17
clarkbI don't want to import the entire ical thing into my calendar to get that info. But that is currently the easiest way to do it20:17
clarkbit does look like after about 1500 UTC on Fridays is clear though if we want to land https://review.opendev.org/c/opendev/system-config/+/95859620:18
clarkbfungi: any opinion on ^ ?20:19
fungidoesn't even need to be a graph/chart, we could make a quick algorithm that identifies all the schedule gaps from the meetings data, and return them sorted by largest to smallest20:19
fungibut yeah this time on fridays has typically been safe, i don't see anything scheduled20:20
clarkblooks like you approved it. thanks. I'm around this afternoon so can monitor the deployment (this one should actually trigger deployments automatically iirc)20:20
fungii'll be around for a while still too20:21
opendevreviewMerged opendev/system-config master: Build ircbots with base python image from quay.io  https://review.opendev.org/c/opendev/system-config/+/95859620:50
fungilooks like meetbot came back quickly20:56
clarkbI'm surprised it deployed so quickly20:57
opendevreviewMerged opendev/infra-specs master: Add spec to use Matrix for OpenDev comms  https://review.opendev.org/c/opendev/infra-specs/+/95482620:57
fungistatusbot never left20:57
clarkbbut I guess it got ahead of hourlies20:57
funginor gerritbot20:57
clarkbfungi: statusbot was done manually earlier today, it is a different repo20:57
fungioh, gotcha20:57
clarkbthis change waas ircbot (limnoria/meetbot), accessbot which is running its own job now, and matrix-eavesdrop20:57
fungiyep, looks like the rest are successful and accessbot hopefully too since it didn't fail straight away20:58
clarkbI see logs here https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2025-09-12.log20:59
clarkband checked that all three images have newer versions on eavesdrop0220:59
clarkbnow I just need someone to say something in matrix20:59
clarkbhttps://meetings.opendev.org/irclogs/%23zuul/%23zuul.2025-09-12.log matrix eavesdrop looks good too21:00
fungifirst pass at moving afs rw volumes from afs02.dfw to afs01.dfw just finished, we're down to 5 volumes that didn't go on the first try21:00
fungii'll retry these now: mirror.debian, mirror.ubuntu, mirror.ubuntu-ports, mirror.wheel.deb11x64, project.opendev21:01
fungiproject.opendev and mirror.wheel.deb11x64 both claim to be locked for writes21:03
clarkbwould 954826 merging have raced your check on project.opendev?21:03
fungipossible, i'm going to try again in a few21:04
clarkbhttps://zuul.opendev.org/t/openstack/build/106d1f036a59448c926e4d0af8d46c15 accessbot reports success21:05
clarkbso all three lgtm21:06
fungilooks like project.opendev and mirror.wheel.deb11x64 may have stuck locks21:08
fungii'll see if i can clear them21:08
clarkbfungi: you should make sure the cronjob to release prjoect.* isn't running21:10
clarkbit mgiht be going long21:10
fungiit's not, that was the first thing i checked on mirror.update21:11
clarkbfungi: it ssh's to another host21:12
clarkbI actually wonder if maybe the host it ssh's to got shutdown and that is why its stuck now21:12
clarkblooks like it is afs01.dfw21:13
clarkbso ya maybe when you restarted it as part of the intiial upgrade or when you did the subsequent restart to fix the vcpu issue21:14
fungiah yep21:14
fungiso anyway, should be able to clear those two locks and try again21:14
fungii suppose the guidance in our documentation pre-dates when we started ssh'ing to a specific server to perform the vos release calls, which essentially makes it a spof21:15
clarkbyes I think so21:15
clarkbI wonder how we can address that21:15
clarkbmaybe have the ssh commands run a wrapper script that checks for a flag file and returns exit code 1 if present21:15
clarkbthen we can set that and do maintenance on the host or something21:15
fungiwe only did that because vos release of a few volumes can take longer than the kerberos ticket timeout21:15
fungiso we're relying on -localauth on a fileserver instead of kerberos auth21:16
clarkbyup21:16
fungiworth thinking about how to make this less fragile, yes21:17
clarkbI'll put it on the pre ptg agenda now so that we don't forget about it if we don't have a solution by then21:17
fungii was able to vos unlock both volumes and vos release them, looks like they can be moved now21:19
clarkbI'll put moving lists installation to an ssd volume on the pre ptg etherpad too. If we address them before then great we can skip or recap what we did21:20
fungisounds great21:21
fungiunrelated to volume moves, we've ceased updating the mirror.debian-security volume as it ran out of quota once trixie appeared21:35
fungii'll work on fixing that now21:35
fungithe "last 2 years" graph shows it grows fairly linearly until we delete a version21:37
fungilooks like we still have stretch mirrored there21:38
fungihttps://static.opendev.org/mirror/debian-security/dists/21:38
fungii'll temporarily bump the quota by 50% to 300gb but we really should do some cleanup there and on the mirror.debian volume too (it also still has stretch)21:40
clarkb++21:40
fungi#status log Increased the mirror.debian-security AFS volume quota by 50% to 300GB after it ran out of space on 2025-08-2321:46
opendevstatusfungi: finished logging21:46
fungiproject.opendev and mirror.wheel.deb11x64 rw volumes have moved back to afs01.dfw, mirror.debian is in progress now and will likely take a while, mirror.ubuntu and mirror.ubuntu-ports will hopefully finish over the weekend but i'll try to keep an eye on the screen session21:48
fungihttps://review.opendev.org/c/opendev/system-config/+/817340 dropped stretch mirroring when it merged 4 years ago, i think we're overdue for deleting those files21:57
clarkbfungi: I haven't run the manual steps to clear out old content, I wonder how expensive it would be if the reprepro synchronization script simply ran that every time through?23:05
clarkbthat way when we drop the release from the reprepro config it would automatically delete the old content but in theory it would noop 99% of the time23:05

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!