| fungi | the project.tarballs move has been in progress since about the past 30 minutes | 13:03 |
|---|---|---|
| profcorey | hi @fungi recently I did a quick review of the other 'x' namespace projects and seems not much activity is occurring with other x projects. If there's any talk about shutting down the 'x' namespace, please let me know ahead of time. My plan is to propose to the TC to accept cascade as an official openstack project at the next NA summit. If accepted, I'll need help moving cascde from x to the openstack namespace, thank you. | 14:45 |
| fungi | profcorey: yes, we have a process for moving repositories between namespaces, that's fine | 14:50 |
| profcorey | @fungi, fantastic thanks! | 14:52 |
| opendevreview | Jan Gutter proposed zuul/zuul-jobs master: Make buildx builder image configurable https://review.opendev.org/c/zuul/zuul-jobs/+/960840 | 14:54 |
| clarkb | I'm rerunning fio sanity checks on lists and mirrors now | 15:03 |
| clarkb | then I'm going to start looking at what container update chagnes I can land after the big container location move yesterday. fungi you also mentioned today is probably a good day to run the fio tests with services on lists off. Is there a time you want to do that or think would be ideal for doing that? | 15:03 |
| clarkb | mirror randread `read: IOPS=20.0k, BW=78.2MiB/s (82.0MB/s)(2346MiB/30004msec)` read `read: IOPS=137k, BW=534MiB/s (560MB/s)(15.7GiB/30005msec)` | 15:04 |
| opendevreview | Merged openstack/project-config master: Make update_constraints.sh work with pyproject.toml https://review.opendev.org/c/openstack/project-config/+/960627 | 15:05 |
| clarkb | lists randread `read: IOPS=814, BW=3260KiB/s (3338kB/s)(95.8MiB/30085msec)` read `read: IOPS=15.2k, BW=59.3MiB/s (62.2MB/s)(1782MiB/30060msec)` | 15:08 |
| clarkb | things appear to remain pretty consistent over time (which I guess is good?) | 15:08 |
| clarkb | fungi: the thing that occurred to me overnight is with the bit root disk on lists we must be using a different flavor type and I wonder if that flavor type comes with different disk performance out of the box? | 15:08 |
| fungi | not sure | 15:10 |
| clarkb | I'll look into that momentarily | 15:10 |
| clarkb | meanwhile https://review.opendev.org/c/opendev/system-config/+/960676 is an easy review for post container image move cleanup I think | 15:10 |
| clarkb | I'm going to approve gerritbot, statusbot, grafyaml and lodgeit base image updates now | 15:12 |
| clarkb | I'm starting with gerritbot because the merges of the other changes should be a good test of gerritbot | 15:12 |
| opendevreview | Jan Gutter proposed zuul/zuul-jobs master: Make buildx builder image configurable https://review.opendev.org/c/zuul/zuul-jobs/+/960840 | 15:19 |
| opendevreview | Merged opendev/gerritbot master: Pull python base images from quay.io https://review.opendev.org/c/opendev/gerritbot/+/958600 | 15:20 |
| clarkb | I don't think ^ will trigger automatic deployments (we'd have to wait for daily runs) so I'll go and manually update gerritbot shortly | 15:22 |
| fungi | k | 15:22 |
| clarkb | fungi: ok the lists server is a '8GB Standard Instance'. The mirror server is a '8 GB Performance'. Running flavor show on each of those gives properties listings that include 'disk_io_index' for the general flavor it is 2 and for the performance flavor it is 40 | 15:23 |
| fungi | 20x difference? | 15:23 |
| fungi | wow | 15:23 |
| clarkb | I'm beginning to suspect that this is the underlying issue, but also seems to be by design | 15:23 |
| fungi | so moving onto an ssd volume makes more sense in that case | 15:24 |
| clarkb | fungi: ya we get around 1k iops on randreads from lists and 20k randreads from mirror | 15:24 |
| clarkb | so that 20x values seems to align with the fio results | 15:24 |
| fungi | agreed | 15:24 |
| clarkb | I no longer think we need to stop lists' services and get a baseline that way | 15:24 |
| corvus | oh wow we haven't used "standard" for ages, i guess that's our legacy from carrying over the ip address. | 15:24 |
| fungi | nor i | 15:24 |
| fungi | i don't recall why we used that flavor now | 15:25 |
| clarkb | and yes I think moving the mailman 3 installation into an ssd volume is a good next step. If we do that for the entire /var/lib/mailman space and whereever mariadb is backign its data I suspect the lower iops elsewhere won't be a big deal with apache logs etc | 15:25 |
| clarkb | I think it was to mimic the old server more than anything else | 15:25 |
| fungi | yeah, that server was only created 3 years ago | 15:26 |
| clarkb | I just restarted gerritbot | 15:29 |
| clarkb | I've approved the grafyaml update now which should give us a notification in a bit once it merges | 15:31 |
| clarkb | to confirm gerritbot is happy | 15:31 |
| clarkb | I tailed the logs and didn't see anything amiss so it should be | 15:31 |
| opendevreview | Merged openstack/project-config master: Fix update_constraints.sh for 'python_version>=' cases https://review.opendev.org/c/openstack/project-config/+/960628 | 15:36 |
| clarkb | heh a different change beat me to the test of gerritbot | 15:36 |
| clarkb | I went ahead and approved the lodgeit and statusbot changes too | 15:37 |
| opendevreview | Merged opendev/statusbot master: Pull python base images from quay.io https://review.opendev.org/c/opendev/statusbot/+/958603 | 15:45 |
| opendevreview | Merged opendev/lodgeit master: Pull base images from opendevorg rather than opendevmirror https://review.opendev.org/c/opendev/lodgeit/+/958602 | 15:48 |
| clarkb | that is opendevstatus updated | 15:49 |
| clarkb | anyone else getting `kex_exchange_identification: read: Connection reset by peer` when attempting to ssh to paste.opendev.org? I've tried three times now and 2 failed with that and oen succeeded | 15:51 |
| clarkb | fourth attempt just succeeded too | 15:51 |
| clarkb | and fifth. This is weird | 15:51 |
| opendevreview | Merged opendev/grafyaml master: Pull python base images from quay.io https://review.opendev.org/c/opendev/grafyaml/+/958601 | 15:53 |
| clarkb | ok I can't reproduce anymore so wondering if this was a blip in the Internets rather than a service/server problem | 15:53 |
| clarkb | I'm going to restart lodgeit on paste on the new image just as soon as I find a test paste I can check post restart | 15:59 |
| clarkb | https://paste.opendev.org/show/bWxZSt3nyaG8bxa3yW54/ is going to be my check | 15:59 |
| clarkb | that paste loads again | 16:01 |
| clarkb | I don't know if anyone cares but in checking the logs we're definitely getting crawled on that service too | 16:01 |
| fungi | i would be surprised to hear we weren't | 16:01 |
| clarkb | https://paste.opendev.org/show/b3GTlggxFOmP8t6cIfpB/ this is a new paste created after the update | 16:02 |
| clarkb | I am able to load old pastes and create new ones so I think all is well | 16:02 |
| fungi | lgtm | 16:02 |
| clarkb | that leaves https://review.opendev.org/c/opendev/system-config/+/960676 and the gerrit and ircbot changes as cleanup after the move | 16:03 |
| clarkb | that first one should be a quick and easy review if anyone wants to get that done. | 16:03 |
| clarkb | then for ircbot and gerrit I'm happy to defer those for now as I need to look into bwrap issues with zuul on trixie. | 16:04 |
| opendevreview | Jan Gutter proposed zuul/zuul-jobs master: Make buildx builder image configurable https://review.opendev.org/c/zuul/zuul-jobs/+/960840 | 16:05 |
| clarkb | fungi: https://zuul.opendev.org/t/openstack/build/2487e9193fc546928652e74b3ed58190 the daily afs job seems to haev been successful | 16:13 |
| clarkb | the cloud launcher and backup setup jobs failed though | 16:13 |
| opendevreview | Merged opendev/system-config master: Stop mirroring python base images to quay.io/opendevmirror https://review.opendev.org/c/opendev/system-config/+/960676 | 16:15 |
| clarkb | looking at the log for infra-prod-service-borg-backup on bridge it failed due to an undefined variable for borg-user in the loop where it creates users on the backup server. It only did this for one backup server and not the other. Also this job was successful up until yesterday. So I think this may be an ansible bug/blip and not anything systemic | 16:15 |
| clarkb | if both backup servers failed I would worry more | 16:15 |
| clarkb | Let's keep an eye on it | 16:15 |
| clarkb | ya we dynamically generate that username based on the first part of the fqdn (up to the first .) | 16:16 |
| clarkb | so shouldn't need to explicitly set that under normal circumstances | 16:16 |
| fungi | makes sense | 16:18 |
| fungi | need to run a quick errand, will brb | 18:39 |
| clarkb | infra-root the zuul on trixie change depends on https://review.opendev.org/c/opendev/system-config/+/960681 to mirror a couple of container images to quay.io/opendevmirror | 19:23 |
| clarkb | and last call on https://review.opendev.org/c/opendev/infra-specs/+/954826 if anyone has feedback for that. Otherwise I plan to approve it today | 19:25 |
| fungi | okay, back, looks like i didn't miss much | 19:38 |
| clarkb | I ate lunch and it has been a quiet friday | 19:46 |
| clarkb | I'm just slowly getting through my todo list | 19:46 |
| opendevreview | Merged opendev/system-config master: Mirror golang and node trixie container images https://review.opendev.org/c/opendev/system-config/+/960681 | 19:48 |
| clarkb | fungi: for afsdb03 did you end up needing to edit the netplan stuff or did you switch to /e/n/i or maybe it just worked? | 19:48 |
| fungi | i edited the netplan configs to add enX interfaces in addition to the eth interfaces | 19:49 |
| fungi | when everything's done, i'll go back and delete the old eth interface sections | 19:49 |
| clarkb | got it. | 19:49 |
| fungi | so basically same solution as the servers that use ifupdown, just different config files | 19:50 |
| clarkb | I'm thinking Monday we can upgrade gitea | 19:51 |
| clarkb | though I should check the screenshots on that change. I haven't even looked yet | 19:51 |
| fungi | sounds good | 19:51 |
| clarkb | ya screenshots lgtm | 19:52 |
| clarkb | https://review.opendev.org/c/opendev/system-config/+/960675 is the change | 19:52 |
| fungi | https://github.com/advisories/GHSA-jc7w-c686-c4v9 seems to be the background on the security item in that changelog | 19:59 |
| clarkb | do you think that is urgent enough to go ahead with today (I think its probably fine to wait?) | 19:59 |
| clarkb | mostly I didn't want to rush and ask for reviews on friday | 20:00 |
| fungi | i don't personally see how it could even be triggered by a malicious gitea user, much less on our deployment with so much stuff already disabled | 20:05 |
| fungi | so, no, doesn't look urgent to me | 20:05 |
| fungi | odds are gitea doesn't even use the affected methods in that library | 20:06 |
| clarkb | we need yaml2ical to give us a graph of when meetings happen and don't happen so that its easier to decide when to update ircbot | 20:17 |
| clarkb | I don't want to import the entire ical thing into my calendar to get that info. But that is currently the easiest way to do it | 20:17 |
| clarkb | it does look like after about 1500 UTC on Fridays is clear though if we want to land https://review.opendev.org/c/opendev/system-config/+/958596 | 20:18 |
| clarkb | fungi: any opinion on ^ ? | 20:19 |
| fungi | doesn't even need to be a graph/chart, we could make a quick algorithm that identifies all the schedule gaps from the meetings data, and return them sorted by largest to smallest | 20:19 |
| fungi | but yeah this time on fridays has typically been safe, i don't see anything scheduled | 20:20 |
| clarkb | looks like you approved it. thanks. I'm around this afternoon so can monitor the deployment (this one should actually trigger deployments automatically iirc) | 20:20 |
| fungi | i'll be around for a while still too | 20:21 |
| opendevreview | Merged opendev/system-config master: Build ircbots with base python image from quay.io https://review.opendev.org/c/opendev/system-config/+/958596 | 20:50 |
| fungi | looks like meetbot came back quickly | 20:56 |
| clarkb | I'm surprised it deployed so quickly | 20:57 |
| opendevreview | Merged opendev/infra-specs master: Add spec to use Matrix for OpenDev comms https://review.opendev.org/c/opendev/infra-specs/+/954826 | 20:57 |
| fungi | statusbot never left | 20:57 |
| clarkb | but I guess it got ahead of hourlies | 20:57 |
| fungi | nor gerritbot | 20:57 |
| clarkb | fungi: statusbot was done manually earlier today, it is a different repo | 20:57 |
| fungi | oh, gotcha | 20:57 |
| clarkb | this change waas ircbot (limnoria/meetbot), accessbot which is running its own job now, and matrix-eavesdrop | 20:57 |
| fungi | yep, looks like the rest are successful and accessbot hopefully too since it didn't fail straight away | 20:58 |
| clarkb | I see logs here https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2025-09-12.log | 20:59 |
| clarkb | and checked that all three images have newer versions on eavesdrop02 | 20:59 |
| clarkb | now I just need someone to say something in matrix | 20:59 |
| clarkb | https://meetings.opendev.org/irclogs/%23zuul/%23zuul.2025-09-12.log matrix eavesdrop looks good too | 21:00 |
| fungi | first pass at moving afs rw volumes from afs02.dfw to afs01.dfw just finished, we're down to 5 volumes that didn't go on the first try | 21:00 |
| fungi | i'll retry these now: mirror.debian, mirror.ubuntu, mirror.ubuntu-ports, mirror.wheel.deb11x64, project.opendev | 21:01 |
| fungi | project.opendev and mirror.wheel.deb11x64 both claim to be locked for writes | 21:03 |
| clarkb | would 954826 merging have raced your check on project.opendev? | 21:03 |
| fungi | possible, i'm going to try again in a few | 21:04 |
| clarkb | https://zuul.opendev.org/t/openstack/build/106d1f036a59448c926e4d0af8d46c15 accessbot reports success | 21:05 |
| clarkb | so all three lgtm | 21:06 |
| fungi | looks like project.opendev and mirror.wheel.deb11x64 may have stuck locks | 21:08 |
| fungi | i'll see if i can clear them | 21:08 |
| clarkb | fungi: you should make sure the cronjob to release prjoect.* isn't running | 21:10 |
| clarkb | it mgiht be going long | 21:10 |
| fungi | it's not, that was the first thing i checked on mirror.update | 21:11 |
| clarkb | fungi: it ssh's to another host | 21:12 |
| clarkb | I actually wonder if maybe the host it ssh's to got shutdown and that is why its stuck now | 21:12 |
| clarkb | looks like it is afs01.dfw | 21:13 |
| clarkb | so ya maybe when you restarted it as part of the intiial upgrade or when you did the subsequent restart to fix the vcpu issue | 21:14 |
| fungi | ah yep | 21:14 |
| fungi | so anyway, should be able to clear those two locks and try again | 21:14 |
| fungi | i suppose the guidance in our documentation pre-dates when we started ssh'ing to a specific server to perform the vos release calls, which essentially makes it a spof | 21:15 |
| clarkb | yes I think so | 21:15 |
| clarkb | I wonder how we can address that | 21:15 |
| clarkb | maybe have the ssh commands run a wrapper script that checks for a flag file and returns exit code 1 if present | 21:15 |
| clarkb | then we can set that and do maintenance on the host or something | 21:15 |
| fungi | we only did that because vos release of a few volumes can take longer than the kerberos ticket timeout | 21:15 |
| fungi | so we're relying on -localauth on a fileserver instead of kerberos auth | 21:16 |
| clarkb | yup | 21:16 |
| fungi | worth thinking about how to make this less fragile, yes | 21:17 |
| clarkb | I'll put it on the pre ptg agenda now so that we don't forget about it if we don't have a solution by then | 21:17 |
| fungi | i was able to vos unlock both volumes and vos release them, looks like they can be moved now | 21:19 |
| clarkb | I'll put moving lists installation to an ssd volume on the pre ptg etherpad too. If we address them before then great we can skip or recap what we did | 21:20 |
| fungi | sounds great | 21:21 |
| fungi | unrelated to volume moves, we've ceased updating the mirror.debian-security volume as it ran out of quota once trixie appeared | 21:35 |
| fungi | i'll work on fixing that now | 21:35 |
| fungi | the "last 2 years" graph shows it grows fairly linearly until we delete a version | 21:37 |
| fungi | looks like we still have stretch mirrored there | 21:38 |
| fungi | https://static.opendev.org/mirror/debian-security/dists/ | 21:38 |
| fungi | i'll temporarily bump the quota by 50% to 300gb but we really should do some cleanup there and on the mirror.debian volume too (it also still has stretch) | 21:40 |
| clarkb | ++ | 21:40 |
| fungi | #status log Increased the mirror.debian-security AFS volume quota by 50% to 300GB after it ran out of space on 2025-08-23 | 21:46 |
| opendevstatus | fungi: finished logging | 21:46 |
| fungi | project.opendev and mirror.wheel.deb11x64 rw volumes have moved back to afs01.dfw, mirror.debian is in progress now and will likely take a while, mirror.ubuntu and mirror.ubuntu-ports will hopefully finish over the weekend but i'll try to keep an eye on the screen session | 21:48 |
| fungi | https://review.opendev.org/c/opendev/system-config/+/817340 dropped stretch mirroring when it merged 4 years ago, i think we're overdue for deleting those files | 21:57 |
| clarkb | fungi: I haven't run the manual steps to clear out old content, I wonder how expensive it would be if the reprepro synchronization script simply ran that every time through? | 23:05 |
| clarkb | that way when we drop the release from the reprepro config it would automatically delete the old content but in theory it would noop 99% of the time | 23:05 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!