Friday, 2026-02-06

@clarkb:matrix.orgpruning has begun but it looks like it may actually take about 3 hours to compelte so I probably wont' be able to test that the lodgeit image is preserved until tomorrow00:01
@clarkb:matrix.orgit hasn't failed yet which is also a good sign00:12
@clarkb:matrix.org(and I did check the container updated earlier today)00:12
@clarkb:matrix.orgI have rechecked https://review.opendev.org/c/opendev/system-config/+/945143 to see if registry pruning causes the lodgeit image to 404 (it shouldn't I hope)15:07
@clarkb:matrix.orgfungi: re depends on. I think the raeson I'm convinced it is fine is we are not changing the use of override checkout only the target and depends on worked fine when overriding to the branch name15:15
@clarkb:matrix.orgfungi: so in theory we should continue to override whatevervalue is in there with the depends on in the future as well15:15
@clarkb:matrix.orgThe recheck of https://review.opendev.org/c/opendev/system-config/+/945143 succeeded and pruning seems to have completed successfully on the registry too. I think this problem may be fixed now15:43
@clarkb:matrix.orgI also don't see any keycloak backup failure emails. Let me check the host directly15:44
@clarkb:matrix.org`Fri Feb  6 05:36:08 UTC 2026 Backup finished successfully`15:45
@clarkb:matrix.orgI'm reasonably satisfied that both of those items are resolved now. Say something if you see evidence to the contrary15:45
@tafkamax:matrix.orgHey 👋15:56
@fungicide:matrix.orgwelcome!15:57
@clarkb:matrix.orgfungi: while I'm working on those infra-manual updates did we want to proceed with https://review.opendev.org/c/opendev/system-config/+/975176 or do you think we need more clarification on depends-on first?16:01
@jim:acmegating.comClark: note the infra-manual still mentions #opendev irc (not sure if you're already planning on making that update, but maybe you can while you're in there?)16:03
@clarkb:matrix.orgcorvus: yup that is the first step.16:05
@fungicide:matrix.orgClark: i'm happy to approve 975176 but am disappearing for lunch so can't work on a restart for an hour or so16:10
@fungicide:matrix.orgapproved now, it probably won't merge until i get back regardless16:10
@clarkb:matrix.orgfungi: sounds good thanks16:12
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/infra-manual] 975926: Add documentation for Matrix https://review.opendev.org/c/opendev/infra-manual/+/97592616:39
@clarkb:matrix.orgcorvus: ^ that is the first step there16:39
@clarkb:matrix.orgLooking at the docs we actually do have the getting started document point to `git review -s` and its the extra bits document that talks about https remotes. So I'll need to think about how we can convey this better16:40
-@gerrit:opendev.org- Clark Boylan proposed:16:55
- [opendev/infra-manual] 975928: Point people at Getting Started with an attention block https://review.opendev.org/c/opendev/infra-manual/+/975928
- [opendev/infra-manual] 975929: Make it clearer that SSH is the preferred Gerrit comms protocol https://review.opendev.org/c/opendev/infra-manual/+/975929
@clarkb:matrix.orgSomething like those two changes maybe16:55
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 975176: Update Gerrit images to 3.11.8 and 3.12.4 https://review.opendev.org/c/opendev/system-config/+/97517617:18
@fungicide:matrix.orgjust in time17:19
@clarkb:matrix.orgI guess we can proceed with that whenever we think it is likely to be less disruptive (maybe already?)17:21
@clarkb:matrix.orgthe deployment jobs are still running should probably wait for those to complete17:21
@clarkb:matrix.orgfungi: did we want to do this like last time where you run things on the gerrit side and I can pause zuul queue processing?17:24
@clarkb:matrix.orgthe deploy jobs have completed successfully now17:25
@fungicide:matrix.orgsure, sounds fine to me17:42
@clarkb:matrix.orghttps://quay.io/repository/opendevorg/gerrit/manifest/sha256:8ff0f759ae6729bbf57f47721728038de6e41a92f50370ec96621d76b841c0da is the new image17:42
@fungicide:matrix.orgi've opened a root screen session on review0317:43
@fungicide:matrix.orgquay.io/opendevorg/gerrit       3.11      5bebd6c38d59   2 weeks ago    716MB17:44
@fungicide:matrix.orgthat's what we're running on at the moment17:44
@fungicide:matrix.orgi'll do a pull and inspect17:44
@clarkb:matrix.orgI have attached and see that happening17:44
@clarkb:matrix.orgI'm going to dig up the zuul pausing command17:45
@fungicide:matrix.orgquay.io/opendevorg/gerrit       3.11      b4345ed1ab79   About an hour ago   715MB17:45
@fungicide:matrix.orgquay.io/opendevorg/gerrit@sha256:8ff0f759ae6729bbf57f47721728038de6e41a92f50370ec96621d76b841c0da17:45
@fungicide:matrix.orgseems to have pulled the expected image17:45
@clarkb:matrix.orgexcellent17:45
@clarkb:matrix.orgshould we send something like `#status notice Gerrit on review.opendev.org will experience a short outage while we upgrade it to 3.11.8`17:46
@fungicide:matrix.orglgtm17:46
@clarkb:matrix.org`zuul-client manage-events --all-tenants --reason "Gerrit restart in progress" pause-result` appears to be the pausing command. Now to figure out the unpausing command17:46
@fungicide:matrix.org`docker compose -f /etc/gerrit-compose/docker-compose.yaml down && mv ~gerrit2/review_site/data/replication/ref-updates/waiting ~gerrit2/tmp/waiting_queue_2026-02-06 && rm ~gerrit2/review_site/cache/{gerrit_file_diff,git_file_diff,git_modified_files,modified_files,comment_context}.* && sudo docker compose -f /etc/gerrit-compose/docker-compose.yaml up -d`17:46
@fungicide:matrix.orgthat's what i've queued up in screen on review0317:47
@clarkb:matrix.org`zuul-client manage-events --all-tenants normal` is the unpause17:47
@fungicide:matrix.orgi guess we're ready to go if you want to do the status notice?17:47
@clarkb:matrix.orgyup let me do the status notice then when that completes I'll pause zuul and you can run your restart command17:48
@clarkb:matrix.org#status notice Gerrit on review.opendev.org will experience a short outage while we upgrade it to 3.11.817:48
@status:opendev.org@clarkb:matrix.org: sending notice17:48
@fungicide:matrix.orgperfect17:49
-@status:opendev.org- NOTICE: Gerrit on review.opendev.org will experience a short outage while we upgrade it to 3.11.817:51
@status:opendev.org@clarkb:matrix.org: finished sending notice17:51
@clarkb:matrix.orgfungi: zuul is paused now too17:52
@fungicide:matrix.orggerrit is restarting17:52
@clarkb:matrix.org`git_file_diff.lock.db` is the last cache lock fiel remaining so I think it is close to shutting down17:54
@fungicide:matrix.orgstopping took 223.9 seconds17:55
@fungicide:matrix.orgthe webui is coming up for me now17:56
@clarkb:matrix.org`[2026-02-06T17:56:18.282Z] [main] INFO  com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.11.8-dirty ready` from the log17:56
@fungicide:matrix.org"Powered by Gerrit Code Review (3.11.8-dirty)"17:56
@clarkb:matrix.orgshould I unpause zuul now? I also have a change I want to make a small udpate on that I could push to first if we want17:57
@fungicide:matrix.orgi'm fine with unpausing now, and yeah a quick replication test would be good17:57
@clarkb:matrix.orgactuall I have to write the change first. I should just unpause zuul now17:57
@clarkb:matrix.orgzuul is unpaused17:58
@clarkb:matrix.organd diffs load for me. Let me make my update17:58
@fungicide:matrix.orgi'm also prepped to run `gerrit index start changes --force` over the ssh api once we're ready for that17:58
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/infra-manual] 975929: Make it clearer that SSH is the preferred Gerrit comms protocol https://review.opendev.org/c/opendev/infra-manual/+/97592918:00
@clarkb:matrix.orgthere is my update18:00
@fungicide:matrix.orgah, no it's `replication start` we want to run, not index start18:00
@clarkb:matrix.orgfungi: no its index18:01
@fungicide:matrix.orgoh, right, because we lose the pending queue when restarting18:01
@clarkb:matrix.orgbecause there is a race in the shutdown process where a new change can arrive and get recorded in git before the index is updated then gerrit shutsdown and if we don't update the index gerrit never finds out about that change (or it finds out later when reindexing happens for another reason)18:01
@clarkb:matrix.orghttps://opendev.org/opendev/infra-manual/commit/91bab6d6ae089c1ee15c97b3a5c113d6b905e9db seems to have replicated from my push so I think that is working18:01
@fungicide:matrix.orgso it may have been preparing to index a change and we don't persist the storage for that18:01
@fungicide:matrix.orgokay, running `gerrit index start changes --force` after all18:02
@clarkb:matrix.orgyup I think that is the next step. show-queue was basically empty, I can push things and they replicate, web ui is up and diffs work etc18:02
@fungicide:matrix.orgwatching the reindex progress from gerrit logs in the screen session18:03
@clarkb:matrix.orgI wish I understood what leads to the huge variance in shutdown timingt18:06
@clarkb:matrix.orgI'm half tempted to set our docker compose shutdown timeout to something like 1800 seconds (half an hour) then we can manually kill -9 if it takes longer than we want. Otherwise it lets us wait18:07
@clarkb:matrix.orgbut considering that we literally delete these caches before starting back up I think having a resonable timeout then giving up and killing it with -9 is probably fine18:07
@clarkb:matrix.orgI don't think anything is running in the jvm other than the cache db cleanup at that point so its super safe (particularly when paired with the h2 deletion after shutdown)18:07
@fungicide:matrix.orgclouds18:11
@fungicide:matrix.orgthe reason is always clouds18:11
@clarkb:matrix.orgfungi: if you want to review the infra-manual updates they may be good "merge something" test cases since their ci jobs should run quickly. I'm also happy to update them if you find issues18:14
@clarkb:matrix.orgfungi: I think reindexing is slower than it has been in the past. I suspect that is due to the extra cache dbs we are deleting now as indexing relies on the caches (it will populate them as it goes)18:21
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan:18:28
- [opendev/infra-manual] 975926: Add documentation for Matrix https://review.opendev.org/c/opendev/infra-manual/+/975926
- [opendev/infra-manual] 975928: Point people at Getting Started with an attention block https://review.opendev.org/c/opendev/infra-manual/+/975928
- [opendev/infra-manual] 975929: Make it clearer that SSH is the preferred Gerrit comms protocol https://review.opendev.org/c/opendev/infra-manual/+/975929
@fungicide:matrix.orgseems like reindexing is over halfway done already18:29
@fungicide:matrix.orgi'm not sure it's any slower than in recent history18:29
@fungicide:matrix.orgbut also my sense of time is terrible to nonexistent18:30
@clarkb:matrix.orgI tthink half an hour is what it would take in the past18:34
@clarkb:matrix.organd we just about at that time now18:34
@clarkb:matrix.orglooks like it completed after about 50 minutes with the expected 3 failures. fungi I detached from the screen19:06
@fungicide:matrix.orgconfirmed, i've closed out the screen session now19:10
@clarkb:matrix.orgThere are more trixie image updates if we want to take a risk on any of them on a Friday. I've actually got some yard work I should do today before the rain comes back so maybe I'll pop outside after lunch19:24
-@gerrit:opendev.org- Bartosz Bezak proposed: [opendev/system-config] 975966: UCA: Add Gazpacho https://review.opendev.org/c/opendev/system-config/+/97596621:30
@fungicide:matrix.orglooks like our uca mirror has been stale for 4 months, judging from logs it was broken by bionic removal:21:44
@fungicide:matrix.org`Error: packages database contains unused 'bionic-updates/rocky|main|arm64' database.`21:44
@fungicide:matrix.orgi think we need to run `reprepro --delete clearvanished` per https://docs.opendev.org/opendev/system-config/latest/reprepro.html#removing-components21:46
@fungicide:matrix.orgi'll work on that now, then try manually pulling updates21:46
@fungicide:matrix.orghttps://paste.opendev.org/show/bkVtLFGRdvfA7OmmaihY/21:48
@fungicide:matrix.orgwe missed doing that for xenial as well, apparently21:49
@fungicide:matrix.orgfollowing up with `reprepro --nokeepunreferencedfiles deleteunreferenced` now to clear out the associated packages21:49
@fungicide:matrix.orgamd finally, manual update is in progress21:50
@fungicide:matrix.orgseems to have worked, vos release is running21:58
@fungicide:matrix.orgrerunning it just to make sure it's essentially a no-op21:58
@fungicide:matrix.organd it was, finished already21:59
@fungicide:matrix.orgi've released the lock22:00
@fungicide:matrix.orgnext cron-driven run is in ~16 minutes22:00
@fungicide:matrix.org#status log Ran a reprepro clearvanished pass on our Ubuntu Cloud Archive mirror in order to resolve errors related to earlier Xenial and Bionic ARM removals which were blocking updates for the past 4 months22:02
@status:opendev.org@fungicide:matrix.org: finished logging22:02
@clarkb:matrix.orgfungi: so if we remove a release and don't intervene we break the db?22:13
@fungicide:matrix.orgno, it didn't break the db22:14
@fungicide:matrix.orgour update script runs a check of the reprepro configuration, which errors when it finds components present which aren't reflected in the config, so the script aborts without updating anything22:15
@fungicide:matrix.orgall i did was tell reprepro to clear references to anything not listed in the config22:16
@fungicide:matrix.orgwe could potentially just run that first in the script, but it's potentially destructive and doesn't need to be run often22:17
@clarkb:matrix.orgGot it. And did that clear out the old packages from the pool too or just the release details on the index side?22:19
@fungicide:matrix.orgthe first command (clearvanished) only cleared the db entries, the second command (deleteunreferenced) cleared the orphaned package files from the pool22:22
@fungicide:matrix.orgthough our update script also does a deleteunreferenced so i could have skipped that, it was useful to do it first so i could see what got deleted and separate that from the subsequent deletions during update (when new package versions made the kept ones obsolete)22:23
@fungicide:matrix.orghttps://static.opendev.org/mirror/ubuntu-cloud-archive/timestamp.txt shows the time from the cron run a few minutes ago, so i think it's back on schedule now22:24
@clarkb:matrix.orgI wonder if maybe we want to run clear vanished for repos like UCA which are much smaller (and thus easier to rebuild it necessary) and also update more frequently as they do a Ubuntu X Openstack release matrix 22:24
@clarkb:matrix.orgThen be more conservative with the distro proper repos as those change infrequently from a release perspective and are massively expensive to rebuild22:25
@fungicide:matrix.orgit's worth thinking about, but we reuse the same script for all repositories/suites of all deb-based distros we mirror, so we'd need a selector/flag22:25
@fungicide:matrix.orgwhich i guess could just be a list of names baked into the script22:26
@clarkb:matrix.orgOr a simple getopt flag22:26
@fungicide:matrix.orgyep22:26
@fungicide:matrix.organd then set it in the cronjobs22:26
@clarkb:matrix.orgI think I would be willing to do that for the lower risk repos (mostly a cost to rebuild question I think so UCA, docker, maybe ceph?)22:28
@fungicide:matrix.orgyeah, just not for debian, ubuntu and ubuntu-ports22:29
@fungicide:matrix.orgsince those take days to a week to rebuild22:29
@clarkb:matrix.org++22:32
@fungicide:matrix.orgjust a heads up, seems like launchpad may be having problems. system-config-run-mirror-update failed twice in a row with timeouts reading from our openafs ppa22:32
@fungicide:matrix.org`failed to fetch PPA information, error was: Connection failure: The read operation timed out`22:33
@clarkb:matrix.orgThere is a whole conversation on the gerrit mailing list about 3.12 and its v2 h2 cache dbs getting corrupted. It looks like this is happening due to problems like OOMing which we should in theory avoid since we limit the jvm well below host limits. But I wonder if the kill -9 timeout from docker compose down would cause that to happen22:45
@clarkb:matrix.orgagain it probably doesn't matter too much if we are then immediately deleting the backing database file.22:45
@clarkb:matrix.orgBut something to keep in mind as we ramp up 3.12 upgrade planning22:46
@clarkb:matrix.orgit also looks like the solution is to delete the database entirely too if you hit it so again we're already doing that regularly its probably not a big deal for us22:47

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!