mnasiadka | fungi: thanks | 05:31 |
---|---|---|
mnasiadka | Need a second +2 on https://review.opendev.org/c/openstack/diskimage-builder/+/952548 and a DIB release afterwards to continue adding stream10 and rocky10 to niz :-) | 05:32 |
*** ykarel__ is now known as ykarel | 06:08 | |
*** darmach2 is now known as darmach | 11:33 | |
fungi | mnasiadka: i approved it, i or someone else can tag a new release once that lands | 13:14 |
mnasiadka | Nice, thanks :) | 13:32 |
opendevreview | Merged openstack/diskimage-builder master: Add Rocky Linux 10 support to rocky-container element https://review.opendev.org/c/openstack/diskimage-builder/+/952548 | 14:15 |
fungi | i'm in meetings until 16:00 utc, but can look into doing a dib release once those wrap up | 14:17 |
fungi | looks like element/matrix wants us to make some dns changes, i can get that proposed later today if nobody beats me to it | 14:44 |
clarkb | infra-root is today a good day to land https://review.opendev.org/c/opendev/system-config/+/950595 and plan for a gerrit restart? | 14:58 |
clarkb | this is the change that removes the extra h2 compaction time setting from the jvm args | 14:59 |
fungi | sure, once my meetings are done | 15:01 |
corvus | huh, we have a 16gb nodes for zuul01 and a 30gb node for zuul02 | 15:34 |
corvus | i guess we launched those when zuul was using a lot more ram | 15:35 |
corvus | (though, i'm not sure why they are different) | 15:35 |
corvus | for the past year, 8gb of ram would have been fine for zuul01, and these days it's even better, only using 3.67gb | 15:36 |
corvus | i think the noble replacements should both be 8gb machines | 15:38 |
clarkb | nice! | 15:39 |
corvus | https://imgur.com/YzsiUYR you can see the change to reuse config objects | 15:40 |
corvus | i'm launching replacement servers for zuul01 and zuul02 now; so heads up, since i'm reusing the names, there will be two of them in the server list for a little bit. | 15:42 |
clarkb | good to know | 15:43 |
opendevreview | James E. Blair proposed opendev/zone-opendev.org master: Replace zuul01 and zuul02 https://review.opendev.org/c/opendev/zone-opendev.org/+/953597 | 15:55 |
opendevreview | James E. Blair proposed opendev/system-config master: Replace zuul01 and zuul02 https://review.opendev.org/c/opendev/system-config/+/953598 | 15:55 |
corvus | i plan on pausing here and resuming later, but if folks want to +2 those changes, that would be great, and i can +w them and do the actual switches later. | 15:57 |
clarkb | I'll take a look momentarily | 15:58 |
clarkb | corvus: both lgtm. The one question I have is whether or not there is a chicken and egg situation where we rely on zuul to deploy these things but changing dns records and group membership will affect firewalls/network connectivity in a way that may impact the ability to deploy? I guess worst case we manually run things from bridge? | 16:00 |
corvus | the other concern is the load balancer; i was thinking of putting it in emergency... | 16:01 |
corvus | but perhaps i should split these into two changes | 16:01 |
corvus | and just do one at a time. will slow the process a bit, but then we can allow for one to be broken while the other is not | 16:01 |
clarkb | ya that seems like a reasonable out. Its the same issue and solution with zookeeper... | 16:02 |
clarkb | fungi: were you planning to approve https://review.opendev.org/c/opendev/system-config/+/950595 or should I? | 16:03 |
fungi | on it now | 16:03 |
clarkb | corvus: editing the haproxy config manually should be fairly easy at least | 16:03 |
clarkb | if you don't want to do them one at a time | 16:04 |
corvus | yeah i agree... but i think i'll split it though... hopefully should be a more relaxed process :) | 16:05 |
opendevreview | James E. Blair proposed opendev/zone-opendev.org master: Replace zuul01 https://review.opendev.org/c/opendev/zone-opendev.org/+/953602 | 16:07 |
opendevreview | James E. Blair proposed opendev/zone-opendev.org master: Replace zuul02 https://review.opendev.org/c/opendev/zone-opendev.org/+/953603 | 16:07 |
opendevreview | James E. Blair proposed opendev/system-config master: Replace zuul01 https://review.opendev.org/c/opendev/system-config/+/953604 | 16:07 |
opendevreview | James E. Blair proposed opendev/system-config master: Replace zuul02 https://review.opendev.org/c/opendev/system-config/+/953605 | 16:07 |
clarkb | fungi: we are not building new gerrit container images with that chagne (I'm just confirming we don't need to pull as part of the restart) | 16:13 |
opendevreview | Rubén proposed openstack/diskimage-builder master: docs: clarify when debian-systemd makes sense https://review.opendev.org/c/openstack/diskimage-builder/+/953612 | 16:13 |
clarkb | a simple restart should do (where simple == not simple because we have a bunch of other things to do) | 16:13 |
mnasiadka | clarkb: do you think https://review.opendev.org/c/opendev/zuul-providers/+/953269 could be merged? Would like to continue with stream10 and rocky10 for niz next week | 16:13 |
fungi | k | 16:14 |
fungi | yeah we just deploy the updated config file to the server | 16:14 |
clarkb | mnasiadka: yes, though corvus may have held off landing that one due to the issues monitoring image build status with zuul launcher right now | 16:14 |
clarkb | corvus: ^ should we go ahead and add focal and bionic or wait for logging to get better? | 16:15 |
clarkb | I also have one question on the hcnage I'll post shortly | 16:16 |
corvus | oh er i just +3d 953269 | 16:16 |
corvus | i think i only didn't approve it so you could see it clarkb :) | 16:16 |
clarkb | corvus: the question is basically should we add the promote jobs to the periodic and promote pipelines | 16:17 |
clarkb | I think its fine if that goes in a followup too | 16:17 |
clarkb | questions is posted on the change with better context | 16:17 |
corvus | oh yes, let's do that | 16:18 |
corvus | and let's revise the change for it so we have a better model to copypasta | 16:18 |
clarkb | mnasiadka: ^ fyi | 16:19 |
corvus | i think also we were missing "image-build" | 16:20 |
corvus | so 3 pipelines | 16:20 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Add Ubuntu bionic/focal builds, labels and provider config https://review.opendev.org/c/opendev/zuul-providers/+/953269 | 16:21 |
corvus | i added those and +2d it if you want to +3 it clarkb | 16:23 |
clarkb | done | 16:23 |
rubencabrera[m] | <clarkb> "rubencabrera: looks like..." <- Just submitted this README change so I don't fall for the same mistake again. Does it make sense? | 16:25 |
rubencabrera[m] | https://review.opendev.org/c/openstack/diskimage-builder/+/953612 | 16:25 |
clarkb | rubencabrera[m]: yup I +2' it | 16:27 |
clarkb | mnasiadka: I left some comments on https://review.opendev.org/c/opendev/zuul-providers/+/953460 as well | 16:27 |
clarkb | fungi: since the gerrit restart process is complicated enough now I put together https://etherpad.opendev.org/p/gerrit-restart-process | 16:37 |
fungi | thanks | 16:37 |
clarkb | this incorporates the idea of doing a manual sighup to see if things shutdown quickly doing that vs the docker compose driven sigint | 16:38 |
clarkb | fungi: you don't happen to have the command you used for deleting cache files somewhere do you? I need to add that to the etherpad still | 16:38 |
fungi | rm ~gerrit2/review_site/cache/{gerrit_file_diff,git_file_diff}.* | 16:39 |
fungi | looks like what's in root's shell history as the most recent | 16:39 |
fungi | could be simplified to {gerrit,git}_file_diff.* | 16:40 |
clarkb | fungi: we need to add diff_intraline to the list | 16:40 |
mnasiadka | clarkb: yeah, the stream10 one needs more love, thanks for the comments :) | 16:41 |
clarkb | so I've kept the comma separated list | 16:41 |
fungi | or even g{err,}it_file_diff.* if you're insane ;) | 16:41 |
clarkb | fungi: I like to think I have managed to hold onto a small semblance of sanity :) | 16:41 |
fungi | sounds good | 16:41 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Update .well-known/matrix/client for mobile OIDC https://review.opendev.org/c/opendev/system-config/+/953624 | 16:56 |
fungi | ^ as promised earlier | 16:57 |
fungi | clarkb: okay, the dib release buildset just reported | 17:06 |
clarkb | perfect. The gerrit config update should land soon, then we wait for it to deploy then we can restart and see how we do | 17:06 |
fungi | i've started a root screen session on review in preparation | 17:06 |
clarkb | I have attached | 17:08 |
opendevreview | Merged opendev/system-config master: Revert "Set h2.maxCompactTime to 15 seconds" https://review.opendev.org/c/opendev/system-config/+/950595 | 17:28 |
clarkb | the config files appear to have updated as expected | 17:32 |
clarkb | fungi: I think we can proceed whenever we're ready. Do you want to drive or should I? | 17:32 |
clarkb | I put a suggested statusbot message in the etherpad too | 17:33 |
fungi | i can drive, sure | 17:35 |
fungi | #status notice Gerrit is being restarted to pick up a configuration change. You may notice a short outage. | 17:35 |
opendevstatus | fungi: sending notice | 17:35 |
-opendevstatus- NOTICE: Gerrit is being restarted to pick up a configuration change. You may notice a short outage. | 17:35 | |
fungi | gerrit java pid seems to be 765069 | 17:36 |
clarkb | yes that looks right to me | 17:36 |
clarkb | I stuck the && date in there so we'd have a good idea of how long it takes to shutdown if it is slow again | 17:36 |
fungi | standing by to hup the process once notifications are done | 17:36 |
opendevstatus | fungi: finished sending notice | 17:38 |
clarkb | I'm ready when you are | 17:39 |
fungi | signal sent | 17:39 |
clarkb | and process is still running last I checked | 17:40 |
clarkb | thats "good" because it means that sigint isn't necessarily the problem? | 17:40 |
clarkb | anyway I think we can let it go for a few minutes before we issue the docker compose down and let it fall back onto its timeout behavior | 17:40 |
clarkb | strace shows it writing and seeking in file id 164 | 17:41 |
clarkb | java 765069 gerrit2 164u REG 252,0 6056280064 44047297 /var/gerrit/cache/git_file_diff.h2.db | 17:42 |
clarkb | so I think our suspicion that the compaction is slowing down shutdown here may be correct | 17:42 |
clarkb | running lsof its still compacting that file | 17:43 |
clarkb | s/compacting/seeking and writing to that file/ | 17:43 |
clarkb | whcih is logner than the 15 second timeout we give it... | 17:43 |
fungi | yeah | 17:44 |
clarkb | it is also writing a bunch of 0s which does seem in line with compaction. Maybe the value isn't actually being treated as milliseconds and 15000 is a longer timeout than we anticipated | 17:44 |
clarkb | oh wow the file sizes are definitely shrinking according to ls though | 17:45 |
fungi | but also compaction on shutdown is pointless if we always kill it early due to ecxceeding the timeout | 17:45 |
clarkb | it was like 15GB before iirc | 17:45 |
clarkb | fungi: yup and we've discovered that we get faster startup responses if we delete the cache anyway | 17:46 |
clarkb | so I think this config update to stop doing long compaction is a good one | 17:46 |
clarkb | fungi: the process stopped | 17:46 |
clarkb | so it ook about 6-7 minutes. Just slightly longer than our 5 minute timeout | 17:46 |
clarkb | I think we can proceed with docker compose down now and the rest of our restart process | 17:47 |
fungi | doing | 17:47 |
fungi | on its way back up now | 17:48 |
clarkb | [2025-06-27T17:48:22.232Z] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.10.6-12-gf8d9e0470a-dirty ready | 17:48 |
clarkb | ps -elf | grep jvm doesn't show the h2 compaction timeout flag as expected (good) | 17:48 |
fungi | confirmed | 17:49 |
clarkb | changes and diffs load for me in the web ui | 17:49 |
fungi | yeah, poking around the ui everything appears fine | 17:49 |
fungi | i'll let the openstack release team know we're done | 17:49 |
clarkb | if someone has a change or patchset to push that would be a good check and then we can check replication too | 17:49 |
clarkb | but ya I think we're largely done at this point | 17:50 |
clarkb | I'll summarize what we've learned in the channel for record keeping purposes once we're satisfied everything is functioning | 17:50 |
clarkb | remote: https://review.opendev.org/c/zuul/zuul/+/952592 Set lower bounds on IBM, Google, and AWS cloud deps | 17:52 |
clarkb | I just pushed an update to that change | 17:52 |
clarkb | https://opendev.org/zuul/zuul/commit/d860fc4ed27c5694b5ca1318f7adf6ddd0cc1e58 and that seems to have replicated | 17:53 |
clarkb | To summarize: We shutdown Gerrit this time using a manual sighup out of band from docker compose/podman. We did that to test if sighup behaves any differently than sigint. And "good" news it did not. Gerrit took about 6-7 minutes to shutdown after the sighup. During this time I ran a strace which showed gerrit reading/writing/seeking to file descriptor 164. lsof reported this as | 17:55 |
clarkb | belonging to one of the h2 cache files. ls also showed the size of the large cache files shrinking bout about a third. I think that this pretty much confirms that compaction of h2 is what is slowing down gerrit restarts and not something to do with the sighup -> sigint move | 17:55 |
clarkb | This means that the revert we just put in place to remove longer h2 compaction times is a good fix for us. Subsequent gerrit restarts should work via docker compose with sigint just fine and be much quicker. We're deleting the large cache files during the restart anyway so there isn't much sense in compacting them first especially if that takes minutes to do | 17:56 |
clarkb | hashar: ^ fyi since you had been dealing with similar | 17:57 |
clarkb | fungi: oh! I just remembered we need to issue a reindex of changes. That was the new thing we learned | 17:57 |
fungi | aha, yes, on it | 17:57 |
clarkb | can you add it to the etherpad so that we have it for future reference? | 17:58 |
fungi | yeah, that's something we trigger through the ssh api, right? | 17:59 |
clarkb | `ssh -p 29418 adminuser@review03.opendev.org gerrit index start changes --force` | 17:59 |
clarkb | looks like this is what i have in my local command history | 17:59 |
clarkb | the --force is necessary iirc because the current index version is the latest index version. --force says reindex anyway in that situation | 18:00 |
clarkb | so yes via the ssh api. Then you can use show-queue -w to track progress | 18:00 |
fungi | `gerrit index changes` looks like the syntax | 18:00 |
fungi | aha, you're further down the docs road than me | 18:00 |
fungi | Reindexer started | 18:07 |
fungi | sorry, was fumbling with my various ssh keys trying to work out the right one for my admin account | 18:07 |
clarkb | show-queue -w shows a bunch of index all changes for project foo entries now | 18:08 |
fungi | and yeah, i used `gerrit index start changes --force` as suggested | 18:08 |
clarkb | cool I added that to the etherpad | 18:09 |
clarkb | hopefully now we don't forget next time | 18:09 |
clarkb | if you tail the error log you get periodic updates on reindex progress. If you find the ssh connection errors too noisy you can filter for "Reindexing changes" | 18:13 |
clarkb | as an alternative to running show-queue periodically | 18:14 |
clarkb | https://groups.google.com/g/repo-discuss/c/qypZHLipsCU everyone continues to struggle with the ai bot crawlers | 18:16 |
fungi | yeah, my personal webserver that has gitweb set up for a few very small projects is periodically entirely offline these days due to the crawler load | 18:28 |
clarkb | once reindexing completes I'm going to pop out for lunch | 18:39 |
clarkb | Failed 3/947540 changes Reindex changes to version 86 complete Using changes schema version 86 | 18:40 |
clarkb | spot checking those three failures they are teh really old changes with problems that its struggled with forever | 18:40 |
clarkb | I don't personally see anything concerning | 18:40 |
clarkb | fungi: also I detached from the screen a while back and I think you can shut it down whenever you're done with it | 18:41 |
clarkb | alright back later | 18:42 |
fungi | yeah, looks good | 19:02 |
fungi | i too am popping out to grab an early dinnner while the tourists are still busy baking themselves | 19:02 |
opendevreview | Merged openstack/diskimage-builder master: docs: clarify when debian-systemd makes sense https://review.opendev.org/c/openstack/diskimage-builder/+/953612 | 21:35 |
opendevreview | Merged opendev/zone-opendev.org master: Replace zuul01 https://review.opendev.org/c/opendev/zone-opendev.org/+/953602 | 23:35 |
opendevreview | Merged opendev/system-config master: Replace zuul01 https://review.opendev.org/c/opendev/system-config/+/953604 | 23:52 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!