Friday, 2025-06-27

mnasiadkafungi: thanks05:31
mnasiadkaNeed a second +2 on https://review.opendev.org/c/openstack/diskimage-builder/+/952548 and a DIB release afterwards to continue adding stream10 and rocky10 to niz :-)05:32
*** ykarel__ is now known as ykarel06:08
*** darmach2 is now known as darmach11:33
fungimnasiadka: i approved it, i or someone else can tag a new release once that lands13:14
mnasiadkaNice, thanks :)13:32
opendevreviewMerged openstack/diskimage-builder master: Add Rocky Linux 10 support to rocky-container element  https://review.opendev.org/c/openstack/diskimage-builder/+/95254814:15
fungii'm in meetings until 16:00 utc, but can look into doing a dib release once those wrap up14:17
fungilooks like element/matrix wants us to make some dns changes, i can get that proposed later today if nobody beats me to it14:44
clarkbinfra-root is today a good day to land https://review.opendev.org/c/opendev/system-config/+/950595 and plan for a gerrit restart?14:58
clarkbthis is the change that removes the extra h2 compaction time setting from the jvm args14:59
fungisure, once my meetings are done15:01
corvushuh, we have a 16gb nodes for zuul01 and a 30gb node for zuul0215:34
corvusi guess we launched those when zuul was using a lot more ram15:35
corvus(though, i'm not sure why they are different)15:35
corvusfor the past year, 8gb of ram would have been fine for zuul01, and these days it's even better, only using 3.67gb15:36
corvusi think the noble replacements should both be 8gb machines15:38
clarkbnice!15:39
corvushttps://imgur.com/YzsiUYR  you can see the change to reuse config objects15:40
corvusi'm launching replacement servers for zuul01 and zuul02 now; so heads up, since i'm reusing the names, there will be two of them in the server list for a little bit.15:42
clarkbgood to know15:43
opendevreviewJames E. Blair proposed opendev/zone-opendev.org master: Replace zuul01 and zuul02  https://review.opendev.org/c/opendev/zone-opendev.org/+/95359715:55
opendevreviewJames E. Blair proposed opendev/system-config master: Replace zuul01 and zuul02  https://review.opendev.org/c/opendev/system-config/+/95359815:55
corvusi plan on pausing here and resuming later, but if folks want to +2 those changes, that would be great, and i can +w them and do the actual switches later.15:57
clarkbI'll take a look momentarily15:58
clarkbcorvus: both lgtm. The one question I have is whether or not there is a chicken and egg situation where we rely on zuul to deploy these things but changing dns records and group membership will affect firewalls/network connectivity in a way that may impact the ability to deploy? I guess worst case we manually run things from bridge?16:00
corvusthe other concern is the load balancer; i was thinking of putting it in emergency...16:01
corvusbut perhaps i should split these into two changes16:01
corvusand just do one at a time.  will slow the process a bit, but then we can allow for one to be broken while the other is not16:01
clarkbya that seems like a reasonable out. Its the same issue and solution with zookeeper...16:02
clarkbfungi: were you planning to approve https://review.opendev.org/c/opendev/system-config/+/950595 or should I?16:03
fungion it now16:03
clarkbcorvus: editing the haproxy config manually should be fairly easy at least16:03
clarkbif you don't want to do them one at a time16:04
corvusyeah i agree... but i think i'll split it though...  hopefully should be a more relaxed process :)16:05
opendevreviewJames E. Blair proposed opendev/zone-opendev.org master: Replace zuul01  https://review.opendev.org/c/opendev/zone-opendev.org/+/95360216:07
opendevreviewJames E. Blair proposed opendev/zone-opendev.org master: Replace zuul02  https://review.opendev.org/c/opendev/zone-opendev.org/+/95360316:07
opendevreviewJames E. Blair proposed opendev/system-config master: Replace zuul01  https://review.opendev.org/c/opendev/system-config/+/95360416:07
opendevreviewJames E. Blair proposed opendev/system-config master: Replace zuul02  https://review.opendev.org/c/opendev/system-config/+/95360516:07
clarkbfungi: we are not building new gerrit container images with that chagne (I'm just confirming we don't need to pull as part of the restart)16:13
opendevreviewRubĂ©n proposed openstack/diskimage-builder master: docs: clarify when debian-systemd makes sense  https://review.opendev.org/c/openstack/diskimage-builder/+/95361216:13
clarkba simple restart should do (where simple == not simple because we have a bunch of other things to do)16:13
mnasiadkaclarkb: do you think https://review.opendev.org/c/opendev/zuul-providers/+/953269 could be merged? Would like to continue with stream10 and rocky10 for niz next week16:13
fungik16:14
fungiyeah we just deploy the updated config file to the server16:14
clarkbmnasiadka: yes, though corvus may have held off landing that one due to the issues monitoring image build status with zuul launcher right now16:14
clarkbcorvus: ^ should we go ahead and add focal and bionic or wait for logging to get better?16:15
clarkbI also have one question on the hcnage I'll post shortly16:16
corvusoh er i just +3d 95326916:16
corvusi think i only didn't approve it so you could see it clarkb :)16:16
clarkbcorvus: the question is basically should we add the promote jobs to the periodic and promote pipelines16:17
clarkbI think its fine if that goes in a followup too16:17
clarkbquestions is posted on the change with better context16:17
corvusoh yes, let's do that16:18
corvusand let's revise the change for it so we have a better model to copypasta16:18
clarkbmnasiadka: ^ fyi16:19
corvusi think also we were missing "image-build"16:20
corvusso 3 pipelines16:20
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add Ubuntu bionic/focal builds, labels and provider config  https://review.opendev.org/c/opendev/zuul-providers/+/95326916:21
corvusi added those and +2d it if you want to +3 it clarkb 16:23
clarkbdone16:23
rubencabrera[m]<clarkb> "rubencabrera: looks like..." <- Just submitted this README change so I don't fall for the same mistake again. Does it make sense?16:25
rubencabrera[m]https://review.opendev.org/c/openstack/diskimage-builder/+/95361216:25
clarkbrubencabrera[m]: yup I +2' it16:27
clarkbmnasiadka: I left some comments on https://review.opendev.org/c/opendev/zuul-providers/+/953460 as well16:27
clarkbfungi: since the gerrit restart process is complicated enough now I put together https://etherpad.opendev.org/p/gerrit-restart-process16:37
fungithanks16:37
clarkbthis incorporates the idea of doing a manual sighup to see if things shutdown quickly doing that vs the docker compose driven sigint16:38
clarkbfungi: you don't happen to have the command you used for deleting cache files somewhere do you? I need to add that to the etherpad still16:38
fungirm ~gerrit2/review_site/cache/{gerrit_file_diff,git_file_diff}.*16:39
fungilooks like what's in root's shell history as the most recent16:39
fungicould be simplified to {gerrit,git}_file_diff.*16:40
clarkbfungi: we need to add diff_intraline to the list16:40
mnasiadkaclarkb: yeah, the stream10 one needs more love, thanks for the comments :)16:41
clarkbso I've kept the comma separated list16:41
fungior even g{err,}it_file_diff.* if you're insane ;)16:41
clarkbfungi: I like to think I have managed to hold onto a small semblance of sanity :)16:41
fungisounds good16:41
opendevreviewJeremy Stanley proposed opendev/system-config master: Update .well-known/matrix/client for mobile OIDC  https://review.opendev.org/c/opendev/system-config/+/95362416:56
fungi^ as promised earlier16:57
fungiclarkb: okay, the dib release buildset just reported17:06
clarkbperfect. The gerrit config update should land soon, then we wait for it to deploy then we can restart and see how we do17:06
fungii've started a root screen session on review in preparation17:06
clarkbI have attached17:08
opendevreviewMerged opendev/system-config master: Revert "Set h2.maxCompactTime to 15 seconds"  https://review.opendev.org/c/opendev/system-config/+/95059517:28
clarkbthe config files appear to have updated as expected17:32
clarkbfungi: I think we can proceed whenever we're ready. Do you want to drive or should I?17:32
clarkbI put a suggested statusbot message in the etherpad too17:33
fungii can drive, sure17:35
fungi#status notice Gerrit is being restarted to pick up a configuration change. You may notice a short outage.17:35
opendevstatusfungi: sending notice17:35
-opendevstatus- NOTICE: Gerrit is being restarted to pick up a configuration change. You may notice a short outage.17:35
fungigerrit java pid seems to be 76506917:36
clarkbyes that looks right to me17:36
clarkbI stuck the && date in there so we'd have a good idea of how long it takes to shutdown if it is slow again17:36
fungistanding by to hup the process once notifications are done17:36
opendevstatusfungi: finished sending notice17:38
clarkbI'm ready when you are17:39
fungisignal sent17:39
clarkband process is still running last I checked17:40
clarkbthats "good" because it means that sigint isn't necessarily the problem?17:40
clarkbanyway I think we can let it go for a few minutes before we issue the docker compose down and let it fall back onto its timeout behavior17:40
clarkbstrace shows it writing and seeking in file id 16417:41
clarkbjava    765069 gerrit2  164u   REG              252,0 6056280064 44047297 /var/gerrit/cache/git_file_diff.h2.db17:42
clarkbso I think our suspicion that the compaction is slowing down shutdown here may be correct17:42
clarkbrunning lsof its still compacting that file17:43
clarkbs/compacting/seeking and writing to that file/17:43
clarkbwhcih is logner than the 15 second timeout we give it...17:43
fungiyeah17:44
clarkbit is also writing a bunch of 0s which does seem in line with compaction. Maybe the value isn't actually being treated as milliseconds and 15000 is a longer timeout than we anticipated17:44
clarkboh wow the file sizes are definitely shrinking according to ls though17:45
fungibut also compaction on shutdown is pointless if we always kill it early due to ecxceeding the timeout17:45
clarkbit was like 15GB before iirc17:45
clarkbfungi: yup and we've discovered that we get faster startup responses if we delete the cache anyway17:46
clarkbso I think this config update to stop doing long compaction is a good one17:46
clarkbfungi: the process stopped17:46
clarkbso it ook about 6-7 minutes. Just slightly longer than our 5 minute timeout17:46
clarkbI think we can proceed with docker compose down now and the rest of our restart process17:47
fungidoing17:47
fungion its way back up now17:48
clarkb[2025-06-27T17:48:22.232Z] [main] INFO  com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.10.6-12-gf8d9e0470a-dirty ready17:48
clarkbps -elf | grep jvm doesn't show the h2 compaction timeout flag as expected (good)17:48
fungiconfirmed17:49
clarkbchanges and diffs load for me in the web ui17:49
fungiyeah, poking around the ui everything appears fine17:49
fungii'll let the openstack release team know we're done17:49
clarkbif someone has a change or patchset to push that would be a good check and then we can check replication too17:49
clarkbbut ya I think we're largely done at this point17:50
clarkbI'll summarize what we've learned in the channel for record keeping purposes once we're satisfied everything is functioning17:50
clarkbremote:   https://review.opendev.org/c/zuul/zuul/+/952592 Set lower bounds on IBM, Google, and AWS cloud deps17:52
clarkbI just pushed an update to that change17:52
clarkbhttps://opendev.org/zuul/zuul/commit/d860fc4ed27c5694b5ca1318f7adf6ddd0cc1e58 and that seems to have replicated17:53
clarkbTo summarize: We shutdown Gerrit this time using a manual sighup out of band from docker compose/podman. We did that to test if sighup behaves any differently than sigint. And "good" news it did not. Gerrit took about 6-7 minutes to shutdown after the sighup. During this time I ran a strace which showed gerrit reading/writing/seeking to file descriptor 164. lsof reported this as17:55
clarkbbelonging to one of the h2 cache files. ls also showed the size of the large cache files shrinking bout about a third. I think that this pretty much confirms that compaction of h2 is what is slowing down gerrit restarts and not something to do with the sighup -> sigint move17:55
clarkbThis means that the revert we just put in place to remove longer h2 compaction times is a good fix for us. Subsequent gerrit restarts should work via docker compose with sigint just fine and be much quicker. We're deleting the large cache files during the restart anyway so there isn't much sense in compacting them first especially if that takes minutes to do17:56
clarkbhashar: ^ fyi since you had been dealing with similar17:57
clarkbfungi: oh! I just remembered we need to issue a reindex of changes. That was the new thing we learned17:57
fungiaha, yes, on it17:57
clarkbcan you add it to the etherpad so that we have it for future reference?17:58
fungiyeah, that's something we trigger through the ssh api, right?17:59
clarkb`ssh -p 29418 adminuser@review03.opendev.org gerrit index start changes --force`17:59
clarkblooks like this is what i have in my local command history17:59
clarkbthe --force is necessary iirc because the current index version is the latest index version. --force says reindex anyway in that situation18:00
clarkbso yes via the ssh api. Then you can use show-queue -w to track progress18:00
fungi`gerrit index changes` looks like the syntax18:00
fungiaha, you're further down the docs road than me18:00
fungiReindexer started18:07
fungisorry, was fumbling with my various ssh keys trying to work out the right one for my admin account18:07
clarkbshow-queue -w shows a bunch of index all changes for project foo entries now18:08
fungiand yeah, i used `gerrit index start changes --force` as suggested18:08
clarkbcool I added that to the etherpad18:09
clarkbhopefully now we don't forget next time18:09
clarkbif you tail the error log you get periodic updates on reindex progress. If you find the ssh connection errors too noisy you can filter for "Reindexing changes"18:13
clarkbas an alternative to running show-queue periodically18:14
clarkbhttps://groups.google.com/g/repo-discuss/c/qypZHLipsCU everyone continues to struggle with the ai bot crawlers18:16
fungiyeah, my personal webserver that has gitweb set up for a few very small projects is periodically entirely offline these days due to the crawler load18:28
clarkbonce reindexing completes I'm going to pop out for lunch18:39
clarkbFailed 3/947540 changes Reindex changes to version 86 complete Using changes schema version 8618:40
clarkbspot checking those three failures they are teh really old changes with problems that its struggled with forever18:40
clarkbI don't personally see anything concerning18:40
clarkbfungi: also I detached from the screen a while back and I think you can shut it down whenever you're done with it18:41
clarkbalright back later18:42
fungiyeah, looks good19:02
fungii too am popping out to grab an early dinnner while the tourists are still busy baking themselves19:02
opendevreviewMerged openstack/diskimage-builder master: docs: clarify when debian-systemd makes sense  https://review.opendev.org/c/openstack/diskimage-builder/+/95361221:35
opendevreviewMerged opendev/zone-opendev.org master: Replace zuul01  https://review.opendev.org/c/opendev/zone-opendev.org/+/95360223:35
opendevreviewMerged opendev/system-config master: Replace zuul01  https://review.opendev.org/c/opendev/system-config/+/95360423:52

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!