Wednesday, 2024-12-18

corvusclarkb: whoops thanks.  no rush on merging 970 we can just leave it00:09
corvusclarkb: i'm unlikely to debug those queue items in the near term, so if they need to go, thats fine00:09
Clark[m]Ack sorry switched to dinner prep once I was happy with gerrit01:03
opendevreviewTony Breeds proposed openstack/diskimage-builder master: Add a tool for displaying CPU flags and QEMU version  https://review.opendev.org/c/openstack/diskimage-builder/+/93783601:55
opendevreviewTony Breeds proposed opendev/system-config master: Also include tzdata when installing ARA  https://review.opendev.org/c/opendev/system-config/+/92368401:59
opendevreviewTony Breeds proposed opendev/system-config master: Update ansible-devel job to run on a newer bridge  https://review.opendev.org/c/opendev/system-config/+/93053801:59
opendevreviewTony Breeds proposed opendev/system-config master: Update pip3 role to work on Ubuntu Noble  https://review.opendev.org/c/opendev/system-config/+/93493701:59
opendevreviewTony Breeds proposed opendev/system-config master: Add ara and tzdata as installed requirements for the ansible-devel job  https://review.opendev.org/c/opendev/system-config/+/93491701:59
opendevreviewTony Breeds proposed opendev/system-config master: Install ARA master in the ansible-devel job  https://review.opendev.org/c/opendev/system-config/+/92401201:59
opendevreviewTony Breeds proposed opendev/system-config master: Add some debugging commands to the post job  https://review.opendev.org/c/opendev/system-config/+/92566701:59
*** ykarel_ is now known as ykarel06:02
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404507:22
tonybkarolinku[m]: can you add a depends-on: https://review.opendev.org/c/openstack/diskimage-builder/+/937836 to your change 07:26
karolinku[m]I just rebased on that change07:51
tonybahh cool I didn't notice 07:54
tonybkarolinku[m]: Looks like you got an x86_64-v3 VM, but the job timed-out.   Would you like help debugging that?09:57
tonybOh NM, the node from nodepool is v3, but the VM that got launched isn't so the CentOS-10 VM didn't boot.09:59
karolinku[m]yes, this is most recent problem. it looks like that even if it shows Haswell, AVX flags are disabled10:02
fricklerthere aren't enough logs collected from devstack to be sure, but I assume the libvirt KVM flag doesn't make it into devstack really. either add more log collection or hold a node to check I'd suggest10:07
tonybIt looks like it does, libvirt.cpu_model = Haswell is set according to https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_22c/934045/26/check/dib-nodepool-functional-openstack-centos-10-stream-src/22cc926/nodepool/openstack/screen-n-cpu.txt10:08
tonybThe QEMU in the nodepool VM isn't new enough to emulate x86_64-v3 but it should be able to set it correctly10:09
tonybBut yeah I think placing an autohold will be my next step10:38
tonybthe config looks right but it'll be easier to verify that interactively10:38
tonybkarolinku[m]: If you send me an SSH public key I can add you to the held node10:39
opendevreviewRafal Lewandowski proposed openstack/diskimage-builder master: Prevent from overwriting grub defaults if no variables are set  https://review.opendev.org/c/openstack/diskimage-builder/+/93768411:07
karolinku[m]tonyb: https://github.com/karolinku.keys11:31
hasharclarkb: corvus: hi,I was looking at Gerrit metric for caches and "Cache disk metrics are expensive to compute on larger installations and are not computed by default."   Which I guess explains why `gerrit show-cache` was slow yesterday12:26
* hashar at the bottom of https://gerrit.wikimedia.org/r/Documentation/metrics.html#_caches12:26
hasharI also restarted the Wikimedia Gerrit which has been up since October 22nd or two months12:39
hashargit_file_diff.h2.db went from 12G to 500MB   and gerrit_file_diff.h2.db from 3.3G to 683M as well12:40
hasharother caches got made smaller12:40
hasharall thanks to `-Dh2.maxCompactTime=15000` being passed to java12:41
*** darmach5 is now known as darmach14:24
clarkbhashar: looks like the compact time occurs during db shutdown not startup so we would have to restart then restart again before it takes effect. Not a big deal and still likely an improvement for us. I'll work on a change in a bit15:47
hasharoh possibly hyeah15:47
clarkbinfra-root I also want to delete the caches I moved aside on gerrit yesterday to return that disk space to the service. Any concerns with me doing that this morning?15:47
hasharand your caches should have some noticeable size now15:48
clarkbya I need to load ssh keys and check15:49
fricklerI haven't noticed anything wrong with gerrit today, so seems fine to proceed with the cleanup15:53
clarkbthose two caches I moved aside are now up to 5GB and 4GB in production again. So ya they grow, but still much smaller than they were when I moved them aside16:02
clarkbinfra-root I have deleted the contents of /home/gerrit2/tmp/clarkb/old_caches/ on review0216:08
clarkbI will also shutdown the screen used to do the restarts yesterday16:08
opendevreviewClark Boylan proposed opendev/system-config master: Set h2.maxCompactTime to 15 seconds  https://review.opendev.org/c/opendev/system-config/+/93800016:17
clarkbhashar: ^ fyi and thanks again for the help16:17
hasharoh you are even adding my blog post as a reference, that is kind :)16:22
clarkbhashar: of course!16:23
clarkbI am a fan of capturing as much info for future me and you as possible. Probably 90% of the time we never refer back to that info but the 10% that you do is so much better when you haev the information. Also you did all the debugging and deserve the credit16:24
corvus++16:28
hasharI +1ed and let a comment about our gerrit being probably killed by systemd after 90seconds16:30
hasharand I have no idea why it takes a long time for it to stop16:30
hasharthat MIGHT be due to compaction16:30
hasharor some other oddity, I have never investigated16:30
hashar$ du -s -h node_modules/16:37
hashar442Mnode_modules/16:37
hasharerr wrong window16:37
clarkbfun fact those two large cache files were consuming more disk than the rest of the complete gerrit installation combined16:56
clarkbthe ubuntu ports reprepro process is still running. Doesn't look like the log file has updated for a few hours though so not sure if it is stalling out or just busy doing expected work17:53
opendevreviewClark Boylan proposed opendev/system-config master: Set h2.maxCompactTime to 15 seconds  https://review.opendev.org/c/opendev/system-config/+/93800018:05
corvusthat's one heck of a cache19:52
hasharclarkb: fun you apparently noticed the issue back in June 2022 based on https://review.opendev.org/c/opendev/system-config/+/84988622:14
hasharand I have hit live in December 2022 (resulting in an outage)22:14
hasharthat change can be abandoned22:14
* hashar sleeps22:15
clarkboh good research22:15
clarkbpre-commit doesn't log the versions of things it installs like tox and nox do23:38
* clarkb adds that to reasons to not use pre-commit23:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!