Tuesday, 2025-04-15

*** gmann is now known as gmann_pto01:16
*** jroll05 is now known as jroll007:09
*** persia is now known as Guest1370512:03
clarkbzuul is much busier this morning than yesterday evening and things still appear to be happy14:50
mordredclarkb: where do we upload things to https://quay.io/repository/opendevmirror ?14:54
clarkbmordred: here https://review.opendev.org/c/opendev/system-config/+/947160 corvus is already on top of it14:54
clarkbI can probably approve that after meetings today if no one else beats me to it14:55
mordredcool! I'm learning things14:55
mordredwe all love the variable name mirror_container_images_images yes?14:56
clarkbabsolutely14:56
corvusyou expect some duplication when mirrors are involved14:57
corvusmirror_container_images_segami could have been a contender14:57
clarkbwe're also slowly moving images we build to quay as the canonical location. Lodgeit is already there. Hound is next. Then after Gerrit gets its server upgraded we can do gerrit too14:57
clarkbeventually they will all be there I hope and we can avoid the docker hub rate limits14:58
opendevreviewMerged openstack/project-config master: End gating for openstack/openstack-helm-infra  https://review.opendev.org/c/openstack/project-config/+/94512715:19
clarkbI've got a break in meetings from 1600-1700 UTC. I plan to rerun the index and git repo syncs from review02 to review03 in that block so that we can get a better sense of speedups/time cost and then maybe by this afternoon we'll be in a position to turn on gerrit on review03 and test its functionality15:31
clarkbalright I'm just about to start on that second pass for gerrit data sync16:04
clarkbsyncing the indexes took 5 seconds on the second pass16:08
clarkbthat seems short but there was data transfered and new files created. I do wonder if I maybe need to worry about lucene files not changing size but changing content so rsync isn't resyncing them?16:08
corvusdo timestamps change?16:09
corvuswondering if timestamp detection would be enough or would you need to turn on checksums16:09
clarkbgood question16:09
fungii would pick some random samples and checksum them manually16:11
clarkbafter the first sync we had this file: '39893792 Apr 11 14:46 _a2wu_Lucene90_0.tim' after the second sync we have '39928810 Apr 15 14:24 _acjh_Lucene90_0.tim' which appears to match what is on review02 (file size and timestamp)16:11
clarkbI'll hash that file on 02 and 03 now16:11
fungikeep in mind that checksumming can add massive delays because rsync has to read every file's content whether or not it will be transferred, and then read it again to transfer it if the checksum doesn't match16:12
fungithough in theory if there's not a ton of filesystem cache pressure the second read is probably from ram16:12
clarkbsha256sum seem to match as well16:12
fungiand the first read may be too in cases where the file is read often16:12
clarkbso I think we're probably ok with the default quick check using file size and mod time16:13
clarkband lucene/gerrit's index is just more efficient with new data than I expected16:13
fungibut yeah, it's rare for a file's modified time and length to remain unchanged while its content has been altered, so unless we expect that sort of pathological condition i wouldn't turn on checksumming16:13
clarkbya I think this looks ok after doing this spot check16:14
clarkbI will proceed with syncing the git repos now16:14
fungino well-behaved program should alter a file and reset its modification time back to an old value16:14
clarkbside note in checking review_site/git contents pre second rsync I noticed timestamps newer than the last rsync. This must be a side effect of our git gc/repack cronjob16:15
clarkbnot a huge deal as the --delete flag to rsync should make things match the source side anyway16:16
fungiare we relying on the python-openstackclient container image at all anywhere i don't know about? https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/message/BE7PPQL4DGNDZ2SIMUVSK67I5NF3TFCX/16:18
fungiand yeah, disabling gc/repack on the new server temporarily is probably a good idea. disabling it on the old server this weekend might similarly be wise in order to cut down on churn before a final rsync16:19
clarkbya the current git sync is running much longer than the index sync. Likely due to repacking16:20
clarkbfungi: codesearch says we are not using the oscclient/python-openstackclient container image. I think mordred did try to use it at one time but then we ended up needing special lib combos for rax on bridge and so we stopped?16:20
fungiah, right, i vaguely recall that16:21
fungiinterestingly, https://codesearch.opendev.org/?q=oscclient turns up 0 references16:22
clarkbI made a TODO note to figure out disabling repacking early16:22
clarkboh that might be too many c's16:22
clarkbosclient?16:22
mordredyeah. too many c's16:22
fungiaha, that does have results16:22
fungibut probably not ours16:22
mordredonly ref is in the osclient repo: https://codesearch.opendev.org/?q=osclient%2Fpython&i=nope&literal=nope&files&excludeFiles&repos16:23
fungithanks! mainly trying to confirm that stephenfin's plan won't impact any of our servers16:23
fungisounds like we're all clear16:23
mordredI think the reason we didn't end up using it is it was too hard - there would need to be a wrapper script to include all of the bind mounting needed for it to be sane16:24
mordredhonestly pipx is a better choice for that use case now that it exists :) 16:24
clarkbwe've got a hand curated install now with only the most artisinal libraries known to work with as many clouds as possible16:24
clarkbthe main thing is actually rax volume management which mostly works now16:25
fungiyeah, https://opendev.org/opendev/system-config/src/branch/master/launch16:25
fungithe deps in the pyproject.toml there16:25
clarkbgit sync just finished in 8 minutes 43 seconds16:25
clarkbabout half the time of the original sync16:25
fungiyeah, maybe tolerable if we want to do a warm sync just before the shutdown16:26
fungiand just not bother with disabling gc/repack anywhere16:26
fungidns propagation will take at least that long anyway16:26
clarkbya syncing should be less than 10 minutes which isn't too bad16:26
clarkb(between db, index, and git assuming we don't need anything else I've missed)16:27
clarkbto catch everyone up I did a single sync of the accountPatchReviewDB yesterday which should be neough for testing purposes, synced review_site/index and review_site/git yesterday and then again just now to get a sense of how long it will actually take on Monday. Everything else (except for the known ssh host key problem) should be handled by ansible config management so I think at16:29
clarkbthis point we're ready to turn on gerrit on review03 for the first time and see if it actually works16:29
clarkbI have a meeting in half an hour and haven't had breakfast yet so that won't happen until my next free block of time (either 1800 UTC or 2000 UTC)16:29
clarkbbut feel free to look at the server and check things that you may have concerns about and/or leave comments here or on the etherpad16:30
clarkboh and the root screen on 03 is still up if you want to look at scrollback16:31
clarkbsemi related: I just noticed that /boot on the noble images is a distinct partition and maybe slightly smaller than I would prefer. But our oldest noble node (paste) still shows only 14% usage despite kernel updates it has received so this is probably fine16:33
clarkbI think the issue is likely to be more pronounced on real hardware where you have bigger kernels?16:35
clarkbso ya I think we're ok16:35
clarkboh you know I bet the new kernel compression method makes a difference there too16:36
clarkbso a combo of not needing every last module and compressing more efficiently means we can get away with a smaller /boot16:36
clarkbanyway calling it out as this is a change from jammy and earlier where we might have a /boot/efi (required as it must be fat32) but no separate /boot (just using the / partition)16:37
fungiyes, on my physical systems i have a /boot (ext4), /boot/efi (vfat), and then an encrypted lvm pv for everything else16:39
fungihaving the contents of /boot inside a crypt device is still challenging, since grub's driver for that isn't all there16:40
fungilast i checked anyway, you could do it with luks1 but not luks2 yet16:41
fungi"Since GRUB 2.12rc1, grub-install can create a core image to unlock LUKS2. However, it only supports PBKDF2, not Argon2." https://wiki.archlinux.org/title/GRUB#LUKS216:45
fungiso i guess it's possible with the grub-pc package in debian trixie/testing and sid/unstable, or on bookworm/stable if you enable backports16:46
clarkbmy local install is / /boot/efi/ /home and swap. I have to passphrase to get grub to load the botoable things16:46
clarkbthen i have to passphrase again for the kernle to mount things16:46
clarkbso yes its very doable. Has bene the case for about 2 years I think16:47
clarkbeverything but /boot/efi is encrypted (including swap)16:47
fungiis that using pbkdf2 instead of argon2 i guess?16:47
clarkbI guess? it would use whatever tumbleweed's partitioner does16:47
clarkblsblk doesn't seem to say16:48
clarkbI think hibernate with encryped swap may also mostly work at this point too. But I almost never try it because it never worked in the past16:48
clarkbfungi: the bit in that wiki article about recovering from a failed passphrase is annoying. I personally enter 'exit' into the grub recovery shell which causes the machine to reboot then I can try typing it again16:50
fungiyeah, works fine for me with encrypted swap, has for years16:50
fricklerfyi https://help.ubuntu.com/community/Full_Disk_Encryption_Howto_2019 describes how to setup things so that you only need one passphrase entry16:51
fungii guess https://savannah.gnu.org/bugs/index.php?55093 is the place to watch for argon2i support16:52
fungii'll fiddle with the latest debian-installer and see if it can figure out the setup itself for /boot as part of / on an lvm2 logical volume in a pv that's luks2/pbkdf2 encrypted, with just /boot/efi outside the dm-crypt device16:53
fungiit can handle that with /boot outside dm-crypt on a separate ext4 device, but i haven't tried anything more daring recently16:54
clarkbok if there are no concerns I think I will go ahead and start gerrit on review03 in the next few minutes. I have a gap in meetings where I can turn that on and see what happens18:00
fungisgtm, thanks!18:00
clarkbI'll triple check the replication config has no targets before doing so18:01
clarkbconfirmed it has just the top level replication options without any targets18:01
clarkbin that case give me a minute or two and we should have a gerrit18:01
fungiperfect18:01
clarkbI'll do a pull first too to ensure we have the same 3.10.5 image as prod18:02
clarkb[2025-04-15T18:03:27.126Z] [main] INFO  com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.10.5-1-g47283ba335-dirty ready18:03
clarkbas expected https://review03.opendev.org/q/status:open+-is:wip looks like a snapshot from when I did the data syncs18:04
clarkbI guess I'll leave that up for now as I don't see anything obviously wrong or problematic18:06
clarkbif you want to login to the server and interact with it in more meaningful ways you should be abel to do so by putting the review03 IP(s) in /etc/hosts for review.o.o instead then all the openid redirects should work18:06
clarkbjust keep in mind when we actually switch servers we will delete any work you do on review03 at that point as the sync will overwrite the ephemeral new stuff on 0318:07
clarkbgetting to this point is a big deal to me. Even though we have CI jobs testing gerrit with podman on noble now its nice to see it working in a prod like setup too18:12
fungithough this has led me to discover that, e.g., we don't even redirect https://review02.opendev.org/ to the review cname18:17
fungisomething we should fix, though it's fine to fix it after we're done with the server swap18:17
clarkbwell if we had those redirects this testing with review03 wouldn't be possible18:17
clarkbI'm not sure we should redirect18:17
clarkb(you would always need to udpate /etc/hosts and not just for the authentication dance)18:18
clarkbsince this seems happy I'm going to try and sneak an early lunch before our team meeting in 40 minutes18:19
fungipopping out for quick dinner, bbiab19:45
corvushere's a draft email about the need for image volunteers: https://etherpad.opendev.org/p/B_3yEOP_ZQOms3ysgnXk19:46
corvuspls feel free to make changes to it19:46
clarkbcorvus: looking at it now. One thing that occurred to me is you can download the images from the builder apache servers. Does zuul-launcher have anything like that?19:48
clarkblike maybe we can register them as artifacts if we don't arelady?19:49
clarkbactually we probably do since that is how the whole system works? That is proably sufficient19:49
clarkbcorvus: email looks good (I made one very small edit)19:51
corvusclarkb: yes, they are artifacts and anyone can download (within 3 days since we only keep them that long; we can keep them longer if we want)19:51
clarkbthats probably fine to start. The old images were rebuilt once a day so would only be downloadable for 2 days19:52
corvusi sort of didn't want to go too much into a sales pitch for the system -- it would be distracting, and we can do that another time.  i just wanted to cover "we're making things better" and "here's how the new system makes this specific job easier".  we can mention the artifact thing if you want -- but i don't think anyone actually needs that for this; at least, that hasn't been part of my workflow.  :)19:53
clarkbI don't think you should mention it. The email just triggered taht question in my own mind19:53
corvusack19:54
clarkbI have successfully logged into review03 with /etc/hosts overridden20:35
clarkbstill shows 944183 as unmerged otherwise its a little tough to tell you're on the new host ( can also check the ssl cert names)20:35
clarkbI guess I should start doing some write operations /me will review some stuff20:36
clarkbthat also seems to work. I left a +1 on 93969920:37
clarkbI'm currently trying to clean up the shell and shell-run containers with docker compose on review03. The service may flap while I sort through this21:04
clarkbthat was actually easier than anticipated `docker compse down shell` seems to hvae done it and left the db and gerrit alone21:05
clarkbI've yet to see anything concerning myself. Definitely test too if you have time. Also reviews on the ssh keys thing would be great.21:34
clarkbI'm going to start moving towards the outside to enjoy the spring weather while I can21:34
corvusi'm going to start a full rolling reboot of zuul using the playbook, because i think we're now at a good point for a release, and i'd like to double check things without waiting for the weekly reboot22:53
corvusthat's running in a corvus-owned screen session on bridge using the usual flock and log files23:02
clarkbsounds good23:39
clarkbopenstack just released and then ptg was last week I expect that might go fairly quickly given that demand seems to be lower this week23:40
clarkbmordred: corvus: Iv'e approved the node:22 mirror update. That should land well in advance of the daily periodic job that mirrors things23:45
opendevreviewMerged opendev/system-config master: Mirror node:22  https://review.opendev.org/c/opendev/system-config/+/94716023:48

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!