*** gmann is now known as gmann_pto | 01:16 | |
*** jroll05 is now known as jroll0 | 07:09 | |
*** persia is now known as Guest13705 | 12:03 | |
clarkb | zuul is much busier this morning than yesterday evening and things still appear to be happy | 14:50 |
---|---|---|
mordred | clarkb: where do we upload things to https://quay.io/repository/opendevmirror ? | 14:54 |
clarkb | mordred: here https://review.opendev.org/c/opendev/system-config/+/947160 corvus is already on top of it | 14:54 |
clarkb | I can probably approve that after meetings today if no one else beats me to it | 14:55 |
mordred | cool! I'm learning things | 14:55 |
mordred | we all love the variable name mirror_container_images_images yes? | 14:56 |
clarkb | absolutely | 14:56 |
corvus | you expect some duplication when mirrors are involved | 14:57 |
corvus | mirror_container_images_segami could have been a contender | 14:57 |
clarkb | we're also slowly moving images we build to quay as the canonical location. Lodgeit is already there. Hound is next. Then after Gerrit gets its server upgraded we can do gerrit too | 14:57 |
clarkb | eventually they will all be there I hope and we can avoid the docker hub rate limits | 14:58 |
opendevreview | Merged openstack/project-config master: End gating for openstack/openstack-helm-infra https://review.opendev.org/c/openstack/project-config/+/945127 | 15:19 |
clarkb | I've got a break in meetings from 1600-1700 UTC. I plan to rerun the index and git repo syncs from review02 to review03 in that block so that we can get a better sense of speedups/time cost and then maybe by this afternoon we'll be in a position to turn on gerrit on review03 and test its functionality | 15:31 |
clarkb | alright I'm just about to start on that second pass for gerrit data sync | 16:04 |
clarkb | syncing the indexes took 5 seconds on the second pass | 16:08 |
clarkb | that seems short but there was data transfered and new files created. I do wonder if I maybe need to worry about lucene files not changing size but changing content so rsync isn't resyncing them? | 16:08 |
corvus | do timestamps change? | 16:09 |
corvus | wondering if timestamp detection would be enough or would you need to turn on checksums | 16:09 |
clarkb | good question | 16:09 |
fungi | i would pick some random samples and checksum them manually | 16:11 |
clarkb | after the first sync we had this file: '39893792 Apr 11 14:46 _a2wu_Lucene90_0.tim' after the second sync we have '39928810 Apr 15 14:24 _acjh_Lucene90_0.tim' which appears to match what is on review02 (file size and timestamp) | 16:11 |
clarkb | I'll hash that file on 02 and 03 now | 16:11 |
fungi | keep in mind that checksumming can add massive delays because rsync has to read every file's content whether or not it will be transferred, and then read it again to transfer it if the checksum doesn't match | 16:12 |
fungi | though in theory if there's not a ton of filesystem cache pressure the second read is probably from ram | 16:12 |
clarkb | sha256sum seem to match as well | 16:12 |
fungi | and the first read may be too in cases where the file is read often | 16:12 |
clarkb | so I think we're probably ok with the default quick check using file size and mod time | 16:13 |
clarkb | and lucene/gerrit's index is just more efficient with new data than I expected | 16:13 |
fungi | but yeah, it's rare for a file's modified time and length to remain unchanged while its content has been altered, so unless we expect that sort of pathological condition i wouldn't turn on checksumming | 16:13 |
clarkb | ya I think this looks ok after doing this spot check | 16:14 |
clarkb | I will proceed with syncing the git repos now | 16:14 |
fungi | no well-behaved program should alter a file and reset its modification time back to an old value | 16:14 |
clarkb | side note in checking review_site/git contents pre second rsync I noticed timestamps newer than the last rsync. This must be a side effect of our git gc/repack cronjob | 16:15 |
clarkb | not a huge deal as the --delete flag to rsync should make things match the source side anyway | 16:16 |
fungi | are we relying on the python-openstackclient container image at all anywhere i don't know about? https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/message/BE7PPQL4DGNDZ2SIMUVSK67I5NF3TFCX/ | 16:18 |
fungi | and yeah, disabling gc/repack on the new server temporarily is probably a good idea. disabling it on the old server this weekend might similarly be wise in order to cut down on churn before a final rsync | 16:19 |
clarkb | ya the current git sync is running much longer than the index sync. Likely due to repacking | 16:20 |
clarkb | fungi: codesearch says we are not using the oscclient/python-openstackclient container image. I think mordred did try to use it at one time but then we ended up needing special lib combos for rax on bridge and so we stopped? | 16:20 |
fungi | ah, right, i vaguely recall that | 16:21 |
fungi | interestingly, https://codesearch.opendev.org/?q=oscclient turns up 0 references | 16:22 |
clarkb | I made a TODO note to figure out disabling repacking early | 16:22 |
clarkb | oh that might be too many c's | 16:22 |
clarkb | osclient? | 16:22 |
mordred | yeah. too many c's | 16:22 |
fungi | aha, that does have results | 16:22 |
fungi | but probably not ours | 16:22 |
mordred | only ref is in the osclient repo: https://codesearch.opendev.org/?q=osclient%2Fpython&i=nope&literal=nope&files&excludeFiles&repos | 16:23 |
fungi | thanks! mainly trying to confirm that stephenfin's plan won't impact any of our servers | 16:23 |
fungi | sounds like we're all clear | 16:23 |
mordred | I think the reason we didn't end up using it is it was too hard - there would need to be a wrapper script to include all of the bind mounting needed for it to be sane | 16:24 |
mordred | honestly pipx is a better choice for that use case now that it exists :) | 16:24 |
clarkb | we've got a hand curated install now with only the most artisinal libraries known to work with as many clouds as possible | 16:24 |
clarkb | the main thing is actually rax volume management which mostly works now | 16:25 |
fungi | yeah, https://opendev.org/opendev/system-config/src/branch/master/launch | 16:25 |
fungi | the deps in the pyproject.toml there | 16:25 |
clarkb | git sync just finished in 8 minutes 43 seconds | 16:25 |
clarkb | about half the time of the original sync | 16:25 |
fungi | yeah, maybe tolerable if we want to do a warm sync just before the shutdown | 16:26 |
fungi | and just not bother with disabling gc/repack anywhere | 16:26 |
fungi | dns propagation will take at least that long anyway | 16:26 |
clarkb | ya syncing should be less than 10 minutes which isn't too bad | 16:26 |
clarkb | (between db, index, and git assuming we don't need anything else I've missed) | 16:27 |
clarkb | to catch everyone up I did a single sync of the accountPatchReviewDB yesterday which should be neough for testing purposes, synced review_site/index and review_site/git yesterday and then again just now to get a sense of how long it will actually take on Monday. Everything else (except for the known ssh host key problem) should be handled by ansible config management so I think at | 16:29 |
clarkb | this point we're ready to turn on gerrit on review03 for the first time and see if it actually works | 16:29 |
clarkb | I have a meeting in half an hour and haven't had breakfast yet so that won't happen until my next free block of time (either 1800 UTC or 2000 UTC) | 16:29 |
clarkb | but feel free to look at the server and check things that you may have concerns about and/or leave comments here or on the etherpad | 16:30 |
clarkb | oh and the root screen on 03 is still up if you want to look at scrollback | 16:31 |
clarkb | semi related: I just noticed that /boot on the noble images is a distinct partition and maybe slightly smaller than I would prefer. But our oldest noble node (paste) still shows only 14% usage despite kernel updates it has received so this is probably fine | 16:33 |
clarkb | I think the issue is likely to be more pronounced on real hardware where you have bigger kernels? | 16:35 |
clarkb | so ya I think we're ok | 16:35 |
clarkb | oh you know I bet the new kernel compression method makes a difference there too | 16:36 |
clarkb | so a combo of not needing every last module and compressing more efficiently means we can get away with a smaller /boot | 16:36 |
clarkb | anyway calling it out as this is a change from jammy and earlier where we might have a /boot/efi (required as it must be fat32) but no separate /boot (just using the / partition) | 16:37 |
fungi | yes, on my physical systems i have a /boot (ext4), /boot/efi (vfat), and then an encrypted lvm pv for everything else | 16:39 |
fungi | having the contents of /boot inside a crypt device is still challenging, since grub's driver for that isn't all there | 16:40 |
fungi | last i checked anyway, you could do it with luks1 but not luks2 yet | 16:41 |
fungi | "Since GRUB 2.12rc1, grub-install can create a core image to unlock LUKS2. However, it only supports PBKDF2, not Argon2." https://wiki.archlinux.org/title/GRUB#LUKS2 | 16:45 |
fungi | so i guess it's possible with the grub-pc package in debian trixie/testing and sid/unstable, or on bookworm/stable if you enable backports | 16:46 |
clarkb | my local install is / /boot/efi/ /home and swap. I have to passphrase to get grub to load the botoable things | 16:46 |
clarkb | then i have to passphrase again for the kernle to mount things | 16:46 |
clarkb | so yes its very doable. Has bene the case for about 2 years I think | 16:47 |
clarkb | everything but /boot/efi is encrypted (including swap) | 16:47 |
fungi | is that using pbkdf2 instead of argon2 i guess? | 16:47 |
clarkb | I guess? it would use whatever tumbleweed's partitioner does | 16:47 |
clarkb | lsblk doesn't seem to say | 16:48 |
clarkb | I think hibernate with encryped swap may also mostly work at this point too. But I almost never try it because it never worked in the past | 16:48 |
clarkb | fungi: the bit in that wiki article about recovering from a failed passphrase is annoying. I personally enter 'exit' into the grub recovery shell which causes the machine to reboot then I can try typing it again | 16:50 |
fungi | yeah, works fine for me with encrypted swap, has for years | 16:50 |
frickler | fyi https://help.ubuntu.com/community/Full_Disk_Encryption_Howto_2019 describes how to setup things so that you only need one passphrase entry | 16:51 |
fungi | i guess https://savannah.gnu.org/bugs/index.php?55093 is the place to watch for argon2i support | 16:52 |
fungi | i'll fiddle with the latest debian-installer and see if it can figure out the setup itself for /boot as part of / on an lvm2 logical volume in a pv that's luks2/pbkdf2 encrypted, with just /boot/efi outside the dm-crypt device | 16:53 |
fungi | it can handle that with /boot outside dm-crypt on a separate ext4 device, but i haven't tried anything more daring recently | 16:54 |
clarkb | ok if there are no concerns I think I will go ahead and start gerrit on review03 in the next few minutes. I have a gap in meetings where I can turn that on and see what happens | 18:00 |
fungi | sgtm, thanks! | 18:00 |
clarkb | I'll triple check the replication config has no targets before doing so | 18:01 |
clarkb | confirmed it has just the top level replication options without any targets | 18:01 |
clarkb | in that case give me a minute or two and we should have a gerrit | 18:01 |
fungi | perfect | 18:01 |
clarkb | I'll do a pull first too to ensure we have the same 3.10.5 image as prod | 18:02 |
clarkb | [2025-04-15T18:03:27.126Z] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.10.5-1-g47283ba335-dirty ready | 18:03 |
clarkb | as expected https://review03.opendev.org/q/status:open+-is:wip looks like a snapshot from when I did the data syncs | 18:04 |
clarkb | I guess I'll leave that up for now as I don't see anything obviously wrong or problematic | 18:06 |
clarkb | if you want to login to the server and interact with it in more meaningful ways you should be abel to do so by putting the review03 IP(s) in /etc/hosts for review.o.o instead then all the openid redirects should work | 18:06 |
clarkb | just keep in mind when we actually switch servers we will delete any work you do on review03 at that point as the sync will overwrite the ephemeral new stuff on 03 | 18:07 |
clarkb | getting to this point is a big deal to me. Even though we have CI jobs testing gerrit with podman on noble now its nice to see it working in a prod like setup too | 18:12 |
fungi | though this has led me to discover that, e.g., we don't even redirect https://review02.opendev.org/ to the review cname | 18:17 |
fungi | something we should fix, though it's fine to fix it after we're done with the server swap | 18:17 |
clarkb | well if we had those redirects this testing with review03 wouldn't be possible | 18:17 |
clarkb | I'm not sure we should redirect | 18:17 |
clarkb | (you would always need to udpate /etc/hosts and not just for the authentication dance) | 18:18 |
clarkb | since this seems happy I'm going to try and sneak an early lunch before our team meeting in 40 minutes | 18:19 |
fungi | popping out for quick dinner, bbiab | 19:45 |
corvus | here's a draft email about the need for image volunteers: https://etherpad.opendev.org/p/B_3yEOP_ZQOms3ysgnXk | 19:46 |
corvus | pls feel free to make changes to it | 19:46 |
clarkb | corvus: looking at it now. One thing that occurred to me is you can download the images from the builder apache servers. Does zuul-launcher have anything like that? | 19:48 |
clarkb | like maybe we can register them as artifacts if we don't arelady? | 19:49 |
clarkb | actually we probably do since that is how the whole system works? That is proably sufficient | 19:49 |
clarkb | corvus: email looks good (I made one very small edit) | 19:51 |
corvus | clarkb: yes, they are artifacts and anyone can download (within 3 days since we only keep them that long; we can keep them longer if we want) | 19:51 |
clarkb | thats probably fine to start. The old images were rebuilt once a day so would only be downloadable for 2 days | 19:52 |
corvus | i sort of didn't want to go too much into a sales pitch for the system -- it would be distracting, and we can do that another time. i just wanted to cover "we're making things better" and "here's how the new system makes this specific job easier". we can mention the artifact thing if you want -- but i don't think anyone actually needs that for this; at least, that hasn't been part of my workflow. :) | 19:53 |
clarkb | I don't think you should mention it. The email just triggered taht question in my own mind | 19:53 |
corvus | ack | 19:54 |
clarkb | I have successfully logged into review03 with /etc/hosts overridden | 20:35 |
clarkb | still shows 944183 as unmerged otherwise its a little tough to tell you're on the new host ( can also check the ssl cert names) | 20:35 |
clarkb | I guess I should start doing some write operations /me will review some stuff | 20:36 |
clarkb | that also seems to work. I left a +1 on 939699 | 20:37 |
clarkb | I'm currently trying to clean up the shell and shell-run containers with docker compose on review03. The service may flap while I sort through this | 21:04 |
clarkb | that was actually easier than anticipated `docker compse down shell` seems to hvae done it and left the db and gerrit alone | 21:05 |
clarkb | I've yet to see anything concerning myself. Definitely test too if you have time. Also reviews on the ssh keys thing would be great. | 21:34 |
clarkb | I'm going to start moving towards the outside to enjoy the spring weather while I can | 21:34 |
corvus | i'm going to start a full rolling reboot of zuul using the playbook, because i think we're now at a good point for a release, and i'd like to double check things without waiting for the weekly reboot | 22:53 |
corvus | that's running in a corvus-owned screen session on bridge using the usual flock and log files | 23:02 |
clarkb | sounds good | 23:39 |
clarkb | openstack just released and then ptg was last week I expect that might go fairly quickly given that demand seems to be lower this week | 23:40 |
clarkb | mordred: corvus: Iv'e approved the node:22 mirror update. That should land well in advance of the daily periodic job that mirrors things | 23:45 |
opendevreview | Merged opendev/system-config master: Mirror node:22 https://review.opendev.org/c/opendev/system-config/+/947160 | 23:48 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!