Wednesday, 2024-11-13

opendevreviewTony Breeds proposed opendev/system-config master: Install ARA master in the ansible-devel job  https://review.opendev.org/c/opendev/system-config/+/92401200:07
opendevreviewTony Breeds proposed opendev/system-config master: Install ARA master in the ansible-devel job  https://review.opendev.org/c/opendev/system-config/+/92401200:15
opendevreviewTony Breeds proposed opendev/system-config master: Add ara and tzdata as installed requirements for the ansible-devel job  https://review.opendev.org/c/opendev/system-config/+/93491700:15
opendevreviewMerged zuul/zuul-jobs master: Switch logo color in docs pages to dark blue  https://review.opendev.org/c/zuul/zuul-jobs/+/93445300:37
clarkbI've been semi regularly checking the gerrit queues for the last ~10 minutes or so and index interactive status:abandoned tasks keep showing up00:55
clarkbI suspect these are tasks handling those queries?00:55
clarkbinteresting though as status:abandoned is a weurd query to get consistently00:56
clarkbgitea13 is having a bit of a sad (I noticed since some replication tasks were hanging around01:00
clarkblooks like a gitea process is consuming all of the memory. I half wonder if gitea has a memory leak now01:00
clarkbwe had attributed memory pressure to git processes but in this case git seems fine01:00
clarkbI'm going to stop and start gitea on gitea1301:00
clarkbhttps://github.com/go-gitea/gitea/issues/31565 seems like a likely culprit though no known solution (just a workaround that may or may not help)01:06
clarkbI haven't see anything out of the ordinary around 0100 in gerrit queues fwiw01:06
clarkbmaybe it was coincidence or we need more stars to align01:06
clarkbnoonedeadpunk: ^ fyi I'm beginning to suspect that this may result in your request timeouts01:08
tonybWhat distros do we care to "support" in system-config?  LTSs only? > Xenial?01:11
fungicurrently yes i think01:11
clarkbyes I think we actually started to remove xenial when there were some recent problems?01:12
tonybThis is in reference to https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/pip3/tasks/main.yaml#L17 and trying to get noble support moving along01:12
fungiif pip3 only exists to handle <=xenial at this point, then happy to see it deleted01:12
clarkbI guess that is an important distinction though. We do still run on a small number of xenial nodes01:12
tonybpython3-distutils no longer exists on noble01:13
clarkbso removing existing support like that is likly to break something01:13
clarkbtonyb: I would probably rewrite that to include vars listing the packages to install or include tasks for each release that install the right packages01:14
clarkbthe default can be noble (so we're forward looking) then have special cases for the older stuff01:14
clarkbfungi: that role exists to install pip from upstream I think01:14
tonybI'll take a gently does it approach01:16
clarkbya you can also just != xenial and != noble01:16
clarkbthats probably the minimal change01:16
tonybYeah01:16
fungiah, yes i see it does do different things on xenial, bionic, and newer01:17
clarkbre status:abandoned tasks there appears to be some researcher requesting details for all abandoned changes01:19
clarkbgrep for abandoned in the apache logs and you'll see them. I don't think they are impacting things negatively so seem to be throttling well and I'll leave it be01:19
clarkband now I need to find dinner01:20
*** fungi is now known as Guest924901:33
*** kinrui is now known as fungi01:41
opendevreviewSuzan Song proposed opendev/git-review master: Support running on systems setting GIT_SSH_COMMAND environment variable  https://review.opendev.org/c/opendev/git-review/+/93474502:25
opendevreviewSuzan Song proposed opendev/git-review master: Support running on systems setting GIT_SSH_COMMAND environment variable  https://review.opendev.org/c/opendev/git-review/+/93474502:29
tonybGah this is way more complex than I'd have like as python is now 'externally-managed' so installing pip, and installing packages with that pip need additional args.03:10
tonybor perhaps setup.   I suspect we don't really want to install pip and others into a venv03:11
opendevreviewSuzan Song proposed opendev/git-review master: Support running on systems setting GIT_SSH_COMMAND environment variable  https://review.opendev.org/c/opendev/git-review/+/93474504:51
opendevreviewMerged openstack/project-config master: Fix openstack developer docs promote job  https://review.opendev.org/c/openstack/project-config/+/93483207:31
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404508:09
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404509:51
opendevreviewTony Breeds proposed opendev/system-config master: Add ara and tzdata as installed requirements for the ansible-devel job  https://review.opendev.org/c/opendev/system-config/+/93491709:52
opendevreviewTony Breeds proposed opendev/system-config master: Install ARA master in the ansible-devel job  https://review.opendev.org/c/opendev/system-config/+/92401209:52
opendevreviewTony Breeds proposed opendev/system-config master: Add some debugging commands to the post job  https://review.opendev.org/c/opendev/system-config/+/92566709:52
opendevreviewTony Breeds proposed opendev/system-config master: Update pip3 role to work on Ubuntu Noble  https://review.opendev.org/c/opendev/system-config/+/93493709:52
fricklerI reenqueued the api-site promote job after corvus' fix merged earlier and it worked, yay https://zuul.opendev.org/t/openstack/build/74d666f516834e14814cc83ea0688e5910:59
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404511:17
opendevreviewTony Breeds proposed opendev/system-config master: Update pip3 role to work on Ubuntu Noble  https://review.opendev.org/c/opendev/system-config/+/93493711:23
opendevreviewTony Breeds proposed opendev/system-config master: Add ara and tzdata as installed requirements for the ansible-devel job  https://review.opendev.org/c/opendev/system-config/+/93491711:23
opendevreviewTony Breeds proposed opendev/system-config master: Install ARA master in the ansible-devel job  https://review.opendev.org/c/opendev/system-config/+/92401211:23
opendevreviewTony Breeds proposed opendev/system-config master: Add some debugging commands to the post job  https://review.opendev.org/c/opendev/system-config/+/92566711:23
tonybWhy do we have docker and docker-compose on bridge01/bridge99 ? ( I admit I haven't unlocked my keys to look on the live systems )11:58
jaltmanhttps://www.openafs.org/pages/security/#OPENAFS-SA-2024-00113:39
jaltmanhttps://www.openafs.org/pages/security/#OPENAFS-SA-2024-00213:39
jaltmanhttps://www.openafs.org/pages/security/#OPENAFS-SA-2024-00313:39
opendevreviewDaniel Bengtsson proposed opendev/irc-meetings master: Oslo update meeting time.  https://review.opendev.org/c/opendev/irc-meetings/+/93464813:43
fungithanks jaltman! doesn't look like those have hit the oss-security ml yet13:52
opendevreviewJoel Capitao proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404514:16
slaweqfrickler fungi ianw clarkb hi, thx for reviews and help with the rtd webhook job, it is working fine again now: https://zuul.opendev.org/t/openstack/build/dd2ab6308a9748aab3386555264d894b14:28
fungithanks for confirming, slaweq!14:28
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404514:33
opendevreviewJoel Capitao proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404516:12
clarkbI've rechecked https://review.opendev.org/c/zuul/zuul-jobs/+/680178 to see if rax keystone is happier now. Their status page is a bit ambiguous as today is marked green but the last incident doesn't have a resolved timestamp16:49
clarkbtonyb: some tools are run from docker iirc. I can't remember which ones those as openstackclient isn't16:50
clarkbs/those/though/16:50
clarkbtonyb: re pip and being externally managed this would be for noble I'm guessing? I think maybe we do need to shift our install paradigm and use python3 -m venv targets for the python stuff we install16:51
clarkbtonyb: then I would expect the venv module to install pip for us in that particular venv. So less have a venv for pip and more just put what we want in venvs?16:51
fungipip can also be used outside a venv. these days there's a directly importable zip too, so "installing" pip is as simple as dropping that file somewhere python can find it16:58
fungi(used outside the venv it's managing, so you could even install pip in one venv and tell it to manage another venv that has no pip in it at all)16:59
clarkb#status log Manually deleted /nodepool/images/debian-bookworm-arm64/builds/08c5105a1f5f44a487ad691cc9a97147 in nodepool zk database to remove corrupt entry prevent other record cleanup from occuring17:23
opendevstatusclarkb: finished logging17:23
opendevreviewJay Faulkner proposed openstack/diskimage-builder master: Reapply "Make sure dnf won't autoremove packages that we explicitly installed"  https://review.opendev.org/c/openstack/diskimage-builder/+/93499217:27
clarkbI have rechecked 680178 for additional data but the first pass was happy. If second pass is also happy I'll push up a revert17:44
clarkbI'm going to send the gerrit upgrade announcement momentarily17:52
clarkband sent18:01
opendevreviewClark Boylan proposed opendev/base-jobs master: Revert "Disable rax job log uploads"  https://review.opendev.org/c/opendev/base-jobs/+/93499518:04
clarkbnot super urgent but it looks like rax swift log uploads are happy again per https://review.opendev.org/c/zuul/zuul-jobs/+/680178 so I think we can proceed with the revert when ready18:04
corvusand then also we're clear for the dry run registry prune18:06
clarkbyup I'm currently checking the image version on insecure-ci-registry18:07
clarkbseems to match the latest tag in quay.io18:08
clarkbnext up is figuring out how to run the command. I don't think we want to docker exec in the existing container but instead run a new container?18:08
clarkb(mostly because this may take some time)? then do something like tee the output to a logfile and also have it emit to the console in a screen session?18:08
corvusclarkb: are you concerned we may want to restart the container while it's running?18:09
corvusthe automatic upgrade will reboot the whole server though, so even if we started a new container, it would get killed for that18:10
corvusso if that's the only concern -- i think you could just run it in the existing container with an exec18:10
clarkbcorvus: ya mostly worried that we'd do out of band container updates. Note I think zuul-regsitry just updatse hourly and isn't part of the weekly upgrades and reboots18:11
clarkblooks like `docker-compose run foo $cmd` should override the container command which I'm checking now18:11
clarkbthere is no entrypoint and only a zuul-registry CMD so I think we can run this using `docker-compose run registry /usr/local/bin/zuul-registry -c /conf/registry.yaml prune --dry-run` ?18:12
clarkbalternatively replace run with exec if we think having a second container will cause problems18:12
corvuswe don't merge much to zuul-registry; i'd probably just do an exec.  you can write the output to, say, the conf directory if you want to get it out of the container.18:13
clarkback18:13
corvusi don't think it'd cause problems, i just thought exec would be easier.  either works i think.18:13
corvusand yeah, that cmd lgtm18:13
clarkbyour note about logging paths has me wondering if tee would even work since it would be in the container context? I guess I could try running it in a subshell or something but keeping this simple is probably best18:14
clarkbor actually no the parser in the parent shell should know everything after the | is in the host side18:15
clarkbcorvus: I've got a screen running on insecure-ci-registry now with a command queued up if you want to sanity check it there18:17
corvusclarkb: that seems reasonable to be -- but i'm only like 51% confident those redirects are going to the right command.  :)18:19
corvusone way to find out18:19
clarkbya I can test it with ls or similar first actually18:20
clarkband now we're back to using exec :)18:21
clarkbhuh the failed docker compose run did tee as  Iexpected but the exec didn't. Weird18:23
corvusi think it did?18:23
corvuswasn't the last line registry.yaml?18:23
clarkboh ya it did18:23
corvusso yeah it got both18:23
clarkbsorry I'm just reading poorly ok. I'll run this as an exec using the redirect and pipe then18:24
corvusmaybe rm and run again18:24
corvusjust to be surce :)18:24
clarkback18:24
clarkbya that lgtm.18:25
clarkbready to run the queued up command?18:25
corvusyep18:25
corvusokay that's a lot of uploads :)18:26
clarkbcorvus: interesting that it seems to be saying it would prune the actual object files and the metadata but not the parent sha location18:26
corvusthe uploads are just a staging area18:27
corvusso i think those are basically incomplete pushes18:27
clarkbah18:27
clarkbshould it prune the actual sha path too then?18:27
clarkb(I'm just wondering if there is an underprune there based on that log output)18:28
corvusoh like the containing directory?18:28
clarkbbut it might be best ot let it run to completion then investigate the log18:28
clarkbya18:28
clarkbthose objects will be tiny but they accumulate as objects to track18:28
clarkbnot sure if that is a big deal or not18:28
corvusi think with swift it doesn't matter because i don't think that containing dir is a real object... but it probably should for the file backends18:28
opendevreviewMerged opendev/base-jobs master: Revert "Disable rax job log uploads"  https://review.opendev.org/c/opendev/base-jobs/+/93499518:29
clarkbchecking free on the server shows we're not filling memory tracking all of these records (one of my concerns with doing a prune after so much time)18:30
corvusoh i think i see what you mean18:30
corvusthat "Keep" line is weird isn't it18:30
clarkbya18:30
corvusit should only say that if it has a child under it that was kept (should be no) or its ctime is > the target time.18:32
corvusmaybe in swift, since it's not a real object, it's the ctime that's tripping it?18:32
corvus(if that's the case, then i think it's notabug)18:32
corvusyeah, we set the ctime to now in our swift handler for subdirs18:34
corvusso they all basically show up as immediately created; but i think they should "disappear" once they have no children18:34
clarkbgot it so multiple passes would take care of it18:36
corvusi don't think multiple passes are necessary18:37
corvuslike, it doesn't actually need to be removed because it's swift18:37
corvusit doesn't exist :)18:37
clarkboh I see what you are saying they are virtual paths18:37
corvusyep18:37
clarkbthey only exist because they have children I get it now18:37
clarkbnot that they exist within our pruning system because they have children but in swift itself18:37
corvusand if we ran this with the file backend, it would say "Prune" instead of "Keep" because it would really exist and have a real ctime18:37
corvuslooking ahead at the other stuff -- zuul-operator is a good microcosm because it doesn't change much18:38
corvusit looks like we're keeping a handful of manifests for the operator and deleting most of the others.  that seems about right.18:38
corvusactually zuul-registry may be more interesting...18:39
corvusit looks like we're keeping no zuul-registyr manifists?  shouldn't we keep like... 2?  for our recent changes?18:40
clarkbcorvus: yes I would've expected the two most recent commits (bugfix and --dryrun) to have manifests and objects that are kept18:42
corvusmaybe they didn't get pushed to the intermediate registry...18:43
clarkbcorvus: we're keeping some of the quay.io/zuul/zuul-registry manifests18:43
corvusoh was i looking at the old name?  i bet i was18:44
clarkbI think what that is doing is pruning all of the old docker hub prefixed (or lack of prefix) manifests but then after we migrated to quay.io it is deciding to keep some?18:44
corvushttps://zuul.opendev.org/t/zuul/build/9895a96eab724dbeb7abd04e76e1cce3/18:44
corvusi'm going to spot check that build18:44
clarkb++18:45
corvus2024-11-13 18:28:10,522 DEBUG registry.storage: Keep _local/repos/quay.io/zuul-ci/zuul-registry/manifests/9895a96eab724dbeb7abd04e76e1cce3_latest18:45
corvusnow to find the layers for it18:45
corvusdocker pull insecure-ci-registry.opendev.org:5000/quay.io/zuul-ci/zuul-registry:9895a96eab724dbeb7abd04e76e1cce3_latest18:47
corvusi think we need to keep all of these objects; https://paste.opendev.org/show/bqTsOb6Mz95mgZRS62R4/18:49
corvusthis is probably the part of the prune where we start to use a lot of ram18:50
clarkbfree seems to report things are still happy18:52
clarkbpython also overallocates for things like lists pretty aggressively so we may have expanded usable memory in the process earlier on and now we're just actually going to use it for real records rather than python's worry for records18:52
corvuswe get a list of manifests at the start, then we get the contents of those manifests.  using that, it looks like we're maybe 40-50% through the list of manifests18:55
corvusso maybe another 20-30m to finish that, then it's on to removing blobs18:56
clarkbcorvus: another thought which I didn't consider when reviewing the code is: are prune runs resumable if interrupted and I think the answer is yes? but maybe we need to not delete manifests until after we delete blobs? Is the only way to get the blobs iwth a manifest?18:58
clarkbactually no we list all blobs and then compare to those in manifets we want to keep removing the others right?18:58
clarkbso it should be resumable without orphaning objects18:59
clarkb(just thinking this will take a lot of time so being resumable is a good thing)18:59
corvusyep, that's the idea19:00
corvusit should be resumable19:00
clarkbmemory use is inching up but very slowly and we still have plenty of room19:01
corvusbiab19:03
corvusthis is going much slower than my initial estimate.19:36
corvusoh actually maybe not that off.  we're at blobs now.19:37
clarkband memory has remained stable19:37
clarkboh it completed19:47
clarkbcorvus: I'm not seeing 362a321e8da1d7ce58567f73e4eef005677d6d8a8b0115612d7b08fba97b60c0 in the log (no keep and no prune entries nothing at all)19:49
clarkbmaybe the format is different and I have to add some /'s or something19:49
clarkbI wonder if it stopped early for some reason19:50
corvushrm i'm looking too19:50
clarkbrc code was 019:51
clarkbwe didn't get OOMKilled19:52
corvusclarkb: i think we don't log anything about blobs we keep19:52
clarkbah that would also explain it19:52
corvusso absence of delete lines for all of the shas in that paste == success19:53
clarkb2024-11-13 19:41:09,730 DEBUG registry.storage: Keep _local/blobs/sha256:02bf457d870ff6d4e274ea79f5ce5b7bfdf5041d6ac11c5be1ba23b1cec59977/ <- that is a keep line fwiw but maybe we don't do it in every case?19:53
corvuswe deleted that blob19:53
corvusthe keep is the swift virtual subdir again19:54
clarkbaha19:54
corvusokay i checked all the shas in the paste from earlier and it looks like we did not delete them19:55
clarkbexcellent19:55
clarkbthe actual pruning will probably take much longer? I'm not sure if the delete calls are super slow but at the very least we would be adding a good chunk of them to the runtime we just had19:56
corvusi think my inclination is to say this passes the sniff test, leave this log in place, then run the prune for real as scheduled and redirect it to a second log19:56
clarkband it took just over an hour and 15 minutes in dry run mode19:56
clarkb++19:56
corvusif something goes wrong with the real prune, we will have lots of info for debugging19:56
corvuswe would have deleted 184564 objects19:57
corvusso... probably 185k by friday :)19:57
corvusmaybe fungi has a feeling for how long each individual delete call might take on average...19:58
corvusif they take a second, that's an extra 51 hours.  honestly not bad for this and i probably would just let it run; we're in no rush.  :)19:59
fungii think when i was testing i was doing it through the rackspace dashboard, and it was timing out trying to do recursive operations, so i don't really have a good number20:01
clarkbcorvus: should I stop the screen now?20:05
Clark[m]I guess we can leave it in place and use it on Friday 20:33
opendevreviewDoug Goldstein proposed openstack/project-config master: Update more ironic project ACLs for editHashtags  https://review.opendev.org/c/openstack/project-config/+/93502220:47
clarkbcorvus: fwiw I double checked the playbook that manages the zuul cluster upgrades and reboots and it doesn't touch zuul registry as far as I can tell20:51
clarkbso we should be in the clear for running this friday without interference from that20:51
gouthamro/ is there a document regarding multinode setups in a zuul context? I want to know if we have any other networking options beyond the public network..  21:00
gouthamrthe use case is to be able to bind a service on a "controller" node and expose it to OpenStack VMs on the compute node.. 21:01
clarkbgouthamr: https://docs.opendev.org/opendev/infra-manual/latest/testing.html is the closest thing we have to that document and no you don't get highly configurable networking because half the clouds don't let you configure it at all. Instead you have to build that within your test system21:01
clarkbgouthamr: there are zuul roles (that devstack uses iirc) to build vxlan overlay networks that allow you to virtually create somewhat arbitrary network configurations21:01
clarkbfor example devstack multinode uses this to be able to ssh to VMs on the compute node(s) from the controller21:02
clarkbthats very similar to your use case but instead of a client reaching a server on the VM its the other way around21:02
gouthamrclarkb: ah yes, thank you; i can go look in that direction.. 21:02
gouthamrand i'm trying to build a devstack multinode job, so maybe i don't need to reinvent the wheel :) 21:03
corvusclarkb: i've detached; i don't think we need to keep it but i'll leave it up to you21:04
clarkbcorvus: ack I made note of the command I ran so I can just drop the --dry-run next time21:04
corvusinfra-root: i would like to examine the performance around the zuul time query, so i want to enable the slow query log on the production mariadb server that zuul uses.  i don't think i'll need this in place for long (maybe 24h?), and the least disruptive way to enable it is to enable it at runtime with the cli.  i'd like to do that today if there aren't any objections.21:10
clarkbI don't have any. I guess set a reminder to turn it off so that we don't fill the disks with logs?21:11
corvus++21:12
corvus#status log enabled slow query log on zuul-db0122:08
opendevstatuscorvus: finished logging22:08
*** tkajinam is now known as Guest933822:33
opendevreviewJeremy Stanley proposed opendev/infra-openafs-deb jammy: Update Jammy to 1.8.13  https://review.opendev.org/c/opendev/infra-openafs-deb/+/93502522:47
opendevreviewJeremy Stanley proposed opendev/infra-openafs-deb jammy: Update Jammy to 1.8.13  https://review.opendev.org/c/opendev/infra-openafs-deb/+/93502522:59
opendevreviewJames E. Blair proposed opendev/system-config master: Add basic docs on updating the OpenAFS ppa  https://review.opendev.org/c/opendev/system-config/+/93502623:08
clarkbcorvus: small thing on ^23:10
opendevreviewJames E. Blair proposed opendev/system-config master: Add basic docs on updating the OpenAFS ppa  https://review.opendev.org/c/opendev/system-config/+/93502623:12
opendevreviewJeremy Stanley proposed opendev/infra-openafs-deb jammy: Update Jammy to 1.8.13  https://review.opendev.org/c/opendev/infra-openafs-deb/+/93502523:36
opendevreviewJeremy Stanley proposed opendev/infra-openafs-deb jammy: Update Jammy to 1.8.13  https://review.opendev.org/c/opendev/infra-openafs-deb/+/93502523:40

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!