Tuesday, 2025-12-09

opendevreviewClark Boylan proposed opendev/system-config master: Build haproxy and zk statsd containers on Trixie  https://review.opendev.org/c/opendev/system-config/+/97020800:21
clarkbthese two images are our simplest based on python. They don't even use the assemble routines. I think if those look good then we do an irc bot or three then gerrit00:22
clarkbreading the gerrit 3.12 release notes I fear they may have made the h2 cache situation worse by updating to h2 v2 because now unclean shutdowns corrupt the cache files00:33
clarkbpreviously it seemed like we could shutdown and h2 would be ok but https://www.gerritcodereview.com/3.12.html#known-issues implies this may not always be the case with h2 v2 :/00:34
clarkbthis will require some investigating. I'm also not sure I undertasnd the implications of https://www.gerritcodereview.com/3.12.html#gitattributes-configuration-support-in-jgit-merge-driver is that maybe for the web editing of rebases/merges allowing you to edit things with markers in the brwoser? Otherwise it is an error to merge something in gerrit with conflicts so not sure why00:35
clarkbconfiguring the way conflict tags are handles is helpful00:35
clarkbI'll have to ask some questions as we spin up on the next upgrade00:35
*** darmach3 is now known as darmach04:26
*** dhill is now known as Guest3321613:29
mordredclarkb: I think the gitattributes thing links to a not very helpful explanation. I'd be willing to bet that's more about honoring gitattributes for things like EOL markers and binary file handling14:04
fungii'd buy that explanation14:46
clarkbmordred: ah ya could be that it is less about merge conflict behavior and other behaviors like that. I can ask Nasser who wrote the change for more details though15:02
opendevreviewMerged openstack/project-config master: Cleanup Monasca infra  https://review.opendev.org/c/openstack/project-config/+/97019315:55
clarkbinfra-root: gerrit 3.11 -> 3.12 upgrades require an h2 cache version change. I believe the way the current upgrade test is working is h2 must be recreating the database from scratch. If we do that in production I believe everyone will be logged out of the server during the upgrade. The alternative is to convert the underlying h2 cache dbs from v1 to v2 with this tool15:58
clarkbhttps://manticore-projects.com/H2MigrationTool/index.html15:59
clarkbis that something we think we would like to do? Do we think this software source is trustable (I think the account session cache is sensitive info)15:59
clarkbanother concern is that this tool might be really slow with our large h2 v1 files. So if we do the conversion route we're probably actually going to do so selectively for caches we believe are important to rpeserve like the login sessions cache16:01
clarkband then let the others rebuild from scratch16:01
clarkbpart of me is thinking forcing everyone to log back in may not be the worst thing16:07
clarkbfungi: re project renames: https://zuul.opendev.org/t/openstack/build/3730086c8554461aa74ec28f7966c055/log/bridge99.opendev.org/ansible/bootstrap-and-test-review.yaml.log#956-1369 this is where we test that in our CI jobs and this log ran against gerrit 3.1116:46
fungioh perfect16:47
clarkbfungi: reading through the log I think my main concern is that the gerrit stop may timeout and we're not deleting caches16:47
clarkbhowever since we just restarts gerrit less than a week before the planned rename I suspect this won't be a big issue16:47
clarkbbasically gerrit needs to run longer for the caches to become a problem16:47
fungiyeah, it'll have been 5 days so should stop normally16:47
clarkbya I think normal prep steps are all that we need16:48
clarkbas a side note import playbook doesn't seem to log its own task name. I had a hard time finding where those logs were and had to look for the inner task names16:49
clarkba quirk of ansible I guess16:49
clarkbI approved the haproxy-statsd and zookeeper-statsd trixie update change. I'll watch it20:52
fungithanks20:54
fungilists.o.o is getting hammered again20:56
fungiit's starting to respond again, but i think it ran out of apache slots for a bit21:00
opendevreviewJeremy Stanley proposed opendev/project-config master: Add record for planned rename on December 12, 2025  https://review.opendev.org/c/opendev/project-config/+/97030721:11
*** parallax is now known as Guest3323721:14
clarkbfungi: do you think we should increase the slot count like we did with static? I feel like that is trickier with lists as its load is much higher under more typical usage levels21:37
fungi5-minute load average was only up around 6-7 when i was getting no response out of apache21:39
fungiso yeah, i think it varies21:39
fungiclarkb: oh, james replied to you almost immediately!21:42
clarkboh cool. i also followed up with EMS too21:43
clarkbso all the promised emails are off to their destinations21:43
clarkbfungi: did you have any thoughts on the testing I add to gerritlib here: https://review.opendev.org/c/opendev/gerritlib/+/970142 Mostly I'm trying to get ahead of the gerrit releases so that its a proactive rather than post upgrade thing21:44
fungiclarkb: one question on that change, mostly for my own education21:58
clarkbfungi: oh sorry that was the main thing taht needed fixing. Gerrit 3.11 and newer requires all edits to refs/meta/config go through code review unless you have force push permissions to the ref21:59
fungimostly making sure that wasn't cruft from some separate change that accidentally got picked up in the diff21:59
clarkbfungi: they basically decided that what we've been doing with manage-projects and project-config for 15 years is how everyone should use gerrit last year21:59
clarkbso we've had to update all the gerrit testing in zuul and system-config etc to push a patch to add force push then code review to land it to get manage-projects to work22:00
clarkbI had a note about that in the commit message at first but when I started going off on making all the versions work I think I rewrote it and dropped it (which in hindsight was a mistake)22:00
fungino worries, i really just wanted to know where it came from22:00
fungialso interesting that you'd need to push --force when it's a fast-forward update22:01
clarkbyes because they think all updates to refs/meta/config should be code reviewed. Which we think as well we just solved it outside of gerrit forever ago22:01
fungior is it that they require permission to use push --force even if you're not specifying the --force flag?22:02
clarkbits that the permission that lets you bypass the implicit requirement for code review is the force push permission22:02
clarkbthey are overloading git terms/behavior a bit22:02
fungithat's a weird option to overload, yeah22:02
fungii wonder why they didn't make it a separate permission/option, but whatever22:02
clarkbthey did actually. But I tested it and it doesnt' work22:03
fungioh weird, so this is a workaround for their broken permissions model?22:03
clarkbwhich I asked them about and never got a resposne on. It probably is a bug but ya22:03
clarkbyes pretty much22:03
clarkbfungi: its a gerrit.config server wide option that basically says don't require code review on refs/meta/config22:03
clarkbbut in my testing of it setting it didn't change the behavior22:04
clarkballowing force push does22:04
fungimind if i push up a followup change to summarize this situation in a code comment? guessing you asked about it in discord, not a bug report22:04
clarkbyes it was on discord22:04
fungicool, just making sure there's no bug url i should include in the comment22:05
clarkband no I don't mind. You can find similar justification for similar changes in system-config's gerrit testing and zuul's quickstart testing too22:05
fungiah, maybe i should add the comment to one of those instead22:05
fungi(or all of them)22:05
clarkbfungi: https://etherpad.opendev.org/p/gerrit-upgrade-3.11 line 31 and the block below covers some of this too22:05
fungiokay great, thanks22:05
clarkbit links to where I tested the setting of the config option though I'm guessing those logs have since been deleted22:05
fungidid you test without the config option and only +force in the acl?22:06
fungiwondering if it needs both to work22:06
clarkbfungi: yes that is the situation in production and our system-config test jobs today22:07
clarkb(we don't set the server wide config option at all only the +force acl)22:07
fungiokay, so just +force, the separate permission is currently pointless i guess22:07
clarkbyup and my question on discord was basically "I set this value and the behavior didn't change. Changing the acl works. What is the point of the option in this case" and never got a response22:08
clarkbI feel like this is one of those situations where being an early adopter (another is pbr) creates some inflexibility and strong opinions whereas for others they're still learning as they go22:12
opendevreviewClark Boylan proposed opendev/system-config master: Build accessbot on trixie  https://review.opendev.org/c/opendev/system-config/+/97032122:13
fungilooks like we may not need anything similar in git-review testing, we seem to run it with the default acl22:14
clarkbya I think this may be very related to manage-projects needing to push directly to the acls22:15
clarkbif the default acls work for you then its a noop22:16
opendevreviewClark Boylan proposed opendev/system-config master: Update Hound Container to Debian Trixie  https://review.opendev.org/c/opendev/system-config/+/97032222:25
opendevreviewMerged opendev/gerritlib master: Update Gerrit integration testing to test many Gerrit versions  https://review.opendev.org/c/opendev/gerritlib/+/97014222:26
opendevreviewMerged opendev/system-config master: Build haproxy and zk statsd containers on Trixie  https://review.opendev.org/c/opendev/system-config/+/97020822:27
opendevreviewClark Boylan proposed opendev/system-config master: Update matrix-eavesdrop container to build on Debian Trixie  https://review.opendev.org/c/opendev/system-config/+/97032522:27
clarkbok I think that set of three additional changes is a reasonable set to get in before Gerrit moves to trixie. We could also do lodgeit potentially.22:28
clarkbthere are probably a couple other candidates I'm forgetting too, but the mix I've pushed up seems like decent sanity coverage so don't need to overthink it22:29
clarkbI've logged into both load balancers and zk01 and will check that their statsd containers update cleanly then move over to grafana to ensure the stats keep rolling in22:30
clarkbthe job is deploying now to all three22:30
clarkball three hosts I'm looking at have new containers now22:31
clarkbthe zookeepers go one by one so will be a few minutes for the other two servers in the zk cluster to update22:32
clarkbhrm zk02 isn't updating and the job just turned red22:33
clarkbhrm docker hub rate limist were hit but I thought we weren't using docker hub for this image any more22:34
clarkboh! its the zk image not the zookeeper-statsd image that hit the rate limit22:35
clarkbI think I'm ok leaving it in this state it should recover on its own and if zk01's statsd are arriving that is good enough for now22:35
fungii guess we don't mirror zk's images to quay22:36
clarkbwe might. I was going to check that after looking at grafana22:36
fungifor that matter, i wonder if some of the projects we're mirroring to quay might have separately started to publish images to non-dockerhub registries on their own22:37
fungiwe can't be the only ones struggling with this22:38
clarkbhttps://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&from=now-30m&to=now&timezone=utc has data from zk01 in the last 5 minutes (if you click zk01 on the graph it will drop the data for the other two) (approximate data size and ephemeral node counts change often enough to see it is working)22:38
clarkbfungi: I think some of the issue is that docker itself is managing these library images in many cases22:38
clarkbrather than say zookeeper themselves22:38
clarkbor python etc22:38
clarkbhttps://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-5m&to=now&timezone=utc also has data from the gitea lb system so I think the new statsd container is working22:39
clarkbfungi: also ipv6 makes this a million times worse22:41
clarkbfungi: and I suspect many/most people are still ipv4 only22:41
opendevreviewClark Boylan proposed opendev/system-config master: Mirror zookeeper:3.9 to our quay.io mirror org  https://review.opendev.org/c/opendev/system-config/+/97032722:41
clarkbwe were not mirroring zookeeper:3.9 only :latest22:42
clarkb(they may be the same thing today I'm not sure but we shouldn't count on that long term)22:45
clarkbfungi: remind me: apt-key is the old thing right? And that would explain why trixie doesn't have it by default?22:47
clarkbfungi: instaed we can write the ascii armored file directly to some path in /etc/apt/ ?22:47
*** parallax is now known as Guest3325022:50
clarkbaha you're supposed to put specific signed by lines in apt sources list entries then stick the file whereever you like22:50
opendevreviewClark Boylan proposed opendev/system-config master: Update Hound Container to Debian Trixie  https://review.opendev.org/c/opendev/system-config/+/97032222:55
fungiclarkb: yes, you've found the answer now though23:01
clarkbya found a good enough pointer via google then from that discovered we already had an example in system-config23:02
fungiif you look at the main sources list on a trixie install you'll see clear examples23:02
fungiapt-key add was deprecated a couple of debian releases back, and then global keyrings more generally were deprecated too23:03
clarkbya seems like an improvement to make sure each key is only validating the packages you expect it to23:04
fungicorrect23:04
clarkbalso it is far more convenient to write these files to disk like any other fiel and refer to them23:04
clarkbrather than maitnain what is basicalyl a databawse23:05
clarkbwow I cannot type23:05
funginot that it's all that huge of an improvement from a security sense, because installing a deb runs maintscripts as root and can do any arbitrary thing the package creator wants to your system23:05
clarkbah so could just say trust my key for everything anyway23:05
fungipotentially. or just replace /bin/sh with your pwn3d backdoor23:06
opendevreviewClark Boylan proposed opendev/system-config master: Update Hound Container to Debian Trixie  https://review.opendev.org/c/opendev/system-config/+/97032223:11
clarkbI guess you'd need some wya of limiting access to portions of the system per package/per repo and have that info get signed by some more authoritatiev key and at that point you've hobbled the ability of anyone to have their own package repo23:30
clarkband maybe you've reinvented snaps/flatpaks at that point23:30
fungiyeah, that's basically the challenge, snap/fpack or containers more generally23:39

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!