#opendev-meeting log

19:00:38 <clarkb> #startmeeting infra
19:00:38 <opendevmeet> Meeting started Tue Jan 21 19:00:38 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:38 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:38 <opendevmeet> The meeting name has been set to 'infra'
19:00:40 <clarkb> Hello!
19:00:47 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JWBLUYVPNULENDQWGEKO6VX27CVXQLGO/ Our Agenda
19:01:02 <frickler> \o
19:01:09 <clarkb> #topic Announcements
19:01:22 <clarkb> I didn't have anything to announce. Did anyone else?
19:02:21 <clarkb> sounds like no. Let's dive into the agenda then
19:02:23 <clarkb> #topic Zuul-launcher image builds
19:02:31 <clarkb> #link https://review.opendev.org/q/hashtag:niz+status:open is next set of work happening in Zuul
19:03:01 <clarkb> I believe that a good chunk of this work landed last week and should be deployed now (via our weekly updates). But there are still some open changes last I looked so not sure if we need to get those in before this can proceed
19:03:32 <clarkb> corvus: ^ is this something where we're still waiting or is what landed sufficient to make progress?
19:05:28 <clarkb> we may not have corvus right now. We can continue and get back to this later if that changes
19:05:40 <clarkb> in any case some progress has been made I'm just not sure of how much yet
19:05:46 <clarkb> #topic Deploying new Noble Servers
19:05:59 <fungi> a noble endeavor
19:06:15 <clarkb> Progress has happened here as well. I migrated paste01's db to paste02 last week and updated DNS. The next step was to get backups working which is where I ran into complications
19:06:32 <clarkb> the tl;dr is that we need borg ~1.2.8 to run on noble for compatibility with python3.12
19:06:55 <clarkb> we currently pin to 1.1.18 and there are some big nasty warnings from borg about mixing these. However, after much reading and attempt to understand the problems I think the risk to us is extrmeely low
19:07:39 <clarkb> basically what could happen is we could delete valid archives (backups) if we run `borg check --repair` on a borg 1.x created archive using 1.2
19:08:09 <clarkb> however, this particular issue seems to be unlikely for us because we never used a borg old enough to produce the archives that would now be considered invalid. And we don't automatically run borg check --repair anywhere
19:08:29 <corvus> (sorry for tardiness; almost ready to make progress on images; expect real progress by next week)
19:08:44 <clarkb> (personally I would've appreciated clearer and more direct communication of the problem from borg rather than the big scary messages we got but we muddled through)
19:09:17 <clarkb> so anyway paste02 is backing up with borg 1.2.8 to servers running 1.1.18. Worst case today only paste02's backups would be impacted but as mentioend I don't expect problems anyway
19:09:19 <clarkb> corvus: ack thanks
19:09:25 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/939667 fixup for warnings treated as errors with new borg spamming email
19:10:04 <clarkb> there is one little borg update annoyance though and that is brog 1.2 exits with rc 1 if there are warnings so we've been spamming root email with backup failures that are most likely all warnings not errors. This change would treat rc 1 when using newer borg as a successful backup
19:10:22 <clarkb> the most common warning (the only one I've seen anyway) is a warning for files changing while being backed up. This is common with log files in particular
19:10:49 <fungi> also the unknown unencrypted volume warning
19:10:51 <clarkb> I think if we can get 939667 or something like it in and confirm that backups behave nicely afterwards we'll be in a spot to consider deleting paste01 and retiring its backups
19:11:11 <clarkb> fungi: that one we explicitly override though and in testing it doesn't seem to affect rc 1 with our override
19:11:16 <clarkb> I thought it did at first but that was a red herring
19:11:45 <clarkb> basically if unknown unencrypted volume warning is the only warning you get an rc 0 with our flag to ignore that warning
19:11:46 <fungi> ah, okay, so it shows up in the log as a warning but doesn't contribute to the return code?
19:12:12 <fungi> it was the only warning i saw logged in the one i looked at
19:12:21 <clarkb> yes that seems to be the case as I had logs with that warning and no others exit 0
19:12:29 <clarkb> fungi: the other warnnig isn't logged as a warning
19:12:37 <fungi> bwahahahaha
19:12:38 <clarkb> its just a message with no prefix
19:12:44 <clarkb> let me find an example really quickly
19:13:10 <clarkb> https://zuul.opendev.org/t/openstack/build/49255ec995394248a24c5eb1e11c9a68/log/borg-backup-noble.opendev.org/borg-backup-borg-backup01.region.provider.opendev.org.log#8560
19:13:13 <fungi> okay, so exits nonzero on warnings, logs warnings even when they're explicitly disabled, but also doesn't state that some warnings it logs are warnings
19:13:45 <clarkb> right. When the linked message doesn't appear we get rc 0 even with the explicit warning about unknown unencrypted volume
19:14:16 <clarkb> so anyway long story short I think this is mostly working now from podman to python 3.12 to borg. With the above change being the last known cleanup. And I think I'm happy to start on old server cleanups once the last issue is sorted
19:14:27 <clarkb> let me know if you have concerns or questions about that and we can dig in more and make sure we're happy with it
19:14:40 <clarkb> #topic Deploying Lodgeit entirely without Docker Hub
19:14:41 <fungi> for the record, the log that's currently /var/log/borg-backup-backup01.ord.rax.opendev.org.log.3.gz on paste02 is the one i was looking at
19:15:05 <clarkb> fungi: look for 'changed while' and you should see what I linked to above just in prod
19:15:22 <fungi> /var/log/borg-backup-backup01.ord.rax.opendev.org.log: file changed while we backed it up
19:15:25 <fungi> indeed, it's in there
19:15:29 <clarkb> cool
19:16:01 <clarkb> this next topic is related to the previous one in that I realized we could take this opportunity of paste02 running podman and docker compose to test our assumptions about speculative image testing and switch it over to quay entirely
19:16:09 <clarkb> #link https://review.opendev.org/c/opendev/lodgeit/+/939385 Publish lodgeit image to quay.io
19:16:27 <clarkb> that change updates where we push the image to (quay instead of docker hub) then I still need to write a followup to pull it from quay if we want to proceed with that
19:16:39 <clarkb> I think it is a good idea to triple check our assumptions before we get too far down this path again
19:16:56 <clarkb> please review and let me know if you ahve any questions or concerns about that
19:17:14 <tonyb> ++
19:17:28 <tonyb> sounds like a good plan to me
19:17:43 <clarkb> #topic Upgrading Old Servers
19:17:53 <clarkb> I think we can move on I mostly wanted to call out the effort and discussion can proceed in review
19:18:06 <clarkb> tonyb: I know you were out until recently, but anything we need to do / think about re wiki?
19:19:07 <tonyb> nope.   I'll send the announce email and switch the skin
19:19:22 <fungi> yay! so close
19:19:35 <clarkb> tonyb: did the changes still need some updates? I seem to remember planning to change the proxy setup maybe?
19:19:39 <tonyb> given your experience with paste do you think it's "safe" to go with noble
19:19:54 <clarkb> I guess ping for reviews when that is ready and let us know if they have arleady been updated
19:20:01 <tonyb> will do
19:20:21 <clarkb> noble was definitely an uplift but I am hopeful I've sorted ou the major items
19:20:39 <tonyb> okay
19:21:01 <fungi> just make sure you enable configdrive when launching it
19:21:41 <clarkb> anything else on this topic?
19:21:52 <fungi> (rackspace's jammy image doesn't need that, so it's not on by default, but our noble image needs it for cloud-init to work)
19:22:06 <tonyb> not from me.
19:22:15 <clarkb> ya becusae we uploaded tonyb's converted upstream noble image
19:22:25 <clarkb> I suspect rax does magic to make not config drive work and they haven't uploaded noble yet
19:22:33 <fungi> rackspace presumably fiddles with the image they make available
19:22:36 <clarkb> ya
19:22:39 <clarkb> #topic Gerrit 3.10.4
19:23:20 <clarkb> last week we managed to get borg going for noble and upgrade gitea. The last major item on my list was updating gerrit to 3.10.4 but due to our prior gerrit restart experience going poorly and a holiday weekend approaching with basiclly only me around I decided to defer this to this week
19:23:51 <fungi> i'm happy to help with it basically any time this week
19:23:53 <clarkb> I think I'll aim for doing this tomorrow once I'm caught up with the plan being to land the h2 db setting and also update to the newer point release
19:24:19 <fungi> links to changes/topic might help
19:24:20 <clarkb> this may require a couple of restarts to see it take effect, but otherwise it should be similar to most of our restarts
19:24:29 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/938000
19:24:42 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/939167
19:24:46 <clarkb> those are the two related changes
19:25:34 <fungi> both lgtm, thanks
19:25:38 <clarkb> the updates between 3.10.3 and 3.10.4 seem minimal and also safe
19:25:43 <clarkb> mostly bugfixes
19:25:47 <clarkb> #topic Running certcheck on bridge
19:26:14 <clarkb> fungi: is there a change for this yet? I suspect no given the distractions last week but wanted to double check
19:26:37 <fungi> i think we had rough consensus in favor last week, but no i said i wouldn't have time to put that together until this week. thanks for the reminder!
19:27:13 <clarkb> yup I definitely didn't expect it yet. Just checking
19:27:19 <clarkb> #topic Service Coordinator Election
19:27:25 <fungi> worth noting though, the build failure which prompted me to start looking at all of that turned out to be related to a github outage and not rate limits, so it's not super urgent
19:27:32 <clarkb> ack
19:27:51 <clarkb> Last week I said taht I would send email to make the proposed plan for this election official then didn't do that. So this is now on my list for today
19:28:19 <clarkb> as a reminder Nominations Open From February 4, 2025 to February 18, 2025. Voting February 19, 2025 to February 26, 2025. All times are UTC based
19:28:51 <clarkb> #topic Beginning of the Year (Virtual) Meetup
19:29:06 <clarkb> and finally a reminder we're trying to meetup several times this week (with the first block of time occuring in 1.5 hours)
19:29:12 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup
19:29:29 <clarkb> the etherpad has the schedule info and a number of topics from myself
19:29:34 <clarkb> feel free to add items
19:30:05 <clarkb> frickler do you know if the "early" block from 1800-2000 UTC tomorrow and day after are something you'll be attending?
19:30:15 <clarkb> if not I suspect we may only use the "late" blocks from 2100-2300 UTC each day
19:30:27 <frickler> I won't, sorry
19:30:46 <clarkb> ok I'll go ahead and remove the early blocks. We'll just use the late blocks
19:31:00 <tonyb> sounds good
19:31:23 <fungi> updated my reminders accordingly
19:32:04 <clarkb> we can probably do additonal self organization when we jump on meetpad later today
19:32:13 <clarkb> #topic Open Discussion
19:32:35 <clarkb> This didn't make it onto the agenda but I think we should push a Bindep release with changes up to everything before switching to pyproject.toml
19:32:57 <clarkb> then we can switch to pyproject.toml and have bindep act as a canary for how that all works with PBRs proposed updates to better support that system
19:32:59 <fungi> oh, right
19:33:12 <fungi> i can also try to find some time to push that this week
19:33:32 <fungi> related, there are a couple of documentation updates for pbr in review related to pyproject.toml support
19:33:45 <clarkb> unfortauntely the python packaging world is full steam ahead into pyproject.toml as an assumed tool and python3.12 and newer are getting clunkier without it (also creates tons of confusion for people when they need to manually install setuptools)
19:34:21 <clarkb> the changes should continue to support the old school method as long as you preinstall setuptools. But we're also trying to make pyproject.toml work for people who are too confused about that
19:34:22 <fungi> but yeah, it would be great to have one or more simple opendev projects we could point at as examples of how to use pbr that way
19:35:23 <clarkb> also we updated gitea last week which includes a "fix" for the memory leak
19:35:27 <fungi> also it was a good exercise for mapping out the remaining rough edges and future possible polish around pbr's support there, as well as fixing up its documentation
19:35:38 <clarkb> basically they disabled fuzzy search by default as fuzzy search is apparently very memory hungry
19:35:49 <clarkb> fungi: ++
19:36:28 <clarkb> we also had to emergency apply a new user agent filter string for a valid but old version of edge that was impacting service availability
19:36:44 <clarkb> very likely more ai crawler bots not being nice
19:37:35 <clarkb> I found a hacker news post from someone else that had to take their gitea off the internet due to similar problems. The discussion on hacker news about it was interesting as apparently different crawler bots respect robots.txt in different orthogonal ways that people have inferred over time
19:37:57 <clarkb> liek apparently open ai will only respect entries specifically for its user agent string and not the generic top level rules
19:38:09 <clarkb> and others have noticed that crawl-delay has really inconsistent support
19:38:16 <fungi> but also there seemed to be some consensus that gitea is not designed to withstand aggressive crawlers, worse than many web applications anyway
19:38:25 <clarkb> https://news.ycombinator.com/item?id=42750420 that was this post
19:39:47 <fungi> as well as general lament that it's starting to seem like the only viable solutions are to start relying on cdn-oriented ai filtering service providers
19:39:53 <clarkb> what else? there was a small zuul blip with a bug that would impact a subset of playbook runs. corvus took care of that over the weekend before it was a larger problem. Thank you for the quick turnouarnd on that
19:40:24 <corvus> i had help; someone wrote a patch first, but i did restart over the weekend.  :)
19:40:38 <corvus> also, i wrote the bug in the first place :(
19:41:05 <fungi> if you don't write them, who will?
19:41:32 <clarkb> I'll give it a few more minutes but we may get 15 minutes back for $meal today
19:41:32 <corvus> (fun story: it was the result of a particularly gruesome git conflict resolution.  it's the biggest fail of git merge i've seen.  i basically had to just completely reconstruct everything manually from the diffs)
19:41:40 <clarkb> oof
19:41:54 <fungi> i hate it when that happens
19:42:24 <fungi> e.g. when git merges a chunk to the wrong part of the file due to multiple context matches
19:42:41 <corvus> yep
19:42:55 <corvus> it got 1 line right and the remaining 70 lines wrong.
19:44:19 <clarkb> sounds like that is everything. Thank you everyone. We'll be back here next week same time and location. We're also going to hang out on meetpad from 2100-2300 UTC today, tomorrow, and thursday to go over higher level topic discussion
19:44:24 <fungi> thanks clarkb!
19:44:26 <clarkb> see you there in about 1.25 hours
19:44:31 <clarkb> #endmeeting