Tuesday, 2025-01-21

clarkbjust about meeting time18:59
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Jan 21 19:00:38 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkbHello!19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JWBLUYVPNULENDQWGEKO6VX27CVXQLGO/ Our Agenda19:00
frickler\o19:01
clarkb#topic Announcements19:01
clarkbI didn't have anything to announce. Did anyone else?19:01
clarkbsounds like no. Let's dive into the agenda then19:02
clarkb#topic Zuul-launcher image builds19:02
clarkb#link https://review.opendev.org/q/hashtag:niz+status:open is next set of work happening in Zuul19:02
clarkbI believe that a good chunk of this work landed last week and should be deployed now (via our weekly updates). But there are still some open changes last I looked so not sure if we need to get those in before this can proceed19:03
clarkbcorvus: ^ is this something where we're still waiting or is what landed sufficient to make progress?19:03
clarkbwe may not have corvus right now. We can continue and get back to this later if that changes19:05
clarkbin any case some progress has been made I'm just not sure of how much yet19:05
clarkb#topic Deploying new Noble Servers19:05
fungia noble endeavor19:05
clarkbProgress has happened here as well. I migrated paste01's db to paste02 last week and updated DNS. The next step was to get backups working which is where I ran into complications19:06
clarkbthe tl;dr is that we need borg ~1.2.8 to run on noble for compatibility with python3.1219:06
clarkbwe currently pin to 1.1.18 and there are some big nasty warnings from borg about mixing these. However, after much reading and attempt to understand the problems I think the risk to us is extrmeely low19:06
clarkbbasically what could happen is we could delete valid archives (backups) if we run `borg check --repair` on a borg 1.x created archive using 1.219:07
clarkbhowever, this particular issue seems to be unlikely for us because we never used a borg old enough to produce the archives that would now be considered invalid. And we don't automatically run borg check --repair anywhere19:08
corvus(sorry for tardiness; almost ready to make progress on images; expect real progress by next week)19:08
clarkb(personally I would've appreciated clearer and more direct communication of the problem from borg rather than the big scary messages we got but we muddled through)19:08
clarkbso anyway paste02 is backing up with borg 1.2.8 to servers running 1.1.18. Worst case today only paste02's backups would be impacted but as mentioend I don't expect problems anyway19:09
clarkbcorvus: ack thanks19:09
clarkb#link https://review.opendev.org/c/opendev/system-config/+/939667 fixup for warnings treated as errors with new borg spamming email19:09
clarkbthere is one little borg update annoyance though and that is brog 1.2 exits with rc 1 if there are warnings so we've been spamming root email with backup failures that are most likely all warnings not errors. This change would treat rc 1 when using newer borg as a successful backup19:10
clarkbthe most common warning (the only one I've seen anyway) is a warning for files changing while being backed up. This is common with log files in particular19:10
fungialso the unknown unencrypted volume warning19:10
clarkbI think if we can get 939667 or something like it in and confirm that backups behave nicely afterwards we'll be in a spot to consider deleting paste01 and retiring its backups19:10
clarkbfungi: that one we explicitly override though and in testing it doesn't seem to affect rc 1 with our override19:11
clarkbI thought it did at first but that was a red herring19:11
clarkbbasically if unknown unencrypted volume warning is the only warning you get an rc 0 with our flag to ignore that warning19:11
fungiah, okay, so it shows up in the log as a warning but doesn't contribute to the return code?19:11
fungiit was the only warning i saw logged in the one i looked at19:12
clarkbyes that seems to be the case as I had logs with that warning and no others exit 019:12
clarkbfungi: the other warnnig isn't logged as a warning19:12
fungibwahahahaha19:12
clarkbits just a message with no prefix19:12
clarkblet me find an example really quickly19:12
clarkbhttps://zuul.opendev.org/t/openstack/build/49255ec995394248a24c5eb1e11c9a68/log/borg-backup-noble.opendev.org/borg-backup-borg-backup01.region.provider.opendev.org.log#856019:13
fungiokay, so exits nonzero on warnings, logs warnings even when they're explicitly disabled, but also doesn't state that some warnings it logs are warnings19:13
clarkbright. When the linked message doesn't appear we get rc 0 even with the explicit warning about unknown unencrypted volume19:13
clarkbso anyway long story short I think this is mostly working now from podman to python 3.12 to borg. With the above change being the last known cleanup. And I think I'm happy to start on old server cleanups once the last issue is sorted19:14
clarkblet me know if you have concerns or questions about that and we can dig in more and make sure we're happy with it19:14
clarkb#topic Deploying Lodgeit entirely without Docker Hub19:14
fungifor the record, the log that's currently /var/log/borg-backup-backup01.ord.rax.opendev.org.log.3.gz on paste02 is the one i was looking at19:14
clarkbfungi: look for 'changed while' and you should see what I linked to above just in prod19:15
fungi/var/log/borg-backup-backup01.ord.rax.opendev.org.log: file changed while we backed it up19:15
fungiindeed, it's in there19:15
clarkbcool19:15
clarkbthis next topic is related to the previous one in that I realized we could take this opportunity of paste02 running podman and docker compose to test our assumptions about speculative image testing and switch it over to quay entirely19:16
clarkb#link https://review.opendev.org/c/opendev/lodgeit/+/939385 Publish lodgeit image to quay.io19:16
clarkbthat change updates where we push the image to (quay instead of docker hub) then I still need to write a followup to pull it from quay if we want to proceed with that19:16
clarkbI think it is a good idea to triple check our assumptions before we get too far down this path again19:16
clarkbplease review and let me know if you ahve any questions or concerns about that19:16
tonyb++19:17
tonybsounds like a good plan to me19:17
clarkb#topic Upgrading Old Servers19:17
clarkbI think we can move on I mostly wanted to call out the effort and discussion can proceed in review19:17
clarkbtonyb: I know you were out until recently, but anything we need to do / think about re wiki?19:18
tonybnope.   I'll send the announce email and switch the skin19:19
fungiyay! so close19:19
clarkbtonyb: did the changes still need some updates? I seem to remember planning to change the proxy setup maybe?19:19
tonybgiven your experience with paste do you think it's "safe" to go with noble 19:19
clarkbI guess ping for reviews when that is ready and let us know if they have arleady been updated19:19
tonybwill do19:20
clarkbnoble was definitely an uplift but I am hopeful I've sorted ou the major items19:20
tonybokay19:20
fungijust make sure you enable configdrive when launching it19:21
clarkbanything else on this topic?19:21
fungi(rackspace's jammy image doesn't need that, so it's not on by default, but our noble image needs it for cloud-init to work)19:21
tonybnot from me.19:22
clarkbya becusae we uploaded tonyb's converted upstream noble image19:22
clarkbI suspect rax does magic to make not config drive work and they haven't uploaded noble yet19:22
fungirackspace presumably fiddles with the image they make available19:22
clarkbya19:22
clarkb#topic Gerrit 3.10.419:22
clarkblast week we managed to get borg going for noble and upgrade gitea. The last major item on my list was updating gerrit to 3.10.4 but due to our prior gerrit restart experience going poorly and a holiday weekend approaching with basiclly only me around I decided to defer this to this week19:23
fungii'm happy to help with it basically any time this week19:23
clarkbI think I'll aim for doing this tomorrow once I'm caught up with the plan being to land the h2 db setting and also update to the newer point release19:23
fungilinks to changes/topic might help19:24
clarkbthis may require a couple of restarts to see it take effect, but otherwise it should be similar to most of our restarts19:24
clarkb#link https://review.opendev.org/c/opendev/system-config/+/93800019:24
clarkb#link https://review.opendev.org/c/opendev/system-config/+/93916719:24
clarkbthose are the two related changes19:24
fungiboth lgtm, thanks19:25
clarkbthe updates between 3.10.3 and 3.10.4 seem minimal and also safe19:25
clarkbmostly bugfixes19:25
clarkb#topic Running certcheck on bridge19:25
clarkbfungi: is there a change for this yet? I suspect no given the distractions last week but wanted to double check19:26
fungii think we had rough consensus in favor last week, but no i said i wouldn't have time to put that together until this week. thanks for the reminder!19:26
clarkbyup I definitely didn't expect it yet. Just checking19:27
clarkb#topic Service Coordinator Election19:27
fungiworth noting though, the build failure which prompted me to start looking at all of that turned out to be related to a github outage and not rate limits, so it's not super urgent19:27
clarkback19:27
clarkbLast week I said taht I would send email to make the proposed plan for this election official then didn't do that. So this is now on my list for today19:27
clarkbas a reminder Nominations Open From February 4, 2025 to February 18, 2025. Voting February 19, 2025 to February 26, 2025. All times are UTC based19:28
clarkb#topic Beginning of the Year (Virtual) Meetup19:28
clarkband finally a reminder we're trying to meetup several times this week (with the first block of time occuring in 1.5 hours)19:29
clarkb#link https://etherpad.opendev.org/p/opendev-january-2025-meetup19:29
clarkbthe etherpad has the schedule info and a number of topics from myself19:29
clarkbfeel free to add items19:29
clarkbfrickler do you know if the "early" block from 1800-2000 UTC tomorrow and day after are something you'll be attending?19:30
clarkbif not I suspect we may only use the "late" blocks from 2100-2300 UTC each day19:30
fricklerI won't, sorry19:30
clarkbok I'll go ahead and remove the early blocks. We'll just use the late blocks19:30
tonybsounds good 19:31
fungiupdated my reminders accordingly19:31
clarkbwe can probably do additonal self organization when we jump on meetpad later today19:32
clarkb#topic Open Discussion19:32
clarkbThis didn't make it onto the agenda but I think we should push a Bindep release with changes up to everything before switching to pyproject.toml19:32
clarkbthen we can switch to pyproject.toml and have bindep act as a canary for how that all works with PBRs proposed updates to better support that system19:32
fungioh, right19:32
fungii can also try to find some time to push that this week19:33
fungirelated, there are a couple of documentation updates for pbr in review related to pyproject.toml support19:33
clarkbunfortauntely the python packaging world is full steam ahead into pyproject.toml as an assumed tool and python3.12 and newer are getting clunkier without it (also creates tons of confusion for people when they need to manually install setuptools)19:33
clarkbthe changes should continue to support the old school method as long as you preinstall setuptools. But we're also trying to make pyproject.toml work for people who are too confused about that19:34
fungibut yeah, it would be great to have one or more simple opendev projects we could point at as examples of how to use pbr that way19:34
clarkbalso we updated gitea last week which includes a "fix" for the memory leak19:35
fungialso it was a good exercise for mapping out the remaining rough edges and future possible polish around pbr's support there, as well as fixing up its documentation19:35
clarkbbasically they disabled fuzzy search by default as fuzzy search is apparently very memory hungry19:35
clarkbfungi: ++19:35
clarkbwe also had to emergency apply a new user agent filter string for a valid but old version of edge that was impacting service availability19:36
clarkbvery likely more ai crawler bots not being nice19:36
clarkbI found a hacker news post from someone else that had to take their gitea off the internet due to similar problems. The discussion on hacker news about it was interesting as apparently different crawler bots respect robots.txt in different orthogonal ways that people have inferred over time19:37
clarkbliek apparently open ai will only respect entries specifically for its user agent string and not the generic top level rules19:37
clarkband others have noticed that crawl-delay has really inconsistent support19:38
fungibut also there seemed to be some consensus that gitea is not designed to withstand aggressive crawlers, worse than many web applications anyway19:38
clarkbhttps://news.ycombinator.com/item?id=42750420 that was this post19:38
fungias well as general lament that it's starting to seem like the only viable solutions are to start relying on cdn-oriented ai filtering service providers19:39
clarkbwhat else? there was a small zuul blip with a bug that would impact a subset of playbook runs. corvus took care of that over the weekend before it was a larger problem. Thank you for the quick turnouarnd on that19:39
corvusi had help; someone wrote a patch first, but i did restart over the weekend.  :)19:40
corvusalso, i wrote the bug in the first place :(19:40
fungiif you don't write them, who will?19:41
clarkbI'll give it a few more minutes but we may get 15 minutes back for $meal today19:41
corvus(fun story: it was the result of a particularly gruesome git conflict resolution.  it's the biggest fail of git merge i've seen.  i basically had to just completely reconstruct everything manually from the diffs)19:41
clarkboof19:41
fungii hate it when that happens19:41
fungie.g. when git merges a chunk to the wrong part of the file due to multiple context matches19:42
corvusyep19:42
corvusit got 1 line right and the remaining 70 lines wrong.19:42
clarkbsounds like that is everything. Thank you everyone. We'll be back here next week same time and location. We're also going to hang out on meetpad from 2100-2300 UTC today, tomorrow, and thursday to go over higher level topic discussion19:44
fungithanks clarkb!19:44
clarkbsee you there in about 1.25 hours19:44
clarkb#endmeeting19:44
opendevmeetMeeting ended Tue Jan 21 19:44:31 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:44
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2025/infra.2025-01-21-19.00.html19:44
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-01-21-19.00.txt19:44
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2025/infra.2025-01-21-19.00.log.html19:44

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!