19:00:38 <clarkb> #startmeeting infra 19:00:38 <opendevmeet> Meeting started Tue Jan 21 19:00:38 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:38 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:38 <opendevmeet> The meeting name has been set to 'infra' 19:00:40 <clarkb> Hello! 19:00:47 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JWBLUYVPNULENDQWGEKO6VX27CVXQLGO/ Our Agenda 19:01:02 <frickler> \o 19:01:09 <clarkb> #topic Announcements 19:01:22 <clarkb> I didn't have anything to announce. Did anyone else? 19:02:21 <clarkb> sounds like no. Let's dive into the agenda then 19:02:23 <clarkb> #topic Zuul-launcher image builds 19:02:31 <clarkb> #link https://review.opendev.org/q/hashtag:niz+status:open is next set of work happening in Zuul 19:03:01 <clarkb> I believe that a good chunk of this work landed last week and should be deployed now (via our weekly updates). But there are still some open changes last I looked so not sure if we need to get those in before this can proceed 19:03:32 <clarkb> corvus: ^ is this something where we're still waiting or is what landed sufficient to make progress? 19:05:28 <clarkb> we may not have corvus right now. We can continue and get back to this later if that changes 19:05:40 <clarkb> in any case some progress has been made I'm just not sure of how much yet 19:05:46 <clarkb> #topic Deploying new Noble Servers 19:05:59 <fungi> a noble endeavor 19:06:15 <clarkb> Progress has happened here as well. I migrated paste01's db to paste02 last week and updated DNS. The next step was to get backups working which is where I ran into complications 19:06:32 <clarkb> the tl;dr is that we need borg ~1.2.8 to run on noble for compatibility with python3.12 19:06:55 <clarkb> we currently pin to 1.1.18 and there are some big nasty warnings from borg about mixing these. However, after much reading and attempt to understand the problems I think the risk to us is extrmeely low 19:07:39 <clarkb> basically what could happen is we could delete valid archives (backups) if we run `borg check --repair` on a borg 1.x created archive using 1.2 19:08:09 <clarkb> however, this particular issue seems to be unlikely for us because we never used a borg old enough to produce the archives that would now be considered invalid. And we don't automatically run borg check --repair anywhere 19:08:29 <corvus> (sorry for tardiness; almost ready to make progress on images; expect real progress by next week) 19:08:44 <clarkb> (personally I would've appreciated clearer and more direct communication of the problem from borg rather than the big scary messages we got but we muddled through) 19:09:17 <clarkb> so anyway paste02 is backing up with borg 1.2.8 to servers running 1.1.18. Worst case today only paste02's backups would be impacted but as mentioend I don't expect problems anyway 19:09:19 <clarkb> corvus: ack thanks 19:09:25 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/939667 fixup for warnings treated as errors with new borg spamming email 19:10:04 <clarkb> there is one little borg update annoyance though and that is brog 1.2 exits with rc 1 if there are warnings so we've been spamming root email with backup failures that are most likely all warnings not errors. This change would treat rc 1 when using newer borg as a successful backup 19:10:22 <clarkb> the most common warning (the only one I've seen anyway) is a warning for files changing while being backed up. This is common with log files in particular 19:10:49 <fungi> also the unknown unencrypted volume warning 19:10:51 <clarkb> I think if we can get 939667 or something like it in and confirm that backups behave nicely afterwards we'll be in a spot to consider deleting paste01 and retiring its backups 19:11:11 <clarkb> fungi: that one we explicitly override though and in testing it doesn't seem to affect rc 1 with our override 19:11:16 <clarkb> I thought it did at first but that was a red herring 19:11:45 <clarkb> basically if unknown unencrypted volume warning is the only warning you get an rc 0 with our flag to ignore that warning 19:11:46 <fungi> ah, okay, so it shows up in the log as a warning but doesn't contribute to the return code? 19:12:12 <fungi> it was the only warning i saw logged in the one i looked at 19:12:21 <clarkb> yes that seems to be the case as I had logs with that warning and no others exit 0 19:12:29 <clarkb> fungi: the other warnnig isn't logged as a warning 19:12:37 <fungi> bwahahahaha 19:12:38 <clarkb> its just a message with no prefix 19:12:44 <clarkb> let me find an example really quickly 19:13:10 <clarkb> https://zuul.opendev.org/t/openstack/build/49255ec995394248a24c5eb1e11c9a68/log/borg-backup-noble.opendev.org/borg-backup-borg-backup01.region.provider.opendev.org.log#8560 19:13:13 <fungi> okay, so exits nonzero on warnings, logs warnings even when they're explicitly disabled, but also doesn't state that some warnings it logs are warnings 19:13:45 <clarkb> right. When the linked message doesn't appear we get rc 0 even with the explicit warning about unknown unencrypted volume 19:14:16 <clarkb> so anyway long story short I think this is mostly working now from podman to python 3.12 to borg. With the above change being the last known cleanup. And I think I'm happy to start on old server cleanups once the last issue is sorted 19:14:27 <clarkb> let me know if you have concerns or questions about that and we can dig in more and make sure we're happy with it 19:14:40 <clarkb> #topic Deploying Lodgeit entirely without Docker Hub 19:14:41 <fungi> for the record, the log that's currently /var/log/borg-backup-backup01.ord.rax.opendev.org.log.3.gz on paste02 is the one i was looking at 19:15:05 <clarkb> fungi: look for 'changed while' and you should see what I linked to above just in prod 19:15:22 <fungi> /var/log/borg-backup-backup01.ord.rax.opendev.org.log: file changed while we backed it up 19:15:25 <fungi> indeed, it's in there 19:15:29 <clarkb> cool 19:16:01 <clarkb> this next topic is related to the previous one in that I realized we could take this opportunity of paste02 running podman and docker compose to test our assumptions about speculative image testing and switch it over to quay entirely 19:16:09 <clarkb> #link https://review.opendev.org/c/opendev/lodgeit/+/939385 Publish lodgeit image to quay.io 19:16:27 <clarkb> that change updates where we push the image to (quay instead of docker hub) then I still need to write a followup to pull it from quay if we want to proceed with that 19:16:39 <clarkb> I think it is a good idea to triple check our assumptions before we get too far down this path again 19:16:56 <clarkb> please review and let me know if you ahve any questions or concerns about that 19:17:14 <tonyb> ++ 19:17:28 <tonyb> sounds like a good plan to me 19:17:43 <clarkb> #topic Upgrading Old Servers 19:17:53 <clarkb> I think we can move on I mostly wanted to call out the effort and discussion can proceed in review 19:18:06 <clarkb> tonyb: I know you were out until recently, but anything we need to do / think about re wiki? 19:19:07 <tonyb> nope. I'll send the announce email and switch the skin 19:19:22 <fungi> yay! so close 19:19:35 <clarkb> tonyb: did the changes still need some updates? I seem to remember planning to change the proxy setup maybe? 19:19:39 <tonyb> given your experience with paste do you think it's "safe" to go with noble 19:19:54 <clarkb> I guess ping for reviews when that is ready and let us know if they have arleady been updated 19:20:01 <tonyb> will do 19:20:21 <clarkb> noble was definitely an uplift but I am hopeful I've sorted ou the major items 19:20:39 <tonyb> okay 19:21:01 <fungi> just make sure you enable configdrive when launching it 19:21:41 <clarkb> anything else on this topic? 19:21:52 <fungi> (rackspace's jammy image doesn't need that, so it's not on by default, but our noble image needs it for cloud-init to work) 19:22:06 <tonyb> not from me. 19:22:15 <clarkb> ya becusae we uploaded tonyb's converted upstream noble image 19:22:25 <clarkb> I suspect rax does magic to make not config drive work and they haven't uploaded noble yet 19:22:33 <fungi> rackspace presumably fiddles with the image they make available 19:22:36 <clarkb> ya 19:22:39 <clarkb> #topic Gerrit 3.10.4 19:23:20 <clarkb> last week we managed to get borg going for noble and upgrade gitea. The last major item on my list was updating gerrit to 3.10.4 but due to our prior gerrit restart experience going poorly and a holiday weekend approaching with basiclly only me around I decided to defer this to this week 19:23:51 <fungi> i'm happy to help with it basically any time this week 19:23:53 <clarkb> I think I'll aim for doing this tomorrow once I'm caught up with the plan being to land the h2 db setting and also update to the newer point release 19:24:19 <fungi> links to changes/topic might help 19:24:20 <clarkb> this may require a couple of restarts to see it take effect, but otherwise it should be similar to most of our restarts 19:24:29 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/938000 19:24:42 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/939167 19:24:46 <clarkb> those are the two related changes 19:25:34 <fungi> both lgtm, thanks 19:25:38 <clarkb> the updates between 3.10.3 and 3.10.4 seem minimal and also safe 19:25:43 <clarkb> mostly bugfixes 19:25:47 <clarkb> #topic Running certcheck on bridge 19:26:14 <clarkb> fungi: is there a change for this yet? I suspect no given the distractions last week but wanted to double check 19:26:37 <fungi> i think we had rough consensus in favor last week, but no i said i wouldn't have time to put that together until this week. thanks for the reminder! 19:27:13 <clarkb> yup I definitely didn't expect it yet. Just checking 19:27:19 <clarkb> #topic Service Coordinator Election 19:27:25 <fungi> worth noting though, the build failure which prompted me to start looking at all of that turned out to be related to a github outage and not rate limits, so it's not super urgent 19:27:32 <clarkb> ack 19:27:51 <clarkb> Last week I said taht I would send email to make the proposed plan for this election official then didn't do that. So this is now on my list for today 19:28:19 <clarkb> as a reminder Nominations Open From February 4, 2025 to February 18, 2025. Voting February 19, 2025 to February 26, 2025. All times are UTC based 19:28:51 <clarkb> #topic Beginning of the Year (Virtual) Meetup 19:29:06 <clarkb> and finally a reminder we're trying to meetup several times this week (with the first block of time occuring in 1.5 hours) 19:29:12 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup 19:29:29 <clarkb> the etherpad has the schedule info and a number of topics from myself 19:29:34 <clarkb> feel free to add items 19:30:05 <clarkb> frickler do you know if the "early" block from 1800-2000 UTC tomorrow and day after are something you'll be attending? 19:30:15 <clarkb> if not I suspect we may only use the "late" blocks from 2100-2300 UTC each day 19:30:27 <frickler> I won't, sorry 19:30:46 <clarkb> ok I'll go ahead and remove the early blocks. We'll just use the late blocks 19:31:00 <tonyb> sounds good 19:31:23 <fungi> updated my reminders accordingly 19:32:04 <clarkb> we can probably do additonal self organization when we jump on meetpad later today 19:32:13 <clarkb> #topic Open Discussion 19:32:35 <clarkb> This didn't make it onto the agenda but I think we should push a Bindep release with changes up to everything before switching to pyproject.toml 19:32:57 <clarkb> then we can switch to pyproject.toml and have bindep act as a canary for how that all works with PBRs proposed updates to better support that system 19:32:59 <fungi> oh, right 19:33:12 <fungi> i can also try to find some time to push that this week 19:33:32 <fungi> related, there are a couple of documentation updates for pbr in review related to pyproject.toml support 19:33:45 <clarkb> unfortauntely the python packaging world is full steam ahead into pyproject.toml as an assumed tool and python3.12 and newer are getting clunkier without it (also creates tons of confusion for people when they need to manually install setuptools) 19:34:21 <clarkb> the changes should continue to support the old school method as long as you preinstall setuptools. But we're also trying to make pyproject.toml work for people who are too confused about that 19:34:22 <fungi> but yeah, it would be great to have one or more simple opendev projects we could point at as examples of how to use pbr that way 19:35:23 <clarkb> also we updated gitea last week which includes a "fix" for the memory leak 19:35:27 <fungi> also it was a good exercise for mapping out the remaining rough edges and future possible polish around pbr's support there, as well as fixing up its documentation 19:35:38 <clarkb> basically they disabled fuzzy search by default as fuzzy search is apparently very memory hungry 19:35:49 <clarkb> fungi: ++ 19:36:28 <clarkb> we also had to emergency apply a new user agent filter string for a valid but old version of edge that was impacting service availability 19:36:44 <clarkb> very likely more ai crawler bots not being nice 19:37:35 <clarkb> I found a hacker news post from someone else that had to take their gitea off the internet due to similar problems. The discussion on hacker news about it was interesting as apparently different crawler bots respect robots.txt in different orthogonal ways that people have inferred over time 19:37:57 <clarkb> liek apparently open ai will only respect entries specifically for its user agent string and not the generic top level rules 19:38:09 <clarkb> and others have noticed that crawl-delay has really inconsistent support 19:38:16 <fungi> but also there seemed to be some consensus that gitea is not designed to withstand aggressive crawlers, worse than many web applications anyway 19:38:25 <clarkb> https://news.ycombinator.com/item?id=42750420 that was this post 19:39:47 <fungi> as well as general lament that it's starting to seem like the only viable solutions are to start relying on cdn-oriented ai filtering service providers 19:39:53 <clarkb> what else? there was a small zuul blip with a bug that would impact a subset of playbook runs. corvus took care of that over the weekend before it was a larger problem. Thank you for the quick turnouarnd on that 19:40:24 <corvus> i had help; someone wrote a patch first, but i did restart over the weekend. :) 19:40:38 <corvus> also, i wrote the bug in the first place :( 19:41:05 <fungi> if you don't write them, who will? 19:41:32 <clarkb> I'll give it a few more minutes but we may get 15 minutes back for $meal today 19:41:32 <corvus> (fun story: it was the result of a particularly gruesome git conflict resolution. it's the biggest fail of git merge i've seen. i basically had to just completely reconstruct everything manually from the diffs) 19:41:40 <clarkb> oof 19:41:54 <fungi> i hate it when that happens 19:42:24 <fungi> e.g. when git merges a chunk to the wrong part of the file due to multiple context matches 19:42:41 <corvus> yep 19:42:55 <corvus> it got 1 line right and the remaining 70 lines wrong. 19:44:19 <clarkb> sounds like that is everything. Thank you everyone. We'll be back here next week same time and location. We're also going to hang out on meetpad from 2100-2300 UTC today, tomorrow, and thursday to go over higher level topic discussion 19:44:24 <fungi> thanks clarkb! 19:44:26 <clarkb> see you there in about 1.25 hours 19:44:31 <clarkb> #endmeeting