19:01:51 <clarkb> #startmeeting infra
19:01:51 <opendevmeet> Meeting started Tue Dec  3 19:01:51 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:51 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:51 <opendevmeet> The meeting name has been set to 'infra'
19:01:59 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/LBZPF2RDQ3MSIDIMHZZC5VIKLMYMQYDK/ Our Agenda
19:02:05 <clarkb> #topic Announcements
19:02:21 <clarkb> A reminder that we are planning to upgrade Gerrit to 3.10.3 on Friday which we will discuss in more depth later in this meeting
19:02:25 <clarkb> anything else to announce?
19:02:41 <fungi> i got nothin
19:03:38 <clarkb> #topic Zuul-launcher image builds
19:03:45 <clarkb> corvus: anything new to report on the zuul launcher image builds?
19:06:50 <clarkb> ok lets get back to this subject later if corvus pops in
19:06:55 <clarkb> #topic Backup Server Pruning
19:07:26 <clarkb> the vexxhost server is complaining about being low on disk space again
19:07:41 <clarkb> this morning we landed a chagne to retire an additional set of old no logner existing services/servers on the backup server
19:08:01 <clarkb> (this was followed up with a fixup change to file matchers for the infra-prod job that modifies backup things to deploy it before the daily jobs later today)
19:08:34 <clarkb> I haven't checked yet but the job has run successfully which in theory means we have marked those backups as retired and the next manual prune on the vexxhost server should prune those retired nodes down to a single backup
19:09:07 <clarkb> this will reclaim disk space as we normally do with the added benefit of reducing the total backups for those nodes. Once that is working happily the last step is to purge those backups entirely and fully reclaim their disk space
19:09:48 <clarkb> so the tl'dr here is we prune per usual and ensure that works happily and once it does we follow up with the pruge step
19:09:48 <corvus> hello!
19:10:01 <clarkb> then we should survive for longer before we need to prune the next time
19:10:21 <clarkb> any qusetions/concerns/comments on this backup stuff?
19:11:05 <clarkb> #topic Zuul-launcher image builds
19:11:13 <clarkb> corvus: any updates to zuul launcher image builds?
19:11:38 <corvus> nothing much -- except that i think i'm going to work on some api stuff before i start really testing that
19:12:00 <corvus> i think it will be easier to deal with images in multiple clouds once we can actually see the status with the api and web ui :)
19:12:01 <clarkb> corvus: api internally within zuul-launcher or the zuul-jobs or both?
19:12:09 <clarkb> oh zuul rest api stuff?
19:12:14 <corvus> in zuul-web
19:12:17 <corvus> yep
19:12:18 <clarkb> got it makes sense
19:13:20 <clarkb> anything else?
19:13:24 <corvus> could probably pause this topic from the agenda for a few weeks
19:13:31 <clarkb> can do thanks
19:13:37 <clarkb> #topic Upgrading old servers
19:13:56 <clarkb> tonyb: not sure if you are around (I know you are shuffling time zones and unsure of what that means for meeting attendance), but any updates here?
19:15:22 <clarkb> On my side of things I'm hoping to start digging into this once the Gerrit upgrade is complete
19:15:31 <clarkb> as upgrading the gerrit server is the next big item on my todo list
19:15:44 <clarkb> but I don't really have any input other than what we've already talked about previously
19:15:52 <clarkb> #topic Docker Hub Rate Limits
19:16:11 <clarkb> anecdotally it feels like we're failing less from this now
19:16:37 <clarkb> which I'll attribute to things like fetching zuul-registry from quay for buildest registry jobs and probably also others who have started to look into hosting on quay / relying on docker hub less in general
19:17:07 <corvus> well, the mirror role i wrote to help us deal with this is sure still hitting the error
19:17:09 <corvus> and zuul is as well
19:17:21 <clarkb> ya it is still happening just less frequently at least for system-config changes
19:17:27 <corvus> reviews on https://review.opendev.org/935574 would be appreciated so we have that tool available if we want to use it
19:17:32 <clarkb> ++
19:17:54 <clarkb> #link https://review.opendev.org/935574 roles to mirror container images from docker hub to elsewhere to reduce our reliance on docker hub
19:17:59 <corvus> please do review it regardless of the -1 -- the failures should just be dockerhub.
19:19:11 <corvus> once it exists, of course we can use it to mirror docker.io/registry:2 and resolve those
19:19:19 <clarkb> there were also questions (from openstack-helm I think) in the openstack-infra channel today about updating the container roles to support uploads to quay
19:19:30 <clarkb> I pointed out that zuul is already doing so and asked them to look at what zuul is doing to make it work
19:19:40 <fungi> looks like zuul-jobs-test-mirror-container-images did pass at one point on the latest patchset
19:19:47 <clarkb> https://review.opendev.org/c/zuul/zuul-jobs/+/936909 was the change proposed to fix it (though I'm not sure any fixing is required yet)
19:20:05 <corvus> yeah, i mean, it clearly works.  :)
19:20:18 <corvus> the whole point of any role with "container" in it is that it is not docker-specific
19:20:23 <clarkb> yup
19:20:25 <corvus> if it says "docker" in the role, name, it is.
19:20:30 <clarkb> and it definitely works for zuul
19:20:41 <clarkb> so we'll see if they have any updates after cross checking against what zuul is doing
19:21:35 <clarkb> #topic Docker compose plugin with podman service for servers
19:21:47 <clarkb> tonyb was going to look into this but I'm not aware of any updates
19:22:01 <clarkb> if no one else has input we can continue on
19:23:21 <clarkb> #topic Gerrit 3.10 Upgrade Planning
19:23:28 <clarkb> #link https://etherpad.opendev.org/p/gerrit-upgrade-3.10 Gerrit upgrade planning document
19:23:41 <clarkb> Any feedback you may have for this plan is still very much appreciated
19:24:01 <clarkb> assuming nothnig goes wrong in the next coupel of days my intention is still to upgrade from 3.9.8 to 3.10.3 Friday December 6 at 1600 UTC
19:24:35 <fungi> sounds good, i plan to be around
19:24:42 <clarkb> yesterday we updating our images from 3.9.7 to 3.9.8 and 3.10.2 to 3.10.3 and updated production to 3.9.8 which means I now need to rerun through that upgrade process doc on some recently held nodes to make sure nothing new has popped up with the new images
19:24:52 <clarkb> My goal is to do that this afternoon or first thing tomorrow
19:25:32 <clarkb> when we upgraded to 3.9.8 we discoverd some new annoying login behavior
19:25:50 <clarkb> #link https://issues.gerritcodereview.com/issues/381996067
19:26:19 <clarkb> this problem appears to affect 3.9 3.10 3.11 and master so the best we can do is live with it and help upstream fix it (which I've been trying to test a fix for between meetings that I haven't fully accomplished yet)
19:26:40 <clarkb> basically the problem is the sign in button links aren't getting updated to properly match the url you are at so that when you log in you return back to the page you are in
19:27:07 <clarkb> instead the login url is https://review.opendev.org/login/%2F which ends up sending you to https://review.opendev.org// (note the double /) after you login and that page has no content
19:27:27 <clarkb> you can simply navigate away from that page after you login and go back to where you were or always hit refresh before you login whcih corrects the login links on the buttons
19:27:34 <fungi> and extra / just in case you lost one
19:27:37 <corvus> that issue says someone is working on a fix, but i don't see upstream changes linked...
19:28:11 <clarkb> corvus: ya I need to respond to the issue. Paladox is the person working on a fix in https://gerrit-review.googlesource.com/c/gerrit/+/444801 and https://gerrit-review.googlesource.com/c/gerrit/+/444802 but imo these aren't realk fixes
19:28:23 <clarkb> all that will do is change the login url to https://review.opendev.org/login/
19:28:35 <clarkb> it won't properly update the login urls to match the existing page yo uare on so that you return to that page
19:28:57 <fungi> in theory we could also collate sequential runs of / with rewrites in the apache config fronting the server (though that's potentially lossy, probably not in any way that matters)
19:29:18 <clarkb> its more a workaround for landing on the // page which has no content but the navigation bar and I'm not even sure it accomplishes that (this is what I'mtrying to test but discovered just before the meeting that the behavior between normal developer login mode and openid logins is sufficiently different ot not be accurate and I have to reconfigure my held node)
19:29:29 <corvus> that would be equivalent to paladox's change, yeah?
19:29:42 <clarkb> corvus: yes fungi's idea is basically what paladox is attempting to do
19:30:02 <fungi> just at the apache layer instead of internal to gerrit's javascript/typescript
19:30:05 <corvus> so maybe the apache thing is worth doing just to make it a little less weird/annoying?  but i agree, having the button actually work would be nice.
19:30:31 <clarkb> the "good" news is that upstream gerrit is affected too
19:30:44 <fungi> and... they just didn't notice for months?
19:30:51 <clarkb> just a littl less dramatically as instead of going to // you get redirected to your personal self dashboard instead of wherever you logged in from
19:31:02 <fungi> or i guess it hasn't been that long since it appeared in a very recent point release
19:31:07 <clarkb> fungi: these are all new changes that went in on master/3.11 that got backported into 3.10.3/3.9.8
19:31:30 <clarkb> but also they don't get the super obviously wrong redirect they get a subtley wrong redirect
19:31:43 <clarkb> but I'm hoping that upstream being affected means that a solution is something they are interested in
19:31:59 <fungi> got it, so affected users could simply have assumed it was an intentional beahvior change there
19:32:43 <clarkb> in any case as discussed yesterday the problem seems to be somethign we can live with and proceeding with upgrades as planned shouldn't make it worse (or better)
19:32:54 <clarkb> so I think we proceed to 3.10.3 as long as nothing more major pops up
19:33:22 <corvus> sgtm
19:33:23 <fungi> sounds good to me, thanks for digging into that
19:34:11 <clarkb> #topic Upgrading Gitea to 1.22.4
19:34:19 <clarkb> the last item on the prepared agenda is gitea made a point release
19:34:26 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/936198 Upgrade gitea to 1.22.4
19:34:38 <clarkb> we should probably go ahead and upgrade to that this week if we have time to monitor it
19:35:17 <clarkb> however, I don't believe this will address the memory issue in 1.22
19:36:04 <clarkb> https://github.com/go-gitea/gitea/issues/31565 as this issue is still open
19:36:25 <clarkb> maybe aim for tomorrow morning to kland that. I suspect I'll have time to monitor that then
19:36:32 <fungi> i'm happy to work on that upgrade tomorrow, yeah
19:36:50 <clarkb> cool and please let me know if you see anything concerning in the change or the upstream changelog that needs addressing before then
19:36:56 <clarkb> #topic Open Discussion
19:36:59 <clarkb> Anything else?
19:37:51 <clarkb> I will be popping out early tomorrow afternoon for a school function
19:38:04 <clarkb> I think I need to be there by 4:30 so probably out around 4
19:38:09 <clarkb> (local)
19:38:26 <fungi> i expect to be around then, if anything comes up
19:40:10 <clarkb> the lodgeit captcha fix landed but it seems we didn't automatically pull and restart services on paste
19:40:19 <clarkb> we can do that manually if we like or just let the daily jobs do it
19:41:49 <clarkb> I'll give it until 19:45 and end the meeting there if nothing else comes up
19:41:59 <clarkb> I guess I should note a meeting schedule for December/January
19:42:27 <clarkb> I think we'll have meetings the 10th, 17, and january 7th
19:42:47 <clarkb> skipping december 24 and 31 as those are both holidays for many (Chrismas Eve and New Years Eve)
19:42:55 <corvus> sounds good
19:42:58 <fungi> i'll miss the 17th as i will be operating a motor vehicle on an interstate highway at that time
19:43:04 <fungi> around for the rest
19:43:19 <clarkb> ack. My kids are still in school then so I'll be around
19:45:32 <clarkb> and we are at the previously mentioned stop time
19:45:35 <clarkb> thanks everyone!
19:45:37 <clarkb> #endmeeting