19:00:13 <clarkb> #startmeeting infra
19:00:13 <opendevmeet> Meeting started Tue Jan 16 19:00:13 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:13 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:13 <opendevmeet> The meeting name has been set to 'infra'
19:00:23 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/6MI3ENLNO7K43AW6PPVMW52K4CHSW7VQ/ Our Agenda
19:00:27 <frickler> \o
19:00:56 <clarkb> #topic Announcements
19:01:23 <clarkb> A remidner that we'll be doing an OpenInfra Live episode covering all things OpenDev Thursday at 1500 UTC
19:01:37 <clarkb> feel free to listen in and ask questions
19:02:07 <frickler> vPTG will be 2024-04-08 to -04-12
19:02:39 <clarkb> yup as mentioend in #opendev I inteded to discuss an idea for us to participate this time around at the end of the meeting
19:02:51 <clarkb> signups happen now through february 18
19:03:35 <clarkb> #topic Server Upgrades
19:03:45 <clarkb> Not sure if we've got tonyb here (it is a bit early in the day)
19:03:55 * tonyb waves
19:04:07 <clarkb> progress has been made on meetpad server upgrades. tonyb and I did testing the other day to figure out why the jvb wasn't used in the test env
19:04:20 <fungi> thanks for working through that!
19:04:21 <clarkb> tl;dr is that the jvb was trying to connect to the prod server and that made everything sad
19:04:31 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/905510 Upgrading meetpad service to jammy
19:04:42 <clarkb> this stack of changes is the current output of that testing and effort from tonyb
19:05:08 <clarkb> reviews would be great. I think we can safely land all of those changes since they shouldn't affect prod but only improve our ability to test the service in CI
19:06:02 <clarkb> tonyb: anything else to call out on this topic? definitely continue to ping me if you need extra eyeballs
19:06:56 <tonyb> I think meetpad is under control.  As I mentioned I've started thinking about the "other" servers so if people have time looking at my ideas for wiki.o.o would be good
19:07:28 <fungi> where were those ideas again?
19:07:39 <tonyb> Is there a reason to keep hound on focal? or is the main idea to get off of bionic first
19:07:51 <clarkb> tonyb: only that the priority was to ditch the older servers first
19:07:56 <clarkb> hound should run fine on jammy
19:08:08 <tonyb> fungi: https://etherpad.opendev.org/p/opendev-bionic-server-upgrades#L58
19:08:14 <tonyb> #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades#L58
19:08:16 <fungi> thanks!
19:08:27 * clarkb makes a note to review the notes
19:08:42 * fungi notes clarkb's note to review the notes
19:08:48 <tonyb> Okay I'll add hound to my list.  it looks like it'll be another simple one
19:09:20 <fungi> i wish gitea would gain feature parity with hound, and then we could decommission another almost redundant service
19:09:20 <corvus> i wonder if anything with gitea code search has changed
19:09:27 <clarkb> corvus: its still "weird"
19:09:37 <corvus> bummer
19:09:40 <tonyb> Assuming that none of the index/cached data is critical and can/will be regenerated on a new server
19:09:57 <fungi> tonyb: yes, there's no real state maintained there. it can just be replaced whole
19:10:09 <clarkb> the latest issue we discovered was that you need to update files for it ot notice things have changed to update the index. It doesn't just reindex everything all at once so sometimes things get missed and don't show up
19:10:26 <clarkb> this is for gitea not hound. Hound reindexes everything on startup then does incremental pulls to reindex over time
19:10:41 <fungi> also i prefer that hound gives you all matching lines from a repo, rather than just the repos which matched and one or two example matches
19:11:13 <fungi> i guess that could be a feature request for a search option or something
19:11:27 <clarkb> #topic Python Container Updates
19:11:36 <clarkb> Not to cut the previous discussion short but want to ensure we get time for everything today
19:11:46 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/905018 Drop Bullseye python3.11 images
19:11:58 <clarkb> This is a cleanup change that we can make to our image builds after zuul-registry updated
19:12:12 <clarkb> we are still waiting on zuul-operator to switch in order to remove the last bullseye image
19:12:40 <clarkb> I also used Zuul as a test platform for the 3.12 images and after fixing an import for a removed module it all seems to be working
19:13:02 <clarkb> My main concern was that we might have 3.12 changes that affect the images themselves but that doesn't appear to be the case
19:13:59 <clarkb> #topic Upgrading Zuul's DB Server
19:14:21 <clarkb> There were hardware issues? on the hosting side that caused the existing db server to be migrated
19:14:42 <clarkb> frickler: found at least one build whose final status wasn't recorded properly in the db and the short outage for that migration appeas to be the cause
19:15:08 <clarkb> I don't think this drastically changes the urgency of this move, but does give more fuel to the replace it fire I guess
19:15:31 <frickler> nice wording, I agree to that
19:15:44 <clarkb> that said I still haven't found much time to think it through. I'm somewhat inclined to go with the simplest thing closest to what we're already doing as a result. Spin up a single mysql/mariadb server with the intent of scaling it up later should we decide to
19:16:10 <tonyb> Yeah I think that
19:16:21 <tonyb> s my preference as well
19:17:25 <tonyb> it *seems* like we can add slaves after the fact with a small outage
19:17:26 <clarkb> we don't have to agree to that now, but raising objections before the next meeting would be good and we could write that down somewhere as the plan otherwise
19:19:00 <clarkb> #topic Matrix Homeserver Hosting
19:19:15 <clarkb> Last week fungi reached out to EMS about updating our hosting service/plan
19:19:44 <clarkb> fungi: is there anything to add to that? I assume that our existing service will continue running as is without outage or config changes its merely the business side that changes?
19:20:12 <fungi> yeah, it sounds like they'll just update our hosting plan, bill the foundation, and increase our quota for user count
19:20:30 <fungi> but they're supposed to get up with us by the end of the month with details
19:20:59 <clarkb> thank you for digging into that. I guess we'll practice patience until they get back to us
19:21:12 <tonyb> hopefully billing for the new plan doesn't start until ^^ happens
19:21:16 <fungi> the up side is we no longer have to be quite so careful about the number of accounts we create on it, since the limit for that's increasing from 5 to 20
19:21:32 <fungi> tonyb: yes, it will take effect when our current plan runs out, they said
19:22:18 <tonyb> cool beans
19:22:24 <fungi> and they're contacting us at least a week prior to that
19:22:40 <fungi> which is all the info i have for now
19:22:48 <clarkb> #topic OpenAFS Quota Issues
19:22:53 <fungi> folks with access to the infra-root inbox can also read the e-mail they sent me
19:23:09 <tonyb> Oh that reminds me ...
19:23:38 <tonyb> fungi: IIUC you use mutt to access that ... can you share a redacted mutt.config?
19:23:47 <clarkb> I spent some time digging into this last week
19:23:53 <fungi> tonyb: sure, we'll catch up after the meeting
19:24:00 <tonyb> perfect
19:24:06 <clarkb> There were no obvious issues with extra arches in the ubuntu-ports volume. That would've been too easy :)
19:24:32 <clarkb> However, I think we can probably start the process to cleanup ubuntu bionic in ports and maybe debian buster
19:25:19 <clarkb> The xenial indexes are still there, not getting updated and the packages don't appear to be there either. This is a minimal cleanup opportunity since it is just the indexes but we could clear those out when we clear out the others as I expect it to take the same process
19:25:21 <fungi> "ubuntu ports" in this case means arm64 packages, for clarification
19:25:38 <fungi> not amd64
19:25:41 <clarkb> right
19:25:55 <clarkb> on the ubuntu amd64 mirror volume I noticed that we actually mirror a number of source packages
19:26:11 <clarkb> unfortunately it seems like reprepro uses these to detect package updates and we can't easily remove them?
19:26:28 <clarkb> (we'd have to do a post reprepro step to delete them I think and then have indexes that point at files that don't exist which is not nice)
19:26:38 <clarkb> at least I wasn't able to find a reprepro flag to not mirror those packages
19:26:48 <fungi> well, it would be the "source indices" which aren't the same files
19:27:09 <clarkb> oh maybe we can delete those too then?
19:27:37 <clarkb> I suspect this would be a pretty good size reduction if we can figure out a good way to remove those files
19:27:55 <fungi> but also we might be veering close to violating licenses for some packages if we don't also host sources for them
19:28:25 <fungi> since we technically are redistributing them
19:28:27 <clarkb> even as a mirror of an upstream that isn't modifying content or code?
19:29:13 <fungi> then again, we omit mirroring sources for rpm-based distros right?
19:29:18 <clarkb> yes I think so
19:29:26 <fungi> so maybe don't worry about it unless someone tells us we need to
19:29:30 <clarkb> my non lawyer understanding of the gpl for example is that we'd need to provide sources if asked
19:29:37 <clarkb> so we'd go grab the source for the thing and provide that
19:29:45 <fungi> right
19:29:49 <clarkb> not that we necessarily have to provide it next to the binary package
19:30:13 <fungi> the bigger risk there is if we continue to serve packages which are no longer available elsewhere
19:30:36 <fungi> since actually fulfilling such requests would become challenging
19:30:47 <clarkb> ya I suppose that is possible but also not super likely for ubuntu in particular who has everything in source control. Might be a pain tog dig out of launchpad though
19:31:29 <clarkb> I'm also open to other ideas for improving disk consumption. The other thing I wanted to look into but haven't done yet is whether or not the stream mirrors are still providing many copies of certain packages that might be prunable
19:32:05 <frickler> but that's another good reason to drop eoled versions, I like that ;)
19:32:25 <clarkb> ya I think we can start there
19:32:31 <fungi> frickler: agreed
19:33:32 <clarkb> #topic Broken Wheel Cache/Mirror Builds
19:34:14 <clarkb> one issue was that openafs was not working on arm64 nodes for centos stream. The problem there appears to be that our images stopped getting rebuilt on nb04 which meant our test nodes had old kernels
19:34:21 <clarkb> the old kernels didn't work with new openafs packaging stuff
19:34:36 <clarkb> I claened up nb04 and we have new images now
19:35:08 <clarkb> https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/905270 merged after the iamges updated
19:35:16 <clarkb> I don't know whether or not we've built a new package yet?
19:35:46 <clarkb> but progress anyway. Would be good to finish running that down and tonyb indicated an interest in doing that
19:36:05 <tonyb> Yup.
19:37:11 <fungi> it was pretty late when i approved that, i think, so may need to wait until later today
19:37:32 <clarkb> ack
19:37:34 <clarkb> #topic Gitea disks filling
19:37:56 <clarkb> The cron job configuration merged but looking at app.ini timestamps and process timestamps I don't think we restarted gitea to pick up the changes
19:38:19 <clarkb> we've got two toehr related chagnes and I'm thinking we get those reviewed and possibly landed then we can do a manual rolling restart if still necessary
19:38:21 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/904868 update robots.txt on upstream's suggestion
19:38:27 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/905020 Disable an unneeded cron job in gitea
19:39:28 <tonyb> They both have enough votes to approve
19:39:56 <clarkb> oh cool I think I missed that when looking at gerrit
19:39:57 <tonyb> so yeah I think we're good to land and rolling restart
19:40:20 <clarkb> I think we want to approve 905020 first in case 904868 does an automated rolling restart
19:40:58 <clarkb> I hesitate to approve myself as we're supposed to haev an ice storm starting in about 3 hours
19:41:06 <clarkb> but can approve wednesday if I haven't lost power
19:41:19 <corvus> consider rescheduling the ice storm
19:41:36 <clarkb> I wish. I'm tired of this cold weather. Tomorrow will be above freezing for the first time since Friday
19:42:00 <tonyb> I can approve it and watch for updates and restarts
19:42:09 <clarkb> thanks
19:42:13 <clarkb> #topic OpenDev Service Coordinator Election
19:42:27 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/TB2OFBIGWZEYC7L4MCYA46EXIX5T47TY/
19:42:39 <clarkb> I made an election schedule official on the mailing list
19:43:00 <clarkb> I continue to be happy for someone else to step into the role and will support whoever that may be
19:43:12 <clarkb> I also continue to be willing to do the work should others prefer not to
19:43:28 <clarkb> One nice thing about getting the schedule otu now is it should give everyone plenty of time to consider it :)
19:43:57 <fungi> you deserve a break, but also cycling back through those of us who were infra ptls is probably equally counter-productive
19:44:08 <fungi> a new volunteer would be ideal
19:45:17 * tonyb sees the amount of work clarkb does and is awestruck
19:45:51 <clarkb> heh I'm just chasing the electrons around. Eventually I might catch one
19:46:22 <clarkb> #topic Open Discussion
19:46:38 <clarkb> As mentioned earlier I'm thinking it would be a good idea for us to try out the PTG again
19:47:39 <tonyb> Yup I think that'd be good.
19:47:40 <clarkb> the reason for that is we've got a number of pots on the fire between general maintenance (gitea upgrades container image updates etc), fixing issues that arise, and future looking stuff that I think some focused time would be good
19:48:15 <fungi> mmm, hotpot
19:48:21 <fungi> count me in!
19:48:24 <clarkb> I wanted to bring it up because we said we could also just schedule time if we need it. I'm basically asserting that I think we could use some focused time and I'm happy to organize it outside of/before the PTG as well or as an alternative
19:49:00 <tonyb> focused time is good.
19:49:00 * frickler would certainly prefer something outside of the PTG
19:49:15 <clarkb> frickler: thats good feedback
19:49:56 <tonyb> I guess during the PTG could make it hard for us "as a team" to cover off attending other projects sessions
19:50:03 <clarkb> tonyb: ya thats the struggle
19:50:05 <tonyb> (including the TC)
19:50:46 <clarkb> given frickler's feedback I'm thinking that maybe the best thing would be to schedule a couple of days prior to the PTG (maybe as early as February) and then also sign up for the PTG and just do our best during the bigger event
19:51:24 <fungi> we can also call it ptg prep, and use it for last-minute testing of our infrastructure
19:51:35 <fungi> meetpad, etherpad, ptgbot, etc
19:51:42 <clarkb> I'll be able to look at a calendar and cross check against holidays and other events after the OIL episode. I'll try to get that out Fridayish
19:51:45 <clarkb> fungi: ++
19:51:55 <fungi> as opposed to doing it after the ptg
19:52:09 <clarkb> fungi: yup I think we should definitely do it before
19:52:27 <clarkb> even the PTG feels a bit late, but having somethigni nFebruary and something in April seems not too crazy
19:52:27 <tonyb> Works for me
19:52:56 <tonyb> I don't think I'm able t relocate to the US for the PTG this time which sucks :(
19:54:20 <clarkb> :/ we'll just figure out how to make timezones work as best we can
19:54:30 <clarkb> Anything else in our last ~6 minutes?
19:54:57 <frickler> what about the expiring ssl certs?
19:55:12 <clarkb> frickler: tonyb and I were going to do the linaro one today
19:55:23 <clarkb> I'll remind the openstack.org sysadmins about the other one
19:55:31 <frickler> there were also some more coming up recently
19:56:13 <frickler> mirror02.ord.rax.opendev.org and review.opendev.org
19:56:15 <fungi> yeah, i saw those, i suspect something started failing the letsencrypt periodic job, need to check the logs
19:56:25 <clarkb> either that or we need to restart apache to clear out old workers
19:56:40 <fungi> other possibility is apache processes clinging to old data, yep
19:57:07 <frickler> ok, so everything under control, fine
19:57:09 <clarkb> looks like certs on the mirror node are not new files so likely the jobs
19:57:29 <clarkb> its a good callout and why we alert with 30 days of warning so that we can fix whatever needs to be fixed in time :)
19:58:02 <fungi> full ack
19:58:28 <clarkb> thank you for your time everyone
19:58:35 <clarkb> I'll call it here
19:58:37 <clarkb> #endmeeting