19:00:13 <clarkb> #startmeeting infra 19:00:13 <opendevmeet> Meeting started Tue Jan 16 19:00:13 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:13 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:13 <opendevmeet> The meeting name has been set to 'infra' 19:00:23 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/6MI3ENLNO7K43AW6PPVMW52K4CHSW7VQ/ Our Agenda 19:00:27 <frickler> \o 19:00:56 <clarkb> #topic Announcements 19:01:23 <clarkb> A remidner that we'll be doing an OpenInfra Live episode covering all things OpenDev Thursday at 1500 UTC 19:01:37 <clarkb> feel free to listen in and ask questions 19:02:07 <frickler> vPTG will be 2024-04-08 to -04-12 19:02:39 <clarkb> yup as mentioend in #opendev I inteded to discuss an idea for us to participate this time around at the end of the meeting 19:02:51 <clarkb> signups happen now through february 18 19:03:35 <clarkb> #topic Server Upgrades 19:03:45 <clarkb> Not sure if we've got tonyb here (it is a bit early in the day) 19:03:55 * tonyb waves 19:04:07 <clarkb> progress has been made on meetpad server upgrades. tonyb and I did testing the other day to figure out why the jvb wasn't used in the test env 19:04:20 <fungi> thanks for working through that! 19:04:21 <clarkb> tl;dr is that the jvb was trying to connect to the prod server and that made everything sad 19:04:31 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/905510 Upgrading meetpad service to jammy 19:04:42 <clarkb> this stack of changes is the current output of that testing and effort from tonyb 19:05:08 <clarkb> reviews would be great. I think we can safely land all of those changes since they shouldn't affect prod but only improve our ability to test the service in CI 19:06:02 <clarkb> tonyb: anything else to call out on this topic? definitely continue to ping me if you need extra eyeballs 19:06:56 <tonyb> I think meetpad is under control. As I mentioned I've started thinking about the "other" servers so if people have time looking at my ideas for wiki.o.o would be good 19:07:28 <fungi> where were those ideas again? 19:07:39 <tonyb> Is there a reason to keep hound on focal? or is the main idea to get off of bionic first 19:07:51 <clarkb> tonyb: only that the priority was to ditch the older servers first 19:07:56 <clarkb> hound should run fine on jammy 19:08:08 <tonyb> fungi: https://etherpad.opendev.org/p/opendev-bionic-server-upgrades#L58 19:08:14 <tonyb> #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades#L58 19:08:16 <fungi> thanks! 19:08:27 * clarkb makes a note to review the notes 19:08:42 * fungi notes clarkb's note to review the notes 19:08:48 <tonyb> Okay I'll add hound to my list. it looks like it'll be another simple one 19:09:20 <fungi> i wish gitea would gain feature parity with hound, and then we could decommission another almost redundant service 19:09:20 <corvus> i wonder if anything with gitea code search has changed 19:09:27 <clarkb> corvus: its still "weird" 19:09:37 <corvus> bummer 19:09:40 <tonyb> Assuming that none of the index/cached data is critical and can/will be regenerated on a new server 19:09:57 <fungi> tonyb: yes, there's no real state maintained there. it can just be replaced whole 19:10:09 <clarkb> the latest issue we discovered was that you need to update files for it ot notice things have changed to update the index. It doesn't just reindex everything all at once so sometimes things get missed and don't show up 19:10:26 <clarkb> this is for gitea not hound. Hound reindexes everything on startup then does incremental pulls to reindex over time 19:10:41 <fungi> also i prefer that hound gives you all matching lines from a repo, rather than just the repos which matched and one or two example matches 19:11:13 <fungi> i guess that could be a feature request for a search option or something 19:11:27 <clarkb> #topic Python Container Updates 19:11:36 <clarkb> Not to cut the previous discussion short but want to ensure we get time for everything today 19:11:46 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/905018 Drop Bullseye python3.11 images 19:11:58 <clarkb> This is a cleanup change that we can make to our image builds after zuul-registry updated 19:12:12 <clarkb> we are still waiting on zuul-operator to switch in order to remove the last bullseye image 19:12:40 <clarkb> I also used Zuul as a test platform for the 3.12 images and after fixing an import for a removed module it all seems to be working 19:13:02 <clarkb> My main concern was that we might have 3.12 changes that affect the images themselves but that doesn't appear to be the case 19:13:59 <clarkb> #topic Upgrading Zuul's DB Server 19:14:21 <clarkb> There were hardware issues? on the hosting side that caused the existing db server to be migrated 19:14:42 <clarkb> frickler: found at least one build whose final status wasn't recorded properly in the db and the short outage for that migration appeas to be the cause 19:15:08 <clarkb> I don't think this drastically changes the urgency of this move, but does give more fuel to the replace it fire I guess 19:15:31 <frickler> nice wording, I agree to that 19:15:44 <clarkb> that said I still haven't found much time to think it through. I'm somewhat inclined to go with the simplest thing closest to what we're already doing as a result. Spin up a single mysql/mariadb server with the intent of scaling it up later should we decide to 19:16:10 <tonyb> Yeah I think that 19:16:21 <tonyb> s my preference as well 19:17:25 <tonyb> it *seems* like we can add slaves after the fact with a small outage 19:17:26 <clarkb> we don't have to agree to that now, but raising objections before the next meeting would be good and we could write that down somewhere as the plan otherwise 19:19:00 <clarkb> #topic Matrix Homeserver Hosting 19:19:15 <clarkb> Last week fungi reached out to EMS about updating our hosting service/plan 19:19:44 <clarkb> fungi: is there anything to add to that? I assume that our existing service will continue running as is without outage or config changes its merely the business side that changes? 19:20:12 <fungi> yeah, it sounds like they'll just update our hosting plan, bill the foundation, and increase our quota for user count 19:20:30 <fungi> but they're supposed to get up with us by the end of the month with details 19:20:59 <clarkb> thank you for digging into that. I guess we'll practice patience until they get back to us 19:21:12 <tonyb> hopefully billing for the new plan doesn't start until ^^ happens 19:21:16 <fungi> the up side is we no longer have to be quite so careful about the number of accounts we create on it, since the limit for that's increasing from 5 to 20 19:21:32 <fungi> tonyb: yes, it will take effect when our current plan runs out, they said 19:22:18 <tonyb> cool beans 19:22:24 <fungi> and they're contacting us at least a week prior to that 19:22:40 <fungi> which is all the info i have for now 19:22:48 <clarkb> #topic OpenAFS Quota Issues 19:22:53 <fungi> folks with access to the infra-root inbox can also read the e-mail they sent me 19:23:09 <tonyb> Oh that reminds me ... 19:23:38 <tonyb> fungi: IIUC you use mutt to access that ... can you share a redacted mutt.config? 19:23:47 <clarkb> I spent some time digging into this last week 19:23:53 <fungi> tonyb: sure, we'll catch up after the meeting 19:24:00 <tonyb> perfect 19:24:06 <clarkb> There were no obvious issues with extra arches in the ubuntu-ports volume. That would've been too easy :) 19:24:32 <clarkb> However, I think we can probably start the process to cleanup ubuntu bionic in ports and maybe debian buster 19:25:19 <clarkb> The xenial indexes are still there, not getting updated and the packages don't appear to be there either. This is a minimal cleanup opportunity since it is just the indexes but we could clear those out when we clear out the others as I expect it to take the same process 19:25:21 <fungi> "ubuntu ports" in this case means arm64 packages, for clarification 19:25:38 <fungi> not amd64 19:25:41 <clarkb> right 19:25:55 <clarkb> on the ubuntu amd64 mirror volume I noticed that we actually mirror a number of source packages 19:26:11 <clarkb> unfortunately it seems like reprepro uses these to detect package updates and we can't easily remove them? 19:26:28 <clarkb> (we'd have to do a post reprepro step to delete them I think and then have indexes that point at files that don't exist which is not nice) 19:26:38 <clarkb> at least I wasn't able to find a reprepro flag to not mirror those packages 19:26:48 <fungi> well, it would be the "source indices" which aren't the same files 19:27:09 <clarkb> oh maybe we can delete those too then? 19:27:37 <clarkb> I suspect this would be a pretty good size reduction if we can figure out a good way to remove those files 19:27:55 <fungi> but also we might be veering close to violating licenses for some packages if we don't also host sources for them 19:28:25 <fungi> since we technically are redistributing them 19:28:27 <clarkb> even as a mirror of an upstream that isn't modifying content or code? 19:29:13 <fungi> then again, we omit mirroring sources for rpm-based distros right? 19:29:18 <clarkb> yes I think so 19:29:26 <fungi> so maybe don't worry about it unless someone tells us we need to 19:29:30 <clarkb> my non lawyer understanding of the gpl for example is that we'd need to provide sources if asked 19:29:37 <clarkb> so we'd go grab the source for the thing and provide that 19:29:45 <fungi> right 19:29:49 <clarkb> not that we necessarily have to provide it next to the binary package 19:30:13 <fungi> the bigger risk there is if we continue to serve packages which are no longer available elsewhere 19:30:36 <fungi> since actually fulfilling such requests would become challenging 19:30:47 <clarkb> ya I suppose that is possible but also not super likely for ubuntu in particular who has everything in source control. Might be a pain tog dig out of launchpad though 19:31:29 <clarkb> I'm also open to other ideas for improving disk consumption. The other thing I wanted to look into but haven't done yet is whether or not the stream mirrors are still providing many copies of certain packages that might be prunable 19:32:05 <frickler> but that's another good reason to drop eoled versions, I like that ;) 19:32:25 <clarkb> ya I think we can start there 19:32:31 <fungi> frickler: agreed 19:33:32 <clarkb> #topic Broken Wheel Cache/Mirror Builds 19:34:14 <clarkb> one issue was that openafs was not working on arm64 nodes for centos stream. The problem there appears to be that our images stopped getting rebuilt on nb04 which meant our test nodes had old kernels 19:34:21 <clarkb> the old kernels didn't work with new openafs packaging stuff 19:34:36 <clarkb> I claened up nb04 and we have new images now 19:35:08 <clarkb> https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/905270 merged after the iamges updated 19:35:16 <clarkb> I don't know whether or not we've built a new package yet? 19:35:46 <clarkb> but progress anyway. Would be good to finish running that down and tonyb indicated an interest in doing that 19:36:05 <tonyb> Yup. 19:37:11 <fungi> it was pretty late when i approved that, i think, so may need to wait until later today 19:37:32 <clarkb> ack 19:37:34 <clarkb> #topic Gitea disks filling 19:37:56 <clarkb> The cron job configuration merged but looking at app.ini timestamps and process timestamps I don't think we restarted gitea to pick up the changes 19:38:19 <clarkb> we've got two toehr related chagnes and I'm thinking we get those reviewed and possibly landed then we can do a manual rolling restart if still necessary 19:38:21 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/904868 update robots.txt on upstream's suggestion 19:38:27 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/905020 Disable an unneeded cron job in gitea 19:39:28 <tonyb> They both have enough votes to approve 19:39:56 <clarkb> oh cool I think I missed that when looking at gerrit 19:39:57 <tonyb> so yeah I think we're good to land and rolling restart 19:40:20 <clarkb> I think we want to approve 905020 first in case 904868 does an automated rolling restart 19:40:58 <clarkb> I hesitate to approve myself as we're supposed to haev an ice storm starting in about 3 hours 19:41:06 <clarkb> but can approve wednesday if I haven't lost power 19:41:19 <corvus> consider rescheduling the ice storm 19:41:36 <clarkb> I wish. I'm tired of this cold weather. Tomorrow will be above freezing for the first time since Friday 19:42:00 <tonyb> I can approve it and watch for updates and restarts 19:42:09 <clarkb> thanks 19:42:13 <clarkb> #topic OpenDev Service Coordinator Election 19:42:27 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/TB2OFBIGWZEYC7L4MCYA46EXIX5T47TY/ 19:42:39 <clarkb> I made an election schedule official on the mailing list 19:43:00 <clarkb> I continue to be happy for someone else to step into the role and will support whoever that may be 19:43:12 <clarkb> I also continue to be willing to do the work should others prefer not to 19:43:28 <clarkb> One nice thing about getting the schedule otu now is it should give everyone plenty of time to consider it :) 19:43:57 <fungi> you deserve a break, but also cycling back through those of us who were infra ptls is probably equally counter-productive 19:44:08 <fungi> a new volunteer would be ideal 19:45:17 * tonyb sees the amount of work clarkb does and is awestruck 19:45:51 <clarkb> heh I'm just chasing the electrons around. Eventually I might catch one 19:46:22 <clarkb> #topic Open Discussion 19:46:38 <clarkb> As mentioned earlier I'm thinking it would be a good idea for us to try out the PTG again 19:47:39 <tonyb> Yup I think that'd be good. 19:47:40 <clarkb> the reason for that is we've got a number of pots on the fire between general maintenance (gitea upgrades container image updates etc), fixing issues that arise, and future looking stuff that I think some focused time would be good 19:48:15 <fungi> mmm, hotpot 19:48:21 <fungi> count me in! 19:48:24 <clarkb> I wanted to bring it up because we said we could also just schedule time if we need it. I'm basically asserting that I think we could use some focused time and I'm happy to organize it outside of/before the PTG as well or as an alternative 19:49:00 <tonyb> focused time is good. 19:49:00 * frickler would certainly prefer something outside of the PTG 19:49:15 <clarkb> frickler: thats good feedback 19:49:56 <tonyb> I guess during the PTG could make it hard for us "as a team" to cover off attending other projects sessions 19:50:03 <clarkb> tonyb: ya thats the struggle 19:50:05 <tonyb> (including the TC) 19:50:46 <clarkb> given frickler's feedback I'm thinking that maybe the best thing would be to schedule a couple of days prior to the PTG (maybe as early as February) and then also sign up for the PTG and just do our best during the bigger event 19:51:24 <fungi> we can also call it ptg prep, and use it for last-minute testing of our infrastructure 19:51:35 <fungi> meetpad, etherpad, ptgbot, etc 19:51:42 <clarkb> I'll be able to look at a calendar and cross check against holidays and other events after the OIL episode. I'll try to get that out Fridayish 19:51:45 <clarkb> fungi: ++ 19:51:55 <fungi> as opposed to doing it after the ptg 19:52:09 <clarkb> fungi: yup I think we should definitely do it before 19:52:27 <clarkb> even the PTG feels a bit late, but having somethigni nFebruary and something in April seems not too crazy 19:52:27 <tonyb> Works for me 19:52:56 <tonyb> I don't think I'm able t relocate to the US for the PTG this time which sucks :( 19:54:20 <clarkb> :/ we'll just figure out how to make timezones work as best we can 19:54:30 <clarkb> Anything else in our last ~6 minutes? 19:54:57 <frickler> what about the expiring ssl certs? 19:55:12 <clarkb> frickler: tonyb and I were going to do the linaro one today 19:55:23 <clarkb> I'll remind the openstack.org sysadmins about the other one 19:55:31 <frickler> there were also some more coming up recently 19:56:13 <frickler> mirror02.ord.rax.opendev.org and review.opendev.org 19:56:15 <fungi> yeah, i saw those, i suspect something started failing the letsencrypt periodic job, need to check the logs 19:56:25 <clarkb> either that or we need to restart apache to clear out old workers 19:56:40 <fungi> other possibility is apache processes clinging to old data, yep 19:57:07 <frickler> ok, so everything under control, fine 19:57:09 <clarkb> looks like certs on the mirror node are not new files so likely the jobs 19:57:29 <clarkb> its a good callout and why we alert with 30 days of warning so that we can fix whatever needs to be fixed in time :) 19:58:02 <fungi> full ack 19:58:28 <clarkb> thank you for your time everyone 19:58:35 <clarkb> I'll call it here 19:58:37 <clarkb> #endmeeting