19:01:56 <fungi> #startmeeting infra
19:01:56 <opendevmeet> Meeting started Tue Feb  4 19:01:56 2025 UTC and is due to finish in 60 minutes.  The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:56 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:56 <opendevmeet> The meeting name has been set to 'infra'
19:02:30 <fungi> #link as always, the agenda is at https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:02:50 <fungi> #topic Announcements
19:03:13 <fungi> i don't have any announcements in mind, did anyone have anything that needs mentioning?
19:03:18 <clarkb> not me
19:03:45 <fungi> i'll skip over the actions and specs review sections since they seem to be similarly empty
19:04:00 <fungi> #topic Zuul-launcher image builds (corvus)
19:04:27 <fungi> the agenda mentions that zuul-launcher related configs (images, labels, providers, etc) have been refactored into a zuul-providers repo
19:04:38 <fungi> #link https://opendev.org/opendev/zuul-providers
19:04:53 <fungi> also that Zuul itself is starting to explore dogfooding of the new launcher managed images in jobs zuul runs
19:05:21 <fungi> i reviewed some of those changes, but don't really have any updates myself
19:05:26 <corvus> i was hoping we could keep the jobs separate, but they have to live together with the image definitions; so the idea is we'll (eventually) put that repo in all the tenants, but only load jobs/projects in the opendev tenant.
19:06:04 <corvus> anyway, that's almost exactly what was in opendev/zuul-jobs before, so anyone working on image jobs can just retarget that repo with no changes
19:06:24 <fungi> is the colocation of build jobs and image definitions a security requirement?
19:06:30 <corvus> yep
19:06:42 <fungi> sounds good to me
19:07:14 <fungi> any next steps? what should we be reviewing/changing now?
19:07:19 <corvus> i've run a job using a node on the launcher, including the new nodescan functionality
19:07:41 <corvus> i think next i'll probably propose something like having zuul run its unit test jobs on new-style nodes
19:07:47 <corvus> so we can get a little more volume on it
19:08:21 <fungi> cool so we can do a piecemeal migration from nodepool-managed labels to zuul-launcher labels?
19:08:41 <corvus> yep  (eventually; i wouldn't say we're ready for that yet)
19:08:54 <corvus> i think we want to try to catch some more bugs first :)
19:09:00 <fungi> sure, i just meant as a future low-impact migration plan
19:09:06 <corvus> yep definitely
19:09:13 <fungi> we're not stuck with a big-bang )much)
19:09:31 <corvus> i think we will have a sliding scale, and we can slide it backwards any time when the time comes
19:09:57 <fungi> great! anything else you want to note about this?
19:10:03 <corvus> that's it
19:10:38 <fungi> in that case let's move on to the next topic (unless anyone has questions, feel free to interrupt whenever)
19:10:53 <fungi> #topic Unpinning our Grafana deployment (clarkb)
19:11:04 <fungi> seems like major progress was made on this
19:11:28 <clarkb> ya this went more quickly than I anticipated
19:11:36 <fungi> the changes linked in the agenda are merged
19:11:47 <clarkb> we're running a noble grafana02 in production on the latest grafana 10 rlease as of an hour ago or so
19:12:04 <clarkb> the next steps will be cleaning up the old server once we're happy with this new one
19:12:12 <clarkb> and then figuring out an upgrade to grafana 11
19:12:21 <fungi> yeah, i was about to ask
19:12:29 <clarkb> I guess be on the lookout for cleanup changes as I'll try to get those up soon
19:12:43 <fungi> so probably similar process but in-place now that we're on a noble server?
19:12:46 <clarkb> ya
19:12:55 <clarkb> the 11 upgrade will probably require updates to grafyaml or our graph definitions though due to angular being deprecated in 10
19:13:14 <clarkb> it looks like 11 may have a toggle to reenable angular if it comes to that too but I think we should try and mvoe our graphs away from angular first
19:13:22 <fungi> so hold a 11.x test node, see if it works, then let the upgrade deploy if no obvious issues are identified
19:13:38 <clarkb> yup with also debugging of angular deprecation on that held node
19:13:44 <fungi> oh, got it, we probably need adjustments to grafyam
19:13:46 <fungi> l
19:14:46 <fungi> anything else we should know? or anyone have questions?
19:14:55 <clarkb> not from me
19:15:25 <fungi> great, thanks for working on this!
19:15:37 <fungi> #topic Upgrading Old Servers (clarkb)
19:15:53 <clarkb> mostly this is a catch all for updates around this.
19:16:02 <clarkb> We did update launch node to error if we detect fewer than 2 cpus
19:16:06 <fungi> tonyb was working on the wiki, i might have missed updates if there were any
19:16:16 <clarkb> ya not sure if tonyb has any updates for wiki specifically
19:16:38 <fungi> and we've still got cacti, storyboard, and translate as well
19:17:21 <fungi> seems like probably nothing to cover here for now, aside from the change to detect problem instances in rax xen
19:17:31 <clarkb> ++
19:17:38 <fungi> changes i guess, with the typo correction
19:17:53 <fungi> #topic Sprinting to Upgrade Servers to Focal (clarkb)
19:18:00 <fungi> related to the previous topic
19:18:20 <clarkb> this is an idea I had that came out of doign the paste and grafana server replacments. The work itself is often fairly straight forward with most issues cause in CI before we deploy anything
19:18:37 <clarkb> then the major time sink is waiting for reviews no the various changes to update dns, add to inventory, reupdate dns etc
19:19:14 <fungi> i'll note that the topic is probably a typo
19:19:22 <fungi> i guess you meant upgrade to noble
19:19:22 <clarkb> oh yes
19:19:24 <clarkb> sorry
19:19:29 <fungi> no probs
19:19:57 <clarkb> so basically I was wondering if othes woukld be willing to focus on this next week so that we can try and speed the process up and get some of the lwoer hanging fruit done
19:20:05 <fungi> i should have read the notes under it before i set the topic ;)
19:20:18 <fungi> i'm around next week for a sprint. did you have a particular day or days in mind?
19:20:21 <clarkb> part of my end state goal I'm hoping for is general confidence in noble and podman before we replace the gerrit server
19:20:43 <fungi> that would certainly be good to have
19:20:45 <clarkb> no particular days probably start monday and end when we're tired of working on this specifric set of tasks
19:21:05 <clarkb> and just ask people to try and help replace servers as well as review changes to replace servers
19:21:24 <fungi> to be clear, this is essentially a blocker to moving our images off dockerhub to quay, if we want to retain speculative testing of images, right?
19:21:43 <clarkb> yes
19:21:52 <clarkb> there are many reasons to do it
19:21:52 <fungi> just making sure i've got the motivation stated
19:22:05 <clarkb> better ci, less docker hub, less old ubuntu
19:22:18 <fungi> sure, upgrades are a good idea regardless, but at least to me that's the big carrot
19:22:28 <fungi> dockerhub equals pain
19:23:34 <clarkb> I think most of the platform specific gotchas have been addressed at this point. Now we just need to do the uplift hence the ask for focused time on it
19:23:48 <corvus> i also feel like we can be flexible about reviews on essentially "rote" changes... like if there's nothing too novel about an upgrade, we've all agreed it's a good idea and it's probably okay to push that through with minimal review
19:23:58 <clarkb> I'm happy with that too
19:24:12 <corvus> and if something novel comes up, flag it for more discussion
19:24:18 <clarkb> I like that
19:24:23 <fungi> yeah, i have been doing mostly single-core approvals on those if they come from another of our sysadmins and i plan to be around to keep an eye on things
19:24:41 <fungi> so if folks are interested in making a solid dent in the random dockerhub rate limit failures for our jobs, let's try to move a bunch of stuff to new enough ubuntu next week
19:24:52 <corvus> ++
19:25:47 <fungi> anything else we want to do right now for planning on this? or questions/concerns?
19:26:03 <clarkb> nope later this week I'll try and put a todo list that pepeople can pick off
19:26:05 <clarkb> thanks!
19:27:18 <fungi> that would be great
19:27:54 <fungi> #topic Switch to quay.io/opendevmirror images where possible (clarkb)
19:28:04 <fungi> seems like we have a logical progression in topics
19:28:13 <clarkb> I tried to order them that way :)
19:28:23 <fungi> prescient
19:28:30 <clarkb> made progress on this last week but still have gerrit, zuul db, and I think one other to do
19:28:45 <clarkb> corvus: any concern with just doing this for zuul ro should it be coordinated to minimize the loss of build records?
19:29:14 <fungi> zuul-db only really affects zuul-web services right? or will it cause reporting failures while it's down? i can't remember now
19:29:29 <corvus> it could cause reporting failures
19:29:33 <fungi> and yeah, we'd presumably lose some buulds between the cracks
19:29:34 <clarkb> it will affect the record keeping of jobs that finish while the db restarts
19:29:34 <corvus> but it will also retry
19:29:38 <corvus> so do it fast enough it may be ok
19:29:56 <fungi> so we could probably get by without pausing the whole system
19:30:00 <clarkb> I think it is a relatively quick but not instantaneous restart. On the order of 15-30 seconds?
19:30:16 <clarkb> mostly in mariadb startup costs
19:30:19 <corvus> yeah... given we're not doing a release or anything, i'd say roll the dice :)
19:30:29 <fungi> i'm willing to do it early or late in my hours, or on a weekend, to minimize impact
19:30:29 <clarkb> wfm thanks
19:30:47 <corvus> (i mean, technically, this could happen any time if a hypervisor hiccups)
19:30:58 <clarkb> good point
19:31:09 <fungi> you make a really good point, we still have making the db ha as an outstanding task
19:31:28 <fungi> and we've considered the risk low
19:31:44 <fungi> so maybe just ~whenever (within reason)
19:32:21 <fungi> we can probably knock it out later this week in that case
19:32:28 <clarkb> ++
19:32:29 <corvus> ++
19:32:45 <fungi> any other points that bear raising on this?
19:32:53 <clarkb> not from me
19:33:17 <fungi> #topic Running certcheck on bridge (fungi)
19:33:33 <fungi> it said "clarkb" on the agenda but it's really me at this point
19:33:40 <fungi> and i haven't gotten to it yet, but this is a good reminder
19:34:13 <fungi> i don't really have anything to add, other than to note that i'm holding up one of the things we could move off the old cacti server
19:34:25 <fungi> and i should really find a few minutes to get to it
19:34:37 <fungi> #topic Service Coordinator Election (clarkb)
19:34:45 <fungi> congratulations! no, wait, too soon
19:34:47 <clarkb> This is a reminder that the nomination period opens toady
19:34:51 <clarkb> *today even
19:35:13 <clarkb> I'm happy to answer questions if there is interst in someone else running
19:35:34 <fungi> as am i, and all our previous leaders
19:35:50 <fungi> (i'm speaking on their behalf. we'll get mordred back here yet)
19:36:32 <clarkb> ha
19:36:57 <fungi> it's totally rewarding, and nothing like whitewashing this here picket fence
19:37:42 <fungi> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/NGS2APEFQB45OCJCQ645P5N6XCH52BXW/ February 2025 OpenDev Service Coordinator Election
19:38:08 <fungi> #topic Working through our TODO list (clarkb)
19:38:28 <clarkb> And this is a reminder that we have a todo list that came out of meetup last month
19:38:50 <clarkb> I've been trying to use it to inform what I poke at (the first few items are directly related to launch node updeates and grafana server replacement)
19:38:58 <fungi> #link https://etherpad.opendev.org/p/r.cb73d0388959699f27a517446dabaa71 2025q1 meetup notes
19:39:05 <clarkb> if you're lacking things to do feel free to take a look there and dive in
19:39:39 <fungi> an excellent reminder
19:39:55 <fungi> any specific items you want to call out as priorities from there?
19:40:01 <clarkb> not really
19:40:16 <fungi> let's get cracking!
19:40:29 <fungi> #topic Open discussion
19:41:01 <fungi> freeform poetry is welcome, or whatever you feel appropriate
19:42:00 <corvus> vogon poetry?
19:42:06 <clarkb> I'm going to step out now and try to get this headache under control. thanks everyone!@
19:42:11 <fungi> thy micturations are to me...
19:42:27 <fungi> feel better!
19:42:43 <fungi> these services aren't going to coordinate themselves, after all
19:42:50 <corvus> ++
19:43:26 <fungi> in that case, enjoy the remaining 15-20 minutes for your preferred pasttimes
19:43:34 <fungi> thanks everyone!
19:43:55 <fungi> #endmeeting