#opendev-meeting log

19:00:18 <clarkb> #startmeeting infra
19:00:18 <opendevmeet> Meeting started Tue Aug 12 19:00:18 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:18 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:18 <opendevmeet> The meeting name has been set to 'infra'
19:00:24 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/B2OLLYGACZDURRJWKZSHE34JVQ5HT5RY/ Our Agenda
19:00:28 <clarkb> #topic Announcements
19:00:38 <clarkb> I didn't have anything to announce. Did anyone else?
19:01:41 <fungi> not i
19:02:17 <clarkb> ok I'll probably keep things rolling today as we have a lot of agenda items. I believe we should be able to get through them all in our hour but want to do my best to ensure we do
19:02:21 <clarkb> #topic Gerrit 3.11 Upgrade Planning
19:02:43 <clarkb> I don't have anything new on this item. Its perpetually on my list of todos but requires I be able to sit down for a good chunk of time and not get distracted and that hasn't happened :/
19:02:59 <clarkb> Were there any new questions, comments, or concerns on this subject?
19:03:30 <fungi> i didn't have any
19:04:20 <clarkb> #topic Upgrading old servers
19:04:28 <clarkb> This topic does have updates largely due to the efforts of fungi
19:04:38 <clarkb> fungi: want to fill us in on the refstack and eavesdrop01 changes?
19:04:43 * fungi deletes things
19:05:11 <fungi> eavesdrop was a pretty straightforward server replacement with a cinder volume move
19:05:21 <fungi> old server has now been cleaned up along with its dns records
19:05:53 <fungi> refstack server has been imaged for archival and deleted, dns records for that cleaned up as well as our deployment automation and documentation purged
19:06:14 <fungi> the refstack service sunsetting was announced to the openstack-discuss ml in case anyone asks
19:06:34 <fungi> since it was an openstack-specific service
19:07:18 <fungi> any questions? presumably no
19:07:20 <clarkb> thank you for taking care of that. These updates are important for two reasons. The first is it concludes the upgrades for the "easy" list of servers that need upgrading. And second this means all of our python containers are running on Ubuntu Noble now which means we can potentially make changes to our base container image locations (whcih is a topic for later)
19:07:32 <clarkb> The leftovers are AFS servers, kerberos servers, graphite, and backup servers
19:07:39 <clarkb> #link https://etherpad.opendev.org/p/opendev-server-replacement-sprint
19:07:42 <clarkb> I updated the lists on ^
19:08:02 <clarkb> Of those servers I think afs and kerberos servers may want to be done in place rather than with replacements
19:08:31 <clarkb> due to dns record management for those services as well as the size of the data its just easier this way I suspect. And we have redundancy that allows us to step through it one at a time
19:08:35 <fungi> i can probably arrange to drive that if it's the plan, i'm rather well-versed at in-place debuntu upgrades
19:08:50 <clarkb> then graphite and backup servers would follow our normal new server replacement process
19:08:53 <clarkb> fungi: that would be great
19:09:05 <clarkb> I suspect of those in place upgrades the kerberos servers are easiest
19:09:13 <corvus> i agree that in-place replacement sounds best
19:09:13 <clarkb> then the afs db servers then the afs fileservers
19:09:20 <fungi> afs servers probably won't be that hard either
19:09:47 <fungi> especially since we build our own packages
19:10:02 <fungi> so the openafs versions aren't really changing much, if at all
19:10:06 <clarkb> I'm happy to help as well
19:10:32 <clarkb> we can sync up outside of the meeting and start sketching out some plans (and not even necessarily today just some point soon hopefully)
19:10:39 <fungi> yep, sgtm
19:10:42 <clarkb> any other inputs on this subject?
19:11:41 <clarkb> #topic Matrix for OpenDev comms
19:11:47 <clarkb> #link https://review.opendev.org/c/opendev/infra-specs/+/954826 Spec outlining the motivation and plan for Matrix trialing
19:12:13 <clarkb> I pushed a new patchset yseterday to address the reviews I got. Thank you for the feedback. Looks like ianw has some followup feedback I should probably go ahead and address today as well
19:12:25 <clarkb> then once that new patchset is up followup reviews would be appreciated
19:12:48 <clarkb> I think we can keep conversation in the review I just wanted to call out there was an update and there should be another one soon
19:13:05 <clarkb> #topic Working through our TODO list
19:13:10 <clarkb> #link https://etherpad.opendev.org/p/opendev-running-todo-list
19:13:10 <corvus> did you find out whether ems will run molnir?
19:13:23 <clarkb> #udno
19:13:25 <clarkb> #undo
19:13:25 <opendevmeet> Removing item from minutes: #link https://etherpad.opendev.org/p/opendev-running-todo-list
19:13:28 <clarkb> #undo
19:13:29 <opendevmeet> Removing item from minutes: #topic Working through our TODO list
19:13:47 <clarkb> corvus: I couldn't find anything in their docs indicating they do but didn't login to check the toggles in the dashboard
19:14:05 <clarkb> corvus: but reading up on mjolnir it seems very straightforward to run like one of our other bots so that was what I proposed in the spec
19:14:28 <corvus> ok.  we should probably check the dashboard, but ack, agree we can just run it if we need to
19:14:39 <clarkb> its basically a container with a filesystem bind mount for data storage and a config file configuring its management room (where anyone in that room gets to control the bot)
19:15:00 <corvus> ++
19:15:12 <corvus> (that's all i had on this topic; thx)
19:15:15 <clarkb> #topic Working through our TODO list
19:15:20 <clarkb> #link https://etherpad.opendev.org/p/opendev-running-todo-list
19:15:37 <clarkb> This is a reminder that if you get bored and want to pick up something new starting with this list is a good start
19:16:01 <clarkb> I think I'll drop this off of the meeting agenda after this meeting though as the list has a proper location that should be discoverable at this point so I don't think manual reminders in the meeting are as valuable
19:16:26 <clarkb> feel free to update that list. The idea is that its informal and somewhere we can capture the very beginnings of efforts before they become specs or changes etc
19:17:11 <clarkb> #topic Pre PTG Planning
19:17:19 <clarkb> #link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document
19:17:29 <clarkb> Proposed times: Tuesday October 7 1800-2000 UTC, Wednesday October 8 1500-1700 UTC, Thursday October 9 1500-1700
19:18:03 <clarkb> I think fungi and I actually have a conflict on the 7th from 1800-1900 UTC but we should be able to reschedule that conflict. And this way I think we can avoid having multiple meetings that day and just replace our meeting with the pre ptg
19:18:33 <fungi> right, i can be flexible
19:18:33 <clarkb> Please leave feedback on the proposed times as well as the topics (and/or add your own topic ideas)
19:19:09 <clarkb> On my own I was able to come up with a good set of ideas which helps validate that this is a useful thing to do
19:19:16 <tonyb> I'm still planning to be in MN for the pre-PTG
19:19:43 <clarkb> tonyb: those time blocks should be CDT friendly. But let me know if they aren't for some reason
19:19:50 <fungi> clarkb: the meeting you're thinking of isn't on the 7th from what y calendary claims
19:19:56 <clarkb> fungi: oh good
19:20:01 <tonyb> They seem good to me :)
19:20:17 <fungi> that's the off-week for the meeting
19:20:41 <clarkb> and if we end up not needing all three blocks of time we can cancel and/or stop early
19:21:01 <clarkb> I just want to have a reasonable amount of time available to us upfront so that we have the opportunity to use it if necessary
19:21:24 <clarkb> and ya please update the topics list with your own ideas
19:22:12 <clarkb> #topic Service Coordinator Election Planning
19:22:18 <clarkb> Nomination Period open from August 5, 2025 to August 19, 2025. If necessary we will hold an election from August 20, 2025 to August 27, 2025. All date ranges and times will be in UTC.
19:22:24 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YXRD23ZWJGDPZ3WESBNZNEYO7NBCXFT4/
19:22:39 <clarkb> This is your "one week remaining" warning notice
19:22:55 <clarkb> I'm more than happy to support someone in this role if there is interest. Let me know if there are any questions about what is involved
19:23:39 <fungi> also similarly happy to help support someone in that role who isn't me ;)
19:24:01 <clarkb> #topic Loss of upstream Debian bullseye-backports mirror
19:24:20 <clarkb> the workaround to have reprepro ignore errors landed so that we'll continue to update debian packages that do exist
19:25:09 <clarkb> but now we need to make a plan for the longer term. I think what I'd like to see if have opendev base jobs stop configuring backports by default on debian to match the upstream behavior. Then we can delete the bullseye backports content and the xenial and bionic-arm64 conent
19:25:36 <clarkb> I think the process for all three of those is basically identical so bundling them all up and oing it at once helps avoid unncessary context switching
19:25:49 <fungi> the only question i have is whether it makes sense to announce and adjust in zuul/zuul-jobs instead
19:26:08 <fungi> because if opendev has this problem, other users of that role also may
19:26:37 <clarkb> that is a good point. I think if we do this for bullseye alone we can avoid big announcements as upstream has forced our hand. But if we do it for all of debian then we should announce
19:26:38 <fungi> but as corvus pointed out last week, that role is also on the road to deprecation if someone gets time to replace it with the new design
19:26:59 <clarkb> (and we should announce whether we do it for opendev or zuul or both)
19:27:21 <clarkb> my hunch here is that relatively little stuff is relying on backports since it is a non default opt in choice
19:27:29 <fungi> yeah, i have no problem announcing whatever we do wherever we do it, just wondering if diverging from the zuul default makes sense in this case
19:27:31 <clarkb> so blast radius should be small and we should rip the bandaid off
19:28:10 <fungi> or if freezing the zuul stdlib implementation in order to reduce churn for anyone working on a replacement is better
19:28:28 <clarkb> ya probably comes down to corvus' preference for the zuul community
19:28:43 <clarkb> if corvus thinks we should fix this at the root in zuul-jobs we'll do that otherwise we'll modify things in opendev only
19:29:09 <corvus> i don't think anyone's working on a replacement... and i don't have strong feelings about it...
19:29:40 <corvus> i'm not 100% caught up on the issue
19:29:45 <fungi> there's also not a ton of urgency from our perspective, the workaround is functional but we'd like to be able to free up the space
19:29:59 <corvus> so i'd say: if it makes sense for everyone to change, then go ahead and do it in zuul jobs... otherwise meh
19:30:17 <clarkb> I think the main reason it applies broadly is it helps mimic upstream debian behavior by default better
19:30:26 <clarkb> which leads to fewer surprises as people are testing software
19:30:37 <corvus> sounds reasonable :)
19:30:49 <fungi> corvus: essentially upstream debian deleted the backports suite for bullseye a few weeks ago. the configure-mirrors role defaults to enabling the backports suite on debian nodes, so if we delete it from our mirrors then any bullseye jobs are going to fail unless they're overridden to disable backports in that role
19:31:42 <clarkb> sounds like we should make the change in zuul-jobs and announce it then go from there?
19:31:57 <fungi> and yeah, disabling backports by default instead of enabling would be an easy switch in configure-mirrors, and would be closer to normal debian installs these days, but is a clear behavior change
19:32:25 <fungi> clarkb: probably the reverse. announce and *then* switch
19:32:32 <clarkb> er yes
19:32:37 <clarkb> we want to announce it before we break things
19:32:41 <fungi> agreed
19:32:48 <fungi> i ca ntake care of that
19:32:53 <clarkb> thanks!
19:32:58 <clarkb> #topic Adding Ansible 11 to Zuul
19:33:07 <clarkb> Next up in potentially breaking changes for jobs run by zuul :)
19:33:20 <corvus> ha, but hopefully not!
19:33:22 <clarkb> we switched the zuul and opendev tenants to ansible 11 last week and as far as I can tell things seem to be working
19:33:35 <clarkb> zuul in particular has been running a good number of jobs
19:33:54 <corvus> i haven't heard of any issues.  should we switch the rest of the tenants?  if so when?
19:34:32 <clarkb> I think we should switch the other tenants. I think we can send an announcement ~today for a switch early next week? That way we can communicate to people the change is happening and they can override the version back to 9 if things break
19:34:42 <clarkb> its also probably just early enough in the openstack release cycle that we can work through problems
19:34:59 <fungi> yeah, much longer and we're getting close to the rush/freeze
19:35:23 <clarkb> if we didn't have the fallback to 9 in job overrides I would worry more. but we do have that fallback so I think we should proceed
19:35:32 <corvus> sounds good
19:35:49 <clarkb> corvus: do you want to send that announcement or should I?
19:36:33 <corvus> clarkb: if you don't mind, that'd be great
19:37:00 <corvus> https://paste.opendev.org/show/bu8VuxDAKYEzai5Caik7/
19:37:02 <clarkb> ack /me scribbles a note
19:37:12 <corvus> i did just find an old copy of a message from you on the subject ^
19:37:31 <clarkb> I should be able to do that after lunhc today
19:37:39 <corvus> #link old message https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/A7SOKNMQLZXZSTXUI4UOIIGOQHIQSMZ6/
19:37:45 <corvus> (yay archived-at header)
19:37:56 <clarkb> #topic Etherpad 2.4.2 Upgrade
19:38:02 <clarkb> #link https://github.com/ether/etherpad-lite/blob/v2.4.2/CHANGELOG.md
19:38:31 <clarkb> There are some etherpad updates. Nothing critical according to the release notes but I've been doing my best to keep us up to date on services like this
19:38:44 <clarkb> Problem is the new versions break every skin except for colibris
19:38:52 <clarkb> #link https://github.com/ether/etherpad-lite/issues/7065 Theming updates have broken the no-skin and custom skin themes. We use no-skin.
19:39:05 <clarkb> someone else followed up on the issue I filed to point out it breaks custom skins too
19:39:18 <clarkb> Unfortunately, no feedback from the maintainers yet
19:39:52 <clarkb> I think our options are either wait more and see if they fix it. Or consider using colibris. As mentioned this doesn't appear to be urgent but if you have 10 minutes it might be a good idea to check out colibris on the held node
19:40:06 <clarkb> 158.69.64.117 is the ip address to update in /etc/hosts for etherpad.opendev.org to do that
19:40:40 <clarkb> the biggest difference I find is that colibris mimics the sheet of paper appraoch used by google docs
19:41:08 <clarkb> this narrows the useable space of the pads. Not necessarily always a good or bad thing. Depends on context
19:41:34 <fungi> i personally find it unnecessarily wastes screen space
19:41:36 <clarkb> if you do check it out and have thoughts on whether or not it would be problematic for us let me know. If we think it is a valid switch then we can proceed that way
19:42:38 <clarkb> fungi: ya I feel like it may also imply that these are less ether pads and more perma pads
19:42:40 <corvus> yeah that's kind of silly, but, probably not a showstopper if we have to switch
19:42:44 <fungi> which isn't a show-stopper but feels like a regression
19:42:51 <fungi> yes, agreed
19:43:48 <clarkb> probably the thing to do is wait for now and see if they respond to my issue. Then if there are any important reasons to update sooner (security updates major bug fixes) we can switch to colibris and do our best
19:44:10 <corvus> who needs more than 87 columns anyway?  /s
19:44:57 <clarkb> #topic Cleaning Up the Old voip.ms Account
19:45:14 <clarkb> yesterday fungi and I were made aware that the voip.ms account is making small infrequent charges
19:45:26 <clarkb> since we dropped asterisk and switched to meetpad we haven't been using this service
19:46:02 <clarkb> before we formally shutdown the account/cancel the service I wanted to double check there wasn't a reason to avoid doing so. I think if we decide to tie POTS into meetpad via SIP we can always resurrect the account or create a new one at that point
19:46:43 <clarkb> oh hey etehrpad literally just responded wtih a question. I'll followup after the meeting
19:46:50 <corvus> i imagine the chargers are for maintaining the DID (phone number); part of shutting it down will be releasing that
19:47:11 <clarkb> corvus: yes that was what I thought. fungi thought it could also be spam calls
19:47:19 <corvus> so as long as we don't feel super attached to that number, then that should be fine.
19:47:23 <clarkb> I think giving up the number is also fine. Using a new number shouldn't be a big impact
19:47:59 <corvus> not sure what the plan was, but i wouldn't expect non-terminated incoming calls to be charged
19:48:07 <clarkb> ack
19:48:19 <fungi> yeah, so probably not minutes getting eroded by random call-ups then
19:48:37 <corvus> last i looked it was something like $2/mo to maintain a did
19:48:48 <corvus> (and last i looked was 4 years ago)
19:49:24 <clarkb> to summarize we would give up the existing phone number and do whatever the equivalent of shutting down the voip.ms account is (not sure what options they give us there)
19:49:49 <clarkb> and I'm not hearing any objections to that plan. Wont' get to that until at least tomorrow so if there are objections feel free to raise them after the meeting too
19:49:59 <corvus> sgctm
19:50:01 <corvus> sgtm
19:50:11 <fungi> wfm too
19:50:24 <clarkb> #topic Moving OpenDev's python-base/python-builder/uwsgi-base Images to Quay
19:50:43 <clarkb> As mentioned earlier in the meeting all of our python based container images (that we build ourselves) are running on Ubuntu Noble now
19:51:03 <clarkb> that means we should be good to move the python base images to quay.io then update all of the consumers of the images to pull them from there
19:51:21 <clarkb> and we shouldn't lose speculative image testing. However, there are a couple of caveats/concerns with that
19:52:20 <clarkb> The first is that zuul and vexxhost also use these images so not sure how much communication we feel is necessary to ensure they don't accidentally use orphaned images. Then these images are primarily used in the image build process not the container deployment and testing process. That means that the existing changes to noble to use podman for deployments may not be
19:52:22 <clarkb> sufficient to ensure specualtive testing continues to work
19:52:59 <clarkb> Docker build via buildx/buildkit does support alternative mirrors and is enabled by default in modern docker. But we may need extra configuration to ensure the mirrors are set?
19:53:18 <clarkb> alternatively we could switch to building with podman/buildah and I think that should just work now after we update all the jobs
19:53:48 <clarkb> so I guess I'm asking if we think we need a deprecation announcement period before switching and if we should continue building with docker or switch to buildah
19:54:46 <corvus> i think we should look into continuing to build with docker buildx/buildkit -- so looking into what we need to do to make the mirrors used there, if anything.
19:54:49 <fungi> i suppose uploading to both registries for a transitional period is a no-go (there was almost certainly a reason we didn't do it before now if not)
19:55:21 <clarkb> fungi: it is possible. I just want to avoid doing that long term if possible since that would continue to make us vulnerable to docker rate limits
19:55:34 <clarkb> corvus: ok I can dig into that a bit more to understand that better
19:55:40 <corvus> (i'm less confident in podman build and buildah; but not opposed if it works)
19:56:03 <corvus> i think a note to service-discuss should be fine.  consider the zuul community notified already.  :)
19:56:19 <clarkb> ack thanks
19:56:34 <clarkb> so step 0 is become better informed on the docker build behavior and then make an announcement and plan the swithc
19:56:40 <clarkb> I'll work on that
19:56:45 <clarkb> #topic Open Discussion
19:56:54 <clarkb> We have a few minutes left. Anything we missed that is worth covering?
19:57:00 <corvus> and test between steps 0 and 1 to make sure the mirror stuff is right
19:57:25 <clarkb> ++
19:57:32 <corvus> it's not terrible if we have to roll back, but would be nice if we could avoid that and break our streak :)
19:58:01 <corvus> 1 quick thing
19:58:40 <corvus> there's an upcoming syntax change for zuul-launcher; i have the change to zuul-providers prepped: https://review.opendev.org/956946
19:59:06 <corvus> that's reviewed, and so is the zuul change; when the zuul change merges, i'll handle restarting zuul with it, then merging the zuul-providers change
19:59:43 <corvus> just wanted to let folks know that's upcoming
20:00:09 <clarkb> will test node boots break during the time between restarting launcher and updating zuul-providers?
20:00:18 <clarkb> and if so does that imply we'll need to force merge that update?
20:00:36 <corvus> (during this unstable development period for niz, we might have the occasional backwards-incompat change like that -- basically, it's two commits in zuul so it's just long enough for opendev to make the change without breaking).
20:01:02 <corvus> clarkb: nope ^ 3 changes total: add new syntax, update zuul-providers, remove old.  so i'll sequence everything so it's never broken for opendev
20:01:07 <clarkb> got it
20:01:55 <clarkb> we're over tiem. Thank you everyone! We'll be back next week at the same time and location
20:01:55 <corvus> [eot]
20:01:59 <corvus> thanks!
20:02:00 <clarkb> #endmeeting