19:00:18 <clarkb> #startmeeting infra 19:00:18 <opendevmeet> Meeting started Tue Aug 12 19:00:18 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:18 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:18 <opendevmeet> The meeting name has been set to 'infra' 19:00:24 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/B2OLLYGACZDURRJWKZSHE34JVQ5HT5RY/ Our Agenda 19:00:28 <clarkb> #topic Announcements 19:00:38 <clarkb> I didn't have anything to announce. Did anyone else? 19:01:41 <fungi> not i 19:02:17 <clarkb> ok I'll probably keep things rolling today as we have a lot of agenda items. I believe we should be able to get through them all in our hour but want to do my best to ensure we do 19:02:21 <clarkb> #topic Gerrit 3.11 Upgrade Planning 19:02:43 <clarkb> I don't have anything new on this item. Its perpetually on my list of todos but requires I be able to sit down for a good chunk of time and not get distracted and that hasn't happened :/ 19:02:59 <clarkb> Were there any new questions, comments, or concerns on this subject? 19:03:30 <fungi> i didn't have any 19:04:20 <clarkb> #topic Upgrading old servers 19:04:28 <clarkb> This topic does have updates largely due to the efforts of fungi 19:04:38 <clarkb> fungi: want to fill us in on the refstack and eavesdrop01 changes? 19:04:43 * fungi deletes things 19:05:11 <fungi> eavesdrop was a pretty straightforward server replacement with a cinder volume move 19:05:21 <fungi> old server has now been cleaned up along with its dns records 19:05:53 <fungi> refstack server has been imaged for archival and deleted, dns records for that cleaned up as well as our deployment automation and documentation purged 19:06:14 <fungi> the refstack service sunsetting was announced to the openstack-discuss ml in case anyone asks 19:06:34 <fungi> since it was an openstack-specific service 19:07:18 <fungi> any questions? presumably no 19:07:20 <clarkb> thank you for taking care of that. These updates are important for two reasons. The first is it concludes the upgrades for the "easy" list of servers that need upgrading. And second this means all of our python containers are running on Ubuntu Noble now which means we can potentially make changes to our base container image locations (whcih is a topic for later) 19:07:32 <clarkb> The leftovers are AFS servers, kerberos servers, graphite, and backup servers 19:07:39 <clarkb> #link https://etherpad.opendev.org/p/opendev-server-replacement-sprint 19:07:42 <clarkb> I updated the lists on ^ 19:08:02 <clarkb> Of those servers I think afs and kerberos servers may want to be done in place rather than with replacements 19:08:31 <clarkb> due to dns record management for those services as well as the size of the data its just easier this way I suspect. And we have redundancy that allows us to step through it one at a time 19:08:35 <fungi> i can probably arrange to drive that if it's the plan, i'm rather well-versed at in-place debuntu upgrades 19:08:50 <clarkb> then graphite and backup servers would follow our normal new server replacement process 19:08:53 <clarkb> fungi: that would be great 19:09:05 <clarkb> I suspect of those in place upgrades the kerberos servers are easiest 19:09:13 <corvus> i agree that in-place replacement sounds best 19:09:13 <clarkb> then the afs db servers then the afs fileservers 19:09:20 <fungi> afs servers probably won't be that hard either 19:09:47 <fungi> especially since we build our own packages 19:10:02 <fungi> so the openafs versions aren't really changing much, if at all 19:10:06 <clarkb> I'm happy to help as well 19:10:32 <clarkb> we can sync up outside of the meeting and start sketching out some plans (and not even necessarily today just some point soon hopefully) 19:10:39 <fungi> yep, sgtm 19:10:42 <clarkb> any other inputs on this subject? 19:11:41 <clarkb> #topic Matrix for OpenDev comms 19:11:47 <clarkb> #link https://review.opendev.org/c/opendev/infra-specs/+/954826 Spec outlining the motivation and plan for Matrix trialing 19:12:13 <clarkb> I pushed a new patchset yseterday to address the reviews I got. Thank you for the feedback. Looks like ianw has some followup feedback I should probably go ahead and address today as well 19:12:25 <clarkb> then once that new patchset is up followup reviews would be appreciated 19:12:48 <clarkb> I think we can keep conversation in the review I just wanted to call out there was an update and there should be another one soon 19:13:05 <clarkb> #topic Working through our TODO list 19:13:10 <clarkb> #link https://etherpad.opendev.org/p/opendev-running-todo-list 19:13:10 <corvus> did you find out whether ems will run molnir? 19:13:23 <clarkb> #udno 19:13:25 <clarkb> #undo 19:13:25 <opendevmeet> Removing item from minutes: #link https://etherpad.opendev.org/p/opendev-running-todo-list 19:13:28 <clarkb> #undo 19:13:29 <opendevmeet> Removing item from minutes: #topic Working through our TODO list 19:13:47 <clarkb> corvus: I couldn't find anything in their docs indicating they do but didn't login to check the toggles in the dashboard 19:14:05 <clarkb> corvus: but reading up on mjolnir it seems very straightforward to run like one of our other bots so that was what I proposed in the spec 19:14:28 <corvus> ok. we should probably check the dashboard, but ack, agree we can just run it if we need to 19:14:39 <clarkb> its basically a container with a filesystem bind mount for data storage and a config file configuring its management room (where anyone in that room gets to control the bot) 19:15:00 <corvus> ++ 19:15:12 <corvus> (that's all i had on this topic; thx) 19:15:15 <clarkb> #topic Working through our TODO list 19:15:20 <clarkb> #link https://etherpad.opendev.org/p/opendev-running-todo-list 19:15:37 <clarkb> This is a reminder that if you get bored and want to pick up something new starting with this list is a good start 19:16:01 <clarkb> I think I'll drop this off of the meeting agenda after this meeting though as the list has a proper location that should be discoverable at this point so I don't think manual reminders in the meeting are as valuable 19:16:26 <clarkb> feel free to update that list. The idea is that its informal and somewhere we can capture the very beginnings of efforts before they become specs or changes etc 19:17:11 <clarkb> #topic Pre PTG Planning 19:17:19 <clarkb> #link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document 19:17:29 <clarkb> Proposed times: Tuesday October 7 1800-2000 UTC, Wednesday October 8 1500-1700 UTC, Thursday October 9 1500-1700 19:18:03 <clarkb> I think fungi and I actually have a conflict on the 7th from 1800-1900 UTC but we should be able to reschedule that conflict. And this way I think we can avoid having multiple meetings that day and just replace our meeting with the pre ptg 19:18:33 <fungi> right, i can be flexible 19:18:33 <clarkb> Please leave feedback on the proposed times as well as the topics (and/or add your own topic ideas) 19:19:09 <clarkb> On my own I was able to come up with a good set of ideas which helps validate that this is a useful thing to do 19:19:16 <tonyb> I'm still planning to be in MN for the pre-PTG 19:19:43 <clarkb> tonyb: those time blocks should be CDT friendly. But let me know if they aren't for some reason 19:19:50 <fungi> clarkb: the meeting you're thinking of isn't on the 7th from what y calendary claims 19:19:56 <clarkb> fungi: oh good 19:20:01 <tonyb> They seem good to me :) 19:20:17 <fungi> that's the off-week for the meeting 19:20:41 <clarkb> and if we end up not needing all three blocks of time we can cancel and/or stop early 19:21:01 <clarkb> I just want to have a reasonable amount of time available to us upfront so that we have the opportunity to use it if necessary 19:21:24 <clarkb> and ya please update the topics list with your own ideas 19:22:12 <clarkb> #topic Service Coordinator Election Planning 19:22:18 <clarkb> Nomination Period open from August 5, 2025 to August 19, 2025. If necessary we will hold an election from August 20, 2025 to August 27, 2025. All date ranges and times will be in UTC. 19:22:24 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YXRD23ZWJGDPZ3WESBNZNEYO7NBCXFT4/ 19:22:39 <clarkb> This is your "one week remaining" warning notice 19:22:55 <clarkb> I'm more than happy to support someone in this role if there is interest. Let me know if there are any questions about what is involved 19:23:39 <fungi> also similarly happy to help support someone in that role who isn't me ;) 19:24:01 <clarkb> #topic Loss of upstream Debian bullseye-backports mirror 19:24:20 <clarkb> the workaround to have reprepro ignore errors landed so that we'll continue to update debian packages that do exist 19:25:09 <clarkb> but now we need to make a plan for the longer term. I think what I'd like to see if have opendev base jobs stop configuring backports by default on debian to match the upstream behavior. Then we can delete the bullseye backports content and the xenial and bionic-arm64 conent 19:25:36 <clarkb> I think the process for all three of those is basically identical so bundling them all up and oing it at once helps avoid unncessary context switching 19:25:49 <fungi> the only question i have is whether it makes sense to announce and adjust in zuul/zuul-jobs instead 19:26:08 <fungi> because if opendev has this problem, other users of that role also may 19:26:37 <clarkb> that is a good point. I think if we do this for bullseye alone we can avoid big announcements as upstream has forced our hand. But if we do it for all of debian then we should announce 19:26:38 <fungi> but as corvus pointed out last week, that role is also on the road to deprecation if someone gets time to replace it with the new design 19:26:59 <clarkb> (and we should announce whether we do it for opendev or zuul or both) 19:27:21 <clarkb> my hunch here is that relatively little stuff is relying on backports since it is a non default opt in choice 19:27:29 <fungi> yeah, i have no problem announcing whatever we do wherever we do it, just wondering if diverging from the zuul default makes sense in this case 19:27:31 <clarkb> so blast radius should be small and we should rip the bandaid off 19:28:10 <fungi> or if freezing the zuul stdlib implementation in order to reduce churn for anyone working on a replacement is better 19:28:28 <clarkb> ya probably comes down to corvus' preference for the zuul community 19:28:43 <clarkb> if corvus thinks we should fix this at the root in zuul-jobs we'll do that otherwise we'll modify things in opendev only 19:29:09 <corvus> i don't think anyone's working on a replacement... and i don't have strong feelings about it... 19:29:40 <corvus> i'm not 100% caught up on the issue 19:29:45 <fungi> there's also not a ton of urgency from our perspective, the workaround is functional but we'd like to be able to free up the space 19:29:59 <corvus> so i'd say: if it makes sense for everyone to change, then go ahead and do it in zuul jobs... otherwise meh 19:30:17 <clarkb> I think the main reason it applies broadly is it helps mimic upstream debian behavior by default better 19:30:26 <clarkb> which leads to fewer surprises as people are testing software 19:30:37 <corvus> sounds reasonable :) 19:30:49 <fungi> corvus: essentially upstream debian deleted the backports suite for bullseye a few weeks ago. the configure-mirrors role defaults to enabling the backports suite on debian nodes, so if we delete it from our mirrors then any bullseye jobs are going to fail unless they're overridden to disable backports in that role 19:31:42 <clarkb> sounds like we should make the change in zuul-jobs and announce it then go from there? 19:31:57 <fungi> and yeah, disabling backports by default instead of enabling would be an easy switch in configure-mirrors, and would be closer to normal debian installs these days, but is a clear behavior change 19:32:25 <fungi> clarkb: probably the reverse. announce and *then* switch 19:32:32 <clarkb> er yes 19:32:37 <clarkb> we want to announce it before we break things 19:32:41 <fungi> agreed 19:32:48 <fungi> i ca ntake care of that 19:32:53 <clarkb> thanks! 19:32:58 <clarkb> #topic Adding Ansible 11 to Zuul 19:33:07 <clarkb> Next up in potentially breaking changes for jobs run by zuul :) 19:33:20 <corvus> ha, but hopefully not! 19:33:22 <clarkb> we switched the zuul and opendev tenants to ansible 11 last week and as far as I can tell things seem to be working 19:33:35 <clarkb> zuul in particular has been running a good number of jobs 19:33:54 <corvus> i haven't heard of any issues. should we switch the rest of the tenants? if so when? 19:34:32 <clarkb> I think we should switch the other tenants. I think we can send an announcement ~today for a switch early next week? That way we can communicate to people the change is happening and they can override the version back to 9 if things break 19:34:42 <clarkb> its also probably just early enough in the openstack release cycle that we can work through problems 19:34:59 <fungi> yeah, much longer and we're getting close to the rush/freeze 19:35:23 <clarkb> if we didn't have the fallback to 9 in job overrides I would worry more. but we do have that fallback so I think we should proceed 19:35:32 <corvus> sounds good 19:35:49 <clarkb> corvus: do you want to send that announcement or should I? 19:36:33 <corvus> clarkb: if you don't mind, that'd be great 19:37:00 <corvus> https://paste.opendev.org/show/bu8VuxDAKYEzai5Caik7/ 19:37:02 <clarkb> ack /me scribbles a note 19:37:12 <corvus> i did just find an old copy of a message from you on the subject ^ 19:37:31 <clarkb> I should be able to do that after lunhc today 19:37:39 <corvus> #link old message https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/A7SOKNMQLZXZSTXUI4UOIIGOQHIQSMZ6/ 19:37:45 <corvus> (yay archived-at header) 19:37:56 <clarkb> #topic Etherpad 2.4.2 Upgrade 19:38:02 <clarkb> #link https://github.com/ether/etherpad-lite/blob/v2.4.2/CHANGELOG.md 19:38:31 <clarkb> There are some etherpad updates. Nothing critical according to the release notes but I've been doing my best to keep us up to date on services like this 19:38:44 <clarkb> Problem is the new versions break every skin except for colibris 19:38:52 <clarkb> #link https://github.com/ether/etherpad-lite/issues/7065 Theming updates have broken the no-skin and custom skin themes. We use no-skin. 19:39:05 <clarkb> someone else followed up on the issue I filed to point out it breaks custom skins too 19:39:18 <clarkb> Unfortunately, no feedback from the maintainers yet 19:39:52 <clarkb> I think our options are either wait more and see if they fix it. Or consider using colibris. As mentioned this doesn't appear to be urgent but if you have 10 minutes it might be a good idea to check out colibris on the held node 19:40:06 <clarkb> 158.69.64.117 is the ip address to update in /etc/hosts for etherpad.opendev.org to do that 19:40:40 <clarkb> the biggest difference I find is that colibris mimics the sheet of paper appraoch used by google docs 19:41:08 <clarkb> this narrows the useable space of the pads. Not necessarily always a good or bad thing. Depends on context 19:41:34 <fungi> i personally find it unnecessarily wastes screen space 19:41:36 <clarkb> if you do check it out and have thoughts on whether or not it would be problematic for us let me know. If we think it is a valid switch then we can proceed that way 19:42:38 <clarkb> fungi: ya I feel like it may also imply that these are less ether pads and more perma pads 19:42:40 <corvus> yeah that's kind of silly, but, probably not a showstopper if we have to switch 19:42:44 <fungi> which isn't a show-stopper but feels like a regression 19:42:51 <fungi> yes, agreed 19:43:48 <clarkb> probably the thing to do is wait for now and see if they respond to my issue. Then if there are any important reasons to update sooner (security updates major bug fixes) we can switch to colibris and do our best 19:44:10 <corvus> who needs more than 87 columns anyway? /s 19:44:57 <clarkb> #topic Cleaning Up the Old voip.ms Account 19:45:14 <clarkb> yesterday fungi and I were made aware that the voip.ms account is making small infrequent charges 19:45:26 <clarkb> since we dropped asterisk and switched to meetpad we haven't been using this service 19:46:02 <clarkb> before we formally shutdown the account/cancel the service I wanted to double check there wasn't a reason to avoid doing so. I think if we decide to tie POTS into meetpad via SIP we can always resurrect the account or create a new one at that point 19:46:43 <clarkb> oh hey etehrpad literally just responded wtih a question. I'll followup after the meeting 19:46:50 <corvus> i imagine the chargers are for maintaining the DID (phone number); part of shutting it down will be releasing that 19:47:11 <clarkb> corvus: yes that was what I thought. fungi thought it could also be spam calls 19:47:19 <corvus> so as long as we don't feel super attached to that number, then that should be fine. 19:47:23 <clarkb> I think giving up the number is also fine. Using a new number shouldn't be a big impact 19:47:59 <corvus> not sure what the plan was, but i wouldn't expect non-terminated incoming calls to be charged 19:48:07 <clarkb> ack 19:48:19 <fungi> yeah, so probably not minutes getting eroded by random call-ups then 19:48:37 <corvus> last i looked it was something like $2/mo to maintain a did 19:48:48 <corvus> (and last i looked was 4 years ago) 19:49:24 <clarkb> to summarize we would give up the existing phone number and do whatever the equivalent of shutting down the voip.ms account is (not sure what options they give us there) 19:49:49 <clarkb> and I'm not hearing any objections to that plan. Wont' get to that until at least tomorrow so if there are objections feel free to raise them after the meeting too 19:49:59 <corvus> sgctm 19:50:01 <corvus> sgtm 19:50:11 <fungi> wfm too 19:50:24 <clarkb> #topic Moving OpenDev's python-base/python-builder/uwsgi-base Images to Quay 19:50:43 <clarkb> As mentioned earlier in the meeting all of our python based container images (that we build ourselves) are running on Ubuntu Noble now 19:51:03 <clarkb> that means we should be good to move the python base images to quay.io then update all of the consumers of the images to pull them from there 19:51:21 <clarkb> and we shouldn't lose speculative image testing. However, there are a couple of caveats/concerns with that 19:52:20 <clarkb> The first is that zuul and vexxhost also use these images so not sure how much communication we feel is necessary to ensure they don't accidentally use orphaned images. Then these images are primarily used in the image build process not the container deployment and testing process. That means that the existing changes to noble to use podman for deployments may not be 19:52:22 <clarkb> sufficient to ensure specualtive testing continues to work 19:52:59 <clarkb> Docker build via buildx/buildkit does support alternative mirrors and is enabled by default in modern docker. But we may need extra configuration to ensure the mirrors are set? 19:53:18 <clarkb> alternatively we could switch to building with podman/buildah and I think that should just work now after we update all the jobs 19:53:48 <clarkb> so I guess I'm asking if we think we need a deprecation announcement period before switching and if we should continue building with docker or switch to buildah 19:54:46 <corvus> i think we should look into continuing to build with docker buildx/buildkit -- so looking into what we need to do to make the mirrors used there, if anything. 19:54:49 <fungi> i suppose uploading to both registries for a transitional period is a no-go (there was almost certainly a reason we didn't do it before now if not) 19:55:21 <clarkb> fungi: it is possible. I just want to avoid doing that long term if possible since that would continue to make us vulnerable to docker rate limits 19:55:34 <clarkb> corvus: ok I can dig into that a bit more to understand that better 19:55:40 <corvus> (i'm less confident in podman build and buildah; but not opposed if it works) 19:56:03 <corvus> i think a note to service-discuss should be fine. consider the zuul community notified already. :) 19:56:19 <clarkb> ack thanks 19:56:34 <clarkb> so step 0 is become better informed on the docker build behavior and then make an announcement and plan the swithc 19:56:40 <clarkb> I'll work on that 19:56:45 <clarkb> #topic Open Discussion 19:56:54 <clarkb> We have a few minutes left. Anything we missed that is worth covering? 19:57:00 <corvus> and test between steps 0 and 1 to make sure the mirror stuff is right 19:57:25 <clarkb> ++ 19:57:32 <corvus> it's not terrible if we have to roll back, but would be nice if we could avoid that and break our streak :) 19:58:01 <corvus> 1 quick thing 19:58:40 <corvus> there's an upcoming syntax change for zuul-launcher; i have the change to zuul-providers prepped: https://review.opendev.org/956946 19:59:06 <corvus> that's reviewed, and so is the zuul change; when the zuul change merges, i'll handle restarting zuul with it, then merging the zuul-providers change 19:59:43 <corvus> just wanted to let folks know that's upcoming 20:00:09 <clarkb> will test node boots break during the time between restarting launcher and updating zuul-providers? 20:00:18 <clarkb> and if so does that imply we'll need to force merge that update? 20:00:36 <corvus> (during this unstable development period for niz, we might have the occasional backwards-incompat change like that -- basically, it's two commits in zuul so it's just long enough for opendev to make the change without breaking). 20:01:02 <corvus> clarkb: nope ^ 3 changes total: add new syntax, update zuul-providers, remove old. so i'll sequence everything so it's never broken for opendev 20:01:07 <clarkb> got it 20:01:55 <clarkb> we're over tiem. Thank you everyone! We'll be back next week at the same time and location 20:01:55 <corvus> [eot] 20:01:59 <corvus> thanks! 20:02:00 <clarkb> #endmeeting