Tuesday, 2025-07-29

clarkb	meeting time	19:00
clarkb	#startmeeting infra	19:00
opendevmeet	Meeting started Tue Jul 29 19:00:06 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:00
opendevmeet	The meeting name has been set to 'infra'	19:00
clarkb	#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/SMACAD5RVHJU466XJGM3QKJ2GC7OT3XY/ Our Agenda	19:00
clarkb	#topic Announcements	19:00
clarkb	I'm taking next Monday off so will not be around that day	19:00
fungi	ill be around	19:01
clarkb	I've also drawn up a quick plan for service coordinator elections which we'll talk about later (but want peopel to be awre)	19:01
clarkb	Anything else to announce?	19:02
clarkb	sounds like no. Lets jump into the agenda then	19:04
clarkb	#topic Zuul-launcher	19:04
clarkb	The change to prevent mixed provider nodesets merged and deployed over the weekend. This is now only possible if it is the only way to fulfill a request (say k8s pods and opensatck vms in one nodeset)	19:05
corvus	i think all the bugfixes are in now	19:05
clarkb	yesterday we discovered that ovh bhs1 was fully utilized and that seems to have been a bug in handling failed node boots leading to leaks	19:05
clarkb	and ya corvus restarted services with the fix for ^ this morning	19:05
clarkb	the raxflex sjc3 graph looks weird still but different weird after the restart. I half suspect we maybe leaked floating ips there and we're hitting quota limits on those	19:06
clarkb	I'll check on that after the meeting and lunch	19:06
corvus	yeah, z-l doesn't detect floating ip quota yet	19:06
clarkb	then separately last week we discovered that at least part of the problem with image builds was gitea-lb02 losing its ipv6 address constantly	19:08
fungi	that was a super fun rabbit hole	19:08
clarkb	the address would work for a few minutes then disappear then return an hour later and work for a bit before going away again. We replaced it with a new noble gitea-lb03 node as there was some indication it could be a jammy bug	19:08
clarkb	specifically in systemd-networkd	19:08
clarkb	in addition to that we improved the haproxy health checks to use the gitea health api	19:09
fungi	but other servers running jammy there didn't exhibit this behavior, so odds it's a version-specific issue are slim	19:09
clarkb	ya. I've kept the gitea-lb02 old server around after cleaning up system-config and dns in case vexxhost wants to dig in further	19:10
clarkb	but probably at the end of this week we can clean the server up if we don't make progress on that	19:10
fungi	thanks!	19:10
clarkb	the last thing I've got on the agenda notes for this topic is that nodepool is gone	19:10
clarkb	the servers are deleted etc	19:10
clarkb	corvus: I think openstack/project-config still has nodepool/ dir contents. Is there a change to clean those up? Might be good to avoid future confusion	19:10
corvus	i don't think so. i'll take a look. but i'd like to keep grafana dashboards around for a while.	19:11
clarkb	++ I think keeping those around is fine. I'm more worried about people trying to fix image builds or change max-server counts	19:12
corvus	yep. i'll get rid of the rest	19:12
clarkb	thanks	19:13
clarkb	anything else on this topic?	19:13
corvus	not from me	19:13
clarkb	#topic Gerrit 3.11 Upgrade Planning	19:15
clarkb	Short of testing an upgrade with our production data I've done what I think is reasonable to try and reproduce the reported offline reindexing problems with gerrit 3.11.4	19:15
clarkb	I have been unable to reproduce the problem	19:15
clarkb	given that I think I'll proceed with testing the upgrade itself. Hopefully tomorrow	19:16
clarkb	#link https://www.gerritcodereview.com/3.11.html	19:16
clarkb	these two job links are for jobs that held nodes that I'll use for testing	19:16
clarkb	#link https://zuul.opendev.org/t/openstack/build/f1ca0d1f2e054829a4506ececb58bed3	19:16
clarkb	#link https://zuul.opendev.org/t/openstack/build/588723b923e94901af3065143d9df818	19:17
clarkb	the nodes ran under zuul launcher so didn't get lost in the nodepool cleanup	19:17
clarkb	#link https://etherpad.opendev.org/p/gerrit-upgrade-3.11 Planning Document for the eventual Upgrade	19:17
clarkb	Unfortunately, the delays have us getting into openstack's end of release cycle activities	19:18
clarkb	so planning an actual date may be painful. But we'll figure something out once I've got a better picture of the upgrade itself	19:18
clarkb	any comments, concerns or feedback on this topic before we move on?	19:19
fungi	i have none	19:19
clarkb	#topic Upgrading old servers	19:20
clarkb	The change to update matrix and irc bot container logging to journald from syslog landed on the existing server	19:20
clarkb	this is a prereq to upgrading to noble and using podman as the runtime backend	19:20
clarkb	seems to be working fine. I then looked briefly at how the current server is configured to get a sense for what is required to replace it. The main thing is the logging data is on a cinder volume. I think this means we need a downtime to either move the cinder volume from host A to host B or to sync the data from host A to host B from one volume to another	19:21
clarkb	its a bit tricky because for the limnoria irc bot and the matrix eavesdrop bot I think we really don't want them running concurrently on two different hosts	19:22
clarkb	long story short I think we should pick a day (fridays are probably best due to when meetings occur) to stop services on the old server, land a chagne to configure the new server, and copy the data between them	19:22
clarkb	something like boot new server with new volume, rsync data, time passes, rsync data again, stop services on old server, approve change to configure new server, rsync data, check deployment brings everything back up again	19:23
clarkb	guessing at least an hour for that	19:23
fungi	that wou	19:23
fungi	ld work well	19:23
clarkb	cool I would volunteer to do that this Friday but I'm meeting up with folks for lunch during FOSSY so will probably have to happen next week or someone else can atke it on	19:24
fungi	i likely can	19:24
clarkb	fungi: do you want to boot the new server and get things prepped for that or should I?	19:25
fungi	i'll do it	19:25
clarkb	perfect thanks	19:25
clarkb	then for refstack you mentioned announcing it was going away then planning to proceed with that.	19:25
clarkb	Any progress there?	19:25
fungi	though if we're moving all services at the same time, any reason not to move the cinder volume?	19:26
fungi	detach/attach instead of rsync	19:26
clarkb	fungi: probably not. I always worry the cinder volumes won't detach cleanly but that is probably an overblown concern	19:26
fungi	and no, haven't written up an announcement for refstack going away yet	19:26
clarkb	the main reason to avoid it would be moving providers but I don't think we should do that in this case	19:26
fungi	for some reason i thought we had moved logs into afs, but i hadn't looked at it recentl	19:27
fungi	y	19:27
clarkb	when you boot the new noble node don't forget to use the --config-drive flag if booting it in rax classic (its required for that image)	19:27
clarkb	fungi: yes I had thought so too but we haven't	19:27
fungi	yep, will do	19:27
clarkb	anything else on this topic?	19:28
fungi	i guess it was the meetings site hosting on static.o.o that threw me	19:28
fungi	but i suppose it's proxied to eavesdrop still	19:28
clarkb	ya I think the yaml2ical data is published to afs from zuul jobs?	19:28
fungi	that's what it was, yep	19:28
fungi	nothing else from me	19:29
clarkb	#topic Vexxhost backup server inaccessible	19:29
clarkb	yesterday fungi noticed the vexxhost backup server was inaccessible and backups to it were failing	19:29
clarkb	grabbing the console log failed as did ping and ssh	19:29
clarkb	we waited a day then corvus asked nova to reboot it today. The situation didn't change except the console log was available today	19:30
fungi	though nova claimed it was "active"	19:30
fungi	the whole time	19:30
clarkb	guilhermesp managed to take a look today and root caused it to an OVS issue	19:30
clarkb	whcih explains why network connectivity was sad but nova saw it as active/up	19:30
fungi	which would explain why a reboot didn't fix it, though i'm not sure why console logs were initially unreachable, maybe both caused by a single incident	19:31
clarkb	anyway this has been corrected by guilhermesp. guilhermesp indicates that this would not have been correctable as an end user so if we see similar symptoms in the future we should file a ticket with vexxhost	19:31
clarkb	manual inspection of the host seems to show things are happy again. But we should keep an eye on the infra-root inbox to double check backups aren't still erroring to it	19:31
clarkb	anything else we want to call out about this incident before we move on?	19:32
fungi	not i	19:32
clarkb	#topic Matrix for OpenDev comms	19:33
clarkb	#link https://review.opendev.org/c/opendev/infra-specs/+/954826 Spec outlining the motivation and plan for Matrix trialing	19:33
clarkb	frickler reviewed the spec. I've responded but haven't pushed a new patchset yet. Was hoping for a bit more feedback before I add in some of those suggestions	19:33
clarkb	so if you have 15 minutes to read what I've written that would be great	19:33
clarkb	it is probably also worth noting that the matrix.org homeserver now requires users to be at least 18 years old in response to a recent UK law change...	19:34
corvus	thanks, i'll take a look (intended to earlier, but things came up!)	19:34
fungi	i suppose a new development there is element's announcement that no one under 18 years of age is allowed to have a matrix.org account any longer, though using another homeserver is a potential workaround or affected users	19:34
clarkb	anyone using that homeserver should've gotten a message from them with the terms of service update	19:34
clarkb	fungi: yup. I'm not sure that is a huge barrier for our current user base, but something to consider	19:35
corvus	that also seems like something that may ultimately not be limited to matrix	19:35
clarkb	corvus: yes, apparently wikimedia is challenging the law	19:35
fungi	i think the innternet is about to become 18-and-up	19:35
clarkb	happy to update the spec with that info too if we think it is important to capture	19:37
fungi	not at this stage, i don't expect	19:37
fungi	something we can deal with down the road if it becomes relevant	19:37
clarkb	#topic Working through our TODO list	19:38
clarkb	#link https://etherpad.opendev.org/p/opendev-january-2025-meetup	19:38
clarkb	just our weekly reminder that if anyone ends up bored (ha) they can check the list for the next thing to chew on	19:38
clarkb	#topic Pre PTG Planning	19:38
clarkb	similarly we can figure out what our list looks like during our Pre PTG	19:39
clarkb	#link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document	19:39
clarkb	please add topic ideas to that etherpad with things you'd liek to see covered with a bit more depth than we can do day to day or in this meeting	19:39
clarkb	I think we can do a retrospective on the noble + podman switch. A few bumps but overall seems to work well (as an example)	19:40
clarkb	#topic Service Coordinator Election Planning	19:40
clarkb	it has been almost 6 months since we last elected our service coordinator (me)	19:41
clarkb	which means it is time to make a plan for the next election	19:41
clarkb	Proposal: Nomination Period open from August 5, 2025 to August 19, 2025. If necessary we will hold an election from August 20, 2025 to August 27, 2025. All date ranges and times will be in UTC.	19:41
clarkb	this is my proposal. It basically mimics what we've done the last several elections.	19:41
fungi	wfm	19:41
clarkb	I'd like to make that official today so if there are any comments, questions, or concerns please bring them up before EOD	19:42
clarkb	(I'll make it official via email to the service-discuss list)	19:42
clarkb	then I'd liek to say I'm hapyp for someone else to take on more of these organizational and liason duties if there is interest	19:42
clarkb	I'm happy to hang around in a supporting role	19:42
clarkb	let me know if there is interest or if you have any questions	19:43
fungi	i too would be thrilled to help out supporting a new coordinator if there is one	19:43
clarkb	#topic Open Discussion	19:44
clarkb	Anything else?	19:44
clarkb	sounds like that may be everything	19:46
fungi	thanks clarkb!	19:46
clarkb	thank you everyone for your time and help running opendev	19:46
clarkb	we'll be back here next week at the same time and location	19:47
clarkb	#endmeeting	19:47
opendevmeet	Meeting ended Tue Jul 29 19:47:09 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	19:47
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-29-19.00.html	19:47
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-29-19.00.txt	19:47
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-29-19.00.log.html	19:47

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!