19:00:19 <clarkb> #startmeeting infra 19:00:19 <opendevmeet> Meeting started Tue Jun 24 19:00:19 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:19 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:19 <opendevmeet> The meeting name has been set to 'infra' 19:00:25 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YJ4OIEZ4ARUKB5DRBWTRKT3U6BNBCFME/ Our Agenda 19:00:30 <clarkb> #topic Announcements 19:00:56 <clarkb> Thank you fungi for running the last couple of meetings. I had some family stuff pop up and wasn't able to consistently be around 19:01:10 <clarkb> but I'm slowly catching back up now and finding my routine again 19:01:24 <fungi> welcome back! 19:01:49 <clarkb> the open infra summit's forum session proposals are open right now 19:02:08 <clarkb> I don't know that we need any opendev forums sessions but figured I'd mention that 19:02:11 <clarkb> anything else to announce? 19:04:10 <clarkb> alright lets jump into the agenda then 19:04:20 <clarkb> #topic Zuul-launcher transition 19:04:33 <clarkb> I think that auto hold stuff has rolled out or is rolling out 19:04:58 <clarkb> which means we're largely ready to start migrating more jobs over to zuul launcher from nodepool. Early changes to make that happened merged just before the meeting 19:05:06 <corvus> yep... 19:05:10 <corvus> and i think we want to revert it 19:05:21 <corvus> rax-flex-sjc3 is very sad 19:05:35 <corvus> and in a new way that z-l isn't handling well 19:05:43 <fungi> deploy was at 18:50 utc, for the record 19:05:50 <corvus> so i'm going to propose a revert, manually merge it, then add some exception handling. 19:06:02 <fungi> thanks for spotting that quickly 19:06:13 <clarkb> ack 19:06:22 <corvus> but yeah, we're really close :) 19:06:36 <corvus> and after we merge this change: https://review.opendev.org/952721 19:06:45 <clarkb> fwiw looking at the nodepool grafana graphs (not the zl grafana graphs) it looks similar to what we saw yseterday which was the cloud having some issues that needed to be resolved on the cloud side to get successful boots 19:06:59 <clarkb> but having zuul launcher handle that gracefully is a good idea (nodepool mostly does) 19:07:02 <corvus> we will only have ubuntu-focal and ubuntu-xenial images left in nodepool 19:07:27 <corvus> i forget -- where did we land on whether those were used, and what do we want to do before turning off nodepool? 19:08:05 <clarkb> for xenial one of the biggest users is/was opendev. But I did some cleanup of that on the opendev side and I think we can rip the bandaid off for opendev 19:08:11 <clarkb> focal is more difficult 19:08:20 <clarkb> or did you mean bionic? 19:08:38 <clarkb> in either case I Think we have openstack stable stuff and translation stuff relying on either bionic or focal (maybe both) 19:09:00 <corvus> oh sorry i meant bionic 19:09:08 <corvus> https://review.opendev.org/952726 is the change i meant to link 19:09:17 <clarkb> ya I think the translation sync stuff runs on bionic due to java versions or something along those lines 19:09:25 <corvus> no wait 19:09:31 <corvus> it's bionic and focal that are left 19:10:05 <corvus> and focal? 19:10:29 <fungi> possible we could do something with a tarred-up bionic chroot or container image, i suppose 19:10:31 <clarkb> got it. I see my comment on 952726 now too. To summarize I think we can rip the bandaid off for xenial and live with the fallout. We may need to check bionic/focal things more carefully. fungi not sure if you know how many if any stable branches for openstack rely on either of those. 19:10:50 <clarkb> fungi: yes I think one option for the small number of things that need the old tooling would be to use a container to run the tools 19:11:02 <corvus> fungi: that would almost certainly be *harder* than porting over the build, no? 19:11:13 <fungi> porting it over to what? 19:11:26 <corvus> don't we have working bionic and focal dib builds with nodepool-launcher? 19:11:30 <corvus> er nodepool-builder 19:11:37 <clarkb> corvus: yes 19:11:47 <clarkb> so yes another option available to us is to add those images to zuul launcher 19:12:03 <fungi> i mean using a bionic chroot with zanata tools installed on a newer (jammy or noble) test node 19:12:09 <corvus> yeah, which is probably like 5m of work 19:12:36 <fungi> if the goal is to drop the bionic nodes 19:13:16 <clarkb> I think that is a good goal (and why I think we should proceed wtih xenial that way). But I recognize there is some improtant functionality relying on it so maybe comprimise is better there 19:13:30 <clarkb> if its only 5m of work that seems like a reasonable thing to do 19:14:19 <clarkb> anything else on this topic? 19:14:24 <corvus> yeah, so if opendev is happy to continue hosting bionic and focal images, then we just need someone to copy over the config to zuul-providers 19:14:34 <fungi> right now some openstack projects are still testing on their unmaintained/victoria branches as the oldest not yet eol. victoria's official test runtime included ubuntu 20.04 (focal), so bionic should not be needed if it weren't for the translation jobs requiring older java, i think 19:15:04 <clarkb> fungi: ack that is good info 19:15:29 <clarkb> that tells me we should probably go ahead and add focal images. Then bionic is a matter of deciding about translation jobs but probably easy enough to do when we do focal 19:15:31 <fungi> in theory the bionic requirement for those jobs goes away once openstack's i18n team finishes migrating to weblate 19:16:12 <tonyb> I'm planning on helping the i18n team with that in July. 19:16:24 <corvus> i think we need an action item about the images 19:16:52 <corvus> from my pov, we sent out an email a few months ago and got some help, and now all but those 2 are done 19:16:52 <clarkb> do we have a volunteer to add focal and xenial images to zuul launcher? 19:17:02 <corvus> focal and bionic, right? :) 19:17:08 <clarkb> yes right sorry 19:17:21 <corvus> i don't think anyone has volunteered 19:17:40 <clarkb> mnasiadka isn't in here but did do a big part of that move for openstack. I can ask mnasiadka if they would be willing to do focal and bionic too 19:17:55 <corvus> ok, then if not, maybe send a "last call" email 19:17:58 <clarkb> ++ 19:18:08 <clarkb> #action clarkb to ask mnasiadka about adding focal and bionic image builds to zuul launcher 19:18:26 <corvus> thanks... i'm happy to send the last-call email if needed 19:18:46 <corvus> i think that's all the decision making i need right now 19:19:16 <clarkb> lets continue on with our agenda since we're already 1/3 of the way through our time 19:19:32 <clarkb> #topic Gerrit Shutdown Process and 3.11 Upgrade 19:19:41 <fungi> for further data points, openstack unmaintained/zed is the most recent branch officially relying on focal nodes (it's been unmaintained for over a year now), and i think the openstack position on it's going to be that if opendev wants to pull the rug out from under those jobs because nobody who claims to care about those branches volunteered to make this work, then that's how 19:19:43 <fungi> it goes 19:19:48 <clarkb> I'm going to combine the next two topics into one as I'm largely driving them and don't have any real updates 19:20:14 <clarkb> I am hoping that I will finally have the ability to update the gerrit command line string to remove the h2 cache cleanup timeout change and restart gerrit this week 19:20:46 <clarkb> and use that process to test manual sighup signalling and possibly a followup to do a docker compose driven sigint as well 19:21:03 <clarkb> then assuming I get that done I'm hoping Gerrit upgrade planning and testing can start up again early july 19:21:24 <clarkb> no real updates other than tyhis is still on my radar and I plan to look at it again as I catch up 19:21:38 <clarkb> #topic Upgrading old servers 19:21:55 <clarkb> corvus replaced zuul mergers and zuul launchers with new noble nodes 19:22:15 <corvus> planning to do schedulers and executors later this week/weekend 19:22:22 <clarkb> that seems to have largely gone well. The one hiccup was reducing the size of zuul launchers which reduced their epehermal disk sizes which a fair chunk is needed for to shuffle image data around 19:22:32 <clarkb> this has since been corrected with some cinder volumes 19:22:54 <corvus> also, the restart script needed a change because "docker-compose ps -q" behavior changed 19:23:09 <corvus> i think that command is only used for the zuul servers; does that sound right? 19:23:18 <clarkb> corvus: yes I think that is the only situation where we use that 19:23:33 <clarkb> I'm also like 95% of the way through replacing mirror-update02 with a noble mirror-update03 19:24:07 <clarkb> I thoguht I was going to delete 02 before this meeting then discovered a problem with debian reprepro config that I wanted to fix first. The fix is in place and one of the two mirror locations is happy again. The other is waiting for the next run in about 45 minutes 19:24:24 <clarkb> once that is successful I'll proceed with cleaning up the old mirror-update02 node 19:24:54 <clarkb> corvus: worht noting that mirror-update03 is using openafs to read and write data on noble which is a good indicator for executors 19:25:09 <clarkb> previously we were doing read only operations on mirrors I think 19:25:34 <tonyb> I don't know if the centos team have done it yet but they have a ticket to fix the IP situation 19:25:45 <clarkb> tonyb: yup that seems to be working now too 19:25:50 <fungi> tonyb: yeah, that's been resolved since around 13:00z 19:26:03 <tonyb> \o/ 19:27:10 <clarkb> my hope is to start on the zookeeper cluster tomorrow aswell 19:27:27 <clarkb> currently we have zk04, zk05, and zk06. Do you want zk07-09 or zk01-03 as the new servers? 19:27:59 <clarkb> changing things in place is maybe more complicated with zk since it uses the digits suffix as the cluster member id or whatever it is called 19:28:07 <clarkb> I think new servers should have new ids as a result 19:28:44 <clarkb> I'm guessing 01-03 is preferred. But let me know if not and I can do something else 19:28:56 <corvus> i don't feel strongly; slight pref for 1-3 19:29:02 <fungi> sounds good to me as well 19:29:10 <clarkb> other things to note: don't forget to use the --config-drive flag to launch node when booting our noble image in rax classic (it is required there but not in other clouds) 19:29:21 <clarkb> and fungi was there any refstack update? 19:29:27 <fungi> nope! 19:29:53 <clarkb> that was all I had on this topic. Anything else before I go to the next one? 19:31:09 <clarkb> #topic OFTC Matrix bridge no longer supporting new users 19:31:44 <clarkb> the last time I was around to discuss this I volunteered to write a spec to explicitly call out what we're trying to achieve with a move to matrix for opendev comms and a plan for testing that through doing it 19:31:55 <clarkb> I have not done that. But doing so is one of the things I intend to do as I catch up 19:32:12 <clarkb> is there anything new to consider from the last couple of weeks before I do that? 19:33:10 <corvus> no news i'm aware of 19:33:31 <fungi> not really. there were discussions in the kubernetes community about moving off slack, but they ruled out matrix basically straight away and most people who were involved in the conversation seemed to want to use discord, predictably 19:34:01 <clarkb> ack I'll proceed with the plan from a couple of weeks ago as soon as I dig through enough of my backlog 19:34:25 <clarkb> fungi: ya they seem to have immediately dismissed matrix as being incapable of handling their load/demand/users 19:34:41 <clarkb> mattermost and zulip people got invovled in the discussion as well but they seem to have been dismissed too 19:35:28 <fungi> it could be useful to dig into the size/performance concerns they raised about matrix, though for opendev's purposes we're unlikely to get anywhere near where it would start to be impacted 19:35:37 <clarkb> ya I don't think that is a concern for us 19:35:56 <clarkb> if they can quantify that somehow (rather than just asserting it as fact) that could be useful info generally 19:36:20 <clarkb> #topic Adding CentOS 10 Stream Support to Glean, DIB, and Nodepool 19:36:24 <clarkb> glean is done 19:36:33 <clarkb> #link https://review.opendev.org/c/openstack/diskimage-builder/+/949942 DIB functional testing without Nodepool 19:37:06 <clarkb> this change and its children to have dib stop relying on nodepool for testing image builds is the next step. This will allow dib greater control over devstack and its nova cpu configuration so that centos 10 can be booted with all of the cpu flags it requires 19:37:20 <clarkb> once those are in we should be able to land support for centos 10 and rocky 10 19:37:25 <clarkb> #link https://review.opendev.org/c/openstack/diskimage-builder/+/934045 DIB support for CentOS 10 Stream 19:37:50 <clarkb> and at this point all the related changes are ready for review. I intend on reviewing those today 19:38:07 <clarkb> tonyb: any other concerns or things to call out? 19:38:38 <tonyb> The series adds devstack builds along side of the nodepool ones, but the removal is more complex due to job users in other repos so I'll rework that today so we can drop the nodepool testing 19:38:47 <tonyb> assuming that sounds fair to others 19:39:19 <clarkb> do we define those jobs in dib? I thought we were just consuming them 19:39:29 <clarkb> but yes we need to update glean as well to stay in sync 19:40:11 <tonyb> we define and use jobs like dib-nodepool-funcational-openstack-$distro ; which are uses in openstacksdk 19:40:30 <clarkb> gotcha 19:40:44 <tonyb> I'm less worried about glean 19:41:18 <tonyb> openstacksdk is a little harder as it has several open branches ;P 19:41:19 <fungi> ah, yeah i guess when we pulled shade out of nodepool we wanted to test that it didn't break, and then when shade was merged into openstacksdk we kept that testing 19:41:26 <clarkb> I wonder if the motivation from the openstacksdk side is ensuring they don't break nodepool or if they want to test the sorts of operations nodepool does 19:41:41 <fungi> the former, i'm almost certain 19:42:01 <clarkb> in that case cleanup is probably most appropriate at this point? 19:42:13 <clarkb> since nodepool is going away and I don't know that zuul wants to be that tightly coupled to openstacksdk 19:42:17 <fungi> i believe so 19:42:21 <clarkb> but maybe there is some middle ground I'm not considering 19:42:23 <tonyb> I'll double check with them and proceed as appropriate 19:42:31 <clarkb> sounds good thanks 19:42:59 <clarkb> #topic projects.yaml normalization 19:43:00 <tonyb> I have the patches out there to switch from nodepool to devstack but cleanup would be better 19:43:01 <corvus> yeah, i don't think that's necessary anymore 19:43:17 <clarkb> #link https://sourceforge.net/p/ruamel-yaml/tickets/546/ 19:43:18 <corvus> (from a zuul perspective) 19:43:23 <clarkb> #link https://review.opendev.org/952006 19:43:26 <clarkb> corvus: ack thanks for confirming 19:43:39 <clarkb> sorry I'm moving ahead since we're now 3/4 of the way through our time 19:43:56 <clarkb> and I'm not caught up on this one so want to make sure we talk about it 19:44:00 <corvus> ++ 19:44:09 <clarkb> basically it seems there is some bug in ruamel and maybe we worked around it? 19:44:30 <fungi> i believe we excluded the recent releases 19:44:42 <clarkb> 952006 says that there is some other thing that may fix things but then unfortunately links to itself rather than the actual chagne that I think fixed things 19:44:44 <corvus> confused by that last comment, suspect link is wrong 19:44:53 <clarkb> corvus: yes that. This is where I got lost trying to followup on this 19:45:05 <corvus> possibly https://review.opendev.org/c/openstack/project-config/+/952315 19:45:41 <clarkb> ok so basically we're not needing to do a big normalization pass beacuse we're using an old library verison which produces output consistent with what we current have 19:46:27 <corvus> yes, also, the output of the new version is wrong 19:46:51 <clarkb> I guess we can rollforward for now as is. Monitor the upstream bug (sourceforge!) and see if upstream is going to fix it 19:47:09 <corvus> (so the choices we feel acceptable are (1) pin and wait/hope for them to fix; (2) do some more wrapping of ruamel output to work around it) 19:47:14 <corvus> we chose 1... then 2 if that fails 19:47:22 <corvus> the option of accepting the output was rejected 19:47:46 <clarkb> got it. And ya I agree that new output is not how I would format things 19:47:58 <corvus> the bot hasn't updated that change.. so maybe that means the pin worked? 19:48:09 <clarkb> I suspect so. It runs daily iirc 19:48:18 <fungi> i believe that's the reason, yes 19:48:23 <corvus> i think contents of repo == ideal means no update is needed 19:48:59 <corvus> we should probably abandon that change now 19:49:06 <clarkb> ++ 19:49:23 <corvus> done 19:49:29 <clarkb> ok now i feel caught up 19:49:31 <fungi> though as an unrelated bug, it became apparent that existing change == generated diff doesn't prevent it from pushing a new patchset 19:49:46 <clarkb> fungi: we rely on gerrit to reject it? 19:50:04 <fungi> no, commit dates are different each time so it doesn't get rejected 19:50:21 <clarkb> gotcha so we get a new patchset each day as long as there is a delta to master 19:50:26 <corvus> yep 19:51:02 <fungi> fixing that is probably a trivial line or two in the script, but nobody's had time to work out the patch 19:51:43 <clarkb> good to know. Anything else on this item? 19:52:00 <fungi> i don't think so 19:52:13 <clarkb> #topic Working through our TODO list 19:52:17 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup 19:52:28 <clarkb> Before my schedule went upside downI said I should formaize this better 19:52:40 <clarkb> which is still my intention but until then here is a friendly reminder that this list exists in that etherpad 19:52:47 <clarkb> #topic Pre PTG Planning 19:53:01 <clarkb> which is related to the last thing on the agenda: Planning pre ptg dates 19:53:29 <clarkb> OpenStack's next release is happening October 1, The summit is October 17-19, and the actual ptg is October 27-31 19:53:54 <clarkb> considering all of that I thought that October 6-10 might work for picking 2-3 days to try and have a PTG if we don't feel that October is already full of stuff 19:54:24 <fungi> wfm 19:54:56 <clarkb> I think it was helpful to do the january thing at the beginning of the year and doing a quick checkup before the ptg again would be useful 19:55:17 <fungi> i've got a talk at all things open just prior to the summit, but the proposed pre-ptg dates are early enough not to conflict 19:55:18 <clarkb> I'll probably pencil in those dates on my own calendar but please say something if that doesn't work for some reason and we can look at late september intsead potentially 19:55:25 <tonyb> Works for me. There's a solid chance I'll be in the US at that time which possibly simplifes the time-of-day selection 19:55:59 <clarkb> I suspect we'd do 2 or three days with blocks of a couple of hours 19:56:04 <clarkb> very similar to what we did in january 19:56:21 <clarkb> #topic Open Discussion 19:56:24 <clarkb> Anything else? 19:56:52 <fungi> i don't think i had anything 19:57:24 <clarkb> thanks again everyone for helping out when I wasn't able to be around much. Really appreciate it 19:57:35 <clarkb> and thank you for helping keep opendev up and running 19:57:36 <corvus> good to have you back :) 19:58:01 <fungi> yes, i don't mind pitching in, it's my pleasure 19:58:49 <clarkb> sounds like that may be everything. I expect to be back here same time and location next week 19:58:59 <clarkb> #endmeeting