19:00:08 <clarkb> #startmeeting infra 19:00:08 <opendevmeet> Meeting started Tue Aug 19 19:00:08 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:08 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:08 <opendevmeet> The meeting name has been set to 'infra' 19:00:11 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JQ73GENJBNTUU4QZD7SY6OO2NZ2H7WGU/ Our Agenda 19:00:37 <clarkb> #topic Announcements 19:01:14 <clarkb> I didn't have anything to announce. Did anyone else? 19:01:45 <fungi> i did not 19:02:02 <clarkb> #topic Gerrit 3.11 Upgrade Planning 19:02:18 <clarkb> This continues to be half on the back burner. Except that upstream has made our lives more difficult again by publishing new releases :) 19:02:52 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/957555 Upgrade Gerrit images to 3.10.8 and 3.11.5 19:03:08 <clarkb> Probably a good idea to land that and restart gerrit and recycle teh holds so that when I do get to testing things I'm testing with up to date gerrit versions 19:03:25 <clarkb> maybe tomorrow or thursday for the gerrit restart depending on how zuul stuff goes between now and then? 19:04:09 <clarkb> the zuul upgrade and reboot playbook is looking happy so far just needs time and zuul launchers haven't run out of disk yet so they also looks happy 19:04:16 <clarkb> makes me think I probably will have time for that soon 19:04:24 <clarkb> any comments or concerns around the gerrit 3.11 upgrade? 19:05:58 <clarkb> #topic Upgrading old servers 19:06:10 <clarkb> Fungi cleaned up refstack and the old eavesdrop server stuff last week 19:06:15 <clarkb> Next on the list are kerberos, openafs, graphite, and backup servers 19:06:27 <fungi> i'll try to get to kerberos/openafs later this week 19:06:43 <clarkb> fungi: that would be great. Feel free to ping me if I can help in any way 19:06:55 <fungi> those seem to all be on focal at the moment 19:06:58 <clarkb> Then I wanted to call out a milestone that we appear to have reached: there are no more bionic servers 19:07:09 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/957950 cleanup bionic node testing in system-config 19:07:53 <fungi> my plan is to stick them in emergency disable, update our testing to jammy, in-place upgrade them, one by one to jammy, then maybe increase our testing again to noble and repeat, finally removing them from the disable list 19:07:54 <clarkb> This came up because ansible 11 (our new default in zuul) isn't compatible with the python version on bionic. I did a quick workaround yesterday then dug in more this morning and I believe that we don't have any servers running bionic so we can drop testing for that platform and use ansible 11 19:08:06 <corvus> huzzah! 19:08:15 <clarkb> fungi: that sounds like a great plan. I like the idea of going step by step and checkpointing just to catch any problems early 19:08:43 <fungi> at least i hope our system-config jobs will give an early warning of serious problems before attempting to upgrade 19:08:54 <clarkb> feel free to double check me on the no more bionic assertion but best I can tell the hosts in our ansible fact cache reporting bionic are no longer in our inventory 19:09:07 <clarkb> fungi: ya it should catch the more obvious stuff 19:09:10 <clarkb> its a good check 19:09:36 <clarkb> I also shutdown gitea-lb02.opendev.org this morning as frickler reporting its cronspam was not verifying now that its dns records are gone 19:09:57 <clarkb> we kept the server around as a debugging aid so I dind't want to delete it. But shutting it down for now seemed fine and I've done so 19:10:28 <clarkb> any other server upgrade/replacement/deletion questions comments or concerns? 19:10:35 <corvus> that reminds me, we have a dns record for review02 but i think we're on review03 now? 19:10:50 <fungi> cacti/storyboard are xenial, looks like, and wiki is older 19:11:09 <fungi> so i agree no more bionic that i can find 19:11:29 <fungi> we have stuff older than bionic but we also can't really test it at this point 19:11:36 <clarkb> corvus: correct we're on 03 now. I'll make a note in my todo list to look at that 19:12:01 <clarkb> ya and we removed testing for that older stuff a little while ago 19:12:04 <clarkb> so ship has sailed... 19:12:14 <fungi> review02 is also still in the emergency disable list 19:12:49 <fungi> along with gitea-lb02 19:13:17 <clarkb> they aren't in our inventory any more so should be able to be removed from the emergency file. I can check on that too 19:14:14 <clarkb> #topic Matrix for OpenDev comms 19:14:21 <clarkb> #link https://review.opendev.org/c/opendev/infra-specs/+/954826 Spec outlining the motivation and plan for Matrix trialing 19:14:36 <clarkb> I've updated the spec with the feedback that I got. It looks like ianw is happy with it now. Anyone else care to rereview? 19:14:52 <clarkb> I guess corvus did ask about double checking the possibility of EMS hosted mjolnir which I haven't done 19:15:47 <clarkb> we should oprobably keep most of the discussion on this topic within the spec review 19:15:48 <corvus> yeah, still seems like a good idea, but probably doesn't radically alter the next steps which are: run mjolnir (either ourselves or via ems) 19:16:00 <clarkb> so please follow up there. 19:16:13 <corvus> though i'm still at "learn how to speel mjolnir" which is step 0 19:16:30 <clarkb> I think it is literally the hardest word to remember the spelling of and type 19:16:55 <clarkb> #topic Pre PTG Planning 19:17:00 <fungi> sadly i'm enough of a mythology geek that i have no trouble spellingnit 19:17:01 <clarkb> #link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document 19:17:30 <clarkb> I think we can consider this proposed schedule pretty well settled at this point as I haven't heard any feedback to the contrary 19:18:11 <fungi> sgtm, thanks! 19:18:13 <clarkb> please add agenda items to the etherpad if you have ideas for things to do or change etc 19:18:26 <clarkb> I'll continue to add items myself as I think of them 19:19:15 <clarkb> #topic Service Coordinator Election Planning 19:19:34 <clarkb> The service coordinator nomination period ends at EOD today on a UTC clock 19:19:41 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YXRD23ZWJGDPZ3WESBNZNEYO7NBCXFT4/ 19:19:56 <clarkb> yesterday I went ahead and nominated myself 19:20:09 <clarkb> https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/WNUYDT47NMYC3SC5QA44OG4PWK5ENQEF/ 19:20:47 <clarkb> it seemed like no one else was going for it and I wanted to make sure that I was well ahead of the deadline. If however I misread the room please speak up. I'm happy to step to the side or work togetherwith someone etc 19:20:53 <clarkb> (or have an election) 19:21:29 <clarkb> but I suspect its a tag you're it again situation for myself. And that is fine too. That said I think some variety would be a good thing and would happily support someone else in the role 19:22:02 <fungi> your sacrifice is appreciated 19:22:21 <clarkb> #topic Loss of upstream Debian bullseye-backports mirror 19:22:37 <clarkb> I think we had a rough outline of a plan here which included potentially breaking a small number of jobs 19:22:54 <fungi> sounds like we have a way forward for this yes, just haven't had time to start in on it 19:23:11 <clarkb> any concerns with the plan since we last spoke about it here? 19:23:47 <clarkb> (the plan is basically to clean up backports for bullseye and force jobs to find an alternative since that most accurately reflects the state with upstream) 19:23:49 <fungi> i just need to remember if we said base-jobs first or straight to zuul-jobs with an advance announcement 19:24:30 <clarkb> should be in the meeting logs (I believe it was zuul-jobs but double check) 19:24:36 <fungi> the latter is less work for greater benefit in the long run, but will take a it longer 19:24:55 <fungi> yeah, that's what i thought 19:26:07 <clarkb> #topic Etherpad 2.4.2 Upgrade 19:26:15 <clarkb> #link https://github.com/ether/etherpad-lite/issues/7065 Theming updates have broken the no-skin and custom skin themes. We use no-skin. 19:26:29 <clarkb> I was hopeful that there would be followup after they responded to the issue last week just after our meeting 19:27:04 <clarkb> but no. Basically they said no-skin isn't expected to be used its only there as an example for other skins and I responded that no-skin is/was the only skin until colibris was added and we kept it for user familiarity and density of the text 19:27:27 <fungi> it does seem like there's been some upstream turnover, to the point where the current maintainers aren't aware of what etherpad-lite used to look like 19:27:35 <clarkb> still waiting for a response indicating whethero r not no-skin is no longer expected to be used 19:27:56 <clarkb> but hopefully we have an answer soon on whether or not we have to accept colibris or can continue as is 19:28:09 <fungi> but also custom skins (presumably based on no-skin) are now broken too 19:28:39 <clarkb> yes at least one person responded that this is the case I think they have some stuff to fix at least 19:28:43 <fungi> so no-skin isn't currently useful even for they purpose thet thought it was for 19:29:05 <fungi> and if they fix that, it will likely be usable for us again too 19:29:22 <clarkb> here's hoping 19:29:29 <clarkb> #topic Moving OpenDev's python-base/python-builder/uwsig-base Images to Quay 19:29:39 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/957277 19:30:15 <clarkb> last week I noted that I had some concern that we may not use speculative images in builds with this switch. Since then I dug into the docker docs, our test job roles, and the change I wrote and I think this is a non issue 19:30:23 <clarkb> our past selves already addressed these problems 19:31:11 <clarkb> at this point I think the main concern is that we'll want to rebuild all of the python based images after this lands 19:31:16 <fungi> yay! /pats selves on back 19:31:23 <clarkb> probably don't need to rush to do that but also won't want to delay 19:31:51 <clarkb> I'm not sure there is ever a good time for something like that so its mostly calling it out as a todo once that lands so we don't forget and use the old stale images forever 19:31:58 <clarkb> so ya reviews welcome and feedback on timing too 19:32:38 <clarkb> #topic Bootstrapping rax-flex iad3 19:33:12 <clarkb> there is a third rax flex region and cloudnull has given us the go ahead to use it. Yesterday we landed a change to update our clouds.yaml files and set up the cloud launcher to preconfigure things 19:33:34 <clarkb> cloud launcher failed on auth issues and it seems we still need to login to skyline first to have the new env sync account stuff from the old env 19:33:50 <fungi> (along with increased quotas in all 3 regions) 19:33:52 <clarkb> I have done this for both of our accounts in iad3's skyline service and the openstack client can list images in both accounts now 19:34:16 <clarkb> I think I'll wait for our daily cloud launcher run to happen at ~0200 UTC and then look into bootstrapping a mirror tomorrow 19:34:30 <clarkb> we also need to set the network mtu to 1500 which is an extra step post cloud launcher 19:35:31 <clarkb> then cloudnull also suggested we reenable the rax classic regions and see if they are happier now. Looks like the first change to do that just merged 19:35:56 <clarkb> If we do see new or renewed problems with that we can use the existing email thread I started to followup 19:36:03 <clarkb> (and if anyone would prefer I do the emailing just let me know)) 19:36:14 <fungi> yeah, the change for the first two regions just merged 19:36:31 <clarkb> I split them because ord and dfw had a different failure mode to iad 19:36:43 <clarkb> so want to reenable them separately for ease of reverts / debugging 19:36:48 <fungi> right, i didn't approve the second for now 19:37:30 <clarkb> #topic Open Discussion 19:37:33 <clarkb> Anything else? 19:38:22 <fungi> i'll be gone next week and half the following week 19:38:28 <clarkb> I mentioned earlier that the zuul upgrades and reboots seem happier now. I am running that playbook in a root screen on bridge out of band due to several consecutive failures the last few weeks 19:38:37 <clarkb> fungi: enjoy your time of 19:38:40 <clarkb> *off 19:38:54 <fungi> thanks! 19:41:20 <clarkb> the ansible 11 swithc has gone really well I think. The main issues we have encountered are Bionic and older nodes not being supported by ansible 11 due to python version incomaptibilities 19:41:41 <corvus> ++ 19:41:48 <clarkb> then we also discovered skyline used a list for playbook vars: https://opendev.org/openstack/skyline-apiserver/src/branch/master/playbooks/devstack/pre.yaml#L8-L9 and ansible 11 doesn't like that 19:41:53 <fungi> and that one weird skyline job 19:41:58 <fungi> yeah that 19:42:04 <clarkb> but I'm not sure older ansible was even doing the right thing with that config. It should also be a trivial problem to fix 19:42:15 <fungi> it was likely ignored 19:42:21 <corvus> i get the idea that was like removing prehistoric syntax 19:42:52 <corvus> you think? i thought it was just a super weird old way to specify vars 19:43:02 <fungi> i should be surprised that turned up in one of the newest openstack projects, but i'm not 19:43:30 <fungi> yeah maybe it did actually work in 9 19:43:31 <clarkb> ya its possible it just worked until ansible decided it was weird and stopped being backward compatibile 19:43:38 <clarkb> in any case straightforward to fix 19:44:06 <clarkb> if you see any other ansible 11 issues its good to make note of them as this sort of info can go into zuul's changelog for ansible things 19:44:10 <clarkb> helps other zuul users 19:44:44 <clarkb> ok last call. Anything else? Otherwise we can end about 15 minutes early today 19:45:14 <fungi> bindep and git-review were testing on older python versions we'll need to make some decisions about 19:45:18 <fungi> pbr too 19:45:58 <clarkb> I think for bindep and git review we just drop the old stuff and move on. They have old releases that can run with old python 19:46:12 <clarkb> pbr is trickier and probably worth keeping python2.7 still since swift only just dropped support for that version 19:46:19 <fungi> seems like we can move pbr's py27 testing to newer platforms, but will probably need to drop 3.5-3.7 testing 19:46:22 <clarkb> (and that means updating the python2.7 test job to jammy I think) 19:46:25 <clarkb> ya exactly 19:46:59 <fungi> well, or focal 19:48:10 <clarkb> sounds like that may be everything. Thanks everyone! 19:48:20 <fungi> thanks clarkb! 19:48:34 <clarkb> I'll probably run a meeting next week despite the expected lower attendance. Its good to capture the goings on for people to review if nothing else 19:48:39 <clarkb> #endmeeting