19:00:08 #startmeeting infra 19:00:08 Meeting started Tue Aug 19 19:00:08 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:08 The meeting name has been set to 'infra' 19:00:11 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JQ73GENJBNTUU4QZD7SY6OO2NZ2H7WGU/ Our Agenda 19:00:37 #topic Announcements 19:01:14 I didn't have anything to announce. Did anyone else? 19:01:45 i did not 19:02:02 #topic Gerrit 3.11 Upgrade Planning 19:02:18 This continues to be half on the back burner. Except that upstream has made our lives more difficult again by publishing new releases :) 19:02:52 #link https://review.opendev.org/c/opendev/system-config/+/957555 Upgrade Gerrit images to 3.10.8 and 3.11.5 19:03:08 Probably a good idea to land that and restart gerrit and recycle teh holds so that when I do get to testing things I'm testing with up to date gerrit versions 19:03:25 maybe tomorrow or thursday for the gerrit restart depending on how zuul stuff goes between now and then? 19:04:09 the zuul upgrade and reboot playbook is looking happy so far just needs time and zuul launchers haven't run out of disk yet so they also looks happy 19:04:16 makes me think I probably will have time for that soon 19:04:24 any comments or concerns around the gerrit 3.11 upgrade? 19:05:58 #topic Upgrading old servers 19:06:10 Fungi cleaned up refstack and the old eavesdrop server stuff last week 19:06:15 Next on the list are kerberos, openafs, graphite, and backup servers 19:06:27 i'll try to get to kerberos/openafs later this week 19:06:43 fungi: that would be great. Feel free to ping me if I can help in any way 19:06:55 those seem to all be on focal at the moment 19:06:58 Then I wanted to call out a milestone that we appear to have reached: there are no more bionic servers 19:07:09 #link https://review.opendev.org/c/opendev/system-config/+/957950 cleanup bionic node testing in system-config 19:07:53 my plan is to stick them in emergency disable, update our testing to jammy, in-place upgrade them, one by one to jammy, then maybe increase our testing again to noble and repeat, finally removing them from the disable list 19:07:54 This came up because ansible 11 (our new default in zuul) isn't compatible with the python version on bionic. I did a quick workaround yesterday then dug in more this morning and I believe that we don't have any servers running bionic so we can drop testing for that platform and use ansible 11 19:08:06 huzzah! 19:08:15 fungi: that sounds like a great plan. I like the idea of going step by step and checkpointing just to catch any problems early 19:08:43 at least i hope our system-config jobs will give an early warning of serious problems before attempting to upgrade 19:08:54 feel free to double check me on the no more bionic assertion but best I can tell the hosts in our ansible fact cache reporting bionic are no longer in our inventory 19:09:07 fungi: ya it should catch the more obvious stuff 19:09:10 its a good check 19:09:36 I also shutdown gitea-lb02.opendev.org this morning as frickler reporting its cronspam was not verifying now that its dns records are gone 19:09:57 we kept the server around as a debugging aid so I dind't want to delete it. But shutting it down for now seemed fine and I've done so 19:10:28 any other server upgrade/replacement/deletion questions comments or concerns? 19:10:35 that reminds me, we have a dns record for review02 but i think we're on review03 now? 19:10:50 cacti/storyboard are xenial, looks like, and wiki is older 19:11:09 so i agree no more bionic that i can find 19:11:29 we have stuff older than bionic but we also can't really test it at this point 19:11:36 corvus: correct we're on 03 now. I'll make a note in my todo list to look at that 19:12:01 ya and we removed testing for that older stuff a little while ago 19:12:04 so ship has sailed... 19:12:14 review02 is also still in the emergency disable list 19:12:49 along with gitea-lb02 19:13:17 they aren't in our inventory any more so should be able to be removed from the emergency file. I can check on that too 19:14:14 #topic Matrix for OpenDev comms 19:14:21 #link https://review.opendev.org/c/opendev/infra-specs/+/954826 Spec outlining the motivation and plan for Matrix trialing 19:14:36 I've updated the spec with the feedback that I got. It looks like ianw is happy with it now. Anyone else care to rereview? 19:14:52 I guess corvus did ask about double checking the possibility of EMS hosted mjolnir which I haven't done 19:15:47 we should oprobably keep most of the discussion on this topic within the spec review 19:15:48 yeah, still seems like a good idea, but probably doesn't radically alter the next steps which are: run mjolnir (either ourselves or via ems) 19:16:00 so please follow up there. 19:16:13 though i'm still at "learn how to speel mjolnir" which is step 0 19:16:30 I think it is literally the hardest word to remember the spelling of and type 19:16:55 #topic Pre PTG Planning 19:17:00 sadly i'm enough of a mythology geek that i have no trouble spellingnit 19:17:01 #link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document 19:17:30 I think we can consider this proposed schedule pretty well settled at this point as I haven't heard any feedback to the contrary 19:18:11 sgtm, thanks! 19:18:13 please add agenda items to the etherpad if you have ideas for things to do or change etc 19:18:26 I'll continue to add items myself as I think of them 19:19:15 #topic Service Coordinator Election Planning 19:19:34 The service coordinator nomination period ends at EOD today on a UTC clock 19:19:41 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YXRD23ZWJGDPZ3WESBNZNEYO7NBCXFT4/ 19:19:56 yesterday I went ahead and nominated myself 19:20:09 https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/WNUYDT47NMYC3SC5QA44OG4PWK5ENQEF/ 19:20:47 it seemed like no one else was going for it and I wanted to make sure that I was well ahead of the deadline. If however I misread the room please speak up. I'm happy to step to the side or work togetherwith someone etc 19:20:53 (or have an election) 19:21:29 but I suspect its a tag you're it again situation for myself. And that is fine too. That said I think some variety would be a good thing and would happily support someone else in the role 19:22:02 your sacrifice is appreciated 19:22:21 #topic Loss of upstream Debian bullseye-backports mirror 19:22:37 I think we had a rough outline of a plan here which included potentially breaking a small number of jobs 19:22:54 sounds like we have a way forward for this yes, just haven't had time to start in on it 19:23:11 any concerns with the plan since we last spoke about it here? 19:23:47 (the plan is basically to clean up backports for bullseye and force jobs to find an alternative since that most accurately reflects the state with upstream) 19:23:49 i just need to remember if we said base-jobs first or straight to zuul-jobs with an advance announcement 19:24:30 should be in the meeting logs (I believe it was zuul-jobs but double check) 19:24:36 the latter is less work for greater benefit in the long run, but will take a it longer 19:24:55 yeah, that's what i thought 19:26:07 #topic Etherpad 2.4.2 Upgrade 19:26:15 #link https://github.com/ether/etherpad-lite/issues/7065 Theming updates have broken the no-skin and custom skin themes. We use no-skin. 19:26:29 I was hopeful that there would be followup after they responded to the issue last week just after our meeting 19:27:04 but no. Basically they said no-skin isn't expected to be used its only there as an example for other skins and I responded that no-skin is/was the only skin until colibris was added and we kept it for user familiarity and density of the text 19:27:27 it does seem like there's been some upstream turnover, to the point where the current maintainers aren't aware of what etherpad-lite used to look like 19:27:35 still waiting for a response indicating whethero r not no-skin is no longer expected to be used 19:27:56 but hopefully we have an answer soon on whether or not we have to accept colibris or can continue as is 19:28:09 but also custom skins (presumably based on no-skin) are now broken too 19:28:39 yes at least one person responded that this is the case I think they have some stuff to fix at least 19:28:43 so no-skin isn't currently useful even for they purpose thet thought it was for 19:29:05 and if they fix that, it will likely be usable for us again too 19:29:22 here's hoping 19:29:29 #topic Moving OpenDev's python-base/python-builder/uwsig-base Images to Quay 19:29:39 #link https://review.opendev.org/c/opendev/system-config/+/957277 19:30:15 last week I noted that I had some concern that we may not use speculative images in builds with this switch. Since then I dug into the docker docs, our test job roles, and the change I wrote and I think this is a non issue 19:30:23 our past selves already addressed these problems 19:31:11 at this point I think the main concern is that we'll want to rebuild all of the python based images after this lands 19:31:16 yay! /pats selves on back 19:31:23 probably don't need to rush to do that but also won't want to delay 19:31:51 I'm not sure there is ever a good time for something like that so its mostly calling it out as a todo once that lands so we don't forget and use the old stale images forever 19:31:58 so ya reviews welcome and feedback on timing too 19:32:38 #topic Bootstrapping rax-flex iad3 19:33:12 there is a third rax flex region and cloudnull has given us the go ahead to use it. Yesterday we landed a change to update our clouds.yaml files and set up the cloud launcher to preconfigure things 19:33:34 cloud launcher failed on auth issues and it seems we still need to login to skyline first to have the new env sync account stuff from the old env 19:33:50 (along with increased quotas in all 3 regions) 19:33:52 I have done this for both of our accounts in iad3's skyline service and the openstack client can list images in both accounts now 19:34:16 I think I'll wait for our daily cloud launcher run to happen at ~0200 UTC and then look into bootstrapping a mirror tomorrow 19:34:30 we also need to set the network mtu to 1500 which is an extra step post cloud launcher 19:35:31 then cloudnull also suggested we reenable the rax classic regions and see if they are happier now. Looks like the first change to do that just merged 19:35:56 If we do see new or renewed problems with that we can use the existing email thread I started to followup 19:36:03 (and if anyone would prefer I do the emailing just let me know)) 19:36:14 yeah, the change for the first two regions just merged 19:36:31 I split them because ord and dfw had a different failure mode to iad 19:36:43 so want to reenable them separately for ease of reverts / debugging 19:36:48 right, i didn't approve the second for now 19:37:30 #topic Open Discussion 19:37:33 Anything else? 19:38:22 i'll be gone next week and half the following week 19:38:28 I mentioned earlier that the zuul upgrades and reboots seem happier now. I am running that playbook in a root screen on bridge out of band due to several consecutive failures the last few weeks 19:38:37 fungi: enjoy your time of 19:38:40 *off 19:38:54 thanks! 19:41:20 the ansible 11 swithc has gone really well I think. The main issues we have encountered are Bionic and older nodes not being supported by ansible 11 due to python version incomaptibilities 19:41:41 ++ 19:41:48 then we also discovered skyline used a list for playbook vars: https://opendev.org/openstack/skyline-apiserver/src/branch/master/playbooks/devstack/pre.yaml#L8-L9 and ansible 11 doesn't like that 19:41:53 and that one weird skyline job 19:41:58 yeah that 19:42:04 but I'm not sure older ansible was even doing the right thing with that config. It should also be a trivial problem to fix 19:42:15 it was likely ignored 19:42:21 i get the idea that was like removing prehistoric syntax 19:42:52 you think? i thought it was just a super weird old way to specify vars 19:43:02 i should be surprised that turned up in one of the newest openstack projects, but i'm not 19:43:30 yeah maybe it did actually work in 9 19:43:31 ya its possible it just worked until ansible decided it was weird and stopped being backward compatibile 19:43:38 in any case straightforward to fix 19:44:06 if you see any other ansible 11 issues its good to make note of them as this sort of info can go into zuul's changelog for ansible things 19:44:10 helps other zuul users 19:44:44 ok last call. Anything else? Otherwise we can end about 15 minutes early today 19:45:14 bindep and git-review were testing on older python versions we'll need to make some decisions about 19:45:18 pbr too 19:45:58 I think for bindep and git review we just drop the old stuff and move on. They have old releases that can run with old python 19:46:12 pbr is trickier and probably worth keeping python2.7 still since swift only just dropped support for that version 19:46:19 seems like we can move pbr's py27 testing to newer platforms, but will probably need to drop 3.5-3.7 testing 19:46:22 (and that means updating the python2.7 test job to jammy I think) 19:46:25 ya exactly 19:46:59 well, or focal 19:48:10 sounds like that may be everything. Thanks everyone! 19:48:20 thanks clarkb! 19:48:34 I'll probably run a meeting next week despite the expected lower attendance. Its good to capture the goings on for people to review if nothing else 19:48:39 #endmeeting