clarkb | meeting time | 19:00 |
---|---|---|
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Jul 22 19:00:57 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/7PMI6EMXUSHZA4J2CJ5XNXM3BKHH3CXH/ Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:02 |
clarkb | Anything to announce? | 19:02 |
clarkb | Sounds like no | 19:03 |
clarkb | #topic Zuul-launcher | 19:03 |
fungi | i didn't have anything | 19:03 |
clarkb | We can dive right in then | 19:03 |
clarkb | mixed cloud nodesets are still happening (at a lower rate) | 19:04 |
clarkb | #link https://review.opendev.org/c/zuul/zuul/+/955545 is current proposal due to lack of verbose errors from openstack clouds | 19:04 |
clarkb | there is also a bugfix to move image hashing into the image upload role so that we can hash the correct format of the data | 19:04 |
clarkb | #link https://review.opendev.org/c/opendev/zuul-providers/+/955621/ | 19:04 |
corvus | yeah, afaict they're happening because we're hitting quota, but the clouds give us error messages that don't tell us that. | 19:05 |
corvus | we're unable to upload raw images until 621 lands | 19:05 |
clarkb | and at least some of those errors we expect to be included in the error response btu there is a bug in nova | 19:05 |
fungi | because you need a time machine to interpret 15 years of openstack error messages as they evolved | 19:05 |
corvus | we do a lot, but some of them are just plain empty strings | 19:06 |
fungi | not much you can do with a null response | 19:06 |
corvus | so even though zuul-launcher is basically at the point where anything with "exceed" or "quota" in the string is treated as a quota error, yep, not much else we can do with nothing. | 19:06 |
clarkb | I see you noted centos 9 failures. Those issues are indicative of a broken mirror ( due to upstream updates happening in files out of correct order) | 19:07 |
clarkb | we mirror directly from upstream now so this is upstream not updating their mirror files in the correct order aiui | 19:07 |
corvus | there's also a bug in launcher that's causing extra "pending" uploads; a fix for that is in progress | 19:07 |
corvus | clarkb: that failed a recheck too... how long should we expect those periods to last? | 19:07 |
clarkb | corvus: sometimes its until the next mirror sync which I think happens after 4 or 6 hours | 19:09 |
corvus | oof | 19:09 |
clarkb | but sometimes we've seen it go days when upstream isn't concerned about fixing that stuff | 19:10 |
clarkb | corvus: for fixes like this I feel like we can force merge if most of the images build and we have a problem like this | 19:10 |
corvus | yeah, maybe the way to go. we could also think about making things nonvoting | 19:11 |
clarkb | though maybe that impacts record keeping for subsequent image uploads? | 19:11 |
clarkb | nonvoting is an interesting idea too since it should work when images build? | 19:11 |
clarkb | and that would make things lazy/optimistic/eventually consistent | 19:11 |
corvus | yep. either way should work. | 19:11 |
clarkb | cool. The other thing I wanted to note here is that nodepool is completely shutdown at this point so if you see errors or have questions we need to refer to niz now | 19:12 |
corvus | yep, and i'm ready to delete the nodepool servers whenever | 19:12 |
corvus | how does "now" sound? :) | 19:13 |
clarkb | I think I'm ready if you are. Rolling back seems unlikely at this point as we've been able to rollforward on the majority of workload for several weeks | 19:13 |
corvus | https://review.opendev.org/955229 is the change to remove nodepool config completely; after that merges i can issue the delete commands | 19:13 |
clarkb | and worst case we can create new nodepool servers later if there is a reason to | 19:14 |
corvus | looks like that change may need to run the gauntlet (it failed on 2 different jobs on 2 different rechecks) | 19:14 |
corvus | but i think it's read | 19:14 |
corvus | y | 19:14 |
fungi | now is good with me | 19:15 |
clarkb | it runs a lot of jobs | 19:15 |
corvus | okay i will make it so | 19:15 |
clarkb | I suspect due to the changes to inventory/ we may be able to optimize the job selection there btter if it becomes a problem | 19:15 |
clarkb | anything else on the topic of nodepool in zuul? | 19:16 |
corvus | i think that's it | 19:16 |
clarkb | #topic Gerrit 3.11 Upgrade Planning | 19:16 |
clarkb | Last Friday fungi and I landed all the gerrit image backlog changes and restarted gerrit | 19:16 |
clarkb | the end result is images that we can use to test the gerrit upgrade and they are unlikely to change much between now and the actual upgrade | 19:17 |
clarkb | #link https://www.gerritcodereview.com/3.11.html | 19:17 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.11 Planning Document for the eventual Upgrade | 19:17 |
fungi | yay! | 19:17 |
clarkb | If you get a moment looking over the release notes and making notes in that etherpad for things to test/check would be great | 19:17 |
clarkb | I have also held new test nodes | 19:18 |
clarkb | #link https://zuul.opendev.org/t/openstack/build/f1ca0d1f2e054829a4506ececb58bed3 | 19:18 |
clarkb | #link https://zuul.opendev.org/t/openstack/build/588723b923e94901af3065143d9df818 | 19:18 |
clarkb | these two buildsare the builds with held nodes. I have not done any upgrade testing on them yet. But the 3.11 nodes might be good to interact with for any ui changes | 19:18 |
clarkb | My main concern at the moment is after the recent 3.11.4 update there are two different reports on the upstream mailing list for issues that would be problematic for is | 19:19 |
clarkb | first is offline reindexing not working (it spins forver after completeling 99% of the work) | 19:19 |
clarkb | and the other is the replication plugin refusign to attempt replication to targets after some time. Even asking for a full replication run doesn't work. YOu have to restart gerrit to get it to try again | 19:20 |
clarkb | so I want to see if I can test if these are problems in gerrit 3.11.4 or with the specific deployments involved as part of upgrade testing | 19:20 |
clarkb | I'm hoping to start digging into this tomorrow | 19:21 |
clarkb | any questions or concerns about the gerrit upgrade situation? | 19:21 |
clarkb | #topic Upgrading old servers | 19:22 |
clarkb | The other train of thought I've kicked off this week is starting to look at replacing the eavesdrop server | 19:23 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/955544/ prep step for eavesdrop move to Noble | 19:23 |
clarkb | this change is a prep step to make the existing docker compose configs compatible with docker compose + podman on noble. It should be forward and backward compatible and is somethign we've applied elsewhere without issue | 19:23 |
clarkb | once that is in and happy on the old system I'll look into booting a new system and determining what the cut over looks like. I believe its something like shutdown all the irc bots on the old server, deploy the new server and ensure all the bots are started there and writing to afs happily again | 19:24 |
clarkb | (all the actual data is in afs iirc so we don't need to migrate volumes but I still need to double check that assertion) | 19:24 |
clarkb | fungi: any word on refstack yet? | 19:25 |
fungi | no, i'm planning to just write up an announcement and get someone to agree it's okay | 19:26 |
fungi | and then i'll send it out to openstack-discuss once i get approval | 19:27 |
clarkb | sounds good | 19:27 |
clarkb | thanks | 19:27 |
clarkb | anyone else have server replacement updates? I don't think we've done any recently but happy to have missed some :) | 19:27 |
fungi | i think nobody on the foundation staff cares what happens wrt refstack and associated git repos, it's mostly just me wanting to make sure users aren't surprised and we have some information to point at when there are questions | 19:28 |
clarkb | ++ | 19:28 |
clarkb | #topic Matrix for OpenDev Comms | 19:28 |
clarkb | #link https://review.opendev.org/c/opendev/infra-specs/+/954826 Spec outlining the motivation and plan for Matrix trialing | 19:28 |
clarkb | I wrote the spec | 19:29 |
clarkb | I tried to capture why we've used IRC, what we like about it and how Matrix helps fill those needs while also being more approachable to those who are more familiar with the modern Internet | 19:29 |
clarkb | I don't think anyone has reviewed it yet so I'm mostly hoping for some feedback | 19:30 |
clarkb | but feel free to leave that on the change itself | 19:30 |
clarkb | #topic Working through our TODO list | 19:31 |
clarkb | #link https://etherpad.opendev.org/p/opendev-january-2025-meetup | 19:31 |
clarkb | I have not migrated this into a more permanent home yet | 19:31 |
corvus | (thanks for the spec, i'll take a look at it soon!) | 19:31 |
clarkb | I did do some cleanups to the specs repo and I'm thinking maybe I can port it in there as a high level list of things that don't have teh depth of detail of a list of specs | 19:32 |
clarkb | but maybe a list of stubs that could become specs if necessary and otherwise captuer the need | 19:32 |
clarkb | maybe I will just try that and see if I like it | 19:32 |
clarkb | #topic Pre PTG Planning | 19:32 |
clarkb | #link https://etherpad.opendev.org/p/opendev-preptg-october-2025 Planning happening in this document | 19:32 |
clarkb | and all of that feeds into planning for our october pre ptg event | 19:32 |
clarkb | if you've got topics you want to cover feel free to add them. My plan is to port things that need discussion from that todo list into there as well as anything that is more currently topical | 19:33 |
clarkb | we have a lot of time to get ready though so no rush | 19:34 |
clarkb | #topic Open Discussion | 19:34 |
clarkb | I did want to note that as july ends we approach service coordinator election period in August | 19:34 |
clarkb | I'll start putting a plan for that next week before August actually rolls around | 19:34 |
clarkb | if you are interested in runnign I'm happy to help/support anyone with the interest | 19:35 |
clarkb | and then fungi you are working on updating the gitea main page content | 19:36 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/952407 | 19:36 |
clarkb | this is the resulting squashed change so that we don't do more than one rolling update of gitea | 19:36 |
fungi | yeah, that's the squashed version now that we have some consensus | 19:36 |
clarkb | I'll rereview that shortly but maybe we can get that deployed today | 19:37 |
fungi | if folks are still okay with it, then whenever we're ready for another round of gitea restarts... | 19:37 |
fungi | i'm around all day | 19:37 |
fungi | happy to help monitor the deploy | 19:37 |
clarkb | great. Anything else to discuss before we end the meeting? | 19:39 |
clarkb | sounds like that may be it. Thank you everyone | 19:40 |
clarkb | We'll be back here same time and location next week | 19:40 |
clarkb | #endmeeting | 19:40 |
opendevmeet | Meeting ended Tue Jul 22 19:40:27 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:40 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-22-19.00.html | 19:40 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-22-19.00.txt | 19:40 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-22-19.00.log.html | 19:40 |
fungi | thanks clarkb! | 19:41 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!