19:00:14 <clarkb> #startmeeting infra
19:00:14 <opendevmeet> Meeting started Tue Jun 11 19:00:14 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:14 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:14 <opendevmeet> The meeting name has been set to 'infra'
19:00:20 <clarkb> I made it back in time to host the meeting
19:00:27 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/4BD6VNMXEEQWEEANCGX2JWB2LK6XY5RT/ Our Agenda
19:00:32 <tonyb> \o/
19:00:54 <clarkb> #topic Announcements
19:01:06 <clarkb> I'm going to be out from June 14-19
19:01:31 <clarkb> This means I'll miss next week's meeting. More than happy for ya'll to have one without me or to skip it if that is easier
19:01:38 <tonyb> I'm relocating back to AU on the 14th
19:01:54 <frickler> sounds like skipping would be fine
19:02:12 <fungi> yeah, i'll be around but don't particularly need a meeting either
19:02:21 <clarkb> works for me
19:02:28 <clarkb> #agreed The June 18th meeting is Cancelled
19:02:35 <clarkb> anything else to announce?
19:02:41 <tonyb> and I have started a project via the OIF univerity partnetships to help us get keycloak talking to Ubuntu SSO
19:02:58 <tonyb> Also my IRC is super laggy :(
19:03:14 <clarkb> tonyb: cool! is that related to the folks saying hello in #opendev earlier today?
19:03:14 <fungi> thanks tonyb!!!
19:03:27 <tonyb> clarkb: Yes yes it is :)
19:03:29 <fungi> that's the main blocker to being able to make progress on the spec
19:04:02 <fungi> knikolla had some ideas on how to approach it and had written something similar, if you manage to catch him at some point
19:04:09 <tonyb> Yup so fingers crossed we'll be able to make solid progress in the next month
19:04:16 <tonyb> fungi: Oh awesome!
19:04:20 <fungi> he might be able to provide additional insight
19:04:43 <fungi> something about having written other simplesamlphp bridges in the past
19:05:24 <tonyb> I'll reachout
19:06:28 <clarkb> #topic Upgrading Old Servers
19:06:49 <clarkb> I've been trying to keep up with tonyb but failing to do so. I think there has been progress in getting the wiki changes into shape?
19:07:00 <clarkb> tonyb: do you still need reviews on the change for new docker compose stuff?
19:07:35 <tonyb> Yes please
19:08:13 <tonyb> Adding compose v2 should be ready pending https://review.opendev.org/c/opendev/system-config/+/921764?usp=search passing CI
19:08:36 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/920760 Please review this change which adds the newer docker compose tool to our images (which should be safe to do alongside the old tool as they are different commands)
19:08:45 <tonyb> (side note) once that passes CI I could use it to migrate list3 to the golang tools
19:09:07 <clarkb> neat
19:09:11 <clarkb> and thank you for pushing that along
19:09:21 <tonyb> mediawiki is closer I have it building 2 versions of our custom container
19:09:30 <tonyb> the role is getting closer to failing in testinfra
19:09:44 <clarkb> what are the two different versions for?
19:09:54 <tonyb> from there I'll have questions about more complex testing
19:10:09 <fungi> two different mediawiki releases?
19:10:16 <clarkb> oh right we want to do upgrades
19:10:23 <tonyb> 1.28.2 (the current version) and 1.35,x the version we'll ned to go to next
19:10:34 <clarkb> got it thanks
19:10:37 <fungi> yeah, the plan is to deploy a container of the same version we're running, then upgrade once we're containered up
19:10:47 <tonyb> I wanted to make sure I had the Dockerfile and job more or less right
19:11:03 <clarkb> similar to hwo we do gerrit today with the current and next version queued up
19:11:20 <tonyb> Yup heavily inspired by gerrit :)
19:11:39 <tonyb> On noble I've been trying to move that ball along
19:11:49 <clarkb> with a mirror node based on the openafs qusetions
19:12:02 <tonyb> currebtly stuck on openafs on noble
19:12:11 <clarkb> having thought about it I think the best thing is to simply skip the ppa on noble until we need to build new pacakges for noble
19:12:35 <clarkb> I think we can (should?) do that for jammy too but didn't because we didn't realize we were using the distro package until well after we had been using jammy
19:12:45 <tonyb> I tried that and hit a failure later where we install openafs-dkms so I need to cover that also
19:12:55 <clarkb> at this point its probably fine to leave jammy in that weird state where we install the ppa then ignore it. But for noble I think we can just skip the ppa
19:13:21 <tonyb> Okay we can decide on Jammy during the review
19:13:22 <clarkb> tonyb: ok the packages may have different names in noble now?
19:13:35 <fungi> it's possible you're hitting debian bug 1060896
19:13:36 <clarkb> they were the same in jammy but our ppa had an older version so apt pciked the distro version instead
19:13:53 <clarkb> fungi: is that the arm bug? I think the issue would only be seen on arm
19:13:56 <fungi> #link https://bugs.debian.org/1060896 openafs-modules-dkms: module build fails for Linux 6.7: osi_file.c:178:42: error: 'struct inode' has no member named 'i_mtime'
19:14:06 <tonyb> I'll look it only just happened.
19:14:08 <clarkb> oh no thats a different thing
19:14:10 <fungi> oh, is openafs-modules-dkms only failing on arm?
19:14:37 <fungi> 1060896 is what's keeping me from using it with linux >=6.7
19:14:40 <clarkb> fungi: it has an arm failure, the one I helped debug and posted a bug to debian for. I just couldn't remember the number. That specific issue was arm specific
19:14:53 <tonyb> Oh and I hit another issue where CentOS9s has moved forward from our image so the headers don't match
19:15:10 <clarkb> ianw has also been pushing for us to use kafs in the mirror nodes. We could test that as an alternative
19:15:19 <fungi> #link https://bugs.debian.org/1069781 openafs kernel module not loadable on ARM64 Bookworm
19:15:37 <clarkb> tonyb: you can manually trigger an image rebuild in nodepool if you want to speed up rebuilding images after the mirrors update
19:16:04 <tonyb> Yeah I'm keen on that too (kAFS) but I was thinking later
19:16:23 <tonyb> clarkb: That'd be handy we can do that after the meeting
19:16:23 <clarkb> ++ especially if we can decouple using noble from that. I'm a fan of doing one thing at a time if possible
19:16:36 <fungi> if the challenges with openafs become too steep, kafs might be an easier path forward, yeah
19:16:43 <tonyb> It's a $newthing for me
19:16:49 <clarkb> tonyb: yup can help with that
19:16:56 <tonyb> cool cool
19:17:06 <clarkb> and thank you for pushing forward on all these things. Even if we don't have all the answers it is good to be aware of the potentail traps ahead of us
19:17:10 <fungi> i should try using kafs locally to get around bug 1060896 keeping me from accessing afs on my local systems
19:17:32 <clarkb> anything else on the topic of upgrading servers?
19:17:46 <tonyb> Nope I think that's good
19:18:09 <clarkb> #topic Cleaning up AFS Mirrors
19:18:37 <fungi> i sense a theme
19:18:50 <clarkb> I don't have any new progress on this. I'm pulled in enough directions that this is an easy deprioritization item. That said I wanted to call out that our centos-8-stream images are basically unusable and maybe unbuildable
19:19:07 <clarkb> Upstream cleared out the repos and our mirrors faithfully reflected that empty state
19:19:21 <clarkb> that means all thats left for us to do really is to remove the nodes and image builds from nodepool
19:19:28 <tonyb> 8-stream should be okay?  I thought it was just CentOS-8 that went away?
19:19:30 <clarkb> oh and the nodeset stuff from zuul
19:19:39 <clarkb> tonyb: no, 8-stream is EOL as of a week ago
19:19:48 <tonyb> Oh #rats
19:19:55 <tonyb> Okay
19:20:02 <tonyb> where are we at with Xenial?
19:20:16 <clarkb> and they deleted the repos (I think technically the repos go somewhere else, but I don't want to set an expectation that we're going to chase eol centos repo mirroring locations if the release is eol)
19:20:36 <clarkb> with xenial devstack-gate has been wound down which had a fair bit of fallout.
19:20:58 <fungi> yeah, we could try to start mirroring from the centos archive or whatever they call it, but since the platform's eol and we've said we don't make extra effort to maintain eol platforms
19:20:58 <clarkb> I think the next steps for xenial removal are going to be removing projects from the zuul tenant config and/or merging changes to projects to drop xenial jobs
19:21:22 <tonyb> Okay so we need to correct some of that fallout?  Or can we look at removing all the xenial jobs and images?
19:21:23 <clarkb> we'll have to make educated guesses on which is appropriate (probably based on the last time anything happened in gerrit for the projects)
19:21:55 <clarkb> tonyb: I don't think we need to correct the devstack-gate removal fallout. Its long been deprecated and most of the fallout is in dead projects whcih will probably get dealt with when we remove them from the tenant config
19:22:08 <tonyb> Okay.
19:22:19 <clarkb> basically we should focus on that sort of cleanup first^ then we can figure out what is still on fire and if it is worth intervention at that point
19:22:26 <tonyb> Is there any process or order for removing xenial jobs
19:22:33 <fungi> or in some very old openstack branches that have escaped eol/deletion
19:23:03 <tonyb> for example system-config-zuul-role-integration-xenial?  to pick on one I noticed today
19:23:03 <clarkb> tonyb: the most graceful way to do it is in the child jobs first then remove the parents. I think zuul will enforce this for the most part too
19:23:30 <clarkb> tonyb: for system-config in particular I think we can just proceed with cleanup. Some of my system-config xenial cleanup changes have already merged
19:23:55 <tonyb> clarkb: cool beans
19:24:00 <clarkb> if you do push changes for xenial removal please use https://review.opendev.org/q/topic:drop-ubuntu-xenial that topic
19:24:07 <tonyb> noted
19:24:59 <clarkb> one last thought before we move to the next topic: maybe we pause centos-8-stream image buidls in nodepool now as a step 0 for that distro release. As I'm like 98% positive those image builds will fail now
19:25:09 <clarkb> I should be able to push up a change for that
19:25:32 <clarkb> #topic Gitea 1.22 Upgrade
19:25:51 <clarkb> Similar to the last topic I haven't made much progress on this after I did the simple doctor command testing a few weeks ago
19:26:09 <clarkb> I was hoping that they would have a 1.22.1 release before really diving into this as they typically do a bugfix release shortly after the main release.
19:26:29 <clarkb> there is a 1.22.1 milestone that was due like yesterday in github so I do expect that soon and will pick this back up again once avaialble
19:26:46 <clarkb> there are also some fun bugs in that milestone list though I don't think any would really affect our use case
19:27:02 <clarkb> (things like PRs are not mergable due to git bugs in gitea)
19:27:30 <clarkb> #topic Fixing Cloud Launcher
19:27:50 <clarkb> This was on the agenda to ensure it didn't get lost if the last fix didn't work. But tonyb reports the job was successful last night
19:27:52 <frickler> seems this has passed today
19:28:00 <clarkb> we just had to provide a proper path to the safe.directory config entries
19:28:16 <clarkb> Which is great bceause it means we're ready to take advantage of the tool for bootstrapping some of the openmetal stuff
19:28:28 <tonyb> Huzzah
19:28:38 <clarkb> which takes us to the next topic
19:28:45 <clarkb> #topic OpenMetal Cloud Rebuild
19:28:57 <clarkb> The inmotion cloud is gone and we have a new openmetal cloud on the same hardware!
19:29:20 <clarkb> The upside to all this is we have a much more modern platform and openstack deployment to take advantage of. THe downside is we have to reenroll the cloud into our systems
19:29:43 <clarkb> I got the ball rolling and over the last 24 hours or so tonyb has really picked up todo items as I've been distracted by other stuff
19:29:55 <clarkb> #link https://etherpad.opendev.org/p/openmetal-cloud-bootstrapping captures our todo list for bootstrapping the new cloud
19:30:17 <tonyb> I think https://review.opendev.org/c/opendev/system-config/+/921765?usp=search is ready now
19:30:25 <clarkb> https://review.opendev.org/c/opendev/system-config/+/921765 is the next item that neesd to happen, then we can update cloud launcher to configure security groups and ssh keys, then we can keep working down the todo list int he etherpad
19:31:17 <tonyb> I'm happy to work with whomever to keep that rolling
19:31:20 <clarkb> so anyway reviews welcome as we get changes up to move things along. At this point I expect the vast majority of things to go through gerrit
19:31:36 <clarkb> tonyb: thanks! I should be able to help through thursday at least if we don't get it done by then
19:31:43 <tonyb> It's a good learning opportunity
19:32:19 <tonyb> I was hoping the new mirror node would be on noble but that's far from essential
19:32:55 <clarkb> ya I wouldn't prioritize noble here if there aren't easy fixes for the openafs stuff
19:33:02 <tonyb> Yeah
19:33:14 <frickler> also easy to replace later
19:33:17 <clarkb> ++
19:33:33 <clarkb> the last thing I wanted to call out on this is that I think fungi and corvus may still have account creation for the system outstanding
19:33:40 <tonyb> I haven't been consistent about removing inmotion as I add openmetal figuring we can do a complete cleanup "at the end"
19:33:51 <clarkb> feel free to reach out to me if you ahve questions (I worked through the process and while not an expert did work through a couple odd things)
19:33:52 <corvus> yep sorry :)
19:34:08 <clarkb> tonyb: ya that should be fine too. I have a note in the todo list to find those things that we missed already
19:34:20 <frickler> though that's also not too relevant for operating the cloud itself
19:34:26 <tonyb> Oh and datadog ... My invitation expired and I think clarkb was going to sak about dropping it
19:34:29 <corvus> i'm not, like, blocking things right?
19:34:35 <frickler> all infra-root ssh keys should be on all nodes already
19:34:46 <clarkb> corvus: no you are not. I'm more worried taht the invites might expire and we'll ahve to get openmetal to send new ones
19:34:56 <clarkb> but frickler  is right your ssh keys should be sufficient for the openstack side of things
19:34:59 <corvus> oh i see.  i will get that done soon then.
19:35:10 <tonyb> It looks like ew can send them ourselvs
19:35:18 <clarkb> tonyb: oh cool in that case its far less urgent
19:35:42 <tonyb> it's mostly for issue tracking and "above OpenStack" issues
19:35:49 <fungi> oh, i'll try to find the invites
19:35:58 <fungi> i think i may have missed mine
19:36:09 <clarkb> fungi: mine got sorted into the spam folder, but once dug out it was usable
19:36:17 <fungi> noted, thanks
19:36:20 <clarkb> anything else on the subject of the new openmetal cloud?
19:36:36 <frickler> well ... as far as html mails with three-line-links are usable
19:36:53 <frickler> do we have to discuss details like nested virt yet?
19:37:09 <clarkb> frickler: I figured we could wait on that until after we have "normal" nodes booting
19:37:27 <clarkb> adding nested virt labels to nodepool at that point should be trivial. That said I think we can do that as long as the cloud supports it (which I believe it does)
19:38:03 <frickler> ack, will need some testing first. but let's plan to do that before going "full steam ahead"
19:38:13 <tonyb> That works
19:38:24 <clarkb> sounds good
19:38:55 <tonyb> Do we have/need a logo to add to the "sponsors" page?
19:39:10 <clarkb> tonyb: they are already on the opendev.org page
19:39:28 <clarkb> because the inmotion stuff was really openmetal for the last little while. We were just awiting for the cloud rebuild to go through the trouble of renaming everything
19:39:36 <tonyb> Cool
19:40:10 <fungi> though it's worth double-checking that the donors listed there are still current, and cover everyone we think needs to be listed
19:40:39 <clarkb> the main questions there would be do we add osuosl and linaro/equinix/worksonarm
19:40:48 <tonyb> Rax, Vexx, OVH and OpenMetal
19:41:07 <clarkb> that might be a good discussion outside of the meeting though as we have a couple more topics to get into and I can see that takign some time to get through
19:41:13 <fungi> yep
19:41:16 <fungi> not urgent
19:41:22 <tonyb> kk
19:41:30 <clarkb> #topic Improving Mailman Throughput
19:42:09 <clarkb> We increased the number of out runner to 4 from 1. This unfortunately didn't fix the openstack-discuss delivery slowness. But now that we understand things better should prevent openstack-discuss slowness from impacting other lists so a (small) win anyway
19:42:46 <fungi> yeah, there seems to be an odd tuning mismatch between mailman and exim
19:42:47 <clarkb> fungi did more digging and found that both mailman and exim have a batch size for messages delivery of 10. However it almost seems like exim's might be 9? because it complains when mailman gives it 10 messages and goes into queing mode?
19:43:23 <fungi> yeah, exim's error says it's deferring delivery because there were more than 10 recipients, but there are exactly 10 the way mailman is batching them up (confirmed from exim logs even)
19:43:34 <clarkb> my suggestion was that we could bump the exim number quite a bit to see if it stops complaining mailman is giving it too many messages at once. THen if that helps we can further bump the mailman number up to a closer value but give exim some headroom
19:43:53 <fungi> so i think mailman's default is chosen trying to avoid exceeding exim's default limit, except it's just hitting it instead
19:43:58 <clarkb> say bump exim up to 50 or a 100. Then bump mailman up to 45 or 95
19:44:39 <tonyb> All sounds good to me
19:44:44 <corvus> could also decrease exim's queue interval
19:44:59 <fungi> yeah, right now the delay is basically waiting for the next scheduled queue run
19:45:30 <clarkb> oh thats a piece of info I wasn't aware of. So exim will process each batch in the queue with a delay between them?
19:45:38 <fungi> not exactly
19:45:40 <corvus> (i think synchronous is better if the system can handle it, but if not, then queing and running more often could be a solution)
19:45:54 <fungi> exim will process the message ~immediately if the recipient count is below the threshold
19:46:11 <fungi> otherwise it just chucks it into the big queue and waits for the batch processing to get it later
19:46:16 <clarkb> got it
19:46:36 <clarkb> fungi: was pushing up changes to do some version of ^ somethign that you were planning to do?
19:46:50 <fungi> except that with our mailing lists, every message exim receives outbound is going to exceed that threshold and end up batched instead
19:47:16 <fungi> clarkb: i can, just figured it wasn't urgent and we could brainstorm in the meeting, which we've done now
19:47:27 <clarkb> thanks!
19:47:48 <clarkb> #topic Testing Rackspaces new cloud offering
19:47:50 <fungi> so with synchronous being preferable, increasing the recipient limit in exim sounds like the way to go
19:47:54 <clarkb> ++
19:48:10 <fungi> and it doesn't sound like there are any objections with 50 instead of 10
19:48:16 <clarkb> wfm
19:48:23 <fungi> (where we'll really just be submitting 45 anyway)
19:49:41 <tonyb> #makeitso
19:49:49 <clarkb> re rax's new thing they want us to help test. I haven't had time to ask for more info on that. However, enrolling it will be very similar to the openmetal cloud (though without setting flavors and stuff I expect) so we're ensuring the process generally works and are familiar with it by going through the process for openmetal
19:50:06 <clarkb> part of the reason I have avoided digging into this is I didn't want to start something like that then disappear for a week
19:50:19 <clarkb> but for completeness thought I'd keep it on the agenda in case anyone else had info
19:51:01 <tonyb> I was supposed to reachout to cloudnull and/or the RAX helpdesk for more information
19:51:01 <frickler> is this something they sent to all customers or is it specifically targeted to opendev?
19:51:22 <clarkb> frickler: I am not sure. The way it was sent to us was via a ticket to our nodepool ci account
19:51:28 <fungi> i am a paying rackspace customer and did not receive a personal invite, so i think it's selective
19:51:53 <clarkb> I don't think our other rax accounts got invites either. Cloudnull was a big proponent of what we did back in the osic days and may have asked his team to target us
19:52:13 <frickler> ok
19:52:48 <frickler> coincidentally I've seen a number of tempest job timeouts that were all happening on rax recently
19:53:11 <clarkb> ya I think it can potentially be useful for both us and them
19:53:15 <frickler> so assuming the new cloud also has better performance, that might be really interesting
19:53:18 <clarkb> we just need to find the time to dig into it.
19:54:00 <frickler> I can certainly look into helping with testing, just not sure about communications due to time difference
19:54:02 <fungi> yes, the aging rackspace public cloud hardware and platform has been hinted in a number of different performance discrepancies/variances in jobs for a while
19:54:38 <fungi> also with it being kvm, migrating to it eventually could mean we can stop worrying about "some of the test nodes you get might be xen"
19:55:11 <frickler> +1
19:55:12 <fungi> at least i think it's kvm, but don't have much info either
19:55:28 <clarkb> one thing we should calrify is what happens to the resources after the test period. If we aren't welcome to continue using it in some capacity then the amoutn of effort may not be justified
19:55:57 <clarkb> sounds like we need more info in general. tonyb if you don't manage to find out before I get back from vacation I'll put it on my todo list at that point as I will be less likely to just disappear :)
19:56:04 <clarkb> #topic Open Discussion
19:56:14 <clarkb> We have about 4 minutes left. Anything else that was missed in the agenda?
19:56:23 <tonyb> Perfect
19:56:38 <tonyb> Nothing from me
19:56:49 <fungi> i got nothin'
19:57:44 <frickler> just ftr I finished the pypi cleanup for openstack today
19:58:12 <frickler> but noticed some repos weren't covered yet, like dib. so a bit more work coming up possibly
19:58:20 <tonyb> frickler: Wow!
19:58:27 <tonyb> That's very impressive
19:58:30 <clarkb> frickler: ack thanks for the heads up
19:58:35 <fungi> oh, also you wanted to ask about similar cleanup for opendev-maintained packages
19:58:40 <fungi> like glean
19:59:06 <frickler> yeah, but that's not urgent, so can wait for a less full agenda
19:59:15 <clarkb> for our packges I think it would be good to have a central account that is the opendev pypi owner account
19:59:20 <fungi> though for glean specifically mordred has already indicated he has no need to be a maintainer on it any longer
19:59:44 <clarkb> we can add that account as owner on our packages then remove anyone that isn't the zuul upload account (also make the zuul upload account a normal maintainer rather than owner)
19:59:45 <frickler> yes, and not sure how many other things we actually have on pypi?
19:59:53 <fungi> yeah, the biggest challenge is that there's no api, so we can't script adding maintainers/collaborators
20:00:04 <clarkb> frickler: git-review, glean, jeepyb maybe?
20:00:09 <clarkb> oh and gerritlib possible
20:00:17 <fungi> still a totally manual clicky process unless someone wants to fix the feature request in warehouse
20:00:22 <clarkb> fungi: ya thankfully our problem space is much smaller than openstacks
20:00:25 <fungi> agreed
20:00:30 <fungi> bindep too
20:00:36 <fungi> git-restack, etc
20:01:37 <clarkb> I've also configured that centos 8 stream image builds are failing
20:01:47 <clarkb> and we are at time
20:02:00 <clarkb> thank you everyone. Feel free to continue discussion in #opendev or on the mailing list, but I'll end the meeting here
20:02:02 <clarkb> #endmeeting