19:00:11 <clarkb> #startmeeting infra
19:00:11 <opendevmeet> Meeting started Tue Oct  8 19:00:11 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:11 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:11 <opendevmeet> The meeting name has been set to 'infra'
19:00:21 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/Y55UWOXI5A5Z25CV5MKKQ7XWODWHBQ73/ Our Agenda
19:00:25 <clarkb> #topic Announcements
19:00:31 <clarkb> #link https://www.socallinuxexpo.org/scale/22x/events/open-infra-days CFP for Open Infra Days event at SCaLE is open until November 1
19:00:40 <clarkb> still plenty of time to submit for SCaLE if you are interested
19:01:07 <clarkb> also stuff is still not decided yet but its looking likely i'll be out for at least some of next week. I'll update if anything becomes concrete
19:01:22 <clarkb> if I am not around for our meeting I'll defer to others on whether or not we should have it
19:03:07 <clarkb> #topic Switch OpenDev Zuul Tenants to Ansible 9 by default
19:03:13 <fungi> i will probably not be around for the meeting next week, as i'll be attending openinfra days north america on tuesday and wednesday (flying/driving monday and thursday)
19:03:25 <clarkb> all the more reason to skip it if I'm not around
19:03:27 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/931320
19:03:41 <clarkb> this is something that we roughly agreed to merge nowish yesterday and haven't seen any objections too
19:04:07 <clarkb> I'll go ahead and approve it in the next few minutes if there are no last minute objections  that we need to deal with
19:04:20 <clarkb> but tldr is opendev and zuul are using ansible 9 by default for some time and it generally works
19:04:27 <clarkb> testing with devstack+tempest also works
19:05:03 <clarkb> so I expect it to generally be fine
19:05:13 <fungi> lgtm!
19:05:56 <clarkb> and approved
19:06:02 <clarkb> #topic Rocky Package Mirror Creation
19:06:31 <clarkb> Still no change on this. We can probably drop it from the agenda until a change shows up. But I wanted ot remind NeilHanlon to feel free to reach out with questions if there are any
19:06:57 <fungi> sounds good, thanks
19:07:09 <clarkb> #topic Rackspace's Flex Cloud
19:07:31 <fungi> ...is awesome, really
19:07:39 <clarkb> and there has been progress here
19:07:43 * fungi is not a paid spokesperson
19:08:14 <clarkb> corvus set up swift usage using application credentials. The main missing functionality is acls to restrict credential access to specific services or actions. And that may exist we just need to do more testing
19:08:38 <clarkb> tl;dr is there are two appraoches for this. The first is to create a dedicated user for swift and then use swift acl functionality to limit access
19:09:06 <clarkb> it isn't clear if the integration between old rax user creation and new rax flex swift enables this. They said try it and let them know how it goes
19:09:06 <fungi> which we're told is expected to work, but that we should let them know if not
19:09:12 <fungi> yeah, that
19:09:41 <clarkb> the other appraoch is to use keystonemiddleware based acls on application credentials which would live entirely within rax-flex and not involve the old auth/user stuff at all
19:10:09 <fungi> which seems preferable if it can be made to work, since it doesn't rely on proprietary apis
19:10:09 <clarkb> hwoever for this to work rax-flex would need to configure keystonemiddleware in swift and we are not sure if they have done that. But again probably something we can test to see if it works the way we expect and provide feedback if not
19:10:37 <corvus> if anyone else wants to try these things, i would welcome that
19:10:50 <clarkb> other than that swift in rax-flex is working and corvus has been pushing images to it successfully
19:10:52 <fungi> and is therefore in theory more portable to other openstack providers too
19:10:53 <clarkb> corvus: ack
19:11:01 <clarkb> which takes us to our next topic
19:11:07 <clarkb> #topic Zuul-launcher image builds
19:11:39 <clarkb> we are successfully building images and uploading them to swift with a 72 hour expiration time. The next step is to deploy the zuul-launcher service to consume that job output and upload to clouds?
19:11:51 <clarkb> I can't remember if I have reviewed that chagne now but realize I really should if not
19:12:12 <clarkb> https://review.opendev.org/c/opendev/system-config/+/924188 it did merge yay
19:12:24 <clarkb> that didn't deploy a zuul-launcher yet though just built out the tooling to configure one?
19:12:29 <corvus> the 72 hour thing is almost done
19:12:41 <corvus> yes, someone should launch.py the node soon-ish
19:13:02 <corvus> then we need this patch in zuul: https://review.opendev.org/931208
19:13:04 <fungi> i can try to look at that tomorrow, emergencies and my other work depending
19:13:09 <corvus> minimal openstack driver
19:13:35 <fungi> start with zuul-launcher01.opendev.org presumably
19:13:42 <clarkb> #link https://review.opendev.org/931208 Openstack driver for the zuul-launcher subsystem
19:13:48 <corvus> once that's all in place, then just a bit more zuul configuration change to tell it to go to work
19:13:57 <clarkb> fungi: or zl01 to mimic how nodepool was done
19:14:00 <corvus> fungi: maybe "zl01" -- short like the others
19:14:15 <fungi> er, zl01 yes, i should have looked back at the change in gerrit ;)
19:14:32 <corvus> oh yeah, we would have named it there :)
19:14:37 <fungi> it already has an entry in hiera/common.yaml
19:14:41 <fungi> wfm
19:15:23 <fungi> hopefully tomorrow morning i can get it booted and push up inventory and dns additions for that
19:15:32 <clarkb> the other bit of work is we can also add jobs to opendev/zuul-jobs for different image builds
19:15:39 <clarkb> currently we've just got debian bullseye in there
19:15:43 <clarkb> fungi: thanks!
19:16:20 <clarkb> anything else on this topic?
19:16:33 <corvus> not from me, thx
19:16:43 <clarkb> #topic Updating ansible+ansible-lint versions in our repos
19:16:50 <clarkb> #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 is the last current open change related to this work
19:17:03 <clarkb> this change has been open for long enough that I have had to fix arm image building twice to get it to pass ci :)
19:17:21 <clarkb> now that the openstack release is behind us is anyone else interested in reviewing this change? I think it would be good to land it while things are stableish
19:17:39 <clarkb> that will set us up with modern tooling that we hopefully don't have to modify much for a bit
19:18:37 <clarkb> if there isn't interset in additional review maybe we should go ahead and approve it once we're satisfied ansible 9 hasn't set anything on fire
19:19:20 <fungi> yeah, the longer that sits open the more likely it is to bitrot
19:19:47 <clarkb> ok I'll probably +A this afternoon if the ansible update seems table enough
19:19:56 <clarkb> #topic OpenStack OpenAPI spec publishing
19:20:01 <clarkb> #link https://review.opendev.org/921934
19:20:14 <clarkb> I checked this change yesterday and haven't seen any review responses yet
19:20:28 <clarkb> fungi: how do you want to approach this? Bring it up at the PTG or just reach out directly now or something else maybe?
19:21:01 <fungi> i expect it will resurface in the sdk team's ptg sessions, since that's where it last arose
19:21:14 <fungi> i'm fine dropping it from the agenda until it comes up again
19:21:30 <fungi> i mainly wanted to make sure we had some eyes on the proposal
19:21:40 <clarkb> sounds good
19:21:45 <clarkb> #topic Backup Server Pruning
19:21:58 <clarkb> The smaller of our two backup servers needs pruning roughly every 2.5 weeks at this point
19:22:05 <fungi> this has been an increasingly frequent task, albeit an easy one
19:22:28 <clarkb> a quick investigation shows there are a number of old server backups sitting on that server
19:22:46 <clarkb> I didn't du the directories but I suspect they contain at least some backup data that we could cleanup
19:22:54 <clarkb> ask01, ethercalc02, etherpad01, gitea01, lists, review-dev01, and review01
19:23:11 <clarkb> this is the list I identified. Either services that went away compeltely or services that had their primary server replaced with a new name
19:24:14 <clarkb> I think there are two options here to improve the disk usage and space situation on this server. We can either A) remove those server specific backup dirs and the corresponding login details and clean things up in place or B) add a new volume to the server and update the mounts so that we're backing up into a clean location
19:24:16 <fungi> we made filesystem snapshots of each server as they were deleted, so i'm okay with removing their backups at this point if we've needed nothing from them all this time
19:24:23 <clarkb> then at some point later in the future we can delete the old volume
19:24:42 <fungi> but also, yes, rotating the volume would have a similar effect
19:25:03 <fungi> keep in mind something similar will arise with the wiki replacement
19:25:15 <clarkb> if we do a volume rotation we may need to ensure that the ansible run for backup servers runs before we try to do backups to reinstall the backup target locations
19:25:48 <clarkb> so either way I suspect there is a bit of intervention to do. Personally I think the in place cleanup seems simpler and easier to test (because we don't have to fuss with external things like volumes and coordinate mount moves)
19:26:26 <clarkb> anyone else have an opinion on what they feel is safeest/best for our backups?
19:28:05 <clarkb> in that case I guess we can try some in place cleanup. Probably for a less important service first like ethercalc then decide if we should siwtch to the volume replacement if that doesn't go smoothly
19:28:51 <clarkb> Depending on how my week goes with other stuff I'll see if I have time to do that
19:29:00 <clarkb> (too much is in the air and I'm trying to avoid overcommitting...)
19:29:15 <clarkb> #topic Mailman 3 Upgrade
19:29:21 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/930236 will upgrade our installation of mailman 3 to the latest versions
19:29:28 <fungi> speaking of overcomitting ;)
19:29:35 <clarkb> fungi has pushed a change to upgrade our mailman 3 installation to the latest version
19:29:53 <clarkb> the upgrade tasks themselves like db migration should be handled by the containers
19:30:11 <fungi> i'm happy to babysit this as it deploys and do some in-production tests in case there are any problems not caught in ci
19:30:11 <clarkb> so in theory we build no container images, deploy new configs, push it all and tell it to run and voila upgraded mailman
19:30:23 <clarkb> s/build no/build new/
19:30:45 <clarkb> fungi: any new features we should be aware of?
19:30:57 <fungi> mailman is a lot of separate components, albeit aggregated into a smaller number of container images. the changelogs are sort of scattered
19:31:27 <fungi> i can try to put together a list of links to all the various changelogs if that's of interest
19:31:51 <clarkb> I don't have specific interest. More just wondering if there was anything they were advertising as new and improved
19:31:54 <fungi> though also some of this is upgrading stuff like alpine and django
19:32:22 <fungi> there were notable dmarc mitigation improvements for gmail
19:32:57 <clarkb> fungi: are they something you have to opt into or will we automatically get that in the upgrade? I think generally we haven't had too much trouble with gmail maybe beause our volume isn't too high
19:33:23 <fungi> to work around the fact that gmail's dkim records are a do-as-i-say-not-as-i-do situation so making assumptions about how gmail would handle messages from gmail addresses based on what's in dns turned out to be more recently problematic
19:33:53 <fungi> it's something we'd have to add specific domains to an exclusion list for
19:34:03 <fungi> the issue hasn't hit us yet afaik
19:34:29 <clarkb> the exclusion list says "this domain is backed by gmail treat it special"?
19:34:48 <fungi> there were some additional config options for list behaviors as well which got discussed on mailman-users, but i don't recall the specifics at the moment
19:35:39 <fungi> clarkb: an override list of "if addresses are at this domain then pretend that you need to do full address rewriting even if dns says it's unnecessary"
19:35:54 <clarkb> got it
19:36:09 <clarkb> fungi: any thoughts on timing for the upgrade?
19:36:17 <clarkb> the changes themselves look good to me at this point
19:36:28 <fungi> next week would be harder for me. this week or week-after-next are preferable
19:37:11 <clarkb> I expect to be around through at least thursday but probably also friday at this rate
19:37:17 <clarkb> but then ya next week not good for me either
19:38:10 <clarkb> should we just try and send it tomorrow? or is it better to wait for week after next since we're both traveling?
19:38:55 <fungi> tomorrow wfm
19:39:04 <clarkb> cool I'll be around and can help
19:39:15 <fungi> i'm mostly just doing procrastinated talk prep at this point
19:39:23 <fungi> you know how it is ;)
19:39:24 <clarkb> anything else mm3 related?
19:39:27 <clarkb> heh yes
19:39:36 <fungi> i didn't have anything
19:40:11 <clarkb> #topic Upgrading Old Servers
19:40:18 <clarkb> Unfortunately I don't think there is anything new here
19:40:30 <clarkb> still waiting on updated patchsets for the mediawiki stack
19:40:38 <fungi> it would be nice to reactivate work on the mediawiki changes yeah
19:40:47 <fungi> the old server is getting overrun by crawlers
19:41:33 <clarkb> I did want to make a note for tonyb that I believe some of my comments were related to avoiding restarting the container every time we run asnible against the host. fungi just made changes to the meetpad container stuff to ensure we only restart things when there are actual updates which may be useful to refer to
19:42:11 <clarkb> and some of our other serviecs do similar if you grep around for that sort of ansible task
19:42:31 <clarkb> #topic Docker compose plugin with Podman
19:42:59 <fungi> note that the meetpad container restart changes are not yet approved, still under review
19:43:08 <clarkb> fungi: I thought we landed them?
19:43:31 <clarkb> and we confirmed that we didn't break in the no new container case, but have yet to get an upstream release so haven't confirmed the new containers case
19:43:41 <fungi> oh, never mind, they did merge
19:43:51 <fungi> i'm clearly scattered lately
19:43:57 <clarkb> this topic is related to the previous in that last week we basically said it would be worthwhile to pick a simple but illustrative service like paste and upgrade its server to noble then set up the docker compose plugin with podman behind it
19:44:02 <fungi> don't get old, the mind is the first thing to go
19:44:26 <clarkb> the motivation behind this is it would allow us to continue to use docker compose which should be a smaller migration than say podman compose but also host our images in quay and get speculative images in testing
19:44:53 <clarkb> anyway I don't think anyone has started on this, but if there is interest in doing this I think this would be a good project for someone interested in helping OpenDev
19:45:09 <clarkb> in particular you should be able to model everything in zuul using updates to existing ci jobs or new ci jobs
19:45:43 <clarkb> then you'd only need an existing root to boot the replacement noble server. If anyone is listening and is interested in this I'm happy to help and its likely I'll end up pushing it along myself if others don't take an interest
19:46:31 <clarkb> I think it should create good overall exposure to how opendev system-config configuration management with ansible works as well as our image building process and deployments of containers via docker compose (in conjunction with ansible coordinating things)
19:46:49 <clarkb> #topic Open Discussion
19:46:51 <clarkb> Anything else?
19:47:28 <fungi> nothing on my end
19:48:32 <clarkb> I'll give it until 19:50 and if nothing else comes up we can end a bit early today
19:50:35 <clarkb> and that is the promised time
19:50:37 <clarkb> thanks everyone
19:50:39 <clarkb> #endmeeting