19:00:11 <clarkb> #startmeeting infra 19:00:11 <opendevmeet> Meeting started Tue Oct 8 19:00:11 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:11 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:11 <opendevmeet> The meeting name has been set to 'infra' 19:00:21 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/Y55UWOXI5A5Z25CV5MKKQ7XWODWHBQ73/ Our Agenda 19:00:25 <clarkb> #topic Announcements 19:00:31 <clarkb> #link https://www.socallinuxexpo.org/scale/22x/events/open-infra-days CFP for Open Infra Days event at SCaLE is open until November 1 19:00:40 <clarkb> still plenty of time to submit for SCaLE if you are interested 19:01:07 <clarkb> also stuff is still not decided yet but its looking likely i'll be out for at least some of next week. I'll update if anything becomes concrete 19:01:22 <clarkb> if I am not around for our meeting I'll defer to others on whether or not we should have it 19:03:07 <clarkb> #topic Switch OpenDev Zuul Tenants to Ansible 9 by default 19:03:13 <fungi> i will probably not be around for the meeting next week, as i'll be attending openinfra days north america on tuesday and wednesday (flying/driving monday and thursday) 19:03:25 <clarkb> all the more reason to skip it if I'm not around 19:03:27 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/931320 19:03:41 <clarkb> this is something that we roughly agreed to merge nowish yesterday and haven't seen any objections too 19:04:07 <clarkb> I'll go ahead and approve it in the next few minutes if there are no last minute objections that we need to deal with 19:04:20 <clarkb> but tldr is opendev and zuul are using ansible 9 by default for some time and it generally works 19:04:27 <clarkb> testing with devstack+tempest also works 19:05:03 <clarkb> so I expect it to generally be fine 19:05:13 <fungi> lgtm! 19:05:56 <clarkb> and approved 19:06:02 <clarkb> #topic Rocky Package Mirror Creation 19:06:31 <clarkb> Still no change on this. We can probably drop it from the agenda until a change shows up. But I wanted ot remind NeilHanlon to feel free to reach out with questions if there are any 19:06:57 <fungi> sounds good, thanks 19:07:09 <clarkb> #topic Rackspace's Flex Cloud 19:07:31 <fungi> ...is awesome, really 19:07:39 <clarkb> and there has been progress here 19:07:43 * fungi is not a paid spokesperson 19:08:14 <clarkb> corvus set up swift usage using application credentials. The main missing functionality is acls to restrict credential access to specific services or actions. And that may exist we just need to do more testing 19:08:38 <clarkb> tl;dr is there are two appraoches for this. The first is to create a dedicated user for swift and then use swift acl functionality to limit access 19:09:06 <clarkb> it isn't clear if the integration between old rax user creation and new rax flex swift enables this. They said try it and let them know how it goes 19:09:06 <fungi> which we're told is expected to work, but that we should let them know if not 19:09:12 <fungi> yeah, that 19:09:41 <clarkb> the other appraoch is to use keystonemiddleware based acls on application credentials which would live entirely within rax-flex and not involve the old auth/user stuff at all 19:10:09 <fungi> which seems preferable if it can be made to work, since it doesn't rely on proprietary apis 19:10:09 <clarkb> hwoever for this to work rax-flex would need to configure keystonemiddleware in swift and we are not sure if they have done that. But again probably something we can test to see if it works the way we expect and provide feedback if not 19:10:37 <corvus> if anyone else wants to try these things, i would welcome that 19:10:50 <clarkb> other than that swift in rax-flex is working and corvus has been pushing images to it successfully 19:10:52 <fungi> and is therefore in theory more portable to other openstack providers too 19:10:53 <clarkb> corvus: ack 19:11:01 <clarkb> which takes us to our next topic 19:11:07 <clarkb> #topic Zuul-launcher image builds 19:11:39 <clarkb> we are successfully building images and uploading them to swift with a 72 hour expiration time. The next step is to deploy the zuul-launcher service to consume that job output and upload to clouds? 19:11:51 <clarkb> I can't remember if I have reviewed that chagne now but realize I really should if not 19:12:12 <clarkb> https://review.opendev.org/c/opendev/system-config/+/924188 it did merge yay 19:12:24 <clarkb> that didn't deploy a zuul-launcher yet though just built out the tooling to configure one? 19:12:29 <corvus> the 72 hour thing is almost done 19:12:41 <corvus> yes, someone should launch.py the node soon-ish 19:13:02 <corvus> then we need this patch in zuul: https://review.opendev.org/931208 19:13:04 <fungi> i can try to look at that tomorrow, emergencies and my other work depending 19:13:09 <corvus> minimal openstack driver 19:13:35 <fungi> start with zuul-launcher01.opendev.org presumably 19:13:42 <clarkb> #link https://review.opendev.org/931208 Openstack driver for the zuul-launcher subsystem 19:13:48 <corvus> once that's all in place, then just a bit more zuul configuration change to tell it to go to work 19:13:57 <clarkb> fungi: or zl01 to mimic how nodepool was done 19:14:00 <corvus> fungi: maybe "zl01" -- short like the others 19:14:15 <fungi> er, zl01 yes, i should have looked back at the change in gerrit ;) 19:14:32 <corvus> oh yeah, we would have named it there :) 19:14:37 <fungi> it already has an entry in hiera/common.yaml 19:14:41 <fungi> wfm 19:15:23 <fungi> hopefully tomorrow morning i can get it booted and push up inventory and dns additions for that 19:15:32 <clarkb> the other bit of work is we can also add jobs to opendev/zuul-jobs for different image builds 19:15:39 <clarkb> currently we've just got debian bullseye in there 19:15:43 <clarkb> fungi: thanks! 19:16:20 <clarkb> anything else on this topic? 19:16:33 <corvus> not from me, thx 19:16:43 <clarkb> #topic Updating ansible+ansible-lint versions in our repos 19:16:50 <clarkb> #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 is the last current open change related to this work 19:17:03 <clarkb> this change has been open for long enough that I have had to fix arm image building twice to get it to pass ci :) 19:17:21 <clarkb> now that the openstack release is behind us is anyone else interested in reviewing this change? I think it would be good to land it while things are stableish 19:17:39 <clarkb> that will set us up with modern tooling that we hopefully don't have to modify much for a bit 19:18:37 <clarkb> if there isn't interset in additional review maybe we should go ahead and approve it once we're satisfied ansible 9 hasn't set anything on fire 19:19:20 <fungi> yeah, the longer that sits open the more likely it is to bitrot 19:19:47 <clarkb> ok I'll probably +A this afternoon if the ansible update seems table enough 19:19:56 <clarkb> #topic OpenStack OpenAPI spec publishing 19:20:01 <clarkb> #link https://review.opendev.org/921934 19:20:14 <clarkb> I checked this change yesterday and haven't seen any review responses yet 19:20:28 <clarkb> fungi: how do you want to approach this? Bring it up at the PTG or just reach out directly now or something else maybe? 19:21:01 <fungi> i expect it will resurface in the sdk team's ptg sessions, since that's where it last arose 19:21:14 <fungi> i'm fine dropping it from the agenda until it comes up again 19:21:30 <fungi> i mainly wanted to make sure we had some eyes on the proposal 19:21:40 <clarkb> sounds good 19:21:45 <clarkb> #topic Backup Server Pruning 19:21:58 <clarkb> The smaller of our two backup servers needs pruning roughly every 2.5 weeks at this point 19:22:05 <fungi> this has been an increasingly frequent task, albeit an easy one 19:22:28 <clarkb> a quick investigation shows there are a number of old server backups sitting on that server 19:22:46 <clarkb> I didn't du the directories but I suspect they contain at least some backup data that we could cleanup 19:22:54 <clarkb> ask01, ethercalc02, etherpad01, gitea01, lists, review-dev01, and review01 19:23:11 <clarkb> this is the list I identified. Either services that went away compeltely or services that had their primary server replaced with a new name 19:24:14 <clarkb> I think there are two options here to improve the disk usage and space situation on this server. We can either A) remove those server specific backup dirs and the corresponding login details and clean things up in place or B) add a new volume to the server and update the mounts so that we're backing up into a clean location 19:24:16 <fungi> we made filesystem snapshots of each server as they were deleted, so i'm okay with removing their backups at this point if we've needed nothing from them all this time 19:24:23 <clarkb> then at some point later in the future we can delete the old volume 19:24:42 <fungi> but also, yes, rotating the volume would have a similar effect 19:25:03 <fungi> keep in mind something similar will arise with the wiki replacement 19:25:15 <clarkb> if we do a volume rotation we may need to ensure that the ansible run for backup servers runs before we try to do backups to reinstall the backup target locations 19:25:48 <clarkb> so either way I suspect there is a bit of intervention to do. Personally I think the in place cleanup seems simpler and easier to test (because we don't have to fuss with external things like volumes and coordinate mount moves) 19:26:26 <clarkb> anyone else have an opinion on what they feel is safeest/best for our backups? 19:28:05 <clarkb> in that case I guess we can try some in place cleanup. Probably for a less important service first like ethercalc then decide if we should siwtch to the volume replacement if that doesn't go smoothly 19:28:51 <clarkb> Depending on how my week goes with other stuff I'll see if I have time to do that 19:29:00 <clarkb> (too much is in the air and I'm trying to avoid overcommitting...) 19:29:15 <clarkb> #topic Mailman 3 Upgrade 19:29:21 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/930236 will upgrade our installation of mailman 3 to the latest versions 19:29:28 <fungi> speaking of overcomitting ;) 19:29:35 <clarkb> fungi has pushed a change to upgrade our mailman 3 installation to the latest version 19:29:53 <clarkb> the upgrade tasks themselves like db migration should be handled by the containers 19:30:11 <fungi> i'm happy to babysit this as it deploys and do some in-production tests in case there are any problems not caught in ci 19:30:11 <clarkb> so in theory we build no container images, deploy new configs, push it all and tell it to run and voila upgraded mailman 19:30:23 <clarkb> s/build no/build new/ 19:30:45 <clarkb> fungi: any new features we should be aware of? 19:30:57 <fungi> mailman is a lot of separate components, albeit aggregated into a smaller number of container images. the changelogs are sort of scattered 19:31:27 <fungi> i can try to put together a list of links to all the various changelogs if that's of interest 19:31:51 <clarkb> I don't have specific interest. More just wondering if there was anything they were advertising as new and improved 19:31:54 <fungi> though also some of this is upgrading stuff like alpine and django 19:32:22 <fungi> there were notable dmarc mitigation improvements for gmail 19:32:57 <clarkb> fungi: are they something you have to opt into or will we automatically get that in the upgrade? I think generally we haven't had too much trouble with gmail maybe beause our volume isn't too high 19:33:23 <fungi> to work around the fact that gmail's dkim records are a do-as-i-say-not-as-i-do situation so making assumptions about how gmail would handle messages from gmail addresses based on what's in dns turned out to be more recently problematic 19:33:53 <fungi> it's something we'd have to add specific domains to an exclusion list for 19:34:03 <fungi> the issue hasn't hit us yet afaik 19:34:29 <clarkb> the exclusion list says "this domain is backed by gmail treat it special"? 19:34:48 <fungi> there were some additional config options for list behaviors as well which got discussed on mailman-users, but i don't recall the specifics at the moment 19:35:39 <fungi> clarkb: an override list of "if addresses are at this domain then pretend that you need to do full address rewriting even if dns says it's unnecessary" 19:35:54 <clarkb> got it 19:36:09 <clarkb> fungi: any thoughts on timing for the upgrade? 19:36:17 <clarkb> the changes themselves look good to me at this point 19:36:28 <fungi> next week would be harder for me. this week or week-after-next are preferable 19:37:11 <clarkb> I expect to be around through at least thursday but probably also friday at this rate 19:37:17 <clarkb> but then ya next week not good for me either 19:38:10 <clarkb> should we just try and send it tomorrow? or is it better to wait for week after next since we're both traveling? 19:38:55 <fungi> tomorrow wfm 19:39:04 <clarkb> cool I'll be around and can help 19:39:15 <fungi> i'm mostly just doing procrastinated talk prep at this point 19:39:23 <fungi> you know how it is ;) 19:39:24 <clarkb> anything else mm3 related? 19:39:27 <clarkb> heh yes 19:39:36 <fungi> i didn't have anything 19:40:11 <clarkb> #topic Upgrading Old Servers 19:40:18 <clarkb> Unfortunately I don't think there is anything new here 19:40:30 <clarkb> still waiting on updated patchsets for the mediawiki stack 19:40:38 <fungi> it would be nice to reactivate work on the mediawiki changes yeah 19:40:47 <fungi> the old server is getting overrun by crawlers 19:41:33 <clarkb> I did want to make a note for tonyb that I believe some of my comments were related to avoiding restarting the container every time we run asnible against the host. fungi just made changes to the meetpad container stuff to ensure we only restart things when there are actual updates which may be useful to refer to 19:42:11 <clarkb> and some of our other serviecs do similar if you grep around for that sort of ansible task 19:42:31 <clarkb> #topic Docker compose plugin with Podman 19:42:59 <fungi> note that the meetpad container restart changes are not yet approved, still under review 19:43:08 <clarkb> fungi: I thought we landed them? 19:43:31 <clarkb> and we confirmed that we didn't break in the no new container case, but have yet to get an upstream release so haven't confirmed the new containers case 19:43:41 <fungi> oh, never mind, they did merge 19:43:51 <fungi> i'm clearly scattered lately 19:43:57 <clarkb> this topic is related to the previous in that last week we basically said it would be worthwhile to pick a simple but illustrative service like paste and upgrade its server to noble then set up the docker compose plugin with podman behind it 19:44:02 <fungi> don't get old, the mind is the first thing to go 19:44:26 <clarkb> the motivation behind this is it would allow us to continue to use docker compose which should be a smaller migration than say podman compose but also host our images in quay and get speculative images in testing 19:44:53 <clarkb> anyway I don't think anyone has started on this, but if there is interest in doing this I think this would be a good project for someone interested in helping OpenDev 19:45:09 <clarkb> in particular you should be able to model everything in zuul using updates to existing ci jobs or new ci jobs 19:45:43 <clarkb> then you'd only need an existing root to boot the replacement noble server. If anyone is listening and is interested in this I'm happy to help and its likely I'll end up pushing it along myself if others don't take an interest 19:46:31 <clarkb> I think it should create good overall exposure to how opendev system-config configuration management with ansible works as well as our image building process and deployments of containers via docker compose (in conjunction with ansible coordinating things) 19:46:49 <clarkb> #topic Open Discussion 19:46:51 <clarkb> Anything else? 19:47:28 <fungi> nothing on my end 19:48:32 <clarkb> I'll give it until 19:50 and if nothing else comes up we can end a bit early today 19:50:35 <clarkb> and that is the promised time 19:50:37 <clarkb> thanks everyone 19:50:39 <clarkb> #endmeeting