#opendev-meeting log

19:00:31 <clarkb> #startmeeting infra
19:00:31 <opendevmeet> Meeting started Tue Apr 29 19:00:31 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:31 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:31 <opendevmeet> The meeting name has been set to 'infra'
19:00:46 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/BQPBY4QVBYNC3VTOU3HXAUTESQSC7WKZ/ Our Agenda
19:00:52 <clarkb> #topic Announcements
19:01:15 <clarkb> Due to travel obligations neither fungi or I can attend the meeting next week. For that reason we basically decided last week to cancel the next meeting on May 6
19:01:31 <clarkb> I'm going ahead and making that official now. The May 6 meeting will be cancelled. We'll be back the week after. See you there
19:01:43 <clarkb> Anything else to announce before we dive into today's content?
19:03:02 <clarkb> #topic Zuul-launcher image builds
19:03:25 <clarkb> mnasiadka volunteered to add some arm64 images to zuul launcher. That work is in progress and i think the latest change merged earlier today
19:04:04 <clarkb> So far you should be able to use noble arm64 nodes from zuul launcher and jammy is in progress
19:04:31 <clarkb> this is great to see as it proves we can continue to do multi arch images with zuul launcher and getting help from interested parties is a huge bonus
19:04:51 <clarkb> I think the disk io improvements osuosl made semi recently help keep the image build times reasonable too. A group effort all around
19:05:10 <clarkb> I think the next steps there are to continue to add images and then also start dogfooding them
19:05:21 <clarkb> #link https://review.opendev.org/c/opendev/zuul-providers/+/948318/ is next up
19:05:33 <clarkb> Did anyone else have nodepool in zuul updates?
19:06:38 <clarkb> #topic Container hygiene tasks
19:06:53 <clarkb> Next up we've managed to update all the images to python3.12 except irbot/limnoria
19:06:59 <clarkb> #link https://review.opendev.org/q/topic:%22opendev-python3.12%22+status:open Update images to use python3.12
19:07:25 <clarkb> I'll continue to link the topic link rather than a specific change in case anyone else finds cases that were missed. If you push changes with the topic we'll get them automatically on the review list that way
19:07:38 <clarkb> otherwise things have gone smoothly. Even jeepyb on the gerrit imagei s runnung under python3.12 now
19:07:46 <clarkb> fungi: any specific thoughts on when we should update limnoria?
19:07:57 <fungi> later today?
19:08:14 <clarkb> I'll be around so that should work for me
19:08:14 <fungi> i'm happy to babysit that deploy
19:08:19 <clarkb> cool
19:08:39 <fungi> should be a quiet time so won't disrupt meetings
19:08:48 <clarkb> the other hygiene task is a change to do container image builds with docker hub names forced to resolve to ipv4 addrs via /etc/hosts
19:08:55 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/948247 Force docker hub access to happen over ipv4 for better rate limits.
19:09:17 <clarkb> we already force this on the system-config-run-* job side of thinsg to fetch the images via ipv4 when doing service tests
19:09:31 <clarkb> but we also fetch images from docker hub during the image build process and we've hit a few errors there recently due to rate limits
19:09:48 <clarkb> I think this is a good halfway step while we slowly move to quay for everything
19:10:26 <fungi> yeah, i wasn't comfortable single-core approving that too quickly, but if nobody else wants to look it over there's no need to delay further
19:10:41 <fungi> the sooner it merges, the fewer rechecks we'll need (in theory)
19:10:46 <clarkb> its also theoretically easy to remove later if we like
19:11:17 <clarkb> but ya maybe proceed with limnoria then that and see where we are aftwards in terms of reliablity
19:11:32 <clarkb> anything else container related? (I actually have a gerrit container item but I'll save that for the next topic since it is all about gerrit)
19:12:32 <clarkb> #topic Switching Gerrit to run on Review03
19:12:43 <clarkb> #link https://etherpad.opendev.org/p/i_vt63v18c3RKX2VyCs3 Notes on the migration plan
19:12:47 <clarkb> this is basically done at this point
19:13:02 <clarkb> we've been on the new server for just over a week now and the old server is shutdown
19:13:24 <clarkb> since then we've created new projects in gerrit and switched jeepyb over to python3.12 with a new container image (and update the stop signal to make podman happy)
19:13:37 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/882900 Migrate gerrit images to quay
19:13:52 <clarkb> at this point it should be fine for us to move our gerrit image over to quay and we'll maintain speculative testing
19:14:10 <clarkb> however, when we do that we'll rebuild the image so should plan to restart gerrit again. Which means this week may not be the best for that as I'm traveling all next week
19:14:31 <clarkb> I don't think this is urgent though and can pick it up when I am back. I just wanted to make everyone aware of that as part of some of the last followup to this server move
19:14:42 <clarkb> The other big todo remaining is deleting the review02 instance
19:15:11 <clarkb> this server is boot from volume and has a data volume. I figure we can preserve both the boot from volume disk and the data volume but delete the instance (as long as the bfv volume doesn't automatically delte I should check that)
19:15:27 <clarkb> any concerns with that cleanup? DO you think we should delete the bfv volume and/or the data volume too at this point?
19:15:49 <fungi> sounds fine to me
19:16:08 <fungi> we have backups
19:16:17 <fungi> though we could also snapshot them
19:16:32 <clarkb> ya and either way I figured we could delte the instance then delete the volumes later too
19:16:36 <clarkb> so more time to decide we don't need them
19:17:04 <fungi> in theory snapshots go to cheaper/slower storage so are less taxing than leaving r/w volumes
19:17:05 <clarkb> given that I'll try to do that tomorrow probably. Double check the bfv volume won't delete autoamtically, name that volume so we know what it is later, then delete the instance
19:17:18 <clarkb> and then as further followup we can snapshot then delete the volume
19:17:52 <clarkb> #link https://www.gerritcodereview.com/3.11.html
19:18:04 <clarkb> the last gerrit related item I have is for us to start thinking about 3.11 upgrades
19:18:45 <clarkb> Ideally we'd get that done before openstack is too deep itno the current release cycle
19:18:47 <clarkb> I think that is possible
19:19:15 <fungi> yeah, seems doable
19:19:19 <clarkb> we already test the upgrade and it seems to work on the surface. The big change is that you have to manage refs/meta/config through reviews now by default. We'll want to investigate how that impacts manage-pojrects if at all given our existing acls
19:19:31 <clarkb> I suspect that our existing acls will mean nothing changes for us and only new installs are affected
19:20:19 <clarkb> but again thats a when I get back item for me. Happy for others to dive in too. I'd probably start with holding a node and doing some manual upgrades on the test setup
19:20:34 <clarkb> #topic Upgrading old servers
19:20:47 <clarkb> I don't have any updtes since I did review03
19:20:51 <clarkb> did anyone else?
19:22:10 <fungi> nope, no word yet on refstack announcement plans
19:22:37 <clarkb> ackthanks
19:22:53 <clarkb> #topic Working through our TODO list
19:22:58 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup
19:23:10 <clarkb> just another friendly reminder that you can find a high level backlog here
19:23:21 <clarkb> happy to discuss any of these with volunteers for pciking upwork if more info is needed
19:24:03 <clarkb> #topic Rotating mailman 3 logs
19:24:07 <clarkb> fungi: any news on this one?
19:24:17 <fungi> pushing it now, thanks for the reminder
19:24:26 <fungi> #link https://review.opendev.org/c/opendev/system-config/+/948478 Rotate mailman-core logs [NEW]
19:24:37 <fungi> looks like it was simpler than i expected
19:25:08 <clarkb> fungi: do we want to hold a node and let it copytruncate at least once before landing taht?
19:25:34 <clarkb> I seem to recall there aws concern that copytruncate wasn't sufficient in all cases? Though it seems like it should be since the file handles never change for the running process
19:25:43 <clarkb> tahnk you for getting that up
19:25:46 <fungi> we can, though i'm not sure it would be an effective test
19:26:05 <clarkb> I think as long as the files have data in them it shoudl exercise it?
19:26:47 <clarkb> and then we can look at lsof to see any obviously leaked fds or something
19:26:58 <fungi> yeah, i guess we just need to confirm that the services keep running and write new loglines
19:27:03 <clarkb> yup
19:27:14 <fungi> copytruncate should avoid the risk of leaking fds
19:27:35 <fungi> since mailman isn't opening a new file
19:27:36 <clarkb> ya that was my undersatnding too but I remember someone in the upstream issues saying it didn't work. Maybe they were simply mistaken
19:28:02 <clarkb> also worst case we probably remove the logrotate config and then restart mailman so not a huge deal if we land it and it doesn't work as expected
19:28:23 <fungi> right
19:28:37 <fungi> easy enough to recover from
19:29:08 <clarkb> ++ ok I'm fine with proceeding then. I'm also needing to check if *.log works there but I can do that when I review it properly
19:29:19 <clarkb> #topic Renewing wiki's cert
19:29:32 <clarkb> The cert expires while I'm traveling so my goal is to replace it this week
19:29:44 <clarkb> in fact I think that is a good task to do while waiting on limnoria updates later today
19:29:55 <clarkb> so I'll try to get that moving today. If it isn't done by friday say something place
19:29:59 <clarkb> *say something please
19:30:03 <clarkb> as I want to ensure it is done this week
19:30:36 <clarkb> I'll buy a one year cert. Then we may have to renew early next year as I think march 2026 ish is when max cert validity starts to fall below a year
19:30:53 <clarkb> but these certs are cheap enough that I'm fine with that. We lose like $1-$2 worht of cert validity
19:31:26 <fungi> sounds great. terrible but great
19:31:31 <fungi> greatly terrible
19:31:50 <clarkb> #topic Occasional Log Upload Failures to OVH
19:32:10 <clarkb> I've noticed a couple mornings the last week or so t hat we had very infrequent POST_FAILURE results due to ovh log uploads
19:32:20 <clarkb> both times I noticed this the blip was short and not widespread
19:32:28 <clarkb> so we never disabled that backend
19:32:43 <clarkb> I want to make note of it so that others can keep an eye out and we can debug further if things get worse
19:33:10 <clarkb> and if it does get worse we can always remove that provider
19:33:22 <clarkb> #topic Open Discussion
19:33:39 <clarkb> tonyb started https://review.opendev.org/c/openstack/project-config/+/948033 to discuss hosting rdo in opendev
19:34:07 <clarkb> probably worth reading over if you haven't yet just to call out any concerns. I noted a few things but I think they are all solveable and not blockers
19:35:37 <clarkb> also sean-k-mooney discovered that paste doesn't have utf8 4 byte support in mariadb
19:35:56 <clarkb> we may need to update lodgeit to support 4 byte in the first place then do a db migration (possibly manually)
19:36:12 <clarkb> I think going 3 byte -> 4 byte is a straightforward migration as the db just needs to allocate more disk space and there is no data loss
19:36:31 <fungi> yes
19:36:49 <clarkb> Anything else?
19:37:14 <fungi> i left a comment on the mailman log rotation change with pointers about file globbing
19:37:34 <fungi> spoiler: should be fine
19:38:00 <fungi> unless ansible wants that string quoted or something
19:38:11 <clarkb> ya my main concern is the ansible role supporting it
19:38:12 <fungi> but tests will tell us
19:38:27 <clarkb> but I think ianw fixed the issues with it
19:38:42 <clarkb> previously we used the filename as the name for the logrotate config file but now we hash them iirc
19:38:44 <fungi> oh, i hadn't realized it created problems in the past
19:38:58 <clarkb> ya because we'd get /etc/logrotate.d/*.conf
19:39:14 <fungi> got it, that's why we have e.g. /etc/logrotate.d/854d0b.conf on our servers
19:39:15 <clarkb> but now its something like $(echo '*' | sha256sum).conf
19:39:24 <clarkb> I just wanted to double check before I +2'd
19:39:46 <fungi> cool, please do!
19:40:27 <clarkb> oh yup the role readme even says it may be a wildcard so I think its fine
19:40:31 <clarkb> I just have memories of when it wasn't
19:41:03 <fungi> huh, actually we already have logrotate configuration for /var/lib/mailman/web-data/logs/*.log in /etc/logrotate.d/8e2e5c.conf
19:41:38 <clarkb> ya I think maybe we didn't realize how many log files mm3 has
19:41:40 <clarkb> ?
19:41:56 <fungi> yeah, fixing the change now
19:42:55 <clarkb> anything else for the meeting?
19:43:36 <clarkb> I think we can end a bit early otherwise. Thanks for your time helping keep opendev up and running everyone
19:44:39 <fungi> i've got nothing else
19:44:47 <fungi> log rotation change updated though
19:44:54 <clarkb> yup I'll look again
19:44:58 <clarkb> and then I'm going to eat lunch
19:45:28 <fungi> thanks clarkb! time to cook dinner, then can approve limnoria container change
19:45:37 <clarkb> #endmeeting