19:00:31 <clarkb> #startmeeting infra 19:00:31 <opendevmeet> Meeting started Tue Apr 29 19:00:31 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:31 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:31 <opendevmeet> The meeting name has been set to 'infra' 19:00:46 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/BQPBY4QVBYNC3VTOU3HXAUTESQSC7WKZ/ Our Agenda 19:00:52 <clarkb> #topic Announcements 19:01:15 <clarkb> Due to travel obligations neither fungi or I can attend the meeting next week. For that reason we basically decided last week to cancel the next meeting on May 6 19:01:31 <clarkb> I'm going ahead and making that official now. The May 6 meeting will be cancelled. We'll be back the week after. See you there 19:01:43 <clarkb> Anything else to announce before we dive into today's content? 19:03:02 <clarkb> #topic Zuul-launcher image builds 19:03:25 <clarkb> mnasiadka volunteered to add some arm64 images to zuul launcher. That work is in progress and i think the latest change merged earlier today 19:04:04 <clarkb> So far you should be able to use noble arm64 nodes from zuul launcher and jammy is in progress 19:04:31 <clarkb> this is great to see as it proves we can continue to do multi arch images with zuul launcher and getting help from interested parties is a huge bonus 19:04:51 <clarkb> I think the disk io improvements osuosl made semi recently help keep the image build times reasonable too. A group effort all around 19:05:10 <clarkb> I think the next steps there are to continue to add images and then also start dogfooding them 19:05:21 <clarkb> #link https://review.opendev.org/c/opendev/zuul-providers/+/948318/ is next up 19:05:33 <clarkb> Did anyone else have nodepool in zuul updates? 19:06:38 <clarkb> #topic Container hygiene tasks 19:06:53 <clarkb> Next up we've managed to update all the images to python3.12 except irbot/limnoria 19:06:59 <clarkb> #link https://review.opendev.org/q/topic:%22opendev-python3.12%22+status:open Update images to use python3.12 19:07:25 <clarkb> I'll continue to link the topic link rather than a specific change in case anyone else finds cases that were missed. If you push changes with the topic we'll get them automatically on the review list that way 19:07:38 <clarkb> otherwise things have gone smoothly. Even jeepyb on the gerrit imagei s runnung under python3.12 now 19:07:46 <clarkb> fungi: any specific thoughts on when we should update limnoria? 19:07:57 <fungi> later today? 19:08:14 <clarkb> I'll be around so that should work for me 19:08:14 <fungi> i'm happy to babysit that deploy 19:08:19 <clarkb> cool 19:08:39 <fungi> should be a quiet time so won't disrupt meetings 19:08:48 <clarkb> the other hygiene task is a change to do container image builds with docker hub names forced to resolve to ipv4 addrs via /etc/hosts 19:08:55 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/948247 Force docker hub access to happen over ipv4 for better rate limits. 19:09:17 <clarkb> we already force this on the system-config-run-* job side of thinsg to fetch the images via ipv4 when doing service tests 19:09:31 <clarkb> but we also fetch images from docker hub during the image build process and we've hit a few errors there recently due to rate limits 19:09:48 <clarkb> I think this is a good halfway step while we slowly move to quay for everything 19:10:26 <fungi> yeah, i wasn't comfortable single-core approving that too quickly, but if nobody else wants to look it over there's no need to delay further 19:10:41 <fungi> the sooner it merges, the fewer rechecks we'll need (in theory) 19:10:46 <clarkb> its also theoretically easy to remove later if we like 19:11:17 <clarkb> but ya maybe proceed with limnoria then that and see where we are aftwards in terms of reliablity 19:11:32 <clarkb> anything else container related? (I actually have a gerrit container item but I'll save that for the next topic since it is all about gerrit) 19:12:32 <clarkb> #topic Switching Gerrit to run on Review03 19:12:43 <clarkb> #link https://etherpad.opendev.org/p/i_vt63v18c3RKX2VyCs3 Notes on the migration plan 19:12:47 <clarkb> this is basically done at this point 19:13:02 <clarkb> we've been on the new server for just over a week now and the old server is shutdown 19:13:24 <clarkb> since then we've created new projects in gerrit and switched jeepyb over to python3.12 with a new container image (and update the stop signal to make podman happy) 19:13:37 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/882900 Migrate gerrit images to quay 19:13:52 <clarkb> at this point it should be fine for us to move our gerrit image over to quay and we'll maintain speculative testing 19:14:10 <clarkb> however, when we do that we'll rebuild the image so should plan to restart gerrit again. Which means this week may not be the best for that as I'm traveling all next week 19:14:31 <clarkb> I don't think this is urgent though and can pick it up when I am back. I just wanted to make everyone aware of that as part of some of the last followup to this server move 19:14:42 <clarkb> The other big todo remaining is deleting the review02 instance 19:15:11 <clarkb> this server is boot from volume and has a data volume. I figure we can preserve both the boot from volume disk and the data volume but delete the instance (as long as the bfv volume doesn't automatically delte I should check that) 19:15:27 <clarkb> any concerns with that cleanup? DO you think we should delete the bfv volume and/or the data volume too at this point? 19:15:49 <fungi> sounds fine to me 19:16:08 <fungi> we have backups 19:16:17 <fungi> though we could also snapshot them 19:16:32 <clarkb> ya and either way I figured we could delte the instance then delete the volumes later too 19:16:36 <clarkb> so more time to decide we don't need them 19:17:04 <fungi> in theory snapshots go to cheaper/slower storage so are less taxing than leaving r/w volumes 19:17:05 <clarkb> given that I'll try to do that tomorrow probably. Double check the bfv volume won't delete autoamtically, name that volume so we know what it is later, then delete the instance 19:17:18 <clarkb> and then as further followup we can snapshot then delete the volume 19:17:52 <clarkb> #link https://www.gerritcodereview.com/3.11.html 19:18:04 <clarkb> the last gerrit related item I have is for us to start thinking about 3.11 upgrades 19:18:45 <clarkb> Ideally we'd get that done before openstack is too deep itno the current release cycle 19:18:47 <clarkb> I think that is possible 19:19:15 <fungi> yeah, seems doable 19:19:19 <clarkb> we already test the upgrade and it seems to work on the surface. The big change is that you have to manage refs/meta/config through reviews now by default. We'll want to investigate how that impacts manage-pojrects if at all given our existing acls 19:19:31 <clarkb> I suspect that our existing acls will mean nothing changes for us and only new installs are affected 19:20:19 <clarkb> but again thats a when I get back item for me. Happy for others to dive in too. I'd probably start with holding a node and doing some manual upgrades on the test setup 19:20:34 <clarkb> #topic Upgrading old servers 19:20:47 <clarkb> I don't have any updtes since I did review03 19:20:51 <clarkb> did anyone else? 19:22:10 <fungi> nope, no word yet on refstack announcement plans 19:22:37 <clarkb> ackthanks 19:22:53 <clarkb> #topic Working through our TODO list 19:22:58 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup 19:23:10 <clarkb> just another friendly reminder that you can find a high level backlog here 19:23:21 <clarkb> happy to discuss any of these with volunteers for pciking upwork if more info is needed 19:24:03 <clarkb> #topic Rotating mailman 3 logs 19:24:07 <clarkb> fungi: any news on this one? 19:24:17 <fungi> pushing it now, thanks for the reminder 19:24:26 <fungi> #link https://review.opendev.org/c/opendev/system-config/+/948478 Rotate mailman-core logs [NEW] 19:24:37 <fungi> looks like it was simpler than i expected 19:25:08 <clarkb> fungi: do we want to hold a node and let it copytruncate at least once before landing taht? 19:25:34 <clarkb> I seem to recall there aws concern that copytruncate wasn't sufficient in all cases? Though it seems like it should be since the file handles never change for the running process 19:25:43 <clarkb> tahnk you for getting that up 19:25:46 <fungi> we can, though i'm not sure it would be an effective test 19:26:05 <clarkb> I think as long as the files have data in them it shoudl exercise it? 19:26:47 <clarkb> and then we can look at lsof to see any obviously leaked fds or something 19:26:58 <fungi> yeah, i guess we just need to confirm that the services keep running and write new loglines 19:27:03 <clarkb> yup 19:27:14 <fungi> copytruncate should avoid the risk of leaking fds 19:27:35 <fungi> since mailman isn't opening a new file 19:27:36 <clarkb> ya that was my undersatnding too but I remember someone in the upstream issues saying it didn't work. Maybe they were simply mistaken 19:28:02 <clarkb> also worst case we probably remove the logrotate config and then restart mailman so not a huge deal if we land it and it doesn't work as expected 19:28:23 <fungi> right 19:28:37 <fungi> easy enough to recover from 19:29:08 <clarkb> ++ ok I'm fine with proceeding then. I'm also needing to check if *.log works there but I can do that when I review it properly 19:29:19 <clarkb> #topic Renewing wiki's cert 19:29:32 <clarkb> The cert expires while I'm traveling so my goal is to replace it this week 19:29:44 <clarkb> in fact I think that is a good task to do while waiting on limnoria updates later today 19:29:55 <clarkb> so I'll try to get that moving today. If it isn't done by friday say something place 19:29:59 <clarkb> *say something please 19:30:03 <clarkb> as I want to ensure it is done this week 19:30:36 <clarkb> I'll buy a one year cert. Then we may have to renew early next year as I think march 2026 ish is when max cert validity starts to fall below a year 19:30:53 <clarkb> but these certs are cheap enough that I'm fine with that. We lose like $1-$2 worht of cert validity 19:31:26 <fungi> sounds great. terrible but great 19:31:31 <fungi> greatly terrible 19:31:50 <clarkb> #topic Occasional Log Upload Failures to OVH 19:32:10 <clarkb> I've noticed a couple mornings the last week or so t hat we had very infrequent POST_FAILURE results due to ovh log uploads 19:32:20 <clarkb> both times I noticed this the blip was short and not widespread 19:32:28 <clarkb> so we never disabled that backend 19:32:43 <clarkb> I want to make note of it so that others can keep an eye out and we can debug further if things get worse 19:33:10 <clarkb> and if it does get worse we can always remove that provider 19:33:22 <clarkb> #topic Open Discussion 19:33:39 <clarkb> tonyb started https://review.opendev.org/c/openstack/project-config/+/948033 to discuss hosting rdo in opendev 19:34:07 <clarkb> probably worth reading over if you haven't yet just to call out any concerns. I noted a few things but I think they are all solveable and not blockers 19:35:37 <clarkb> also sean-k-mooney discovered that paste doesn't have utf8 4 byte support in mariadb 19:35:56 <clarkb> we may need to update lodgeit to support 4 byte in the first place then do a db migration (possibly manually) 19:36:12 <clarkb> I think going 3 byte -> 4 byte is a straightforward migration as the db just needs to allocate more disk space and there is no data loss 19:36:31 <fungi> yes 19:36:49 <clarkb> Anything else? 19:37:14 <fungi> i left a comment on the mailman log rotation change with pointers about file globbing 19:37:34 <fungi> spoiler: should be fine 19:38:00 <fungi> unless ansible wants that string quoted or something 19:38:11 <clarkb> ya my main concern is the ansible role supporting it 19:38:12 <fungi> but tests will tell us 19:38:27 <clarkb> but I think ianw fixed the issues with it 19:38:42 <clarkb> previously we used the filename as the name for the logrotate config file but now we hash them iirc 19:38:44 <fungi> oh, i hadn't realized it created problems in the past 19:38:58 <clarkb> ya because we'd get /etc/logrotate.d/*.conf 19:39:14 <fungi> got it, that's why we have e.g. /etc/logrotate.d/854d0b.conf on our servers 19:39:15 <clarkb> but now its something like $(echo '*' | sha256sum).conf 19:39:24 <clarkb> I just wanted to double check before I +2'd 19:39:46 <fungi> cool, please do! 19:40:27 <clarkb> oh yup the role readme even says it may be a wildcard so I think its fine 19:40:31 <clarkb> I just have memories of when it wasn't 19:41:03 <fungi> huh, actually we already have logrotate configuration for /var/lib/mailman/web-data/logs/*.log in /etc/logrotate.d/8e2e5c.conf 19:41:38 <clarkb> ya I think maybe we didn't realize how many log files mm3 has 19:41:40 <clarkb> ? 19:41:56 <fungi> yeah, fixing the change now 19:42:55 <clarkb> anything else for the meeting? 19:43:36 <clarkb> I think we can end a bit early otherwise. Thanks for your time helping keep opendev up and running everyone 19:44:39 <fungi> i've got nothing else 19:44:47 <fungi> log rotation change updated though 19:44:54 <clarkb> yup I'll look again 19:44:58 <clarkb> and then I'm going to eat lunch 19:45:28 <fungi> thanks clarkb! time to cook dinner, then can approve limnoria container change 19:45:37 <clarkb> #endmeeting