Tuesday, 2025-04-29

clarkbcouple of minutes to our weekly meeting18:58
clarkbI expect it may be lightly attended today. WHich is fine. I'll run through the agenda and see what happens18:58
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Apr 29 19:00:31 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/BQPBY4QVBYNC3VTOU3HXAUTESQSC7WKZ/ Our Agenda19:00
clarkb#topic Announcements19:00
clarkbDue to travel obligations neither fungi or I can attend the meeting next week. For that reason we basically decided last week to cancel the next meeting on May 619:01
clarkbI'm going ahead and making that official now. The May 6 meeting will be cancelled. We'll be back the week after. See you there19:01
clarkbAnything else to announce before we dive into today's content?19:01
clarkb#topic Zuul-launcher image builds19:03
clarkbmnasiadka volunteered to add some arm64 images to zuul launcher. That work is in progress and i think the latest change merged earlier today19:03
clarkbSo far you should be able to use noble arm64 nodes from zuul launcher and jammy is in progress19:04
clarkbthis is great to see as it proves we can continue to do multi arch images with zuul launcher and getting help from interested parties is a huge bonus19:04
clarkbI think the disk io improvements osuosl made semi recently help keep the image build times reasonable too. A group effort all around19:04
clarkbI think the next steps there are to continue to add images and then also start dogfooding them19:05
clarkb#link https://review.opendev.org/c/opendev/zuul-providers/+/948318/ is next up19:05
clarkbDid anyone else have nodepool in zuul updates?19:05
clarkb#topic Container hygiene tasks19:06
clarkbNext up we've managed to update all the images to python3.12 except irbot/limnoria19:06
clarkb#link https://review.opendev.org/q/topic:%22opendev-python3.12%22+status:open Update images to use python3.1219:06
clarkbI'll continue to link the topic link rather than a specific change in case anyone else finds cases that were missed. If you push changes with the topic we'll get them automatically on the review list that way19:07
clarkbotherwise things have gone smoothly. Even jeepyb on the gerrit imagei s runnung under python3.12 now19:07
clarkbfungi: any specific thoughts on when we should update limnoria?19:07
fungilater today?19:07
clarkbI'll be around so that should work for me19:08
fungii'm happy to babysit that deploy19:08
clarkbcool19:08
fungishould be a quiet time so won't disrupt meetings19:08
clarkbthe other hygiene task is a change to do container image builds with docker hub names forced to resolve to ipv4 addrs via /etc/hosts19:08
clarkb#link https://review.opendev.org/c/opendev/system-config/+/948247 Force docker hub access to happen over ipv4 for better rate limits.19:08
clarkbwe already force this on the system-config-run-* job side of thinsg to fetch the images via ipv4 when doing service tests19:09
clarkbbut we also fetch images from docker hub during the image build process and we've hit a few errors there recently due to rate limits19:09
clarkbI think this is a good halfway step while we slowly move to quay for everything19:09
fungiyeah, i wasn't comfortable single-core approving that too quickly, but if nobody else wants to look it over there's no need to delay further19:10
fungithe sooner it merges, the fewer rechecks we'll need (in theory)19:10
clarkbits also theoretically easy to remove later if we like19:10
clarkbbut ya maybe proceed with limnoria then that and see where we are aftwards in terms of reliablity19:11
clarkbanything else container related? (I actually have a gerrit container item but I'll save that for the next topic since it is all about gerrit)19:11
clarkb#topic Switching Gerrit to run on Review0319:12
clarkb#link https://etherpad.opendev.org/p/i_vt63v18c3RKX2VyCs3 Notes on the migration plan19:12
clarkbthis is basically done at this point19:12
clarkbwe've been on the new server for just over a week now and the old server is shutdown19:13
clarkbsince then we've created new projects in gerrit and switched jeepyb over to python3.12 with a new container image (and update the stop signal to make podman happy)19:13
clarkb#link https://review.opendev.org/c/opendev/system-config/+/882900 Migrate gerrit images to quay19:13
clarkbat this point it should be fine for us to move our gerrit image over to quay and we'll maintain speculative testing19:13
clarkbhowever, when we do that we'll rebuild the image so should plan to restart gerrit again. Which means this week may not be the best for that as I'm traveling all next week19:14
clarkbI don't think this is urgent though and can pick it up when I am back. I just wanted to make everyone aware of that as part of some of the last followup to this server move19:14
clarkbThe other big todo remaining is deleting the review02 instance19:14
clarkbthis server is boot from volume and has a data volume. I figure we can preserve both the boot from volume disk and the data volume but delete the instance (as long as the bfv volume doesn't automatically delte I should check that)19:15
clarkbany concerns with that cleanup? DO you think we should delete the bfv volume and/or the data volume too at this point?19:15
fungisounds fine to me19:15
fungiwe have backups19:16
fungithough we could also snapshot them19:16
clarkbya and either way I figured we could delte the instance then delete the volumes later too19:16
clarkbso more time to decide we don't need them19:16
fungiin theory snapshots go to cheaper/slower storage so are less taxing than leaving r/w volumes19:17
clarkbgiven that I'll try to do that tomorrow probably. Double check the bfv volume won't delete autoamtically, name that volume so we know what it is later, then delete the instance19:17
clarkband then as further followup we can snapshot then delete the volume19:17
clarkb#link https://www.gerritcodereview.com/3.11.html19:17
clarkbthe last gerrit related item I have is for us to start thinking about 3.11 upgrades19:18
clarkbIdeally we'd get that done before openstack is too deep itno the current release cycle19:18
clarkbI think that is possible19:18
fungiyeah, seems doable19:19
clarkbwe already test the upgrade and it seems to work on the surface. The big change is that you have to manage refs/meta/config through reviews now by default. We'll want to investigate how that impacts manage-pojrects if at all given our existing acls19:19
clarkbI suspect that our existing acls will mean nothing changes for us and only new installs are affected19:19
clarkbbut again thats a when I get back item for me. Happy for others to dive in too. I'd probably start with holding a node and doing some manual upgrades on the test setup19:20
clarkb#topic Upgrading old servers19:20
clarkbI don't have any updtes since I did review0319:20
clarkbdid anyone else?19:20
funginope, no word yet on refstack announcement plans19:22
clarkbackthanks19:22
clarkb#topic Working through our TODO list19:22
clarkb#link https://etherpad.opendev.org/p/opendev-january-2025-meetup19:22
clarkbjust another friendly reminder that you can find a high level backlog here19:23
clarkbhappy to discuss any of these with volunteers for pciking upwork if more info is needed19:23
clarkb#topic Rotating mailman 3 logs19:24
clarkbfungi: any news on this one?19:24
fungipushing it now, thanks for the reminder19:24
fungi#link https://review.opendev.org/c/opendev/system-config/+/948478 Rotate mailman-core logs [NEW]19:24
fungilooks like it was simpler than i expected19:24
clarkbfungi: do we want to hold a node and let it copytruncate at least once before landing taht?19:25
clarkbI seem to recall there aws concern that copytruncate wasn't sufficient in all cases? Though it seems like it should be since the file handles never change for the running process19:25
clarkbtahnk you for getting that up19:25
fungiwe can, though i'm not sure it would be an effective test19:25
clarkbI think as long as the files have data in them it shoudl exercise it?19:26
clarkband then we can look at lsof to see any obviously leaked fds or something19:26
fungiyeah, i guess we just need to confirm that the services keep running and write new loglines19:26
clarkbyup19:27
fungicopytruncate should avoid the risk of leaking fds19:27
fungisince mailman isn't opening a new file 19:27
clarkbya that was my undersatnding too but I remember someone in the upstream issues saying it didn't work. Maybe they were simply mistaken19:27
clarkbalso worst case we probably remove the logrotate config and then restart mailman so not a huge deal if we land it and it doesn't work as expected19:28
fungiright19:28
fungieasy enough to recover from19:28
clarkb++ ok I'm fine with proceeding then. I'm also needing to check if *.log works there but I can do that when I review it properly19:29
clarkb#topic Renewing wiki's cert19:29
clarkbThe cert expires while I'm traveling so my goal is to replace it this week19:29
clarkbin fact I think that is a good task to do while waiting on limnoria updates later today19:29
clarkbso I'll try to get that moving today. If it isn't done by friday say something place19:29
clarkb*say something please19:29
clarkbas I want to ensure it is done this week19:30
clarkbI'll buy a one year cert. Then we may have to renew early next year as I think march 2026 ish is when max cert validity starts to fall below a year19:30
clarkbbut these certs are cheap enough that I'm fine with that. We lose like $1-$2 worht of cert validity19:30
fungisounds great. terrible but great19:31
fungigreatly terrible19:31
clarkb#topic Occasional Log Upload Failures to OVH19:31
clarkbI've noticed a couple mornings the last week or so t hat we had very infrequent POST_FAILURE results due to ovh log uploads19:32
clarkbboth times I noticed this the blip was short and not widespread19:32
clarkbso we never disabled that backend19:32
clarkbI want to make note of it so that others can keep an eye out and we can debug further if things get worse19:32
clarkband if it does get worse we can always remove that provider19:33
clarkb#topic Open Discussion19:33
clarkbtonyb started https://review.opendev.org/c/openstack/project-config/+/948033 to discuss hosting rdo in opendev19:33
clarkbprobably worth reading over if you haven't yet just to call out any concerns. I noted a few things but I think they are all solveable and not blockers19:34
clarkbalso sean-k-mooney discovered that paste doesn't have utf8 4 byte support in mariadb19:35
clarkbwe may need to update lodgeit to support 4 byte in the first place then do a db migration (possibly manually)19:35
clarkbI think going 3 byte -> 4 byte is a straightforward migration as the db just needs to allocate more disk space and there is no data loss19:36
fungiyes19:36
clarkbAnything else?19:36
fungii left a comment on the mailman log rotation change with pointers about file globbing19:37
fungispoiler: should be fine19:37
fungiunless ansible wants that string quoted or something19:38
clarkbya my main concern is the ansible role supporting it19:38
fungibut tests will tell us19:38
clarkbbut I think ianw fixed the issues with it19:38
clarkbpreviously we used the filename as the name for the logrotate config file but now we hash them iirc19:38
fungioh, i hadn't realized it created problems in the past19:38
clarkbya because we'd get /etc/logrotate.d/*.conf19:38
fungigot it, that's why we have e.g. /etc/logrotate.d/854d0b.conf on our servers19:39
clarkbbut now its something like $(echo '*' | sha256sum).conf19:39
clarkbI just wanted to double check before I +2'd19:39
fungicool, please do!19:39
clarkboh yup the role readme even says it may be a wildcard so I think its fine19:40
clarkbI just have memories of when it wasn't19:40
fungihuh, actually we already have logrotate configuration for /var/lib/mailman/web-data/logs/*.log in /etc/logrotate.d/8e2e5c.conf19:41
clarkbya I think maybe we didn't realize how many log files mm3 has19:41
clarkb?19:41
fungiyeah, fixing the change now19:41
clarkbanything else for the meeting?19:42
clarkbI think we can end a bit early otherwise. Thanks for your time helping keep opendev up and running everyone19:43
fungii've got nothing else19:44
fungilog rotation change updated though19:44
clarkbyup I'll look again19:44
clarkband then I'm going to eat lunch19:44
fungithanks clarkb! time to cook dinner, then can approve limnoria container change19:45
clarkb#endmeeting19:45
opendevmeetMeeting ended Tue Apr 29 19:45:37 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:45
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2025/infra.2025-04-29-19.00.html19:45
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-04-29-19.00.txt19:45
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2025/infra.2025-04-29-19.00.log.html19:45

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!