#opendev-meeting log

19:00:47 <clarkb> #startmeeting infra
19:00:47 <opendevmeet> Meeting started Tue Mar 18 19:00:47 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:47 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:47 <opendevmeet> The meeting name has been set to 'infra'
19:00:55 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YM5MZF2IHG6P4FFTRVMNVLJHYOIBVFUD/ Our Agenda
19:02:30 <clarkb> #topic Announcements
19:02:34 <clarkb> Anything to announce?
19:02:58 <clarkb> I dno't have any concrete plans but next week the kids are home from school for a break and I may try to do an easy day or two and get out of the house with them
19:05:04 <tonyb> FWIW, next week 25th I head back to AU
19:05:42 <clarkb> ack
19:05:44 <clarkb> #topic Zuul-launcher image builds
19:05:51 <clarkb> I'm not aware of any new updates on this subject
19:06:07 <clarkb> though if corvus is around I'd be curious to know if the quota handling has improved the reliability
19:06:35 <fungi> also i've still got a change up to add flex dfw3 images/quota
19:06:49 <clarkb> oh do you have a link to that?
19:06:53 <fungi> #link https://review.opendev.org/943104 Add the DFW3 region for Rackspace Flex
19:07:00 <clarkb> thanks!
19:07:17 <clarkb> I'll review that after the meeting
19:08:01 <clarkb> #topic Updating Flavors in OVH
19:08:11 <clarkb> The other related item was the OVH flavor update process
19:08:15 <clarkb> #link https://etherpad.opendev.org/p/ovh-flavors
19:08:41 <clarkb> we proposed starting that yesterday but they came back and said that timing didn't work for them. They will be in touch with timing that does work at some point but I haven't seen any proposed dates from their end
19:09:04 <clarkb> we can probably pull this off of the agenda next week. But I wanted to make sure we called out it wasn't happening and we're waiting on timing from ovh
19:09:45 <clarkb> #topic Container hygiene tasks
19:09:51 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/944799 drop python 3.10 image builds
19:10:04 <clarkb> according to codesearch we don't have anything using python 3.10 iamges anymore
19:10:15 <clarkb> I think this is a safe change to land at any time
19:10:24 <clarkb> next up is rebuilding the images we do use
19:10:30 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/944789 rebuild python 3.11 and 3.12 base images
19:10:45 <clarkb> unfortunately this change has been in uWSGI purgatory
19:11:02 <clarkb> it seems that compiling current uwsgi on top of current bookworm on aarch64 segfaults
19:11:25 <clarkb> uwsgi is only barely maintained anymore so I've begun looking at alternatives (which is the next topic)
19:11:55 <clarkb> I don't think we can safely udpate python-builder and python-base without also updating uwsgi-base as the wheel builds for things will see mismatched binary packages and we could break things relying on uwsgi today
19:12:25 <clarkb> so I think our main options here are either to drop uwsgi, fix uwsgi builds somehow, or retry until we get lucky? (they failures don't seem to be 100% consistent)
19:12:44 <clarkb> maybe it is worth trying to enqueue to the gate a few times to see if we can get it to go through
19:13:09 <clarkb> then finally I've got changes up to move images from python 3.11 to 3.12
19:13:11 <clarkb> #link https://review.opendev.org/q/topic:%22opendev-python3.12%22+status:open Update images to use python3.12
19:13:24 <frickler> I didn't look at details yet, but could we use distro pkgs like devstack does?
19:13:42 <clarkb> while those changes don't strictly depend on the new image updates it might be nice to use these rebuilds to also get new base images so we address two things with one set of service restarts
19:14:06 <clarkb> frickler: the container images are based on debian bookworm. If there is a bookworm uwsgi package then yes I think that is possible
19:14:30 <clarkb> we would add uwsgi to lodgeit's bindep requirements and then switch the base container image from uwsgi-base to python-base
19:14:31 <fungi> #link https://packages.debian.org/bookworm/uwsgi
19:14:58 <clarkb> this will likely downgrade uwsgi for us but we aren't doing anything too crazy with it so thats probably fine (also uwsgi itself isn't super stateful)
19:15:08 <frickler> I think that would likely only work with distro python. so it might work for py3.11 then and we'd have to use noble for py3.12
19:15:31 <clarkb> actually it wouldn't work for either if that is the cas
19:15:39 <clarkb> since both 3.11 and 3.12 are custom builds on top of debian
19:16:18 <clarkb> another option would be to try and pin uwsgi versions to something older than latest to see if we can get that to build more reliably
19:16:30 <frickler> and we do need custom python builds?
19:16:55 <clarkb> it dramatically simplifies supporting a range of pythons
19:17:13 <clarkb> since we can use the upstream python container images and not need to have different platforms for different python versions or even wait for new distro releases
19:17:51 <clarkb> I don't think uwsgi is important enough to warrant dramatically changing how we run python services
19:18:02 <clarkb> it is basically unmaintained and can't be built on modern platforms reliably
19:18:29 <clarkb> if we can get by with simple fixes to uwsgi that is fine. But I would rather invest my time in replacing uwsgi than completely rearchitecting our container setup
19:18:47 <frickler> ack
19:19:21 <clarkb> which is maybe a good segue into the next topic. Reviwes on the above changes are very much appreciated too
19:19:26 <clarkb> #topic Dropping uWSGI
19:19:32 * corvus arrives late
19:20:19 <clarkb> there are a number of factors have me thinking uWSGI is no longer a viable option for python web servers. First is that the project is minimally maintained and bug fixes are slow and build errors are common. Next is the python world is becoming more async which often means ASGI instead of WSGI
19:20:46 <clarkb> rather than try and keep the old barely working thing going I suspect a better use of our time is looking forward and picking a modern tool that is maintained and can support asgi and wsgi
19:21:34 <clarkb> in my investigation I discovered granian which seems to fit this criteria. However, frickler and others have pointed out granian is maintained laregly by one person (there are a number of contributors but the vast majority of work is one person) and there is no distro packaging for it
19:21:51 <clarkb> I think that the lack of distro packaging is less meaningful for us as we install from pypi and they publish x86 and arm wheels
19:22:12 <clarkb> #link https://review.opendev.org/c/opendev/lodgeit/+/944805 lodgeit granian container update
19:22:23 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/944806 lodgeit system-config deployment update
19:22:33 <corvus> we only use uwsgi for lodgeit, right?
19:22:58 <clarkb> corvus: the uwsgi-base container image is lodgeit only. mailman3 also uses uwsgi but not via the base image
19:23:19 <clarkb> anyway I'm open to other alternatives we think might be more appropriate, but I do think we should try and stop using uwsgi
19:24:06 <corvus> my feeling is -- for something like lodgeit -- whatever works and requires the least time.  if granian works, that's good enough for me; i'd be in favor of adding a granian image and declaring uwsgi unmaintained by us
19:24:14 <fungi> in the mm3 case uwsgi is part of upstream images we're reusing (well, rebuilding), right?
19:24:23 <clarkb> fungi: yes
19:24:52 <fungi> in which case we can probably wait for or help with upstream's decisions on whether to continue relying on it
19:25:21 <clarkb> ya I think being in sync with them and possibly starting a conversation with them about alternatives is a good thing and also less urgent
19:25:58 <clarkb> also I think our needs differ from openstack's needs. While it might be more important for openstack to use distro packaged wsgi servers that isn't as important for us
19:26:02 <frickler> can we simply re-use the mm3 container and install lodgeit in there? or is that setup too different?
19:26:36 <clarkb> I'm not sure. But it is fairly different (uses alpine instead of debian, and has django installed)
19:26:40 <fungi> https://github.com/maxking/docker-mailman is the mailman containers. they're arranged a bit differently from how we usually do our container images
19:26:55 <tonyb> I think that granian is a good step forward and the single maintainer is better that no maintainer.  The switch to granian looks pretty nice
19:27:35 <clarkb> gunicorn was also mentioend in discussion in the tc channel. But gunicorn doesn't do asgi. uvicorn was mentioned but that is a separate code base aiui and doesn't do wsgi
19:28:12 <clarkb> anyway I noted on the change why I thought granian was a good option and I think being able to switch to gunicorn or similar later means this si fairly low risk
19:28:27 <tonyb> ++
19:29:02 <frickler> iiuc our uwsgi build issues were on aarch64, too? https://github.com/maxking/docker-mailman/blob/main/web/Dockerfile#L20-L21
19:29:17 <clarkb> frickler: aha thats a good find
19:29:26 <clarkb> I can pin our uwsgi image to that version and see if it works
19:29:35 <clarkb> and then we can migrate away from uwsgi on a less urgent schedule
19:29:48 <fungi> corvus summed up my position quite nicely, if it works with minimal effort, then great, lodgeit hopefully shouldn't eat a ton of our admin time
19:30:32 <clarkb> ya so I think plan is try that workaround from mm3, if that works land it. Concurrently we can make a plan to migrate lodgeit (as written the chagnes I have require coordination between container and system-cofnig so we may have an outage)
19:30:44 <frickler> ack, I wasn't aware that this is "only" for lodgeit, I'd withdraw my -1 then
19:30:50 <clarkb> if the workaround doesn't work we can speed up the granian switch. If the workaround works then we can be a bit more cautious and double check for alternatives first
19:30:58 <clarkb> frickler: oh cool
19:31:04 <clarkb> that gives me a path forward I'll work on that
19:31:11 <clarkb> thanks for listening to me on this subject
19:31:12 <fungi> and yeah, i did a search for uwsgi in the open/closed issues and pull requests for the mailman containers and saw they'd been doing work recently to get it working for aarch64
19:31:25 <fungi> so potentially related
19:32:29 <frickler> ah, yes, git blame shows this https://github.com/maxking/docker-mailman/pull/743, so rather recent
19:33:19 <clarkb> #topic Upgrading old servers
19:33:33 <clarkb> I should probably just merge this and the sprint idea topics together
19:33:56 <tonyb> clarkb: Yeah it seems to be basically the same topic at this point
19:34:12 <clarkb> I built nb05 and nb06 yesterday and got them deployed
19:34:37 <clarkb> they have built every image but bookworm, centos 9 stream, gentoo, and openeuler. The last two are paused and don't build. Just before the meeting I requested rebuilds of the first two
19:34:55 <clarkb> everything is looking ok to me so far. I'll plan to clean out the old servers late this week when we're confident we won't need them anymore
19:35:14 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/944867 fix nodepool image export cron
19:35:28 <clarkb> this is a related fix for a change I made to be backward and forward compatible between docker-compose and docker compose
19:35:44 <clarkb> we should land that and make sure we are exporting things properly before deleting the old servers too
19:36:15 <clarkb> frickler: ^ you had a concern about updating cron's PATH instead but I don't think that is necessary as docker-compose should be a temporary shim to make things compatible between old and new systems
19:37:19 <clarkb> oh also good news is the nested container stuff to build rocky linux images seems to work fine
19:37:33 <clarkb> anyway I'm going to keep working on server replacements as time goes on and help is very much welcome
19:37:37 <clarkb> anyone else have updates on this topic?
19:37:52 <frickler> yes, I was only worried about what happens on nb04, but if it works there, I'm fine
19:38:29 <fungi> there was also some error you spotted on the new servers, frickler?
19:38:38 <clarkb> it should. on the old servers we pip install docker-compose which puts the executable in /usr/local/bin/docker-compose and on noble and newer we write out a shim for docker-compose to docker compose that we write to /usr/local/bin/docker-compose
19:39:00 <clarkb> fungi: its the same error on both I think. /usr/local/bin isn't in cron's path so we don't find the pip installed executable or our shim
19:39:11 <clarkb> your fix uses the rooted path and should fix it for both I think
19:39:22 <fungi> "/etc/nodepool-builder-compose/docker-compose.yaml: `version` is obsolete"
19:39:29 <clarkb> oh thats a warning
19:39:43 <clarkb> `docker compose` doesn't use versioned docker-compose.yaml files but `docker-compose` does
19:39:46 <fungi> okay, so benign then, but something we can clean up after the transition
19:39:49 <clarkb> ya
19:40:12 <frickler> ack
19:41:26 <clarkb> #topic Running certcheck on bridge
19:41:36 <clarkb> I haven't had a chance to look into running this out of an infra-prod job
19:41:47 <clarkb> but I still intend to as I think that is likely a good way to do it
19:42:07 <clarkb> #topic Working through our TODO list
19:42:11 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup
19:42:28 <clarkb> I marked parallel infra-prod job execution done on that list
19:42:51 <clarkb> and the python 3.12 effort from earlier is also out of that list
19:43:06 <clarkb> feel free to pick things off of their and work on them when you have time
19:43:16 <clarkb> #topic Packaging updates for bindep
19:43:35 <clarkb> fungi has two proposed changes to bindep that serve as examples for modernizing python packaging with pbr
19:43:40 <clarkb> #link https://review.opendev.org/938570 Drop requirements.txt
19:43:41 <fungi> just wanted to touch base quickly on those two remaining changes
19:43:45 <clarkb> #link https://review.opendev.org/940711 Drop auxiliary requirements files
19:43:48 <fungi> and either merge or abandon them
19:44:19 <clarkb> I'm happy to reduce the delta between us and upstream PyPA expectations and this is a good step in showing people using PBR how to do that so +2 from me
19:44:32 <fungi> once we have a decision one way or the other, we'll tag a new bindep version and then i can work on porting the various packaging updates from bindep to our other tools
19:45:03 <fungi> ideally i'd like to get a bindep release out this week
19:45:31 <frickler> I can take a closer look at those tomorrow if you want to wait for that
19:45:42 <fungi> but since they're stylistic changes for packaging to serve as a possible template for our other tools and wider pbr user base, i want to be sure we have consensus
19:45:49 <fungi> yeah, tomorrow would be great
19:46:36 <fungi> this came about in part because of problems openstack was facing using pbr, so it would be great to have a real-world example to point them to instead of something fabricated
19:47:21 <fungi> anyway, that's all i had for this topic
19:47:39 <clarkb> #topic Open Discussion
19:47:58 <clarkb> as mentioned I marked parallel infra-prod deplyoments done. This has dramatically sped up our deploy buildsets in many situations
19:48:08 <clarkb> thank you to everyone who helped get that over the finish line
19:48:13 <corvus> what's the mutex at?
19:48:16 <clarkb> 4
19:48:27 <frickler> a generic certcheck job in zuul-jobs might also be interesting for other #zuul users, maybe ask if someone there is interested in helping?
19:48:29 <clarkb> there is potential to bump that up but I think we'll see diminishing returns
19:48:32 <corvus> thinking of increasing it more?
19:48:37 <corvus> ok
19:49:01 <clarkb> mutix of 1 ran periodic in 2 hours or so (usually a little over). Mutex of 2 ran in about 1 hour. Mutex of 4 got it to 40 minutes
19:49:01 <fungi> watching the load on bridge we could almost certainly increase it, but there are coalescent event horizons like the letsencrypt job which would need to be refactored to take advantage of much more parallelism
19:49:15 <clarkb> bumping to 6 I think we'd only get to 35 minutes at best and then ya ^
19:49:35 <clarkb> I'm happy to increase it and see what happens if people would like to do that
19:49:44 <clarkb> but it doesn't seem as urgent as the prior increases
19:50:21 <corvus> meh.  sounds lie time might be better spent thinking about the next blocker
19:50:24 <clarkb> I guess if anyone feels strongly about it or wants to experiment push up a change. I don't think anyone will object
19:50:25 <frickler> if we're voting I'd rather keep it conservative and stick to 4
19:50:51 <corvus> is there a reason to be conservative?
19:50:51 <fungi> i have a change open documenting how i prepped the openstack-discuss ml to switch it to moderating new subscribers by default:
19:50:54 <fungi> #link https://review.opendev.org/944893 docs: Switch a mailing list to default moderation
19:51:16 <fungi> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/L4OG3TJ5JBVS4IS7KCQKXER736PWEITB/ [administrivia] Recent change in moderation for new subscribers
19:51:20 <corvus> (it sounds like the only reason not to do it is because it won't actually help, and we'll just be waiting on other resources outside the mutex)
19:52:20 <clarkb> the main reason I can think of is limiting blast radius if we decide we need to shut off ansible runs
19:52:30 <clarkb> but the speedup tradeoffs if there would be worth it imo
19:52:38 <clarkb> and if not there then ya not much incentive
19:53:08 <clarkb> Gerrit meets is happening at 00:00 UTC between tuesday and wednesday. That is in just over 4 hours
19:53:19 <clarkb> if you want to participate they stream it to gerritforge's youtube channel
19:53:32 <clarkb> they will discuss gerrit caches and I intend on listening in and sending questions (likely to discord)
19:53:55 <clarkb> also the openinfra foundation is putting together a newsletter for the end of march and wants to spotlight openduev
19:54:10 <clarkb> draft work is going in https://etherpad.opendev.org/p/opendev_newsletter
19:54:35 <clarkb> feel free to add ideas or write something if you are intested or have somethign you want to communicate in that format. I'll probably draft things early next week if no one else beats me to it
19:55:21 <frickler> fungi: btw. did you check the load on wiki before rebooting? just wondering whether the issue really was related to that
19:55:58 <fungi> frickler: i didn't, but in the past have observed that the openid login problem persists even after load dies down
19:57:28 <fungi> it's probably restored by restarting apache and mariadb/mysql, guessing there's something going on between them that gets stuck
19:57:45 <fungi> but a reboot takes care of all of that
19:58:13 <frickler> ah, ok, maybe next time I'm confident enough to try that on my own, then
19:58:21 <fungi> feel free
19:58:55 <fungi> we're really trying not to waste too much of our time on it, and would instead prefer to spend it reviewing tonyb's replacement work
19:58:57 <frickler> well this morning a preferred a running wiki with broken login to a possibly completely broken one after a reboot
19:59:24 <fungi> fair enough
19:59:26 <clarkb> fwiw the wiki did work for me yseterday when prepping the agenda
19:59:36 <clarkb> so whatever broke it occured between about 00:00 and when frickler noticed it
19:59:44 <fungi> same, i edited the agenda during my afternoon yesterday as well
19:59:45 <clarkb> and we are at time. Thank you everyone!
19:59:55 <fungi> thanks clarkb!
19:59:57 <frickler> actually bauzas noticed, but anyway ;)
19:59:57 <clarkb> we'll be back here same time and location next week
20:00:11 <clarkb> feel free to continue discussion in our normal comms channels
20:00:13 <clarkb> #endmeeting