19:00:47 <clarkb> #startmeeting infra 19:00:47 <opendevmeet> Meeting started Tue Mar 18 19:00:47 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:47 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:47 <opendevmeet> The meeting name has been set to 'infra' 19:00:55 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YM5MZF2IHG6P4FFTRVMNVLJHYOIBVFUD/ Our Agenda 19:02:30 <clarkb> #topic Announcements 19:02:34 <clarkb> Anything to announce? 19:02:58 <clarkb> I dno't have any concrete plans but next week the kids are home from school for a break and I may try to do an easy day or two and get out of the house with them 19:05:04 <tonyb> FWIW, next week 25th I head back to AU 19:05:42 <clarkb> ack 19:05:44 <clarkb> #topic Zuul-launcher image builds 19:05:51 <clarkb> I'm not aware of any new updates on this subject 19:06:07 <clarkb> though if corvus is around I'd be curious to know if the quota handling has improved the reliability 19:06:35 <fungi> also i've still got a change up to add flex dfw3 images/quota 19:06:49 <clarkb> oh do you have a link to that? 19:06:53 <fungi> #link https://review.opendev.org/943104 Add the DFW3 region for Rackspace Flex 19:07:00 <clarkb> thanks! 19:07:17 <clarkb> I'll review that after the meeting 19:08:01 <clarkb> #topic Updating Flavors in OVH 19:08:11 <clarkb> The other related item was the OVH flavor update process 19:08:15 <clarkb> #link https://etherpad.opendev.org/p/ovh-flavors 19:08:41 <clarkb> we proposed starting that yesterday but they came back and said that timing didn't work for them. They will be in touch with timing that does work at some point but I haven't seen any proposed dates from their end 19:09:04 <clarkb> we can probably pull this off of the agenda next week. But I wanted to make sure we called out it wasn't happening and we're waiting on timing from ovh 19:09:45 <clarkb> #topic Container hygiene tasks 19:09:51 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/944799 drop python 3.10 image builds 19:10:04 <clarkb> according to codesearch we don't have anything using python 3.10 iamges anymore 19:10:15 <clarkb> I think this is a safe change to land at any time 19:10:24 <clarkb> next up is rebuilding the images we do use 19:10:30 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/944789 rebuild python 3.11 and 3.12 base images 19:10:45 <clarkb> unfortunately this change has been in uWSGI purgatory 19:11:02 <clarkb> it seems that compiling current uwsgi on top of current bookworm on aarch64 segfaults 19:11:25 <clarkb> uwsgi is only barely maintained anymore so I've begun looking at alternatives (which is the next topic) 19:11:55 <clarkb> I don't think we can safely udpate python-builder and python-base without also updating uwsgi-base as the wheel builds for things will see mismatched binary packages and we could break things relying on uwsgi today 19:12:25 <clarkb> so I think our main options here are either to drop uwsgi, fix uwsgi builds somehow, or retry until we get lucky? (they failures don't seem to be 100% consistent) 19:12:44 <clarkb> maybe it is worth trying to enqueue to the gate a few times to see if we can get it to go through 19:13:09 <clarkb> then finally I've got changes up to move images from python 3.11 to 3.12 19:13:11 <clarkb> #link https://review.opendev.org/q/topic:%22opendev-python3.12%22+status:open Update images to use python3.12 19:13:24 <frickler> I didn't look at details yet, but could we use distro pkgs like devstack does? 19:13:42 <clarkb> while those changes don't strictly depend on the new image updates it might be nice to use these rebuilds to also get new base images so we address two things with one set of service restarts 19:14:06 <clarkb> frickler: the container images are based on debian bookworm. If there is a bookworm uwsgi package then yes I think that is possible 19:14:30 <clarkb> we would add uwsgi to lodgeit's bindep requirements and then switch the base container image from uwsgi-base to python-base 19:14:31 <fungi> #link https://packages.debian.org/bookworm/uwsgi 19:14:58 <clarkb> this will likely downgrade uwsgi for us but we aren't doing anything too crazy with it so thats probably fine (also uwsgi itself isn't super stateful) 19:15:08 <frickler> I think that would likely only work with distro python. so it might work for py3.11 then and we'd have to use noble for py3.12 19:15:31 <clarkb> actually it wouldn't work for either if that is the cas 19:15:39 <clarkb> since both 3.11 and 3.12 are custom builds on top of debian 19:16:18 <clarkb> another option would be to try and pin uwsgi versions to something older than latest to see if we can get that to build more reliably 19:16:30 <frickler> and we do need custom python builds? 19:16:55 <clarkb> it dramatically simplifies supporting a range of pythons 19:17:13 <clarkb> since we can use the upstream python container images and not need to have different platforms for different python versions or even wait for new distro releases 19:17:51 <clarkb> I don't think uwsgi is important enough to warrant dramatically changing how we run python services 19:18:02 <clarkb> it is basically unmaintained and can't be built on modern platforms reliably 19:18:29 <clarkb> if we can get by with simple fixes to uwsgi that is fine. But I would rather invest my time in replacing uwsgi than completely rearchitecting our container setup 19:18:47 <frickler> ack 19:19:21 <clarkb> which is maybe a good segue into the next topic. Reviwes on the above changes are very much appreciated too 19:19:26 <clarkb> #topic Dropping uWSGI 19:19:32 * corvus arrives late 19:20:19 <clarkb> there are a number of factors have me thinking uWSGI is no longer a viable option for python web servers. First is that the project is minimally maintained and bug fixes are slow and build errors are common. Next is the python world is becoming more async which often means ASGI instead of WSGI 19:20:46 <clarkb> rather than try and keep the old barely working thing going I suspect a better use of our time is looking forward and picking a modern tool that is maintained and can support asgi and wsgi 19:21:34 <clarkb> in my investigation I discovered granian which seems to fit this criteria. However, frickler and others have pointed out granian is maintained laregly by one person (there are a number of contributors but the vast majority of work is one person) and there is no distro packaging for it 19:21:51 <clarkb> I think that the lack of distro packaging is less meaningful for us as we install from pypi and they publish x86 and arm wheels 19:22:12 <clarkb> #link https://review.opendev.org/c/opendev/lodgeit/+/944805 lodgeit granian container update 19:22:23 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/944806 lodgeit system-config deployment update 19:22:33 <corvus> we only use uwsgi for lodgeit, right? 19:22:58 <clarkb> corvus: the uwsgi-base container image is lodgeit only. mailman3 also uses uwsgi but not via the base image 19:23:19 <clarkb> anyway I'm open to other alternatives we think might be more appropriate, but I do think we should try and stop using uwsgi 19:24:06 <corvus> my feeling is -- for something like lodgeit -- whatever works and requires the least time. if granian works, that's good enough for me; i'd be in favor of adding a granian image and declaring uwsgi unmaintained by us 19:24:14 <fungi> in the mm3 case uwsgi is part of upstream images we're reusing (well, rebuilding), right? 19:24:23 <clarkb> fungi: yes 19:24:52 <fungi> in which case we can probably wait for or help with upstream's decisions on whether to continue relying on it 19:25:21 <clarkb> ya I think being in sync with them and possibly starting a conversation with them about alternatives is a good thing and also less urgent 19:25:58 <clarkb> also I think our needs differ from openstack's needs. While it might be more important for openstack to use distro packaged wsgi servers that isn't as important for us 19:26:02 <frickler> can we simply re-use the mm3 container and install lodgeit in there? or is that setup too different? 19:26:36 <clarkb> I'm not sure. But it is fairly different (uses alpine instead of debian, and has django installed) 19:26:40 <fungi> https://github.com/maxking/docker-mailman is the mailman containers. they're arranged a bit differently from how we usually do our container images 19:26:55 <tonyb> I think that granian is a good step forward and the single maintainer is better that no maintainer. The switch to granian looks pretty nice 19:27:35 <clarkb> gunicorn was also mentioend in discussion in the tc channel. But gunicorn doesn't do asgi. uvicorn was mentioned but that is a separate code base aiui and doesn't do wsgi 19:28:12 <clarkb> anyway I noted on the change why I thought granian was a good option and I think being able to switch to gunicorn or similar later means this si fairly low risk 19:28:27 <tonyb> ++ 19:29:02 <frickler> iiuc our uwsgi build issues were on aarch64, too? https://github.com/maxking/docker-mailman/blob/main/web/Dockerfile#L20-L21 19:29:17 <clarkb> frickler: aha thats a good find 19:29:26 <clarkb> I can pin our uwsgi image to that version and see if it works 19:29:35 <clarkb> and then we can migrate away from uwsgi on a less urgent schedule 19:29:48 <fungi> corvus summed up my position quite nicely, if it works with minimal effort, then great, lodgeit hopefully shouldn't eat a ton of our admin time 19:30:32 <clarkb> ya so I think plan is try that workaround from mm3, if that works land it. Concurrently we can make a plan to migrate lodgeit (as written the chagnes I have require coordination between container and system-cofnig so we may have an outage) 19:30:44 <frickler> ack, I wasn't aware that this is "only" for lodgeit, I'd withdraw my -1 then 19:30:50 <clarkb> if the workaround doesn't work we can speed up the granian switch. If the workaround works then we can be a bit more cautious and double check for alternatives first 19:30:58 <clarkb> frickler: oh cool 19:31:04 <clarkb> that gives me a path forward I'll work on that 19:31:11 <clarkb> thanks for listening to me on this subject 19:31:12 <fungi> and yeah, i did a search for uwsgi in the open/closed issues and pull requests for the mailman containers and saw they'd been doing work recently to get it working for aarch64 19:31:25 <fungi> so potentially related 19:32:29 <frickler> ah, yes, git blame shows this https://github.com/maxking/docker-mailman/pull/743, so rather recent 19:33:19 <clarkb> #topic Upgrading old servers 19:33:33 <clarkb> I should probably just merge this and the sprint idea topics together 19:33:56 <tonyb> clarkb: Yeah it seems to be basically the same topic at this point 19:34:12 <clarkb> I built nb05 and nb06 yesterday and got them deployed 19:34:37 <clarkb> they have built every image but bookworm, centos 9 stream, gentoo, and openeuler. The last two are paused and don't build. Just before the meeting I requested rebuilds of the first two 19:34:55 <clarkb> everything is looking ok to me so far. I'll plan to clean out the old servers late this week when we're confident we won't need them anymore 19:35:14 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/944867 fix nodepool image export cron 19:35:28 <clarkb> this is a related fix for a change I made to be backward and forward compatible between docker-compose and docker compose 19:35:44 <clarkb> we should land that and make sure we are exporting things properly before deleting the old servers too 19:36:15 <clarkb> frickler: ^ you had a concern about updating cron's PATH instead but I don't think that is necessary as docker-compose should be a temporary shim to make things compatible between old and new systems 19:37:19 <clarkb> oh also good news is the nested container stuff to build rocky linux images seems to work fine 19:37:33 <clarkb> anyway I'm going to keep working on server replacements as time goes on and help is very much welcome 19:37:37 <clarkb> anyone else have updates on this topic? 19:37:52 <frickler> yes, I was only worried about what happens on nb04, but if it works there, I'm fine 19:38:29 <fungi> there was also some error you spotted on the new servers, frickler? 19:38:38 <clarkb> it should. on the old servers we pip install docker-compose which puts the executable in /usr/local/bin/docker-compose and on noble and newer we write out a shim for docker-compose to docker compose that we write to /usr/local/bin/docker-compose 19:39:00 <clarkb> fungi: its the same error on both I think. /usr/local/bin isn't in cron's path so we don't find the pip installed executable or our shim 19:39:11 <clarkb> your fix uses the rooted path and should fix it for both I think 19:39:22 <fungi> "/etc/nodepool-builder-compose/docker-compose.yaml: `version` is obsolete" 19:39:29 <clarkb> oh thats a warning 19:39:43 <clarkb> `docker compose` doesn't use versioned docker-compose.yaml files but `docker-compose` does 19:39:46 <fungi> okay, so benign then, but something we can clean up after the transition 19:39:49 <clarkb> ya 19:40:12 <frickler> ack 19:41:26 <clarkb> #topic Running certcheck on bridge 19:41:36 <clarkb> I haven't had a chance to look into running this out of an infra-prod job 19:41:47 <clarkb> but I still intend to as I think that is likely a good way to do it 19:42:07 <clarkb> #topic Working through our TODO list 19:42:11 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup 19:42:28 <clarkb> I marked parallel infra-prod job execution done on that list 19:42:51 <clarkb> and the python 3.12 effort from earlier is also out of that list 19:43:06 <clarkb> feel free to pick things off of their and work on them when you have time 19:43:16 <clarkb> #topic Packaging updates for bindep 19:43:35 <clarkb> fungi has two proposed changes to bindep that serve as examples for modernizing python packaging with pbr 19:43:40 <clarkb> #link https://review.opendev.org/938570 Drop requirements.txt 19:43:41 <fungi> just wanted to touch base quickly on those two remaining changes 19:43:45 <clarkb> #link https://review.opendev.org/940711 Drop auxiliary requirements files 19:43:48 <fungi> and either merge or abandon them 19:44:19 <clarkb> I'm happy to reduce the delta between us and upstream PyPA expectations and this is a good step in showing people using PBR how to do that so +2 from me 19:44:32 <fungi> once we have a decision one way or the other, we'll tag a new bindep version and then i can work on porting the various packaging updates from bindep to our other tools 19:45:03 <fungi> ideally i'd like to get a bindep release out this week 19:45:31 <frickler> I can take a closer look at those tomorrow if you want to wait for that 19:45:42 <fungi> but since they're stylistic changes for packaging to serve as a possible template for our other tools and wider pbr user base, i want to be sure we have consensus 19:45:49 <fungi> yeah, tomorrow would be great 19:46:36 <fungi> this came about in part because of problems openstack was facing using pbr, so it would be great to have a real-world example to point them to instead of something fabricated 19:47:21 <fungi> anyway, that's all i had for this topic 19:47:39 <clarkb> #topic Open Discussion 19:47:58 <clarkb> as mentioned I marked parallel infra-prod deplyoments done. This has dramatically sped up our deploy buildsets in many situations 19:48:08 <clarkb> thank you to everyone who helped get that over the finish line 19:48:13 <corvus> what's the mutex at? 19:48:16 <clarkb> 4 19:48:27 <frickler> a generic certcheck job in zuul-jobs might also be interesting for other #zuul users, maybe ask if someone there is interested in helping? 19:48:29 <clarkb> there is potential to bump that up but I think we'll see diminishing returns 19:48:32 <corvus> thinking of increasing it more? 19:48:37 <corvus> ok 19:49:01 <clarkb> mutix of 1 ran periodic in 2 hours or so (usually a little over). Mutex of 2 ran in about 1 hour. Mutex of 4 got it to 40 minutes 19:49:01 <fungi> watching the load on bridge we could almost certainly increase it, but there are coalescent event horizons like the letsencrypt job which would need to be refactored to take advantage of much more parallelism 19:49:15 <clarkb> bumping to 6 I think we'd only get to 35 minutes at best and then ya ^ 19:49:35 <clarkb> I'm happy to increase it and see what happens if people would like to do that 19:49:44 <clarkb> but it doesn't seem as urgent as the prior increases 19:50:21 <corvus> meh. sounds lie time might be better spent thinking about the next blocker 19:50:24 <clarkb> I guess if anyone feels strongly about it or wants to experiment push up a change. I don't think anyone will object 19:50:25 <frickler> if we're voting I'd rather keep it conservative and stick to 4 19:50:51 <corvus> is there a reason to be conservative? 19:50:51 <fungi> i have a change open documenting how i prepped the openstack-discuss ml to switch it to moderating new subscribers by default: 19:50:54 <fungi> #link https://review.opendev.org/944893 docs: Switch a mailing list to default moderation 19:51:16 <fungi> #link https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/L4OG3TJ5JBVS4IS7KCQKXER736PWEITB/ [administrivia] Recent change in moderation for new subscribers 19:51:20 <corvus> (it sounds like the only reason not to do it is because it won't actually help, and we'll just be waiting on other resources outside the mutex) 19:52:20 <clarkb> the main reason I can think of is limiting blast radius if we decide we need to shut off ansible runs 19:52:30 <clarkb> but the speedup tradeoffs if there would be worth it imo 19:52:38 <clarkb> and if not there then ya not much incentive 19:53:08 <clarkb> Gerrit meets is happening at 00:00 UTC between tuesday and wednesday. That is in just over 4 hours 19:53:19 <clarkb> if you want to participate they stream it to gerritforge's youtube channel 19:53:32 <clarkb> they will discuss gerrit caches and I intend on listening in and sending questions (likely to discord) 19:53:55 <clarkb> also the openinfra foundation is putting together a newsletter for the end of march and wants to spotlight openduev 19:54:10 <clarkb> draft work is going in https://etherpad.opendev.org/p/opendev_newsletter 19:54:35 <clarkb> feel free to add ideas or write something if you are intested or have somethign you want to communicate in that format. I'll probably draft things early next week if no one else beats me to it 19:55:21 <frickler> fungi: btw. did you check the load on wiki before rebooting? just wondering whether the issue really was related to that 19:55:58 <fungi> frickler: i didn't, but in the past have observed that the openid login problem persists even after load dies down 19:57:28 <fungi> it's probably restored by restarting apache and mariadb/mysql, guessing there's something going on between them that gets stuck 19:57:45 <fungi> but a reboot takes care of all of that 19:58:13 <frickler> ah, ok, maybe next time I'm confident enough to try that on my own, then 19:58:21 <fungi> feel free 19:58:55 <fungi> we're really trying not to waste too much of our time on it, and would instead prefer to spend it reviewing tonyb's replacement work 19:58:57 <frickler> well this morning a preferred a running wiki with broken login to a possibly completely broken one after a reboot 19:59:24 <fungi> fair enough 19:59:26 <clarkb> fwiw the wiki did work for me yseterday when prepping the agenda 19:59:36 <clarkb> so whatever broke it occured between about 00:00 and when frickler noticed it 19:59:44 <fungi> same, i edited the agenda during my afternoon yesterday as well 19:59:45 <clarkb> and we are at time. Thank you everyone! 19:59:55 <fungi> thanks clarkb! 19:59:57 <frickler> actually bauzas noticed, but anyway ;) 19:59:57 <clarkb> we'll be back here same time and location next week 20:00:11 <clarkb> feel free to continue discussion in our normal comms channels 20:00:13 <clarkb> #endmeeting