19:00:31 <clarkb> #startmeeting infra
19:00:31 <opendevmeet> Meeting started Tue Oct  1 19:00:31 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:31 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:31 <opendevmeet> The meeting name has been set to 'infra'
19:01:01 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/EOYNPX5IPPB3CCX6YA574BPPQEGSYGGH/ Our Agenda
19:02:00 <clarkb> #topic Announcements
19:02:16 <clarkb> A reminder that the CFP for the SCaLE open infra day is open until november 1
19:02:22 <clarkb> #link https://www.socallinuxexpo.org/scale/22x/events/open-infra-days CFP for Open Infra Days event at SCaLE is open until November 1
19:02:54 <clarkb> and a heads up that I've got some family stuff that I'll need to work through in the near future and will likely be afk for a few days in the near future. Unfortunately, the timing for that isn't known yet but I'll do my best to update when I do know
19:03:29 <fungi> i'm mostly around until oid-na, do what you need to do
19:04:08 <clarkb> thanks!
19:04:21 <clarkb> #topic OpenStack Release Wednesday
19:04:41 <clarkb> I wasn't sure if this deserved its own topic or a listing under announcements so it goes first and it can be both
19:04:49 <clarkb> Tomorrow openstack will be making its 2024.2 release
19:05:03 <clarkb> the process for that should start at around 10:00 UTC and end approximately 15:00 UTC
19:05:11 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/930709 will be landed prior to the release to remove a semaphore that slows things down
19:05:23 <clarkb> this change is one that fungi will land todayish I think to streamline the release process
19:05:32 <fungi> today's secret word is "slushy"
19:05:43 <clarkb> please be on the lookout for any problems that need addressing and avoid making changes that are risky for the release for the next 24 hours or so
19:06:11 <fungi> i'll be up and around starting at 10:00 utc hopefully, to make sure things go smoothly
19:06:55 <clarkb> I too will try to have an early start but not that early
19:07:17 <clarkb> #topic Rocky Package Mirror Creation
19:07:32 <clarkb> I kept this on the agenda because it helps remind me that its a thing to pay attention to but I don't see a change for it yet
19:08:14 <clarkb> #topic Rackspace's Flex Cloud
19:08:25 <clarkb> No progress from me on figuring out swift here yet
19:08:30 <fungi> it got a rave review in the tc meeting today at least
19:08:48 <clarkb> yes people are noticing the nodes are much faster. More confirmation the smaller flavor type isn't inherently an issue
19:08:58 <clarkb> Unfortunately I haven't found time to dig into the swift stuff for this cloud yet
19:09:06 <fungi> noonedeadpunk thought some jobs had broken at first, because they completed so quickly
19:09:19 <clarkb> there have been too many distractions and doing so is relatively low on the priority list as it is all new
19:10:12 <clarkb> I'd like to say I'll definitely dig into it this week but with other tsuff going on I know I can't commit to that
19:10:25 <clarkb> others should feel free to do so if they have time otherwise I'll do my best to look at it when I'm able
19:11:06 <clarkb> also corvus tracked down a zuul ci issue that was related to the lower cpu count. TL;DR is that a file leak in zuul's test suite was able to get bad enough to hit ulimits because there are fewer processes running due to fewer CPUs so more files leaked per process
19:11:21 <clarkb> something to be aware of as neutron indicated they may be seeing similar problems
19:11:58 <clarkb> #topic Updating ansible+ansible-lint versions in our repos
19:12:04 <clarkb> #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 is the last current open change related to this work
19:12:17 <clarkb> frickler requested that this change land after the openstack release which is totally reasonable
19:12:38 <clarkb> reviews should be safe in case there is any feedback before the release happens. Otherwise it would be great to try and land this late this week after the reelase
19:12:59 <clarkb> TL;DR is bump linter node to noble, run under python3.12, update tools and rules to accomodate the newer runtime
19:14:17 <clarkb> #topic Zuul-launcher image builds
19:14:26 <corvus> the image build and upload framework in zuul has merged... (full message at <https://matrix.org/oftc/media/v1/media/download/AbsFEYtLzL6EMVsMURDijbX4sw0oCDiLKWKVOxLTJDvGT5CncMMaFkgpii4wKT0M3Rv9N7YS3kIA83xGROOOBtpCeSSzjL6wAG1hdHJpeC5vcmcvYVdTTU5TSnptcVhHU011bXN0TlF0cE9u>)
19:14:36 <corvus> oh dear
19:14:42 <corvus> the image build and upload framework in zuul has merged
19:14:44 <corvus> that means if we were using aws, we could try building and uploading images today
19:14:48 <corvus> but we aren't, so, here are the current work streams to get us there:
19:14:52 <corvus> 1) corvus: working on adding openstack driver
19:14:55 <corvus> 2) clarkb: getting object storage upload ready in rax-flex to use as intermediate storage
19:14:59 <corvus> 3) tonyb: adding image build jobs for more distros
19:15:02 <corvus> 4) anyone: actually spin up new launcher
19:15:08 <corvus> [there we go]
19:15:34 <clarkb> corvus: any concern with the limited config checking that we updated to after the weekend rollout of the initial update?
19:15:39 <clarkb> I think that was related to this
19:15:53 <corvus> nah, that was only a "nice to have" to help users find issues early
19:15:57 <corvus> the real check happens at runtime
19:16:13 <corvus> incidentally, i believe the fix for that should be to just permanently remove that early check.
19:16:22 <corvus> https://review.opendev.org/930942 (for reference)
19:16:32 <clarkb> got it so from a safety/correctness perspective we're good. It is just error reporting we were trying to make friendly for people
19:16:38 <corvus> yep
19:17:35 <clarkb> anything else to add? sounds like we're still making steady progress even if I'm failing to figureo ut swift
19:17:55 <corvus> i'm hoping to have the openstack driver ready enough for this within a few days
19:18:14 <corvus> if we don't get rax-flex swift worked out, we could use one of our existing object stores
19:18:39 <corvus> (then switch at any time when ready)
19:18:54 <clarkb> makes sense those images should mostly be ephemeral in that container anyway.
19:19:19 <corvus> yeah, i'm expecting to give them a ttl of like 72 hours or something for early testing
19:19:52 <corvus> [that's it from me]
19:20:06 <clarkb> #topic OpenStack OpenAPI spec publishing
19:20:23 <clarkb> I wanted to followup on this to note that frickler left a comment and I tried to expand on it. No response since.
19:20:43 <clarkb> fungi: not sure if we want to try and set up time for a synchronous discussion just to get things moving forward?
19:20:55 <clarkb> probably don't need to wait for the PTG to do that, though frickler is out until about then unfortunately
19:22:11 <clarkb> #link https://review.opendev.org/921934
19:22:13 <clarkb> is the change in question
19:22:19 <fungi> yeah, it doesn't seem urgent, the change was opened in... may?
19:22:38 <clarkb> thats part of my concern. Yes probably not urgent but also we've probably ignored it for long enough
19:23:52 <clarkb> I'd be happy to try and sit in on some more focused conversation around this to find a conclusion. Though as noted earlier my availability may be limtied. I think I've written down my concerns well enough that you or others could convey them successfully though
19:25:11 <clarkb> we don't have to solve that now though. Just wanted to throw that idea out there
19:25:16 <clarkb> #topic Upgrading old servers
19:25:31 <clarkb> I don't see new updates on the mediawiki stack since I last reviewed it
19:26:18 <clarkb> I know it is super early in australia so don't expect tonyb is here right now. I believe that should get better after we both DST switch and it will be an hour later for tonyb
19:26:34 <clarkb> anything else related to booting new servers / server upgrades?
19:28:00 <clarkb> sounds like no, but our next topic does overlap a bit
19:28:06 <clarkb> #topic Docker compose plugin with podman service for servers
19:28:15 <clarkb> #link https://review.opendev.org/923084 is a demo (in the Zuul repo) of using docker compose v2 plugin with system podman service
19:28:16 <corvus> #link https://review.opendev.org/923084 a demo (in the Zuul repo) of using docker compose v2 plugin with system podman service
19:28:28 <corvus> jinx (sorry!)
19:28:33 <clarkb> heh no problem
19:28:39 <clarkb> take it away
19:28:44 <corvus> This would let us host opendev images on quay.io (or any non-dockerhub site) and use speculative images
19:28:50 <corvus> Some caveats (seen in the change):
19:28:56 <corvus> unconfined apparmor profile to work around https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2040483
19:29:02 <corvus> buildx startup probably not an issue since we don't use docker compose image builds: https://github.com/docker/buildx/issues/344
19:29:08 <corvus> need to set DOCKER_HOST env variable (system-wide bashrc?  maybe a docker context? other options?)
19:29:24 <corvus> that looks pretty workable for us
19:29:49 <corvus> i don't think any of those 3 things are huge blockers -- and the apparmor confinement thing should work itself out eventually
19:29:49 <clarkb> ya the other caveat is that we probably can't reliably do this until noble? (maybe jammy?) just because podman installation on debuntu before then is tricky
19:29:58 <corvus> oh yeah that too :)
19:30:15 <clarkb> but ya that list of issues seems workable
19:30:21 <fungi> i think it was noble we needed
19:30:47 <corvus> so i think my questions for the group would be: 1) any [other] technical blockers?  2) does it sound like a good idea/ something we want to do?
19:30:55 <corvus> i mean, it's still podman, and we've had surprises there.
19:31:09 <clarkb> ya I think the main unknown is just how podman will continue to work over time
19:31:21 <corvus> but podman running as a systemd service should be the least problematic podman.
19:31:24 <clarkb> but the goal of hosting our images on quay instead of docker hub is still worthwhile I think
19:31:47 <clarkb> maybe a good next step here is picking a relatively self contained service that we can update to noble then convert it over?
19:31:53 <clarkb> something like paste?
19:32:07 <corvus> sounds like a good idea to me
19:32:08 <clarkb> simple enough to be doable relatively quickly but close enough to everything else to be illustrative
19:32:53 <clarkb> and if that doesn't expose any new major issues we can proceed to swap everything else over? Probably as part of server upgrades?
19:33:09 <corvus> ++
19:33:29 <clarkb> side note: you can apparently use ipv6 address literals with docker ce now. But I think podman's insistence that they emulate docker bugs means they haven't quite done the same yet? THough there are some shared libs so maybe it just works there too
19:33:45 <clarkb> as the sort of potential problems we might run into due to using different tools think ^ as an example
19:34:34 <corvus> good point
19:34:37 <fungi> bug-compatible with old docker releases
19:34:55 <clarkb> thank you for digging into this. The composability of these tools is a really neat feature and I think also reduces potential risk for making changes like this (as we should be able to rollback in theory (with some cost))
19:34:57 <corvus> (and testing that particular issue is challenging in our environment currently)
19:35:36 <clarkb> as far as setting the env var maybe we use a wrapper tool
19:35:46 <clarkb> then as long as we consistently use the wrapper we don't have to think about it
19:36:00 <clarkb> we could even call it `docker-compose` >_>
19:36:16 <fungi> has a nice ring to it
19:36:37 <corvus> there's a thing called contexts... i think we might be able to use that to our advantage and have it just work
19:36:59 <corvus> (like, that becomes a permanent client configuration for root)
19:37:02 <corvus> but i haven't tested it
19:37:06 <clarkb> oh interesting
19:38:00 <corvus> ie, the config would say the current docker "context" is the one at /var/run/podman.socket
19:38:02 <corvus> so any docker command run by root would use that
19:38:12 <corvus> at least, that's my understanding based on my own imagination after reading the docs for at least 10-15 seconds
19:38:34 <clarkb> something to look at as paste (or similar) gets an updated config
19:38:42 <corvus> ++
19:38:53 <clarkb> anything else on this topic? I think we may end early today
19:39:06 <corvus> that's it from me
19:39:12 <clarkb> #topic Open Discussion
19:39:14 <clarkb> anything else?
19:41:25 <clarkb> sounds like that may be everything. Thank you everyone!
19:41:38 <corvus> thanks!
19:41:43 <clarkb> I won't promise we'll be back here next week as it is possible someone else will have to run the meeting
19:41:51 <clarkb> but we'll aim for that and if anything changes I'll let you know
19:41:54 <clarkb> #endmeeting