19:00:31 <clarkb> #startmeeting infra 19:00:31 <opendevmeet> Meeting started Tue Oct 1 19:00:31 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:31 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:31 <opendevmeet> The meeting name has been set to 'infra' 19:01:01 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/EOYNPX5IPPB3CCX6YA574BPPQEGSYGGH/ Our Agenda 19:02:00 <clarkb> #topic Announcements 19:02:16 <clarkb> A reminder that the CFP for the SCaLE open infra day is open until november 1 19:02:22 <clarkb> #link https://www.socallinuxexpo.org/scale/22x/events/open-infra-days CFP for Open Infra Days event at SCaLE is open until November 1 19:02:54 <clarkb> and a heads up that I've got some family stuff that I'll need to work through in the near future and will likely be afk for a few days in the near future. Unfortunately, the timing for that isn't known yet but I'll do my best to update when I do know 19:03:29 <fungi> i'm mostly around until oid-na, do what you need to do 19:04:08 <clarkb> thanks! 19:04:21 <clarkb> #topic OpenStack Release Wednesday 19:04:41 <clarkb> I wasn't sure if this deserved its own topic or a listing under announcements so it goes first and it can be both 19:04:49 <clarkb> Tomorrow openstack will be making its 2024.2 release 19:05:03 <clarkb> the process for that should start at around 10:00 UTC and end approximately 15:00 UTC 19:05:11 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/930709 will be landed prior to the release to remove a semaphore that slows things down 19:05:23 <clarkb> this change is one that fungi will land todayish I think to streamline the release process 19:05:32 <fungi> today's secret word is "slushy" 19:05:43 <clarkb> please be on the lookout for any problems that need addressing and avoid making changes that are risky for the release for the next 24 hours or so 19:06:11 <fungi> i'll be up and around starting at 10:00 utc hopefully, to make sure things go smoothly 19:06:55 <clarkb> I too will try to have an early start but not that early 19:07:17 <clarkb> #topic Rocky Package Mirror Creation 19:07:32 <clarkb> I kept this on the agenda because it helps remind me that its a thing to pay attention to but I don't see a change for it yet 19:08:14 <clarkb> #topic Rackspace's Flex Cloud 19:08:25 <clarkb> No progress from me on figuring out swift here yet 19:08:30 <fungi> it got a rave review in the tc meeting today at least 19:08:48 <clarkb> yes people are noticing the nodes are much faster. More confirmation the smaller flavor type isn't inherently an issue 19:08:58 <clarkb> Unfortunately I haven't found time to dig into the swift stuff for this cloud yet 19:09:06 <fungi> noonedeadpunk thought some jobs had broken at first, because they completed so quickly 19:09:19 <clarkb> there have been too many distractions and doing so is relatively low on the priority list as it is all new 19:10:12 <clarkb> I'd like to say I'll definitely dig into it this week but with other tsuff going on I know I can't commit to that 19:10:25 <clarkb> others should feel free to do so if they have time otherwise I'll do my best to look at it when I'm able 19:11:06 <clarkb> also corvus tracked down a zuul ci issue that was related to the lower cpu count. TL;DR is that a file leak in zuul's test suite was able to get bad enough to hit ulimits because there are fewer processes running due to fewer CPUs so more files leaked per process 19:11:21 <clarkb> something to be aware of as neutron indicated they may be seeing similar problems 19:11:58 <clarkb> #topic Updating ansible+ansible-lint versions in our repos 19:12:04 <clarkb> #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/926970 is the last current open change related to this work 19:12:17 <clarkb> frickler requested that this change land after the openstack release which is totally reasonable 19:12:38 <clarkb> reviews should be safe in case there is any feedback before the release happens. Otherwise it would be great to try and land this late this week after the reelase 19:12:59 <clarkb> TL;DR is bump linter node to noble, run under python3.12, update tools and rules to accomodate the newer runtime 19:14:17 <clarkb> #topic Zuul-launcher image builds 19:14:26 <corvus> the image build and upload framework in zuul has merged... (full message at <https://matrix.org/oftc/media/v1/media/download/AbsFEYtLzL6EMVsMURDijbX4sw0oCDiLKWKVOxLTJDvGT5CncMMaFkgpii4wKT0M3Rv9N7YS3kIA83xGROOOBtpCeSSzjL6wAG1hdHJpeC5vcmcvYVdTTU5TSnptcVhHU011bXN0TlF0cE9u>) 19:14:36 <corvus> oh dear 19:14:42 <corvus> the image build and upload framework in zuul has merged 19:14:44 <corvus> that means if we were using aws, we could try building and uploading images today 19:14:48 <corvus> but we aren't, so, here are the current work streams to get us there: 19:14:52 <corvus> 1) corvus: working on adding openstack driver 19:14:55 <corvus> 2) clarkb: getting object storage upload ready in rax-flex to use as intermediate storage 19:14:59 <corvus> 3) tonyb: adding image build jobs for more distros 19:15:02 <corvus> 4) anyone: actually spin up new launcher 19:15:08 <corvus> [there we go] 19:15:34 <clarkb> corvus: any concern with the limited config checking that we updated to after the weekend rollout of the initial update? 19:15:39 <clarkb> I think that was related to this 19:15:53 <corvus> nah, that was only a "nice to have" to help users find issues early 19:15:57 <corvus> the real check happens at runtime 19:16:13 <corvus> incidentally, i believe the fix for that should be to just permanently remove that early check. 19:16:22 <corvus> https://review.opendev.org/930942 (for reference) 19:16:32 <clarkb> got it so from a safety/correctness perspective we're good. It is just error reporting we were trying to make friendly for people 19:16:38 <corvus> yep 19:17:35 <clarkb> anything else to add? sounds like we're still making steady progress even if I'm failing to figureo ut swift 19:17:55 <corvus> i'm hoping to have the openstack driver ready enough for this within a few days 19:18:14 <corvus> if we don't get rax-flex swift worked out, we could use one of our existing object stores 19:18:39 <corvus> (then switch at any time when ready) 19:18:54 <clarkb> makes sense those images should mostly be ephemeral in that container anyway. 19:19:19 <corvus> yeah, i'm expecting to give them a ttl of like 72 hours or something for early testing 19:19:52 <corvus> [that's it from me] 19:20:06 <clarkb> #topic OpenStack OpenAPI spec publishing 19:20:23 <clarkb> I wanted to followup on this to note that frickler left a comment and I tried to expand on it. No response since. 19:20:43 <clarkb> fungi: not sure if we want to try and set up time for a synchronous discussion just to get things moving forward? 19:20:55 <clarkb> probably don't need to wait for the PTG to do that, though frickler is out until about then unfortunately 19:22:11 <clarkb> #link https://review.opendev.org/921934 19:22:13 <clarkb> is the change in question 19:22:19 <fungi> yeah, it doesn't seem urgent, the change was opened in... may? 19:22:38 <clarkb> thats part of my concern. Yes probably not urgent but also we've probably ignored it for long enough 19:23:52 <clarkb> I'd be happy to try and sit in on some more focused conversation around this to find a conclusion. Though as noted earlier my availability may be limtied. I think I've written down my concerns well enough that you or others could convey them successfully though 19:25:11 <clarkb> we don't have to solve that now though. Just wanted to throw that idea out there 19:25:16 <clarkb> #topic Upgrading old servers 19:25:31 <clarkb> I don't see new updates on the mediawiki stack since I last reviewed it 19:26:18 <clarkb> I know it is super early in australia so don't expect tonyb is here right now. I believe that should get better after we both DST switch and it will be an hour later for tonyb 19:26:34 <clarkb> anything else related to booting new servers / server upgrades? 19:28:00 <clarkb> sounds like no, but our next topic does overlap a bit 19:28:06 <clarkb> #topic Docker compose plugin with podman service for servers 19:28:15 <clarkb> #link https://review.opendev.org/923084 is a demo (in the Zuul repo) of using docker compose v2 plugin with system podman service 19:28:16 <corvus> #link https://review.opendev.org/923084 a demo (in the Zuul repo) of using docker compose v2 plugin with system podman service 19:28:28 <corvus> jinx (sorry!) 19:28:33 <clarkb> heh no problem 19:28:39 <clarkb> take it away 19:28:44 <corvus> This would let us host opendev images on quay.io (or any non-dockerhub site) and use speculative images 19:28:50 <corvus> Some caveats (seen in the change): 19:28:56 <corvus> unconfined apparmor profile to work around https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2040483 19:29:02 <corvus> buildx startup probably not an issue since we don't use docker compose image builds: https://github.com/docker/buildx/issues/344 19:29:08 <corvus> need to set DOCKER_HOST env variable (system-wide bashrc? maybe a docker context? other options?) 19:29:24 <corvus> that looks pretty workable for us 19:29:49 <corvus> i don't think any of those 3 things are huge blockers -- and the apparmor confinement thing should work itself out eventually 19:29:49 <clarkb> ya the other caveat is that we probably can't reliably do this until noble? (maybe jammy?) just because podman installation on debuntu before then is tricky 19:29:58 <corvus> oh yeah that too :) 19:30:15 <clarkb> but ya that list of issues seems workable 19:30:21 <fungi> i think it was noble we needed 19:30:47 <corvus> so i think my questions for the group would be: 1) any [other] technical blockers? 2) does it sound like a good idea/ something we want to do? 19:30:55 <corvus> i mean, it's still podman, and we've had surprises there. 19:31:09 <clarkb> ya I think the main unknown is just how podman will continue to work over time 19:31:21 <corvus> but podman running as a systemd service should be the least problematic podman. 19:31:24 <clarkb> but the goal of hosting our images on quay instead of docker hub is still worthwhile I think 19:31:47 <clarkb> maybe a good next step here is picking a relatively self contained service that we can update to noble then convert it over? 19:31:53 <clarkb> something like paste? 19:32:07 <corvus> sounds like a good idea to me 19:32:08 <clarkb> simple enough to be doable relatively quickly but close enough to everything else to be illustrative 19:32:53 <clarkb> and if that doesn't expose any new major issues we can proceed to swap everything else over? Probably as part of server upgrades? 19:33:09 <corvus> ++ 19:33:29 <clarkb> side note: you can apparently use ipv6 address literals with docker ce now. But I think podman's insistence that they emulate docker bugs means they haven't quite done the same yet? THough there are some shared libs so maybe it just works there too 19:33:45 <clarkb> as the sort of potential problems we might run into due to using different tools think ^ as an example 19:34:34 <corvus> good point 19:34:37 <fungi> bug-compatible with old docker releases 19:34:55 <clarkb> thank you for digging into this. The composability of these tools is a really neat feature and I think also reduces potential risk for making changes like this (as we should be able to rollback in theory (with some cost)) 19:34:57 <corvus> (and testing that particular issue is challenging in our environment currently) 19:35:36 <clarkb> as far as setting the env var maybe we use a wrapper tool 19:35:46 <clarkb> then as long as we consistently use the wrapper we don't have to think about it 19:36:00 <clarkb> we could even call it `docker-compose` >_> 19:36:16 <fungi> has a nice ring to it 19:36:37 <corvus> there's a thing called contexts... i think we might be able to use that to our advantage and have it just work 19:36:59 <corvus> (like, that becomes a permanent client configuration for root) 19:37:02 <corvus> but i haven't tested it 19:37:06 <clarkb> oh interesting 19:38:00 <corvus> ie, the config would say the current docker "context" is the one at /var/run/podman.socket 19:38:02 <corvus> so any docker command run by root would use that 19:38:12 <corvus> at least, that's my understanding based on my own imagination after reading the docs for at least 10-15 seconds 19:38:34 <clarkb> something to look at as paste (or similar) gets an updated config 19:38:42 <corvus> ++ 19:38:53 <clarkb> anything else on this topic? I think we may end early today 19:39:06 <corvus> that's it from me 19:39:12 <clarkb> #topic Open Discussion 19:39:14 <clarkb> anything else? 19:41:25 <clarkb> sounds like that may be everything. Thank you everyone! 19:41:38 <corvus> thanks! 19:41:43 <clarkb> I won't promise we'll be back here next week as it is possible someone else will have to run the meeting 19:41:51 <clarkb> but we'll aim for that and if anything changes I'll let you know 19:41:54 <clarkb> #endmeeting