#opendev-meeting log

19:00:06 <clarkb> #startmeeting infra
19:00:06 <opendevmeet> Meeting started Tue Apr  1 19:00:06 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:06 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:06 <opendevmeet> The meeting name has been set to 'infra'
19:00:17 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/5PQX3P4NIXU6FRRQRWPTQSZNICSJJVFF/ Our Agenda
19:00:20 <clarkb> #topic Announcements
19:00:33 <clarkb> OpenStack is going to release its Epoxy 2025.1 release tomorrow
19:00:42 <clarkb> keep that in mind as we make changes over the next 24 hours or so
19:01:32 <clarkb> then the virtual PTG is being hosted next week (April 7 - 11) with meetpad being the default location for teams (they can choose to override the location if they wish)
19:01:39 <frickler> in particular hold back on https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/941246 until the release is done, please
19:02:00 <clarkb> earlier today fungi and I tested meetpad functionality and it seems to be working so I went ahead and put meetpad02 and jvb02 in the emergency file
19:02:12 <frickler> what about etherpad?
19:02:25 <clarkb> this way new container images from upstream won't unexpectedly break us mid PTG. We can remove the hosts from the emergency file Friday afternoon
19:02:34 <clarkb> frickler: etherpad uses images we build so shouldn't get auto updated
19:03:01 <frickler> ah, right
19:03:12 <fungi> (but please don't push and approve any etherpad image update changes)
19:04:04 <clarkb> ++
19:04:19 <clarkb> anything else to announce?
19:04:39 <frickler> do we want to skip the meeting next week?
19:05:23 <clarkb> good question. I'm happy to host it if we like but also happy to skip if people think they will be too busy with the ptg
19:05:38 <clarkb> I have intentionally avoided scheduling opendev ptg time as we do tend to be too busy for that
19:05:58 * frickler wouldn't mind skipping. also +1 to the latter
19:06:11 <fungi> yeah, the meeting technically doesn't conflict with the ptg schedule since it's not during a ptg timeslot, but i'd be okay with skipping
19:06:25 <clarkb> lets say we'll skip then and if something important comes up I can send out an agenda and reschedule it
19:06:30 <fungi> even if just to have one fewer obligation next week
19:06:31 <clarkb> but for now we'll say there is no meeting next week
19:06:38 <frickler> I also plan to more regularly skip the meeting during the summer time, but not set in stone yet
19:07:08 <clarkb> thanks for the heads up
19:07:39 <clarkb> #topic Zuul-launcher image builds
19:07:54 <clarkb> I think zuul has been able to dog food zuu launcher images and nodes a fair bit recently which is neat
19:08:11 <corvus> yeah i think the launcher is performing sufficiently well that we can expand its use
19:08:16 <corvus> i think we can switch the zuul tenant over to using it exclusively
19:08:21 <corvus> should we think about switching the opendev tenant too?
19:08:28 <clarkb> we do still need more image builds to be pushed up to https://opendev.org/opendev/zuul-providers/src/branch/master/zuul.d/image-build-jobs.yaml
19:08:49 <clarkb> corvus: no objections to switching opendev but that may need more images. Probably good motivation to get ^ done
19:09:04 <frickler> +1 to opendev
19:09:16 <clarkb> that is something I may be able to look at later this week or during the ptg next week depending on how gerrit things go
19:09:22 <fungi> i'm in favor
19:10:05 <corvus> sounds good, i'll make changes to switch the node usage over
19:10:29 <clarkb> corvus: are there any major known missing pieces to the system at this point?
19:10:29 <corvus> would be great for not-corvus to add more image jobs
19:10:42 <clarkb> or are we in the unknown unknowns right now and so more use is really what we need?
19:11:06 <corvus> i will also add the image build jobs to periodic pipeline to make sure they're rebuilt frequently
19:11:22 <corvus> i think we need to add statsd
19:11:31 <corvus> i don't think the launcher emits any useful stats now
19:11:37 <clarkb> oh ya that would be good
19:12:09 <corvus> but other than that, as far as the sort of main-line functionality, i think it's generally there, and we're in the unknown-unknowns phase
19:12:44 <frickler> does autohold work the same as with nodepool?
19:12:45 <clarkb> sounds like we have a rough plan for next steps then. statsd, more images, periodic builds, switch opendev jobs over
19:13:00 <fungi> one question springs to mind: how do we go about building different images at different frequencies now? dedicatd pipelines?
19:13:11 <corvus> frickler: probably not, that may be a missing piece
19:13:17 <corvus> fungi: yep
19:13:27 <frickler> fungi: maybe doing some in periodic-weekly would be good enough?
19:13:34 <fungi> yeah, i think so
19:13:43 <fungi> no need to over-complicate it
19:13:53 <clarkb> ++ to daily and weekly
19:14:09 <fungi> we always have the option of adding complexity later if we get bored and hate ourselves enough
19:14:48 <fungi> [masochist sysadmin stereotype]
19:14:58 <corvus> if we switch opendev over, and autohold isn't working and we need it, we can always dynamically switch back
19:15:18 <frickler> so the switch has to be per tenant or can we revert per repo if needed?
19:15:25 <clarkb> it is per job I think
19:15:30 <corvus> basically just saying: even if we switch the tenant over, we can always change back one project/job/change even.
19:15:32 <clarkb> so should be very flexible if we need an autohold
19:15:32 <corvus> yeah per job
19:15:41 <frickler> ah, cool
19:15:51 <clarkb> I think that works as a fallback
19:15:51 <fungi> the power of zuul
19:16:15 <clarkb> anything else on this subject?
19:16:22 <corvus> new labels will be like "niz-ubuntu-noble-8gb" and if you need to switch back to nodepool, just change it to "ubuntu-noble"
19:16:25 <corvus> that's it from me
19:16:34 <frickler> not directly related, but we still have held noble builds
19:16:49 <frickler> not sure if zuul uses neutron in any tests?
19:17:00 <clarkb> there are nodepool jobs that test against a real openstack
19:17:05 <clarkb> so those might be affected if they run on noble
19:17:35 <frickler> those likely will be broken until ubuntu publishes a fixed kernel, which is planned for the week of the 14th
19:17:46 <clarkb> ack
19:17:58 <clarkb> fwiw the noble nodes I booted toreplace old servers have ip6tables rules that look correct to me
19:18:00 <frickler> #link https://bugs.launchpad.net/neutron/+bug/2104134 for reference
19:18:08 <clarkb> so the brokeness must be in a very specific part of the ipv6 firewall handling
19:18:19 <frickler> yes, it is only a special ip6tables module that is missing
19:18:34 <clarkb> ack
19:18:36 <clarkb> #topic Container hygiene tasks
19:18:43 <clarkb> #link https://review.opendev.org/q/topic:%22opendev-python3.12%22+status:open Update images to use python3.12
19:18:58 <clarkb> we updated matrix-eavesdrop and accessbot last week to python3.12
19:19:05 <clarkb> accessbot broke on an ssl thing that fungi fixed up
19:19:14 <fungi> it happens
19:19:24 <clarkb> otherwise things are happy. I think this effort will be on hiatus this week while we wait for the openstack release and I focus on other things
19:19:30 <fungi> just glad to be fixing things for a change rather than breaking them
19:19:50 <clarkb> But so far no major issues with python3.12
19:20:37 <clarkb> In related news I did try to test zuul wit python3.13 to see if we can skip 3.12 for zuul entirely. Unfortunately, zuul relies on google's re2 python package which doesn't have 3.13 wheels yet and they require libabsl and pybind11 versiosn that are too new even for noble
19:20:53 <clarkb> if we really want to we can use bazel to build packages for us which should fetch all the deps and do the right thing
19:21:10 <clarkb> but for now I'm hopeful upstream pushes new wheels (there is a change proposed to add 3.13 wheel builds already)
19:21:48 <clarkb> #topic Booting a new Gerrit server
19:22:17 <clarkb> high on my todo list for early april is buidling a new gerrit server so that we can change the production server over to a newer os version
19:22:45 <clarkb> the rough plan I've got in mind is boot the new server end of this week, early next week. Then late the week after do the production cutover (~April 17/18(
19:23:21 <clarkb> but the first step in doing that is deciding where to boot it. Each of the available options has downsides and upsides. I personally think the best option is to stay where we are and use a non boot from volume v3 flavor in vexxhost ymq
19:23:47 <clarkb> the reason for that is the most stable gerrit has ever been for us has been running on the large flavor (particularly with extra memory) in vexxhost and we don't have access to large nodes like that elsewhere
19:24:00 <clarkb> the downside to this location is ipv6 connectivity has been flaky for some isps in europe
19:24:57 <clarkb> alternatives would be rackspcae classic, the main drawbacks are that we'd probably have to redeploy to rax flex sooner than later (or back to vexxhost) and flavors are smaller aiui. Or ovh. The downside to ovh is their billing is weird and sometimes things go away unexpectedly
19:25:23 <clarkb> all that to say that my vote is for vexxhost ymq and if I don't hear strong objections or suggestions otherwise I'll probably boot a review03 there within the next week
19:25:44 <fungi> yeah, part of me feels wasteful because we're probably using a flavor twice the size of what we could get away with, but also we haven't been asked to scale it down and this is the core of all our project workflows
19:26:09 <clarkb> the day to day size is smaller than we need but whenever we have to do offline reindexing we definitely benefit from the memory and cpus
19:26:09 <fungi> so anything to help with stability makes sense
19:26:27 <clarkb> so bigger is good for upgrades/downgrades and unexpected spieks in demand
19:26:47 <fungi> as for doing it in other providers, i'm not sure we have any with similarly large flavors on offer
19:26:59 <frickler> I agree the current alternatives doen't look too good, so better live with the IPv6 issues and periodically nag mnaser ;=D
19:27:27 <clarkb> once the server is up (whereever it goes) the next step will be to put the server in the inventory safely (no replication config) and sync data safely from review02 (again don't copy the replication config)
19:27:29 <frickler> raxflex would be nice if it finally had IPv6
19:27:46 <fungi> agreed, for a lot of our control plane in fact
19:27:47 <clarkb> that should allows us to check everything is working before scheduling a downtime and doing a final sync over on ~April 17-18
19:28:13 <frickler> not sure if it would be an option to wait for that with gerrit? likely not with not schedule for it yet
19:28:27 <frickler> *no schedule
19:28:35 <clarkb> I don't think we should wait
19:28:59 <clarkb> there is never a great time to make changes to Gerrit so we just have to pick less bad times and roll with it
19:30:06 <clarkb> we don't have to make any hard decisions right this moment. I probably won't get to this until thursday or friday at the earliest. Chew on it and let me know if you have objectiosn or other ideas
19:30:13 <frickler> fair enough
19:30:17 <clarkb> I also wanted to note that Gerrit just made a 3.10.5 release
19:30:33 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/946050 Gerrit 3.10.5 image update
19:30:48 <clarkb> the release notes don't look urgent to me so I'm fine holding off on this until after the openstack release
19:30:53 <clarkb> but I think late this week we should try to sneak this in too
19:32:05 <clarkb> just a heads up that is on my todo list. Don't awnt anyone to be surprised by a gerrit update particularly during release week
19:32:12 <clarkb> #topic Upgrading old servers
19:32:56 <clarkb> Since our last meeting I ended up replacing the rax.iad and rax.dfw servers. rax.iad was to update the base os and rax.dfw was to get rescheduled to avoid network bandwidth issues
19:33:00 <clarkb> both are running noble now
19:33:30 <clarkb> haven't seen any complaints and in the case of dfw we made the region useable again in the process
19:34:03 <fungi> yeah, that was a good impulse
19:34:41 <fungi> seems like there was either something happening with the hypervisor host the old one was running on, or some arbitrary rate limit applied to the server instance's interface
19:35:01 <clarkb> for other servers in the pipeline this is the "easy" list: refstack, mirror-update, eavesdrop, zuul schedulers, and zookeeper servers
19:35:20 <clarkb> I'd appreciate any help others can offer on this especially as I'm going to shift my focus on gerrit for the next bit
19:35:24 <fungi> we haven't heard back from rackspace folks on any underlying cause yet, that i've seen
19:35:33 <clarkb> fungi: right I haven't seen any root cause
19:35:39 <clarkb> but we left the old server up so they could continue to debug
19:35:43 <clarkb> cleaning it up will happen later
19:36:56 <clarkb> Oh also refstack may be the lowest priority as I'm not sure that there is anyone maintaining the software anymore
19:37:02 <clarkb> but the others are all valid I think
19:37:34 <clarkb> and there are more on the hard list (gerrit is on that list too)
19:37:36 <fungi> also we got confirmation from the foundation staff that they no longer rely on it for anything
19:37:44 <fungi> (refstack i mean)
19:38:29 <frickler> so announce deprecation and shut off in a year instead of migrating?
19:38:36 <clarkb> frickler: ya or even sooner
19:38:45 <clarkb> that rough plan makes sense to me.
19:39:01 <frickler> sure, I just wanted to avoid a "that's too fast" response ;)
19:39:25 <clarkb> but ya my next focus on this topic is gerrit. Would be great if others can fill in some of the gaps around that
19:39:34 <clarkb> and let me know if there are questions concerns generally on the proces
19:39:39 <clarkb> #topic Running certcheck on bridge
19:39:47 <clarkb> this is on the agenda mostly so I don't forget to look at it
19:39:57 <clarkb> I don't have any updates and don't think anyone else does so probably not much to say and we can continue
19:40:04 <clarkb> but I'll wait for a minute in case I'm wrong about that
19:41:31 <clarkb> #topic Working through our TODO list
19:42:03 <clarkb> If what we've discussed above doesn't inspire you or fill your todo list full of activities we do have a separate larger and broader list you can look at for inspiration
19:42:09 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup
19:42:25 <clarkb> much of what we discuss week to week falls under this list but I just like to remind people we've got even more to dive into if there is interest
19:42:33 <clarkb> this applies to existing and new contributors alike
19:42:46 <clarkb> and feel free to reach out to me with questions if you have them about anything we're doing or have on that todo list
19:42:52 <clarkb> #topic Rotating mailman 3 logs
19:43:14 <clarkb> fungi: do we have any changes for this yet? speaking of autoholds this might be a good cause for autoholds to test whether copytruncate is useable
19:43:23 <fungi> no, sorry, not yet
19:44:18 <clarkb> ack its been busy lately and I think this week is no different. but might be good to throw something up and get it held so that we can observe the behavior over time
19:45:26 <clarkb> #topic Open Discussion
19:45:53 <clarkb> among everything else this week I'm juggling some family stuff and I'll be out Monday (so missing day 1 of the ptg)
19:46:41 <clarkb> as a perfect example I'm doing the school run this afternoon so will be out for a bit in 1.5 hours
19:46:42 <frickler> ah, that reminds me to drop the osc-placement autohold ;)
19:46:45 <fungi> i'll be around, happy to keep an eye on things when i'm not leading ptg sessions
19:47:21 <fungi> also for tomorrow's openstack release i'm going to try to be online by 10:00 utc (6am local for me) in case anything goes sideways
19:47:39 <clarkb> I won't be awake that early but when I do awke I'll check in on things and can help out if necessary too
19:48:19 * tonyb will be around for the release also
19:49:17 <fungi> much appreciated
19:50:07 <clarkb> anything else?
19:50:19 <clarkb> hopefully everyone is able to celebrate the release tomorrow
19:51:06 <clarkb> thank you everyone for your time and effort and help.
19:51:28 <fungi> i'll be hosting an openinfra.live episode on thursday where openstack community leaders will talk about new features and changes in various components
19:51:32 <clarkb> As mentioned at the beginning of the meeting we will skip next week's meeting unless something important comes up. That way you can all enjoy the PTG and not burn out on meetings as quickly
19:52:08 <tonyb> Nice .... more sleep for me :)
19:52:23 <clarkb> that too :)
19:52:36 <clarkb> #endmeeting