19:00:06 <clarkb> #startmeeting infra 19:00:06 <opendevmeet> Meeting started Tue Apr 1 19:00:06 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:06 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:06 <opendevmeet> The meeting name has been set to 'infra' 19:00:17 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/5PQX3P4NIXU6FRRQRWPTQSZNICSJJVFF/ Our Agenda 19:00:20 <clarkb> #topic Announcements 19:00:33 <clarkb> OpenStack is going to release its Epoxy 2025.1 release tomorrow 19:00:42 <clarkb> keep that in mind as we make changes over the next 24 hours or so 19:01:32 <clarkb> then the virtual PTG is being hosted next week (April 7 - 11) with meetpad being the default location for teams (they can choose to override the location if they wish) 19:01:39 <frickler> in particular hold back on https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/941246 until the release is done, please 19:02:00 <clarkb> earlier today fungi and I tested meetpad functionality and it seems to be working so I went ahead and put meetpad02 and jvb02 in the emergency file 19:02:12 <frickler> what about etherpad? 19:02:25 <clarkb> this way new container images from upstream won't unexpectedly break us mid PTG. We can remove the hosts from the emergency file Friday afternoon 19:02:34 <clarkb> frickler: etherpad uses images we build so shouldn't get auto updated 19:03:01 <frickler> ah, right 19:03:12 <fungi> (but please don't push and approve any etherpad image update changes) 19:04:04 <clarkb> ++ 19:04:19 <clarkb> anything else to announce? 19:04:39 <frickler> do we want to skip the meeting next week? 19:05:23 <clarkb> good question. I'm happy to host it if we like but also happy to skip if people think they will be too busy with the ptg 19:05:38 <clarkb> I have intentionally avoided scheduling opendev ptg time as we do tend to be too busy for that 19:05:58 * frickler wouldn't mind skipping. also +1 to the latter 19:06:11 <fungi> yeah, the meeting technically doesn't conflict with the ptg schedule since it's not during a ptg timeslot, but i'd be okay with skipping 19:06:25 <clarkb> lets say we'll skip then and if something important comes up I can send out an agenda and reschedule it 19:06:30 <fungi> even if just to have one fewer obligation next week 19:06:31 <clarkb> but for now we'll say there is no meeting next week 19:06:38 <frickler> I also plan to more regularly skip the meeting during the summer time, but not set in stone yet 19:07:08 <clarkb> thanks for the heads up 19:07:39 <clarkb> #topic Zuul-launcher image builds 19:07:54 <clarkb> I think zuul has been able to dog food zuu launcher images and nodes a fair bit recently which is neat 19:08:11 <corvus> yeah i think the launcher is performing sufficiently well that we can expand its use 19:08:16 <corvus> i think we can switch the zuul tenant over to using it exclusively 19:08:21 <corvus> should we think about switching the opendev tenant too? 19:08:28 <clarkb> we do still need more image builds to be pushed up to https://opendev.org/opendev/zuul-providers/src/branch/master/zuul.d/image-build-jobs.yaml 19:08:49 <clarkb> corvus: no objections to switching opendev but that may need more images. Probably good motivation to get ^ done 19:09:04 <frickler> +1 to opendev 19:09:16 <clarkb> that is something I may be able to look at later this week or during the ptg next week depending on how gerrit things go 19:09:22 <fungi> i'm in favor 19:10:05 <corvus> sounds good, i'll make changes to switch the node usage over 19:10:29 <clarkb> corvus: are there any major known missing pieces to the system at this point? 19:10:29 <corvus> would be great for not-corvus to add more image jobs 19:10:42 <clarkb> or are we in the unknown unknowns right now and so more use is really what we need? 19:11:06 <corvus> i will also add the image build jobs to periodic pipeline to make sure they're rebuilt frequently 19:11:22 <corvus> i think we need to add statsd 19:11:31 <corvus> i don't think the launcher emits any useful stats now 19:11:37 <clarkb> oh ya that would be good 19:12:09 <corvus> but other than that, as far as the sort of main-line functionality, i think it's generally there, and we're in the unknown-unknowns phase 19:12:44 <frickler> does autohold work the same as with nodepool? 19:12:45 <clarkb> sounds like we have a rough plan for next steps then. statsd, more images, periodic builds, switch opendev jobs over 19:13:00 <fungi> one question springs to mind: how do we go about building different images at different frequencies now? dedicatd pipelines? 19:13:11 <corvus> frickler: probably not, that may be a missing piece 19:13:17 <corvus> fungi: yep 19:13:27 <frickler> fungi: maybe doing some in periodic-weekly would be good enough? 19:13:34 <fungi> yeah, i think so 19:13:43 <fungi> no need to over-complicate it 19:13:53 <clarkb> ++ to daily and weekly 19:14:09 <fungi> we always have the option of adding complexity later if we get bored and hate ourselves enough 19:14:48 <fungi> [masochist sysadmin stereotype] 19:14:58 <corvus> if we switch opendev over, and autohold isn't working and we need it, we can always dynamically switch back 19:15:18 <frickler> so the switch has to be per tenant or can we revert per repo if needed? 19:15:25 <clarkb> it is per job I think 19:15:30 <corvus> basically just saying: even if we switch the tenant over, we can always change back one project/job/change even. 19:15:32 <clarkb> so should be very flexible if we need an autohold 19:15:32 <corvus> yeah per job 19:15:41 <frickler> ah, cool 19:15:51 <clarkb> I think that works as a fallback 19:15:51 <fungi> the power of zuul 19:16:15 <clarkb> anything else on this subject? 19:16:22 <corvus> new labels will be like "niz-ubuntu-noble-8gb" and if you need to switch back to nodepool, just change it to "ubuntu-noble" 19:16:25 <corvus> that's it from me 19:16:34 <frickler> not directly related, but we still have held noble builds 19:16:49 <frickler> not sure if zuul uses neutron in any tests? 19:17:00 <clarkb> there are nodepool jobs that test against a real openstack 19:17:05 <clarkb> so those might be affected if they run on noble 19:17:35 <frickler> those likely will be broken until ubuntu publishes a fixed kernel, which is planned for the week of the 14th 19:17:46 <clarkb> ack 19:17:58 <clarkb> fwiw the noble nodes I booted toreplace old servers have ip6tables rules that look correct to me 19:18:00 <frickler> #link https://bugs.launchpad.net/neutron/+bug/2104134 for reference 19:18:08 <clarkb> so the brokeness must be in a very specific part of the ipv6 firewall handling 19:18:19 <frickler> yes, it is only a special ip6tables module that is missing 19:18:34 <clarkb> ack 19:18:36 <clarkb> #topic Container hygiene tasks 19:18:43 <clarkb> #link https://review.opendev.org/q/topic:%22opendev-python3.12%22+status:open Update images to use python3.12 19:18:58 <clarkb> we updated matrix-eavesdrop and accessbot last week to python3.12 19:19:05 <clarkb> accessbot broke on an ssl thing that fungi fixed up 19:19:14 <fungi> it happens 19:19:24 <clarkb> otherwise things are happy. I think this effort will be on hiatus this week while we wait for the openstack release and I focus on other things 19:19:30 <fungi> just glad to be fixing things for a change rather than breaking them 19:19:50 <clarkb> But so far no major issues with python3.12 19:20:37 <clarkb> In related news I did try to test zuul wit python3.13 to see if we can skip 3.12 for zuul entirely. Unfortunately, zuul relies on google's re2 python package which doesn't have 3.13 wheels yet and they require libabsl and pybind11 versiosn that are too new even for noble 19:20:53 <clarkb> if we really want to we can use bazel to build packages for us which should fetch all the deps and do the right thing 19:21:10 <clarkb> but for now I'm hopeful upstream pushes new wheels (there is a change proposed to add 3.13 wheel builds already) 19:21:48 <clarkb> #topic Booting a new Gerrit server 19:22:17 <clarkb> high on my todo list for early april is buidling a new gerrit server so that we can change the production server over to a newer os version 19:22:45 <clarkb> the rough plan I've got in mind is boot the new server end of this week, early next week. Then late the week after do the production cutover (~April 17/18( 19:23:21 <clarkb> but the first step in doing that is deciding where to boot it. Each of the available options has downsides and upsides. I personally think the best option is to stay where we are and use a non boot from volume v3 flavor in vexxhost ymq 19:23:47 <clarkb> the reason for that is the most stable gerrit has ever been for us has been running on the large flavor (particularly with extra memory) in vexxhost and we don't have access to large nodes like that elsewhere 19:24:00 <clarkb> the downside to this location is ipv6 connectivity has been flaky for some isps in europe 19:24:57 <clarkb> alternatives would be rackspcae classic, the main drawbacks are that we'd probably have to redeploy to rax flex sooner than later (or back to vexxhost) and flavors are smaller aiui. Or ovh. The downside to ovh is their billing is weird and sometimes things go away unexpectedly 19:25:23 <clarkb> all that to say that my vote is for vexxhost ymq and if I don't hear strong objections or suggestions otherwise I'll probably boot a review03 there within the next week 19:25:44 <fungi> yeah, part of me feels wasteful because we're probably using a flavor twice the size of what we could get away with, but also we haven't been asked to scale it down and this is the core of all our project workflows 19:26:09 <clarkb> the day to day size is smaller than we need but whenever we have to do offline reindexing we definitely benefit from the memory and cpus 19:26:09 <fungi> so anything to help with stability makes sense 19:26:27 <clarkb> so bigger is good for upgrades/downgrades and unexpected spieks in demand 19:26:47 <fungi> as for doing it in other providers, i'm not sure we have any with similarly large flavors on offer 19:26:59 <frickler> I agree the current alternatives doen't look too good, so better live with the IPv6 issues and periodically nag mnaser ;=D 19:27:27 <clarkb> once the server is up (whereever it goes) the next step will be to put the server in the inventory safely (no replication config) and sync data safely from review02 (again don't copy the replication config) 19:27:29 <frickler> raxflex would be nice if it finally had IPv6 19:27:46 <fungi> agreed, for a lot of our control plane in fact 19:27:47 <clarkb> that should allows us to check everything is working before scheduling a downtime and doing a final sync over on ~April 17-18 19:28:13 <frickler> not sure if it would be an option to wait for that with gerrit? likely not with not schedule for it yet 19:28:27 <frickler> *no schedule 19:28:35 <clarkb> I don't think we should wait 19:28:59 <clarkb> there is never a great time to make changes to Gerrit so we just have to pick less bad times and roll with it 19:30:06 <clarkb> we don't have to make any hard decisions right this moment. I probably won't get to this until thursday or friday at the earliest. Chew on it and let me know if you have objectiosn or other ideas 19:30:13 <frickler> fair enough 19:30:17 <clarkb> I also wanted to note that Gerrit just made a 3.10.5 release 19:30:33 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/946050 Gerrit 3.10.5 image update 19:30:48 <clarkb> the release notes don't look urgent to me so I'm fine holding off on this until after the openstack release 19:30:53 <clarkb> but I think late this week we should try to sneak this in too 19:32:05 <clarkb> just a heads up that is on my todo list. Don't awnt anyone to be surprised by a gerrit update particularly during release week 19:32:12 <clarkb> #topic Upgrading old servers 19:32:56 <clarkb> Since our last meeting I ended up replacing the rax.iad and rax.dfw servers. rax.iad was to update the base os and rax.dfw was to get rescheduled to avoid network bandwidth issues 19:33:00 <clarkb> both are running noble now 19:33:30 <clarkb> haven't seen any complaints and in the case of dfw we made the region useable again in the process 19:34:03 <fungi> yeah, that was a good impulse 19:34:41 <fungi> seems like there was either something happening with the hypervisor host the old one was running on, or some arbitrary rate limit applied to the server instance's interface 19:35:01 <clarkb> for other servers in the pipeline this is the "easy" list: refstack, mirror-update, eavesdrop, zuul schedulers, and zookeeper servers 19:35:20 <clarkb> I'd appreciate any help others can offer on this especially as I'm going to shift my focus on gerrit for the next bit 19:35:24 <fungi> we haven't heard back from rackspace folks on any underlying cause yet, that i've seen 19:35:33 <clarkb> fungi: right I haven't seen any root cause 19:35:39 <clarkb> but we left the old server up so they could continue to debug 19:35:43 <clarkb> cleaning it up will happen later 19:36:56 <clarkb> Oh also refstack may be the lowest priority as I'm not sure that there is anyone maintaining the software anymore 19:37:02 <clarkb> but the others are all valid I think 19:37:34 <clarkb> and there are more on the hard list (gerrit is on that list too) 19:37:36 <fungi> also we got confirmation from the foundation staff that they no longer rely on it for anything 19:37:44 <fungi> (refstack i mean) 19:38:29 <frickler> so announce deprecation and shut off in a year instead of migrating? 19:38:36 <clarkb> frickler: ya or even sooner 19:38:45 <clarkb> that rough plan makes sense to me. 19:39:01 <frickler> sure, I just wanted to avoid a "that's too fast" response ;) 19:39:25 <clarkb> but ya my next focus on this topic is gerrit. Would be great if others can fill in some of the gaps around that 19:39:34 <clarkb> and let me know if there are questions concerns generally on the proces 19:39:39 <clarkb> #topic Running certcheck on bridge 19:39:47 <clarkb> this is on the agenda mostly so I don't forget to look at it 19:39:57 <clarkb> I don't have any updates and don't think anyone else does so probably not much to say and we can continue 19:40:04 <clarkb> but I'll wait for a minute in case I'm wrong about that 19:41:31 <clarkb> #topic Working through our TODO list 19:42:03 <clarkb> If what we've discussed above doesn't inspire you or fill your todo list full of activities we do have a separate larger and broader list you can look at for inspiration 19:42:09 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup 19:42:25 <clarkb> much of what we discuss week to week falls under this list but I just like to remind people we've got even more to dive into if there is interest 19:42:33 <clarkb> this applies to existing and new contributors alike 19:42:46 <clarkb> and feel free to reach out to me with questions if you have them about anything we're doing or have on that todo list 19:42:52 <clarkb> #topic Rotating mailman 3 logs 19:43:14 <clarkb> fungi: do we have any changes for this yet? speaking of autoholds this might be a good cause for autoholds to test whether copytruncate is useable 19:43:23 <fungi> no, sorry, not yet 19:44:18 <clarkb> ack its been busy lately and I think this week is no different. but might be good to throw something up and get it held so that we can observe the behavior over time 19:45:26 <clarkb> #topic Open Discussion 19:45:53 <clarkb> among everything else this week I'm juggling some family stuff and I'll be out Monday (so missing day 1 of the ptg) 19:46:41 <clarkb> as a perfect example I'm doing the school run this afternoon so will be out for a bit in 1.5 hours 19:46:42 <frickler> ah, that reminds me to drop the osc-placement autohold ;) 19:46:45 <fungi> i'll be around, happy to keep an eye on things when i'm not leading ptg sessions 19:47:21 <fungi> also for tomorrow's openstack release i'm going to try to be online by 10:00 utc (6am local for me) in case anything goes sideways 19:47:39 <clarkb> I won't be awake that early but when I do awke I'll check in on things and can help out if necessary too 19:48:19 * tonyb will be around for the release also 19:49:17 <fungi> much appreciated 19:50:07 <clarkb> anything else? 19:50:19 <clarkb> hopefully everyone is able to celebrate the release tomorrow 19:51:06 <clarkb> thank you everyone for your time and effort and help. 19:51:28 <fungi> i'll be hosting an openinfra.live episode on thursday where openstack community leaders will talk about new features and changes in various components 19:51:32 <clarkb> As mentioned at the beginning of the meeting we will skip next week's meeting unless something important comes up. That way you can all enjoy the PTG and not burn out on meetings as quickly 19:52:08 <tonyb> Nice .... more sleep for me :) 19:52:23 <clarkb> that too :) 19:52:36 <clarkb> #endmeeting