19:01:18 <clarkb> #startmeeting infra 19:01:18 <opendevmeet> Meeting started Tue Jun 6 19:01:18 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:18 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:18 <opendevmeet> The meeting name has been set to 'infra' 19:01:34 <clarkb> there we go was wondering what happened to the bot 19:01:37 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/Y3ZTMR6ZJJDWPZPNWYB32UC2HHGFZH73/ Our Agenda 19:01:48 <clarkb> This just went out. Sorry about that, I got super nerd sniped yesterday afternoon looking into the python and openafs thing 19:02:13 <clarkb> debuginfod is really cool and useful btw 19:02:23 <clarkb> #topic Announcements 19:02:40 <clarkb> A reminder that next week we'll skip having a meeting since several of us will be attending the open infra summit 19:03:10 <clarkb> Then for June 20th (the meeting after next) I won't be able to make it as I'll be in the middle of travel. I'm happy for others to run a meeting if they like. I just can't run it myself 19:03:59 <clarkb> #topic Topics 19:04:12 <clarkb> I removed the quay topic. I think its basically in a steady state situation now 19:04:22 <clarkb> dib functional testing is working again too 19:04:28 <clarkb> #topic Bastion Host Updates 19:04:48 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups 19:05:13 <clarkb> tonyb appears to have reviewed this stack, thanks! Fungi mentioned he would take a look too but I don't think that has happened yet 19:05:26 <clarkb> The other thing I was thinking about recently is we should probably start looking at updating our ansible version bridge to ansible 8 (we are currently 7) 19:05:30 <fungi> no, i've been too distracted by other things, sorry 19:05:47 <clarkb> In theory that will be self testing and if we get the change that bumps things up to run all the system-config-run-* jobs we should get really good coverage of it 19:06:11 <clarkb> Not sure if anyone is interested in doing that. I don't think I have time for the next couple of weeks but I may start looking after if no one else beats me to it 19:06:27 <clarkb> Throwing it out there if there is interest since I think it could be a good one as our testing for it should be robust 19:06:40 <ianw> i think we're testing the git master so it should be fairly easy 19:07:06 <clarkb> ianw: I think that has been failing though. But proposing a change to move the cap from <8 to <9 should do what we need 19:07:15 <clarkb> and then ensuring all the jobs we want to trigger also trigger 19:07:45 <fungi> zuul is well behind that in what versions it supports, but shouldn't be an issue for our nested ansible calls, right? 19:07:52 <clarkb> fungi: correct 19:07:59 <clarkb> they are pretty well separated in our environment 19:08:46 <clarkb> #topic Mailman 3 19:08:55 <clarkb> fungi: any updates with your testing of the vhost fixes? 19:09:06 <fungi> nope, distractions 19:09:09 <fungi> sorry 19:09:20 <clarkb> hopefully post summit we'll all have fewer distractions 19:09:26 <clarkb> #topic Gerrit Updates 19:09:34 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/884779 Revert bind mounts for Gerrit plugin data 19:10:01 <clarkb> I pushed this change because I think I've decided that in he short term the best thing for us may be to just clear out that data when we launch new gerrit containers 19:10:13 <clarkb> I'd love feedback on that and/or the changes I pushed to manually clear things on gerrit startup 19:10:20 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk\ 19:10:46 <clarkb> I don't think this second change will clear all the files we need to clear but it will be easier to see what else is leaking after we land it if we decide to go that route instead. Feedback very much welcome either way or if you have alternative suggestions 19:11:08 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/885317 Update Gerrit 3.8 image to 3.8.0 final release 19:11:31 <clarkb> I also pushed this update yesterday to update our 3.8 image. THis won't affect production, but will make our upgrade testing a bit more realistic 19:12:30 <clarkb> #topic Upgrading Old Servers 19:12:36 <clarkb> #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes 19:12:51 <clarkb> I've replaced the old zp01 server with a zp02 server. Though nothing may be using it... 19:13:04 <clarkb> Still on the list are a handful of mirror nodes, meetpad, and I think the insecure ci registry 19:13:27 <clarkb> I'm going to continue to try and pick them off one by one as I can. Probably mirror nodes will be my next target 19:13:39 <clarkb> Help welcome 19:13:58 <tonyb> can you link to the zp01 change? 19:14:02 <clarkb> tonyb: yes one sec 19:14:08 <tonyb> that'd help me grok what's needed 19:14:20 <tonyb> rather than thinking I know what to do ;P 19:14:37 <clarkb> tonyb: https://review.opendev.org/q/topic:replace-zp01+OR+topic:zp02 19:15:17 <clarkb> tonyb: you do need a root to laucnh the new node, but if you propose a change like https://review.opendev.org/c/opendev/system-config/+/885076 which updtes the test node label type so that testing shows jammy works then I'm happy to do that with/for you 19:15:37 <tonyb> perfect 19:15:38 <clarkb> basically mock it up and see that testing sows it is happy then I can launch the node and stick it in the change in the inventory hosts file 19:16:17 <clarkb> tonyb: insecure-ci-registry would probably be a good one and or the mirrors. Since they have minimal state on host. meetpad is a bit weird because we need to sort out how to replace the control plane for that service 19:16:38 <clarkb> Relatd to all this but slightly different is the update of zuul servers to jammy for podman potentual use 19:16:44 <clarkb> (crrently going to jammy but still using docker) 19:17:14 <tonyb> got it 19:17:33 <clarkb> All of the mergers have been replaced and 6 executors have been booted. THis exposed a behavior in new openafs where lseek()ing the openafs ioctl device/file crashes the process with a kernel oops 19:18:02 <clarkb> this was a problem with zuul because it uses python open() which does an lseek under the hood. corvus replaced that open() with an os.open() which does not do any magic under the hood 19:18:19 <corvus> (actually all 12 booted) 19:18:26 <clarkb> corvus: oh cool I thought it was only 6 19:18:50 <clarkb> static02 is a jammy node running jammy openafs and that has been functional so we expect that once we launch updated executor code on the new nodes they should be happy 19:18:55 <clarkb> this was a specific corner case that was sad 19:18:56 <corvus> yeah, plan was to get it all done before monday :/ 19:19:37 <ianw> so was this https://gerrit.openafs.org/#/c/14918/ ? i'm unclear if that fix wasn't there or if it's a different issue 19:20:13 <clarkb> ianw: the problem persisted in the 1.8.9 package the ppa built 19:20:25 <clarkb> I think you expected that version to include that patch? if so then I don't think that patch was the fix 19:20:40 <clarkb> however the description in that change definitely seems to match the behavior we saw 19:21:07 <corvus> (though i agree, the words in the commit message sure make that sound like it should be a fix) 19:21:10 <ianw> yeah, i perhaps miscalculated if that patch is there ... 19:21:56 <clarkb> in any case I think it is unlikely we'll be opening that structure outside of the single place zuul does it or in openafs itself 19:22:24 <clarkb> and openafs itself seems to be working on static02 so we're prbably good? Lets proceed and keep an eye on it and we can always look at rebuilding new ppa packages later if necessary 19:23:11 <ianw> $ cat src/afs/LINUX/osi_ioctl.c | grep default_llseek 19:23:12 <ianw> $ 19:23:25 <ianw> ... sigh ... you would have thought I'd think to check that :/ 19:23:29 <corvus> (pam modules would be the other place to watch out for that) 19:23:47 <ianw> that at least explains why 1.8.9 didn't fix it for us 19:24:06 <clarkb> cool I think that gives us a path forward should this problem continue to persist. We can build a 1.8.9 with that fix backported 19:24:09 <ianw> so the counterpoint to that is that we could pull that patch in 19:24:25 <clarkb> ianw: yup or convince ubuntu to pull it in maybe 19:24:34 <ianw> sorry about that. i don't know why i didn't think of it until just then 19:24:44 <clarkb> but it doesn't seem necessary yet so I think we can proceed with distro 1.8.8 and take it from there 19:25:35 <clarkb> Anything else related to server updates? 19:26:17 <ianw> it is in the openafs-stable-1_8_x branch in the openafs git 19:26:35 <fungi> fixed in 1.9.x at all? 19:26:57 <fungi> mainly curious, i have no idea how much unrelated breakage 1.9 will mean 19:27:01 <clarkb> cool so we can fetch the backport out of that branch then and it should apply cleanly to the existing packaging 19:27:07 <ianw> that was what i looked at. but that doesn't seem to have tags 19:27:18 <clarkb> fungi: I think it merged to 1.9 19:27:26 <fungi> ahh 19:27:33 <clarkb> ya I think 1.9 is master right now? 19:28:55 <clarkb> #topic Fedora cleanup 19:29:18 <clarkb> tonyb: you've been poking at the zuul-jobs role stuff for this and corvus pointed out the possibiltiy of using the new thing that we never actually switched to... 19:29:34 <tonyb> Yeah 19:29:36 <clarkb> tonyb: did you have thoughts on what makes sense for pushing this ahead? 19:29:57 <tonyb> I'm looking for feedback on timing and priorities 19:30:18 <fungi> (there's an openafs 1.9.1 or 1.9.2 maybe, so they're supposedly releasing from it, but maybe they aren't making tags there) 19:30:27 <tonyb> The new thinng looks good but I admit I'm not in a position to judge the effort needed to make it a reality 19:30:36 <clarkb> I'll be honest my personal priority for this is low, I was just trying to find easy wins for maybe adding rocky/bookworm mirroring 19:31:01 <tonyb> I get that the fedora_mirror_enabled is a hack 19:31:13 <clarkb> for this reason I'm personally happy to take our time and add the new thing to configure mirrors. But that likely would take longer than bookworms release date 19:31:40 <frickler> how much effort is increasing afs capacity? 19:31:44 <clarkb> tonyb: the main thing is going to be adding configuration for the new thing and adding the new thing to the base-test base jobs and then reparenting a representative sampling of jobs to base-tests to ensure it is doing what we expect 19:31:56 <tonyb> corvus: It'd be good to get your thoughts on how desireable the new thing is 19:32:05 <fungi> bookworm release day is saturday, btw 19:32:08 <clarkb> frickler: you need to add volumes to existing servers (easy but increases potential for failures) or add new servers (more work) 19:33:07 <tonyb> I hesitate, could we "fork" the configure-mirrors role into openstack-jobs, remove the fedora stuff while I do the right thin in zuul? 19:33:29 <tonyb> clarkb: Yup I can totally do that. 19:33:36 <clarkb> tonyb: you'd probably need to fork it into opendev/base-jobs 19:33:39 <frickler> maybe it still would be worth to decouple cleaning up old mirrors from setting up new stuff? 19:33:45 <clarkb> tonyb: since this will affect all opendev users 19:33:46 <corvus> well, the new thing is designed to have the kind of flexibility we apparently are now starting to require, so i think it's better for opendev and the wider community (in that it lets others actually use mirror roles in zuul-jobs which, basically, we're the only ones who can actually use right now) 19:33:57 <tonyb> Okay, same idea but wrong repo ;P 19:34:27 <clarkb> frickler: yes that is doable too. It would still be lowish priority for me though which is why I was looking for easy wins 19:34:41 <clarkb> I just don't have time to add new distro content when I can barely keep up with what we already have so my personal preference is cleanup first 19:35:02 <corvus> i think it's the classic long-term/short-term balancing act, and i don't have a good read on making that decision. so i'm just able to provide background. :) 19:35:18 <frickler> iirc the patch to add bookworm mirroring is already present, just needs afs capacity 19:35:29 <tonyb> corvus: okay. 19:35:41 <clarkb> frickler: that and cleanup of buster 19:36:14 <clarkb> I think if I were trying to lead this I would look at doing the new mirror setup thing, test it with base-test, update base, clean up fedora mirroring, then decide if we need to adjust capacity from there or not 19:36:25 <clarkb> because from where I'm sitting we can't keep up so reducing effort first is a win 19:36:52 <clarkb> but if others want to push bookworm ahead and do something different i'm ok with that too 19:37:07 <tonyb> Okay. We'll make that the plan. 19:37:20 <clarkb> also we can do the two things concurrently which is nice 19:37:24 <clarkb> they don't conflict with each other 19:37:44 <tonyb> I'll put myself on the hook for making that happen .... as long as I can count on help/support for doing the new mirror_info thing 19:38:01 <clarkb> yup I can contiue to help 19:38:14 <tonyb> perfect 19:39:38 <clarkb> #topic Storyboard 19:39:58 <clarkb> I think fungi has been keeping up with the updates there. But anything I've missed worth calling out? 19:40:09 <fungi> nothing new since last week, afaik 19:41:13 <clarkb> #topic Open Infra Summit 19:42:13 <clarkb> I sent out an email trying to organize a low key gathering for those of us that will be there (for Zuul and OpenDev but really no one is counting). The beer gardne worked really well in berlin and about 2.5km away from the summit venue is a brewery with a ton of outdoor picnic table type setups 19:42:35 <clarkb> I'm hoping it doesn't rain (current forecast says it will rain in the morning but be dry in the afternoon) and we can go hang out Thursday at 6ish there 19:43:10 <clarkb> It looks like it may be a little cool. 67F/19C as a high and it will probably be a bit cooler in the evening so fingers crossed that still works out 19:43:20 <clarkb> if it gets too cold or rainy we'll figure it out then 19:44:02 <corvus> that's warmer and drier than here now... maybe i should open a beer 19:44:03 <clarkb> Also a reminder that next week is the summit. I expect it will get quiet around here. 19:44:54 <clarkb> #topic Open Discussion 19:44:56 <clarkb> Anything else? 19:45:52 <frickler> I'm slowly working my way through zuul config error cleanups 19:46:10 <corvus> related to the ongoing work to clean up errors, there are some zuul changes arriving soon that will hopefully help with that. new layout for the config errors page, ability to filter, and display of warnings. 19:46:35 <corvus> (i also have adding sorting to that list, but that will make more sense after the warnings arrive, so that may be a few changes away still) 19:46:40 <frickler> got the consent from the TC now to force-merge things that get stalled due so failing CI 19:46:47 <clarkb> frickler: I pushed a DNM change to confirm that github reports there are no valid merge methods for ansible and testinfra :/ 19:47:02 <clarkb> frickler: but I think your removal of the project listings is working fine so we don't need to dig into that with any urgency 19:47:39 <frickler> clarkb: yes, I saw that. some day zuul will likely need to switch to graphql for that 19:48:22 <clarkb> or figure out if different perms are now required 19:49:05 <frickler> corvus: I've switched to using the json from the API, will that also change? 19:49:48 <corvus> frickler: so far only new fields 19:50:53 <clarkb> I think I can give everyone 10 minutes back 19:51:01 <clarkb> feel free to continue discussion in #opendev or on the mailing list 19:51:09 <clarkb> thank you for your time and help! 19:51:11 <clarkb> #endmeeting