Tuesday, 2023-06-06

clarkbalmost meeting time18:59
fricklero/19:00
ianwo/19:01
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Jun  6 19:01:18 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkbthere we go was wondering what happened to the bot19:01
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/Y3ZTMR6ZJJDWPZPNWYB32UC2HHGFZH73/ Our Agenda19:01
clarkbThis just went out. Sorry about that, I got super nerd sniped yesterday afternoon looking into the python and openafs thing19:01
clarkbdebuginfod is really cool and useful btw19:02
clarkb#topic Announcements19:02
clarkbA reminder that next week we'll skip having a meeting since several of us will be attending the open infra summit19:02
clarkbThen for June 20th (the meeting after next) I won't be able to make it as I'll be in the middle of travel. I'm happy for others to run a meeting if they like. I just can't run it myself19:03
clarkb#topic Topics19:03
clarkbI removed the quay topic. I think its basically in a steady state situation now19:04
clarkbdib functional testing is working again too19:04
clarkb#topic Bastion Host Updates19:04
clarkb#link https://review.opendev.org/q/topic:bridge-backups19:04
clarkbtonyb appears to have reviewed this stack, thanks! Fungi mentioned he would take a look too but I don't think that has happened yet19:05
clarkbThe other thing I was thinking about recently is we should probably start looking at updating our ansible version bridge to ansible 8 (we are currently 7)19:05
fungino, i've been too distracted by other things, sorry19:05
clarkbIn theory that will be self testing and if we get the change that bumps things up to run all the system-config-run-* jobs we should get really good coverage of it19:05
clarkbNot sure if anyone is interested in doing that. I don't think I have time for the next couple of weeks but I may start looking after if no one else beats me to it19:06
clarkbThrowing it out there if there is interest since I think it could be a good one as our testing for it should be robust19:06
ianwi think we're testing the git master so it should be fairly easy19:06
clarkbianw: I think that has been failing though. But proposing a change to move the cap from <8 to <9 should do what we need19:07
clarkband then ensuring all the jobs we want to trigger also trigger19:07
fungizuul is well behind that in what versions it supports, but shouldn't be an issue for our nested ansible calls, right?19:07
clarkbfungi: correct19:07
clarkbthey are pretty well separated in our environment19:07
clarkb#topic Mailman 319:08
clarkbfungi: any updates with your testing of the vhost fixes?19:08
funginope, distractions19:09
fungisorry19:09
clarkbhopefully post summit we'll all have fewer distractions19:09
clarkb#topic Gerrit Updates19:09
clarkb#link https://review.opendev.org/c/opendev/system-config/+/884779 Revert bind mounts for Gerrit plugin data19:09
clarkbI pushed this change because I think I've decided that in he short term the best thing for us may be to just clear out that data when we launch new gerrit containers19:10
clarkbI'd love feedback on that and/or the changes I pushed to manually clear things on gerrit startup19:10
clarkb#link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk\19:10
clarkbI don't think this second change will clear all the files we need to clear but it will be easier to see what else is leaking after we land it if we decide to go that route instead. Feedback very much welcome either way or if you have alternative suggestions19:10
clarkb#link https://review.opendev.org/c/opendev/system-config/+/885317 Update Gerrit 3.8 image to 3.8.0 final release19:11
clarkbI also pushed this update yesterday to update our 3.8 image. THis won't affect production, but will make our upgrade testing a bit more realistic19:11
clarkb#topic Upgrading Old Servers19:12
clarkb#link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes19:12
clarkbI've replaced the old zp01 server with a zp02 server. Though nothing may be using it...19:12
clarkbStill on the list are a handful of mirror nodes, meetpad, and I think the insecure ci registry19:13
clarkbI'm going to continue to try and pick them off one by one as I can. Probably mirror nodes will be my next target19:13
clarkbHelp welcome19:13
tonybcan you link to the zp01 change?19:13
clarkbtonyb: yes one sec19:14
tonybthat'd help me grok what's needed19:14
tonybrather than thinking I know what to do ;P19:14
clarkbtonyb: https://review.opendev.org/q/topic:replace-zp01+OR+topic:zp0219:14
clarkbtonyb: you do need a root to laucnh the new node, but if you propose a change like https://review.opendev.org/c/opendev/system-config/+/885076 which updtes the test node label type so that testing shows jammy works then I'm happy to do that with/for you19:15
tonybperfect19:15
clarkbbasically mock it up and see that testing sows it is happy then I can launch the node and stick it in the change in the inventory hosts file19:15
clarkbtonyb: insecure-ci-registry would probably be a good one and or the mirrors. Since they have minimal state on host. meetpad is a bit weird because we need to sort out how to replace the control plane for that service19:16
clarkbRelatd to all this but slightly different is the update of zuul servers to jammy for podman potentual use19:16
clarkb(crrently going to jammy but still using docker)19:16
tonybgot it19:17
clarkbAll of the mergers have been replaced and 6 executors have been booted. THis exposed a behavior in new openafs where lseek()ing the openafs ioctl device/file crashes the process with a kernel oops19:17
clarkbthis was a problem with zuul because it uses python open() which does an lseek under the hood. corvus replaced that open() with an os.open() which does not do any magic under the hood19:18
corvus(actually all 12 booted)19:18
clarkbcorvus: oh cool I thought it was only 619:18
clarkbstatic02 is a jammy node running jammy openafs and that has been functional so we expect that once we launch updated executor code on the new nodes they should be happy19:18
clarkbthis was a specific corner case that was sad19:18
corvusyeah, plan was to get it all done before monday :/19:18
ianwso was this https://gerrit.openafs.org/#/c/14918/ ?  i'm unclear if that fix wasn't there or if it's a different issue19:19
clarkbianw: the problem persisted in the 1.8.9 package the ppa built19:20
clarkbI think you expected that version to include that patch? if so then I don't think that patch was the fix19:20
clarkbhowever the description in that change definitely seems to match the behavior we saw19:20
corvus(though i agree, the words in the commit message sure make that sound like it should be a fix)19:21
ianwyeah, i perhaps miscalculated if that patch is there ... 19:21
clarkbin any case I think it is unlikely we'll be opening that structure outside of the single place zuul does it or in openafs itself19:21
clarkband openafs itself seems to be working on static02 so we're prbably good? Lets proceed and keep an eye on it and we can always look at rebuilding new ppa packages later if necessary19:22
ianw$ cat src/afs/LINUX/osi_ioctl.c | grep default_llseek19:23
ianw$19:23
ianw... sigh ... you would have thought I'd think to check that :/19:23
corvus(pam modules would be the other place to watch out for that)19:23
ianwthat at least explains why 1.8.9 didn't fix it for us19:23
clarkbcool I think that gives us a path forward should this problem continue to persist. We can build a 1.8.9 with that fix backported19:24
ianwso the counterpoint to that is that we could pull that patch in 19:24
clarkbianw: yup or convince ubuntu to pull it in maybe19:24
ianwsorry about that.  i don't know why i didn't think of it until just then19:24
clarkbbut it doesn't seem necessary yet so I think we can proceed with distro 1.8.8 and take it from there19:24
clarkbAnything else related to server updates?19:25
ianwit is in the openafs-stable-1_8_x branch in the openafs git19:26
fungifixed in 1.9.x at all?19:26
fungimainly curious, i have no idea how much unrelated breakage 1.9 will mean19:26
clarkbcool so we can fetch the backport out of that branch then and it should apply cleanly to the existing packaging19:27
ianwthat was what i looked at.  but that doesn't seem to have tags19:27
clarkbfungi: I think it merged to 1.919:27
fungiahh19:27
clarkbya I think 1.9 is master right now?19:27
clarkb#topic Fedora cleanup19:28
clarkbtonyb: you've been poking at the zuul-jobs role stuff for this and corvus pointed out the possibiltiy of using the new thing that we never actually switched to...19:29
tonybYeah19:29
clarkbtonyb: did you have thoughts on what makes sense for pushing this ahead?19:29
tonybI'm looking for feedback on timing and priorities19:29
fungi(there's an openafs 1.9.1 or 1.9.2 maybe, so they're supposedly releasing from it, but maybe they aren't making tags there)19:30
tonybThe new thinng looks good but I admit I'm not in a position to judge the effort needed to make it a reality19:30
clarkbI'll be honest my personal priority for this is low, I was just trying to find easy wins for maybe adding rocky/bookworm mirroring19:30
tonybI get that the fedora_mirror_enabled is a hack19:31
clarkbfor this reason I'm personally happy to take our time and add the new thing to configure mirrors. But that likely would take longer than bookworms release date19:31
fricklerhow much effort is increasing afs capacity?19:31
clarkbtonyb: the main thing is going to be adding configuration for the new thing and adding the new thing to the base-test base jobs and then reparenting a representative sampling of jobs to base-tests to ensure it is doing what we expect19:31
tonybcorvus: It'd be good to get your thoughts on how desireable the new thing is19:31
fungibookworm release day is saturday, btw19:32
clarkbfrickler: you need to add volumes to existing servers (easy but increases potential for failures) or add new servers (more work)19:32
tonybI hesitate, could we "fork" the configure-mirrors role into openstack-jobs, remove the fedora stuff while I do the right thin in zuul?19:33
tonybclarkb: Yup I can totally do that.19:33
clarkbtonyb: you'd probably need to fork it into opendev/base-jobs19:33
fricklermaybe it still would be worth to decouple cleaning up old mirrors from setting up new stuff?19:33
clarkbtonyb: since this will affect all opendev users19:33
corvuswell, the new thing is designed to have the kind of flexibility we apparently are now starting to require, so i think it's better for opendev and the wider community (in that it lets others actually use mirror roles in zuul-jobs which, basically, we're the only ones who can actually use right now)19:33
tonybOkay, same idea but wrong repo ;P19:33
clarkbfrickler: yes that is doable too. It would still be lowish priority for me though which is why I was looking for easy wins19:34
clarkbI just don't have time to add new distro content when I can barely keep up with what we already have so my personal preference is cleanup first19:34
corvusi think it's the classic long-term/short-term balancing act, and i don't have a good read on making that decision.  so i'm just able to provide background.  :)19:35
frickleriirc the patch to add bookworm mirroring is already present, just needs afs capacity19:35
tonybcorvus: okay.19:35
clarkbfrickler: that and cleanup of buster19:35
clarkbI think if I were trying to lead this I would look at doing the new mirror setup thing, test it with base-test, update base, clean up fedora mirroring, then decide if we need to adjust capacity from there or not19:36
clarkbbecause from where I'm sitting we can't keep up so reducing effort first is a win19:36
clarkbbut if others want to push bookworm ahead and do something different i'm ok with that too19:36
tonybOkay.  We'll make that the plan.19:37
clarkbalso we can do the two things concurrently which is nice19:37
clarkbthey don't conflict with each other19:37
tonybI'll put myself on the hook for making that happen .... as long as I can count on help/support for doing the new mirror_info thing19:37
clarkbyup I can contiue to help19:38
tonybperfect19:38
clarkb#topic Storyboard19:39
clarkbI think fungi has been keeping up with the updates there. But anything I've missed worth calling out?19:39
funginothing new since last week, afaik19:40
clarkb#topic Open Infra Summit19:41
clarkbI sent out an email trying to organize a low key gathering for those of us that will be there (for Zuul and OpenDev but really no one is counting). The beer gardne worked really well in berlin and about 2.5km away from the summit venue is a brewery with a ton of outdoor picnic table type setups19:42
clarkbI'm hoping it doesn't rain (current forecast says it will rain in the morning but be dry in the afternoon) and we can go hang out Thursday at 6ish there19:42
clarkbIt looks like it may be a little cool. 67F/19C as a high and it will probably be a bit cooler in the evening so fingers crossed that still works out19:43
clarkbif it gets too cold or rainy we'll figure it out then19:43
corvusthat's warmer and drier than here now... maybe i should open a beer19:44
clarkbAlso a reminder that next week is the summit. I expect it will get quiet around here.19:44
clarkb#topic Open Discussion19:44
clarkbAnything else?19:44
fricklerI'm slowly working my way through zuul config error cleanups19:45
corvusrelated to the ongoing work to clean up errors, there are some zuul changes arriving soon that will hopefully help with that.  new layout for the config errors page, ability to filter, and display of warnings.19:46
corvus(i also have adding sorting to that list, but that will make more sense after the warnings arrive, so that may be a few changes away still)19:46
fricklergot the consent from the TC now to force-merge things that get stalled due so failing CI19:46
clarkbfrickler: I pushed a DNM change to confirm that github reports there are no valid merge methods for ansible and testinfra :/19:46
clarkbfrickler: but I think your removal of the project listings is working fine so we don't need to dig into that with any urgency19:47
fricklerclarkb: yes, I saw that. some day zuul will likely need to switch to graphql for that19:47
clarkbor figure out if different perms are now required19:48
fricklercorvus: I've switched to using the json from the API, will that also change?19:49
corvusfrickler: so far only new fields19:49
clarkbI think I can give everyone 10 minutes back19:50
clarkbfeel free to continue discussion in #opendev or on the mailing list19:51
clarkbthank you for your time and help!19:51
clarkb#endmeeting19:51
opendevmeetMeeting ended Tue Jun  6 19:51:11 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:51
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-06-06-19.01.html19:51
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-06-06-19.01.txt19:51
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-06-06-19.01.log.html19:51
fungithanks clarkb!19:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!