Tuesday, 2023-07-18

clarkbJust about meeting time18:59
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Jul 18 19:01:06 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JVMGLDPDLQW5L3FFIKWILIJU5DJS77ES/ Our Agenda19:01
clarkb#topic Announcements19:01
clarkbA minor announcement. I'm not actually here today. The only reason that this works out is the super early relative to local time hour of the meeting. But the lowest tide of our trip occurs in ~5 hours so we're taking advantage of that for "tide pooling"19:02
clarkband it sa whole production to get the boat out before it gets stuck in the mud19:02
fungii can only imagine19:02
tonybSounds like fun19:03
clarkbYa I think everyone is looking forward to it. But I dfeinitely won't be around after the meeting today19:03
clarkb#topic Bastion Host Updates19:03
clarkb#link https://review.opendev.org/q/topic:bridge-backups19:03
clarkbI think this one still deserves multiple core/root reviewers if we can manage it19:04
clarkbfungi: frickler fyi if you have time19:04
fungioh yep19:05
clarkb#topic Mailman 319:05
clarkbThe 429 spam seems to have gone away as qucikly as it started. I don't think we made changes for that yet so the other end must've gotten bored19:06
fungino appreciable progress. life is starting to get out of the way and i'm working on catching back up to where i left off (new held node, et cetera)19:06
fungii'm wondering if documenting manual steps for adding a new domain is simpler than trying to orchestrate django for now19:06
clarkbfungi: considering the number of domains I think that is workable19:07
clarkbwe are at ~6 today?19:07
fungigiven for the current ones we have manual import steps to perform anyway19:07
fungiyeah19:07
clarkbthat works for me. We hae manual steps elsewhere too19:08
fungii'll shift my focus to working out those steps through the webui in that case19:08
clarkbsounds good. Anything else mailman related?19:08
fungithe existing wip changes are still good for either approach19:08
funginothing from me19:08
clarkb#topic Gerrit Updates19:09
clarkbThere are a few Gerrit items I've merged into one block here19:09
clarkb#link https://review.opendev.org/c/opendev/system-config/+/885317 Build gerrit 3.7.4 and 3.8.1 images19:09
clarkbThe first is Gerrit did a whole bunch of releases over the weekend19:09
clarkb3.7.4 and 3.8.1 are both new and that change updates our image build sto match19:09
clarkbWe run 3.7.3 in prod so 3.7.4 will be our prod update and 3.8.1 will be used for 3.8 testing and 3.7 -> 3.8 upgrade testing19:10
clarkbI made a note about a recorded breaking change that I'm pretty sure doesn't affect us19:10
clarkbNote we need to manually replace the container for gerrit after that lands. It won't be automatic19:11
clarkbNext is the leaking replication task files19:11
clarkb#link https://review.opendev.org/c/opendev/system-config/+/884779 Revert Gerrit replication task bind mount19:11
clarkbis one option and one that we might want to combine with the 3.7.4 container replacement19:11
clarkbsince that will give Gerrit a fresh ephemeral directory for those files the nwe can manually clean up the old bind mount location19:12
clarkbthe alternative is my somewhat hacky changes to add a startup script that scans all the json files and prunes them19:12
clarkbUnfortunately no updates to my gerrit issues filed for this and they changed bug trackers so I'm not even sure my old links will work19:12
clarkbFinally the rejection of implicit merges19:13
clarkb#link https://review.opendev.org/c/opendev/system-config/+/885318 Merge this to reflect change to All-Projects once made19:13
clarkbfungi: Not sure if yo uwere still planning to push that update to All-Projects19:13
fungioh, yes i can do that19:13
fungirelated to gerrit, zuul (as of... yesterday?) has support for the kafka event plugin too, wonder if we should consider working toward using that or stick with ssh event stream (we'd presumably still need to support the latter for existing third-party ci systems anyway, but looks like there are some resiliency benefits if we switch our zuul's connection to kafka)19:14
clarkbthe main issue with kafka is going to be running it19:14
fungiyep19:14
corvusthe gerrit folks have been frowning at ssh for a while.... but i don't think they have plans to remove it19:14
clarkbit is a fairly large and complicated system aiui (they even deleted zookeeper and now do that all internally)19:15
fungithat's why it's a bit of an open question19:15
corvuswhen developing the zuul stuff, i used the bitnami all-in-one container19:15
corvusi didn't look into it much, but it might be easy enough if we want a simple system...19:15
funginot something we need to decide any time soon, mainly just curious19:15
corvusbut if we want multi-host, yeah, probably more work19:15
clarkban all in one container won't give us much extra resiliency when compared to ssh though. Except that we could potentially restart kafka less often then gerrit19:15
corvusclarkb: exactly19:16
corvusalso, did they delete all the zk stuff?  or just augment it with more complexity?19:16
corvusi still saw a lot of "set up zk" instructions...19:16
corvus(which i didn't follow on account of using the bitnami aio, so i don't really know)19:16
fungias we all know, the solution to complexity is to layer on more complexity ;)19:16
clarkbcorvus: my understanding is that kafka deleted the zk dependency or is working to that in order to do simpler/cheaper/quicker elections internally19:16
frickleras long as we only have a single gerrit, aio kafka sounds fine19:17
clarkbAnything else Gerrit related?19:18
tonybJust quickly19:18
corvusand that reminds me, fyi, the reason gerrit supports kafka is mostly to support multi-master stuff... so that's a potentially stepping stone ...19:18
tonybShould I base the python updates on your 3.7.4 review for ordering?19:18
fungigood point about the path to multi-master gerrit. i mainly saw kafka as a way to avoid losing gerrit events if zuul gets disconnected19:19
clarkbtonyb: Yes, I think we should try to update Gerrit first since they tend to have good bugfixes and bullseye is still supported for a bit making bookworm less urgent but still an important update19:19
tonyb++19:20
clarkb#topi Server Upgrades19:21
clarkbThe 12 zuul executors are all running Jammy now19:21
corvusi reckon i'll delete the old ones today19:21
clarkbcool was just going to ask about that19:21
clarkbI need to look at cleaning up the old ci registry too. Probably a tomorrow task ats this point19:22
clarkbOther than that I didn't have any news here. Anyone else have updates?19:22
* tonyb will watch how corvus does it and then copy it for the ci-registry19:22
corvusoh i think all the changes are done now19:22
corvusonly thing left is manually deleting them using openstack cli19:23
tonybalthough I expect the actual server destruction will be done by y'all19:23
ianwif anyone is picking up mirrors, i do think it's worth going back to re-evaluate kafs with them19:23
clarkbya server destruction is a bit manual19:23
tonybOh okay19:23
clarkbianw: ya mirrors and meetpad are next up on the todo list19:23
fungiianw: great reminder about kafs, thanks19:23
corvusianw: any reason in particular, or just lets check in since it's been a while?19:23
tonybianw I can keep you in the loop on that19:23
clarkbianw: for kafs you were thinking we could just deploy a node and then use dns to flip back and forth as necessary?19:24
ianwcorvus: it's come a long way; i've started using it locally and it's working fine19:24
fungijammy upgrades means newer kernel means newer kafs19:24
ianwyeah, i have some changes up to implement it with a flag; we could put up a trial host and do some load testing with a dns switch19:25
ianwwhat i'm not 100% on is the caching layers19:25
corvuskk19:25
ianwand that's the type of thing we'd probably need some real loads to tune19:25
ianwbut ultimately the reason is if we can get away from external builds of openafs that would be nice19:26
clarkbthoguh with jammy the openafs version there seems to be working too (at least for now)19:26
clarkbbut agreed adds more flexibility across platforms and updates etc19:27
ianwyeah, it's never a problem till it is :)19:27
ianwi'm willing to bring up a node, etc., but will require more than just my eyes19:27
clarkbmaybe put it in one o fthe rax regions since that is sizeable enough for data collection and feeling confident elsewhere will be happy too19:28
fungiopenafs dkms builds are back to broken in debian/unstable (seems to be related to linux 6.1 or maybe newer compiler/klibc), so i'm tempted to give kafs a whirl there19:29
clarkb#topic Fedora cleanup19:30
clarkbI haven't had time to look at hte mirroring stuff since we last met. tonyb do you have anything to add?19:30
tonybNo progress from me.  I need to write up how I think the mirroring setup should work for review19:30
clarkband feel free to ping me with questions and point me to the write up when ready19:31
clarkb#topic Storyboard19:31
clarkbI haven't seen anything new here either but figured I would check19:31
funginope19:31
clarkb#topic Gitea 1.20 Upgrade19:32
clarkbWe did the 1.19.4 upgrade of Gitea last week19:32
clarkbWas straightforward as expected19:32
clarkb1.20 is a bit more involved unfortunately19:32
clarkb#link https://review.opendev.org/c/opendev/system-config/+/886993 Gitea 1.20 change19:32
clarkbI finally got our test suite to pass, but there ar ea number of todos I've noted in the commit message about stuff we should check19:32
clarkbThe main frustrations I've hit so far 1) oauth2 is a disabled feature but we still need to configure all of its jwt stuff to avoid startup errors which means more config and state on disk that we don't use but is required19:33
clarkband second 2) they have changed their WORK_DIR/WORK_PATH expectations for the second time and we need to go through that and ensure we aren't orphaning data in our containers' ephemeral disk areas and instead have all that covered by bind mounts19:34
clarkbfor 2) the idea I had was we could hold a node and compare the resulting bindmounts and gitea dir locations with our prod stuff to make sure they roughly align and if they do we should be food19:34
clarkb*good19:34
corvus(this meeting moved from lunch to breakfast for clarkb)19:35
clarkbFrom a feature perspective this release doesn't seem to add anything flashy which is probably good as we don' thave to wrangle features on top of this19:35
clarkbanyway I think reviews may be helpful at this point looking over the change log from gitea and ensuring we havne't missed anything important. And I'll try to work through those TODOs as I'm able and update the change19:36
clarkband feel free to add more todos if you find items that need to be addressed19:36
clarkb#topic Etherpad 1.9.119:37
clarkb#link https://review.opendev.org/c/opendev/system-config/+/887006 Etherpad 1.9.119:37
clarkbBetter news here. I think I sorted ou that the username and user color problems are due to a change of handling falsey boolean config entries to null entries in config19:38
clarkbI made that update to our settings.json on the old held node and seemed to fix it19:38
fungiawesome19:38
clarkbI should have a new held node somewhere built from the code update based on that manual update19:38
clarkbso we need to retest and check that it actually helps19:38
clarkbalso numbered lists seem to work for us19:38
clarkband they appear to have updated the git tag so we don't need to use a random git sha19:39
clarkbI'm hopefuly that after round two of checking we'll be in a good spot to land the update19:39
clarkb#topic Python Container Updates19:40
clarkbTypically we talk about this in the context of updating python versions but due to the recent Debian bookwork release we're doing base OS container updates instead19:40
clarkb#link https://review.opendev.org/q/topic:bookworm-python19:41
clarkb#link https://review.opendev.org/q/topic:force-base-image-build19:41
clarkbtonyb had two specific questions listed on the agenda.19:41
clarkbThe first is a qusetion of updating openstacksdk's old dockerfile and I think we should19:41
clarkbwe can't merge that change but can propose it to them and hopefully they approve it19:42
tonybThey're okay to do whatever we suggest and the chnage is up for review19:42
frickler#link https://review.opendev.org/c/openstack/python-openstackclient/+/88874419:42
clarkband secondly, should we manually clean up the leaked zuul change_* tags in docker hub19:42
clarkbah cool. then ya I think they should update their base image. It should be pretty safe since this is just a client tool we run on all the operating systems with minimal OS integration19:43
clarkbFor leaked zuul change_* tags I wonde rif we should write a script to clean those up and have it run against all our images19:43
clarkbthe script could check gerrit's api to see which changes are no longer open and then delete those tags19:44
tonybWith the SDK there is a "meta" question about tags, we've stopped pushing 3.x as tags shoudl we restart so that consumers can just use whatever we "suggest"19:44
tonybthose tags are pretty old (buster) which isn't great19:45
clarkbtonyb: I think with the buster -> bullseye transition we decided that was not explicit enough19:45
clarkbend users were expected to switch to the specific OS version tags, but I'm not surprised some were missed19:45
tonybOkay as long as it's been considered19:47
clarkbwe didn't remove the old tags to give people the ability to transition but maybe we should consider cleaning them up eventually19:47
tonybDo we have any way to see how many pulls a tag is getting?19:48
clarkbI don't know if docker exposes that to us19:48
tonybI wondered if it was something the org owner could see19:48
clarkb(quay does)19:48
tonybI did a grep/codesearch but that only helps for opendev19:49
clarkbhttps://github.com/docker/hub-feedback/issues/104719:49
clarkbseems like we could fetch the total pulls at intervals and calculate the delta ourselves19:49
clarkbcorvus: I also wanted to mention that zuul/nodepool etc can probably look at bookworm now as the base images are present19:50
clarkbI think that will allow zuul to clean up at least one backported package install19:50
clarkb(bwrap?)19:50
corvusah cool thx19:50
tonybcorvus: FWIW I have zuul containers on my list to tackle19:51
clarkbtonyb: what we could do if we want to be super careful is tag :3.9 as :3.9-deprecated and deleted :3.919:51
clarkbthen if anyone screams they can switch to the new tag and know that they should update to something else soon19:51
tonybclarkb: that'd be cool19:52
tonybMy first round of chnages will just be to s/bullseye/bookworm/19:52
clarkbtonyb: let's revisit that once we're happily on bookworm and we can go back and clean things up. Also need to look at removing 3.9 builds too19:52
clarkb++19:52
clarkbdefinitely an iterative process here19:53
tonyband then do any python version bumps after that19:53
tonyband I was kinda thinking of doing 3.9 to 3.10 and then 3.10 to 3.1119:53
tonybdepending on my perception of risk / downtime19:53
clarkbThe main drawback to 3.11 is ease of testing, but now that bookworm itself is 3.11 that is less of a concern19:54
clarkb(previously you had to install extra packages on ubuntu and I think fedora/centos/rhel were all 3.10 as the newest?)19:54
clarkbdefinitely less of an issue today19:54
clarkb#topic Open Discussion19:54
clarkbWe have about 5 minutes left and I wanted to make sure we didn't miss anything else that may be important19:55
clarkbAnything else?19:55
tonybnope.19:55
fungii got nothin'19:55
tonybI can use the time to make coffee19:56
clarkbdo that!19:56
ianwsounds like clarkb gets an early mark to get the boat ready :)19:56
clarkbthank you for your time everyone!19:56
tonyband not be late for my next meeting 19:56
clarkbWe'll be back next week19:56
tonybhave fun clarkb 19:56
clarkb#endmeeting19:57
opendevmeetMeeting ended Tue Jul 18 19:57:00 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:57
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-07-18-19.01.html19:57
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-07-18-19.01.txt19:57
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-07-18-19.01.log.html19:57
clarkbI'll do my best!19:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!