clarkb | Just about meeting time | 18:59 |
---|---|---|
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Jul 18 19:01:06 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JVMGLDPDLQW5L3FFIKWILIJU5DJS77ES/ Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | A minor announcement. I'm not actually here today. The only reason that this works out is the super early relative to local time hour of the meeting. But the lowest tide of our trip occurs in ~5 hours so we're taking advantage of that for "tide pooling" | 19:02 |
clarkb | and it sa whole production to get the boat out before it gets stuck in the mud | 19:02 |
fungi | i can only imagine | 19:02 |
tonyb | Sounds like fun | 19:03 |
clarkb | Ya I think everyone is looking forward to it. But I dfeinitely won't be around after the meeting today | 19:03 |
clarkb | #topic Bastion Host Updates | 19:03 |
clarkb | #link https://review.opendev.org/q/topic:bridge-backups | 19:03 |
clarkb | I think this one still deserves multiple core/root reviewers if we can manage it | 19:04 |
clarkb | fungi: frickler fyi if you have time | 19:04 |
fungi | oh yep | 19:05 |
clarkb | #topic Mailman 3 | 19:05 |
clarkb | The 429 spam seems to have gone away as qucikly as it started. I don't think we made changes for that yet so the other end must've gotten bored | 19:06 |
fungi | no appreciable progress. life is starting to get out of the way and i'm working on catching back up to where i left off (new held node, et cetera) | 19:06 |
fungi | i'm wondering if documenting manual steps for adding a new domain is simpler than trying to orchestrate django for now | 19:06 |
clarkb | fungi: considering the number of domains I think that is workable | 19:07 |
clarkb | we are at ~6 today? | 19:07 |
fungi | given for the current ones we have manual import steps to perform anyway | 19:07 |
fungi | yeah | 19:07 |
clarkb | that works for me. We hae manual steps elsewhere too | 19:08 |
fungi | i'll shift my focus to working out those steps through the webui in that case | 19:08 |
clarkb | sounds good. Anything else mailman related? | 19:08 |
fungi | the existing wip changes are still good for either approach | 19:08 |
fungi | nothing from me | 19:08 |
clarkb | #topic Gerrit Updates | 19:09 |
clarkb | There are a few Gerrit items I've merged into one block here | 19:09 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/885317 Build gerrit 3.7.4 and 3.8.1 images | 19:09 |
clarkb | The first is Gerrit did a whole bunch of releases over the weekend | 19:09 |
clarkb | 3.7.4 and 3.8.1 are both new and that change updates our image build sto match | 19:09 |
clarkb | We run 3.7.3 in prod so 3.7.4 will be our prod update and 3.8.1 will be used for 3.8 testing and 3.7 -> 3.8 upgrade testing | 19:10 |
clarkb | I made a note about a recorded breaking change that I'm pretty sure doesn't affect us | 19:10 |
clarkb | Note we need to manually replace the container for gerrit after that lands. It won't be automatic | 19:11 |
clarkb | Next is the leaking replication task files | 19:11 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/884779 Revert Gerrit replication task bind mount | 19:11 |
clarkb | is one option and one that we might want to combine with the 3.7.4 container replacement | 19:11 |
clarkb | since that will give Gerrit a fresh ephemeral directory for those files the nwe can manually clean up the old bind mount location | 19:12 |
clarkb | the alternative is my somewhat hacky changes to add a startup script that scans all the json files and prunes them | 19:12 |
clarkb | Unfortunately no updates to my gerrit issues filed for this and they changed bug trackers so I'm not even sure my old links will work | 19:12 |
clarkb | Finally the rejection of implicit merges | 19:13 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/885318 Merge this to reflect change to All-Projects once made | 19:13 |
clarkb | fungi: Not sure if yo uwere still planning to push that update to All-Projects | 19:13 |
fungi | oh, yes i can do that | 19:13 |
fungi | related to gerrit, zuul (as of... yesterday?) has support for the kafka event plugin too, wonder if we should consider working toward using that or stick with ssh event stream (we'd presumably still need to support the latter for existing third-party ci systems anyway, but looks like there are some resiliency benefits if we switch our zuul's connection to kafka) | 19:14 |
clarkb | the main issue with kafka is going to be running it | 19:14 |
fungi | yep | 19:14 |
corvus | the gerrit folks have been frowning at ssh for a while.... but i don't think they have plans to remove it | 19:14 |
clarkb | it is a fairly large and complicated system aiui (they even deleted zookeeper and now do that all internally) | 19:15 |
fungi | that's why it's a bit of an open question | 19:15 |
corvus | when developing the zuul stuff, i used the bitnami all-in-one container | 19:15 |
corvus | i didn't look into it much, but it might be easy enough if we want a simple system... | 19:15 |
fungi | not something we need to decide any time soon, mainly just curious | 19:15 |
corvus | but if we want multi-host, yeah, probably more work | 19:15 |
clarkb | an all in one container won't give us much extra resiliency when compared to ssh though. Except that we could potentially restart kafka less often then gerrit | 19:15 |
corvus | clarkb: exactly | 19:16 |
corvus | also, did they delete all the zk stuff? or just augment it with more complexity? | 19:16 |
corvus | i still saw a lot of "set up zk" instructions... | 19:16 |
corvus | (which i didn't follow on account of using the bitnami aio, so i don't really know) | 19:16 |
fungi | as we all know, the solution to complexity is to layer on more complexity ;) | 19:16 |
clarkb | corvus: my understanding is that kafka deleted the zk dependency or is working to that in order to do simpler/cheaper/quicker elections internally | 19:16 |
frickler | as long as we only have a single gerrit, aio kafka sounds fine | 19:17 |
clarkb | Anything else Gerrit related? | 19:18 |
tonyb | Just quickly | 19:18 |
corvus | and that reminds me, fyi, the reason gerrit supports kafka is mostly to support multi-master stuff... so that's a potentially stepping stone ... | 19:18 |
tonyb | Should I base the python updates on your 3.7.4 review for ordering? | 19:18 |
fungi | good point about the path to multi-master gerrit. i mainly saw kafka as a way to avoid losing gerrit events if zuul gets disconnected | 19:19 |
clarkb | tonyb: Yes, I think we should try to update Gerrit first since they tend to have good bugfixes and bullseye is still supported for a bit making bookworm less urgent but still an important update | 19:19 |
tonyb | ++ | 19:20 |
clarkb | #topi Server Upgrades | 19:21 |
clarkb | The 12 zuul executors are all running Jammy now | 19:21 |
corvus | i reckon i'll delete the old ones today | 19:21 |
clarkb | cool was just going to ask about that | 19:21 |
clarkb | I need to look at cleaning up the old ci registry too. Probably a tomorrow task ats this point | 19:22 |
clarkb | Other than that I didn't have any news here. Anyone else have updates? | 19:22 |
* tonyb will watch how corvus does it and then copy it for the ci-registry | 19:22 | |
corvus | oh i think all the changes are done now | 19:22 |
corvus | only thing left is manually deleting them using openstack cli | 19:23 |
tonyb | although I expect the actual server destruction will be done by y'all | 19:23 |
ianw | if anyone is picking up mirrors, i do think it's worth going back to re-evaluate kafs with them | 19:23 |
clarkb | ya server destruction is a bit manual | 19:23 |
tonyb | Oh okay | 19:23 |
clarkb | ianw: ya mirrors and meetpad are next up on the todo list | 19:23 |
fungi | ianw: great reminder about kafs, thanks | 19:23 |
corvus | ianw: any reason in particular, or just lets check in since it's been a while? | 19:23 |
tonyb | ianw I can keep you in the loop on that | 19:23 |
clarkb | ianw: for kafs you were thinking we could just deploy a node and then use dns to flip back and forth as necessary? | 19:24 |
ianw | corvus: it's come a long way; i've started using it locally and it's working fine | 19:24 |
fungi | jammy upgrades means newer kernel means newer kafs | 19:24 |
ianw | yeah, i have some changes up to implement it with a flag; we could put up a trial host and do some load testing with a dns switch | 19:25 |
ianw | what i'm not 100% on is the caching layers | 19:25 |
corvus | kk | 19:25 |
ianw | and that's the type of thing we'd probably need some real loads to tune | 19:25 |
ianw | but ultimately the reason is if we can get away from external builds of openafs that would be nice | 19:26 |
clarkb | thoguh with jammy the openafs version there seems to be working too (at least for now) | 19:26 |
clarkb | but agreed adds more flexibility across platforms and updates etc | 19:27 |
ianw | yeah, it's never a problem till it is :) | 19:27 |
ianw | i'm willing to bring up a node, etc., but will require more than just my eyes | 19:27 |
clarkb | maybe put it in one o fthe rax regions since that is sizeable enough for data collection and feeling confident elsewhere will be happy too | 19:28 |
fungi | openafs dkms builds are back to broken in debian/unstable (seems to be related to linux 6.1 or maybe newer compiler/klibc), so i'm tempted to give kafs a whirl there | 19:29 |
clarkb | #topic Fedora cleanup | 19:30 |
clarkb | I haven't had time to look at hte mirroring stuff since we last met. tonyb do you have anything to add? | 19:30 |
tonyb | No progress from me. I need to write up how I think the mirroring setup should work for review | 19:30 |
clarkb | and feel free to ping me with questions and point me to the write up when ready | 19:31 |
clarkb | #topic Storyboard | 19:31 |
clarkb | I haven't seen anything new here either but figured I would check | 19:31 |
fungi | nope | 19:31 |
clarkb | #topic Gitea 1.20 Upgrade | 19:32 |
clarkb | We did the 1.19.4 upgrade of Gitea last week | 19:32 |
clarkb | Was straightforward as expected | 19:32 |
clarkb | 1.20 is a bit more involved unfortunately | 19:32 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/886993 Gitea 1.20 change | 19:32 |
clarkb | I finally got our test suite to pass, but there ar ea number of todos I've noted in the commit message about stuff we should check | 19:32 |
clarkb | The main frustrations I've hit so far 1) oauth2 is a disabled feature but we still need to configure all of its jwt stuff to avoid startup errors which means more config and state on disk that we don't use but is required | 19:33 |
clarkb | and second 2) they have changed their WORK_DIR/WORK_PATH expectations for the second time and we need to go through that and ensure we aren't orphaning data in our containers' ephemeral disk areas and instead have all that covered by bind mounts | 19:34 |
clarkb | for 2) the idea I had was we could hold a node and compare the resulting bindmounts and gitea dir locations with our prod stuff to make sure they roughly align and if they do we should be food | 19:34 |
clarkb | *good | 19:34 |
corvus | (this meeting moved from lunch to breakfast for clarkb) | 19:35 |
clarkb | From a feature perspective this release doesn't seem to add anything flashy which is probably good as we don' thave to wrangle features on top of this | 19:35 |
clarkb | anyway I think reviews may be helpful at this point looking over the change log from gitea and ensuring we havne't missed anything important. And I'll try to work through those TODOs as I'm able and update the change | 19:36 |
clarkb | and feel free to add more todos if you find items that need to be addressed | 19:36 |
clarkb | #topic Etherpad 1.9.1 | 19:37 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/887006 Etherpad 1.9.1 | 19:37 |
clarkb | Better news here. I think I sorted ou that the username and user color problems are due to a change of handling falsey boolean config entries to null entries in config | 19:38 |
clarkb | I made that update to our settings.json on the old held node and seemed to fix it | 19:38 |
fungi | awesome | 19:38 |
clarkb | I should have a new held node somewhere built from the code update based on that manual update | 19:38 |
clarkb | so we need to retest and check that it actually helps | 19:38 |
clarkb | also numbered lists seem to work for us | 19:38 |
clarkb | and they appear to have updated the git tag so we don't need to use a random git sha | 19:39 |
clarkb | I'm hopefuly that after round two of checking we'll be in a good spot to land the update | 19:39 |
clarkb | #topic Python Container Updates | 19:40 |
clarkb | Typically we talk about this in the context of updating python versions but due to the recent Debian bookwork release we're doing base OS container updates instead | 19:40 |
clarkb | #link https://review.opendev.org/q/topic:bookworm-python | 19:41 |
clarkb | #link https://review.opendev.org/q/topic:force-base-image-build | 19:41 |
clarkb | tonyb had two specific questions listed on the agenda. | 19:41 |
clarkb | The first is a qusetion of updating openstacksdk's old dockerfile and I think we should | 19:41 |
clarkb | we can't merge that change but can propose it to them and hopefully they approve it | 19:42 |
tonyb | They're okay to do whatever we suggest and the chnage is up for review | 19:42 |
frickler | #link https://review.opendev.org/c/openstack/python-openstackclient/+/888744 | 19:42 |
clarkb | and secondly, should we manually clean up the leaked zuul change_* tags in docker hub | 19:42 |
clarkb | ah cool. then ya I think they should update their base image. It should be pretty safe since this is just a client tool we run on all the operating systems with minimal OS integration | 19:43 |
clarkb | For leaked zuul change_* tags I wonde rif we should write a script to clean those up and have it run against all our images | 19:43 |
clarkb | the script could check gerrit's api to see which changes are no longer open and then delete those tags | 19:44 |
tonyb | With the SDK there is a "meta" question about tags, we've stopped pushing 3.x as tags shoudl we restart so that consumers can just use whatever we "suggest" | 19:44 |
tonyb | those tags are pretty old (buster) which isn't great | 19:45 |
clarkb | tonyb: I think with the buster -> bullseye transition we decided that was not explicit enough | 19:45 |
clarkb | end users were expected to switch to the specific OS version tags, but I'm not surprised some were missed | 19:45 |
tonyb | Okay as long as it's been considered | 19:47 |
clarkb | we didn't remove the old tags to give people the ability to transition but maybe we should consider cleaning them up eventually | 19:47 |
tonyb | Do we have any way to see how many pulls a tag is getting? | 19:48 |
clarkb | I don't know if docker exposes that to us | 19:48 |
tonyb | I wondered if it was something the org owner could see | 19:48 |
clarkb | (quay does) | 19:48 |
tonyb | I did a grep/codesearch but that only helps for opendev | 19:49 |
clarkb | https://github.com/docker/hub-feedback/issues/1047 | 19:49 |
clarkb | seems like we could fetch the total pulls at intervals and calculate the delta ourselves | 19:49 |
clarkb | corvus: I also wanted to mention that zuul/nodepool etc can probably look at bookworm now as the base images are present | 19:50 |
clarkb | I think that will allow zuul to clean up at least one backported package install | 19:50 |
clarkb | (bwrap?) | 19:50 |
corvus | ah cool thx | 19:50 |
tonyb | corvus: FWIW I have zuul containers on my list to tackle | 19:51 |
clarkb | tonyb: what we could do if we want to be super careful is tag :3.9 as :3.9-deprecated and deleted :3.9 | 19:51 |
clarkb | then if anyone screams they can switch to the new tag and know that they should update to something else soon | 19:51 |
tonyb | clarkb: that'd be cool | 19:52 |
tonyb | My first round of chnages will just be to s/bullseye/bookworm/ | 19:52 |
clarkb | tonyb: let's revisit that once we're happily on bookworm and we can go back and clean things up. Also need to look at removing 3.9 builds too | 19:52 |
clarkb | ++ | 19:52 |
clarkb | definitely an iterative process here | 19:53 |
tonyb | and then do any python version bumps after that | 19:53 |
tonyb | and I was kinda thinking of doing 3.9 to 3.10 and then 3.10 to 3.11 | 19:53 |
tonyb | depending on my perception of risk / downtime | 19:53 |
clarkb | The main drawback to 3.11 is ease of testing, but now that bookworm itself is 3.11 that is less of a concern | 19:54 |
clarkb | (previously you had to install extra packages on ubuntu and I think fedora/centos/rhel were all 3.10 as the newest?) | 19:54 |
clarkb | definitely less of an issue today | 19:54 |
clarkb | #topic Open Discussion | 19:54 |
clarkb | We have about 5 minutes left and I wanted to make sure we didn't miss anything else that may be important | 19:55 |
clarkb | Anything else? | 19:55 |
tonyb | nope. | 19:55 |
fungi | i got nothin' | 19:55 |
tonyb | I can use the time to make coffee | 19:56 |
clarkb | do that! | 19:56 |
ianw | sounds like clarkb gets an early mark to get the boat ready :) | 19:56 |
clarkb | thank you for your time everyone! | 19:56 |
tonyb | and not be late for my next meeting | 19:56 |
clarkb | We'll be back next week | 19:56 |
tonyb | have fun clarkb | 19:56 |
clarkb | #endmeeting | 19:57 |
opendevmeet | Meeting ended Tue Jul 18 19:57:00 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:57 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-07-18-19.01.html | 19:57 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-07-18-19.01.txt | 19:57 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-07-18-19.01.log.html | 19:57 |
clarkb | I'll do my best! | 19:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!