19:01:35 <clarkb> #startmeeting infra 19:01:35 <opendevmeet> Meeting started Tue Jul 25 19:01:35 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:35 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:35 <opendevmeet> The meeting name has been set to 'infra' 19:01:50 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/LODRYKSZXR4VEE2OJF2HMJN5E5GXH3LF/ Our Agenda 19:02:00 <frickler> o/ 19:02:02 <clarkb> #topic Announcements 19:02:41 <clarkb> I'll not be able to make the August 8 meeting due to travel 19:02:54 <clarkb> but I intend on running next week's meeting. 19:03:14 <clarkb> Seems like we also have service coordinator election type stuff coming up. I need to go look at timelines set in the last election 19:03:17 <fungi> i could chair on the 8th if others want a meeting 19:04:42 <clarkb> thanks. We can sort that out when we get there I guess 19:05:01 <clarkb> also the bugs are really happy with my laptop screen again 19:05:07 <clarkb> #topic Bastion Host Updates 19:05:27 <clarkb> Nothing new here other than requesting reviews on the stack that implements shared encryption key backups 19:05:34 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups 19:06:05 <clarkb> #topic Mailman 3 19:06:24 <clarkb> Looks like there are some recent developtments here. Fungi can you get us up to date? 19:07:25 <fungi> as discussed in last week's meeting, we're rolling forward with manual creation of a django site in its admin webui and association of the corresponding mailman mail domain. i stuck some notes in comments on #link https://review.opendev.org/867981 19:07:42 <fungi> i meant to stick them on a different child change. anyway i'll get that into a docs change 19:08:12 <fungi> i manually created a django site (per the notes there) for lists.zuul-ci.org and then set that as the site for the lists.zuul-ci.org mail domain 19:08:49 <fungi> i also manually applied #link https://review.opendev.org/867987 to production and restarted the containers just to make sure what we're seeing matches the earlier tests 19:09:24 <fungi> really just the edit to /var/lib/mailman/web/settings.py 19:09:33 <clarkb> fungi: will the other domains in mm3 get similar treatment once yo uare happy with the zuul results? Also, where does the dummy domain fit into the future planning here? 19:09:57 <fungi> i think we can dispense with the dummy domain i was originally planning to add 19:10:17 <fungi> things seem so far to be working as intended with lists.opendev.org as our first site/domain 19:10:51 <fungi> but keep an eye out for any subtle oddities in list post headers or post moderation 19:11:03 <fungi> or inconsistencies i haven't spotted in the web interfaces 19:11:07 <clarkb> cool that helps simplify things 19:11:58 <fungi> if all goes well, and once we get 867987 deployed officially, we can decide to either upgrade to latest mailman 3 with #link https://review.opendev.org/869210 or migrate more sites first 19:12:21 <clarkb> fwiw the domain listing at https://lists.opendev.org seems to show what we want after your manual changes 19:12:32 <fungi> i'm in favor of upgrading first, before we migrate more sites 19:12:46 <clarkb> makes sense. I'll need to review that change I guess 19:12:51 <fungi> just to reduce the blast radius if the upgrade goes sideways 19:13:44 <fungi> anyway, that's it for my update. nice to have a little progress on this again, hopefully it will pick up steam in the coming weeks 19:13:53 <clarkb> to summarize then we need to land 867987 to sync up with your manual changes. Then we can upgrade mm3. Then we can schedule more list moves 19:14:11 <fungi> that would be my recommendation, yes 19:14:39 <clarkb> sounds good, thanks 19:15:11 <fungi> and i saw your comments on 867987, will address those shortly 19:15:34 <clarkb> #topic Gerrit Updates 19:16:00 <clarkb> fungi updated All-Projects to reject implicit merges 19:16:27 <clarkb> be on the lookout for people finding the new behavior to be undesired. They should be able to override the setting in their own project acls but I think a global reject is safest 19:16:27 <fungi> yeah, just a heads up to keep an eye out for any problems people may encounter, though i don't expect any 19:16:57 <clarkb> Then seprately we have ~3 interrelated items that we should try and coordinate around 19:17:30 <clarkb> 1) Gerrit 3.7.4 update https://review.opendev.org/c/opendev/system-config/+/885317 2) jeepyb updates needing a new gerrit image deployment and 3) replication task leaks 19:18:16 <clarkb> I think my preferred course of action would be to land 885317 and build new images if ew are happy with that change. Then schedule a restart to deploy 885317 which will also address 2). During that restart we should manually clera our the replication task files (or move them aside etc) 19:19:02 <clarkb> if that sounds reasonable I should be able to help with that around 2100 UTC tomorrow or the next day etc so that we can do it during a quieter gerrit period 19:19:42 <fungi> i can do 21z tomorrow, sure 19:20:43 <clarkb> that assumes 885317 land sbefore then, but that seems reaosnably safe to assume 19:21:06 <clarkb> I'll check in tomorrow morning then ad we can take it from there. fungi maybe you can take a look at the gerrit replication stuff before then too just to hvae a bit of familiarity with it? 19:21:26 <fungi> sure. where are the notes on that again? 19:21:48 <fungi> never mind, i see the links in the agenda 19:21:56 <clarkb> probably mostly in the changes i wrote around it and the upstream issues 19:22:10 <clarkb> /home/gerrit2/review_site/data/replication/ref-updates/ is the host side path for the contents on review02 19:22:11 <fungi> yep, will refresh my noodle on those 19:22:17 <clarkb> they are in a different path on the container side 19:22:24 <clarkb> thanks 19:22:51 <clarkb> That was all I had for Gerrit. 19:22:58 <clarkb> #topic Server Upgrades 19:23:19 <clarkb> I continue to have no progress here myself. I saw that corvus deleted the old zuul executors though so that is good cleanup 19:23:42 <corvus> yup, all gone, and zuul still thinks it has 12 executors so i think i got the right ones :) 19:23:50 <clarkb> excellent 19:24:09 <corvus> that's zm, and ze upgraded to jammy now 19:24:29 <corvus> also the registry is jammy 19:24:49 <clarkb> only the schedulers remain in zuul land 19:24:55 <fungi> and the lb 19:24:56 <corvus> yep 19:25:39 <clarkb> Anything else on the subject of server upgrades? 19:26:26 <clarkb> #topic Fedora Cleanup 19:27:01 <clarkb> Sometime in the last couple days (the dateline makes it confusing) tonyb and I discussed making progresson this 19:27:20 <clarkb> Basically plan is to copy roles as necessary, make modifications, point base-test at modified roles and test from there 19:27:30 <clarkb> specifically for mirror selection updates to fedora 19:27:39 <clarkb> so that we can stop mirroring fedora before cleaning up the images entirely 19:28:03 <clarkb> I don't know how far tonyb got on that. I know tony wanted to test things a bit locally before pushing stuff up (since our mirrors are public anyway this is something that can be done) 19:29:08 <clarkb> Hopefully we'll have changes we can review soon 19:29:14 <clarkb> #topic Storyboard 19:29:39 <clarkb> Anything new to report here? Should we drop it out of the meeting agenda until we've had time to process any next steps? 19:30:00 <frickler> yes, I was thinking the same 19:30:10 <fungi> wfm unless something changes 19:30:12 <clarkb> The needed gerrit restart is related to updating issues correctly but only the lp side I think 19:30:31 <fungi> right, the sb equivalent has remained working 19:30:50 <clarkb> ack 19:30:55 <clarkb> Let's move on then 19:30:59 <fungi> because it uses the its plugin instead of hook scripts in jeepyb 19:31:27 <clarkb> #topic Gitea 1.20 upgrade 19:32:00 <clarkb> Gitea has released a 1.20.1 version now (they appear to have backpedalled on allowing all url types in markdown by default for securiry reasons) 19:32:30 <clarkb> I'm still trying to work through the breaking change slist and a list of unallowed url type swas on the todo. Hopefully 1.20.1 makes that simpler 19:32:39 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/886993 Gitea 1.20 change. 19:33:01 <clarkb> Unfortunately I'm struggling to collect hte access log files which is why that change currently fails in ci. I think I need to hold the node and check directly 19:33:18 <clarkb> (Access log format changes in 1.20 and I want to confirm we still get something useful from it) 19:33:59 <clarkb> The other itme on the todo list that will need help is the theme color selection change. I'm not quite parsing the intended update given by the release notes 19:34:11 <clarkb> w eneed to set some sort of attributes in the base template or something 19:34:39 <clarkb> No progress on the oauth2 jwt stuff unfortunately 19:34:50 <clarkb> at least not according to release notes. I guess I can test that when I hold a node 19:35:08 <clarkb> #topic Etherpad 1.9.1 19:35:18 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/887006 Etherpad 1.9.1 19:35:35 <clarkb> There is a held node if you look at the child chnage of ^ and pull up the logs for the failed test run (sorry I odn't have that in front of my right now) 19:35:54 <clarkb> After making the config change from false to null values for color and use rname the UI appears to act like it did in 1.8.x 19:36:24 <clarkb> I did some simple testing with the held node and I think that 1.9.1 chnage is ready for review/testing by others and hopefully merging 19:37:14 <clarkb> #topic Python Container Image Updates 19:37:23 <clarkb> #link https://review.opendev.org/q/topic:bookworm-python 19:37:30 <clarkb> #link https://review.opendev.org/q/topic:force-base-image-build 19:37:39 <clarkb> I think this is moving along well. Mostly needs reviews at this point 19:38:02 <clarkb> Note the limnoria image update should be approved during a period where we don't have any meetings running as the image update will restart the bot and interrupt any meetings 19:38:58 <fungi> yeah, i keep forgetting to do that at an appropriate time, i only ever remember about it at inconvenient times 19:39:30 <clarkb> heh ya I theoretically have time here but so many distractions 19:40:20 <clarkb> I've also made a note in the agenda that we should consider deleting our old bare python version tags so that image build sfor anything still using them fail forcing a move to the distro + python version tags 19:40:36 <clarkb> and a zuul-jobs tool that can scan docker hub and gerrit to determine if change tags are leaked and can be cleaned up 19:41:06 <clarkb> I can probably look at that second thing as it is probably similarish to the script we used to sync from docker hub to quay and back again 19:41:33 <clarkb> #topic Meetpad LE Cert Refresh 19:42:04 <clarkb> As fungi pointed out we have a cert but the hanlder that syncs from the acme path on disk to the jitsi meet service path on disk didn't run so we don't have a new cert where we need it 19:42:13 <clarkb> this means simply restarting services won't fix this 19:42:47 <clarkb> instead we should determine what is necessary to rerun the acme system in its entirty so that it either succeeds and we win or it fails and we can debug further 19:42:53 <fungi> frickler was the one who spotted that actually 19:43:27 <clarkb> alternatively w ecan manually copy the file and manually restart services 19:43:46 <frickler> I'm not sure if the rerun will work if LE thinks the cert has already been refreshed 19:43:51 <clarkb> ianw: if you get a chance to see this, what do you think is the preferred way to trigger a cert refresh from scratch 19:44:04 <frickler> so the second option would be my preferred choice 19:44:04 <clarkb> frickler: ya there are some things we need to do to force the system to rerun again 19:44:25 <clarkb> if ianw remembers those without us needing to dig into the acme stuff we should probably document it and attempt that with meetpad 19:44:35 <fungi> in theory, the staged cert being absent would look like a newly-deployed server to acme.sh, right? which should force it to request a new cert? 19:44:43 <clarkb> I think there is some record file it keeps that we can either edit a timestamp in or delete the file then it retriggers 19:45:00 <fungi> ahh, so there's more retained state than just the cert 19:45:04 <clarkb> fungi: maybe? I think it keeps some state info in a config/metadata file though 19:45:11 <clarkb> it may check both things 19:45:58 <clarkb> side note we should followup with acme.sh on fixing that bug thta preventing us from upgrading to latest resulting in backporting patches instead 19:47:06 <clarkb> I'll try to take a look at that stuff and if I hit a wall we can do the manual sync and restart for now 19:47:15 <clarkb> #topic Open Discussion 19:47:18 <clarkb> Anything else? 19:49:03 <fungi> oh, one change 19:49:19 <fungi> #link https://review.opendev.org/888901 Add 32GB nodes Ubuntu Focal and Jammy nodes 19:49:38 <fungi> that basically just duplicates the existing bionic nodes in vexxhost ca-ymq-1 19:49:56 <fungi> so probably uncontroversial, but i figure it might need another set of eyes/opinions 19:50:17 <clarkb> ya I think we can go ahead and approve that. I'll do that as soon as I'm done with the meeting 19:51:07 <fungi> cool 19:51:48 <clarkb> sounds like that may be everything then. Thank you everyone! 19:51:56 <clarkb> #endmeeting