19:00:16 <clarkb> #startmeeting infra
19:00:16 <opendevmeet> Meeting started Tue Nov 12 19:00:16 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:16 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:16 <opendevmeet> The meeting name has been set to 'infra'
19:00:27 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/HQSCECQODT5XIHWR633MLLITCB3FG243/ Our Agenda
19:00:44 <clarkb> sorry for getting the agenda out late this week. I was out yesterday so sent it first thing today
19:01:08 <clarkb> #topic Announcements
19:01:37 <clarkb> I didn't have anything
19:02:09 <clarkb> I expect we'll have our regularly scheduled meeting next week and the week after. There is a slight possibility the one the week after may run into thanksgiving plans and get skipped but I have no plans that would do so at this point
19:02:17 <clarkb> #topic Zuul-launcher image builds
19:02:46 <clarkb> corvus: last week you indicated that we needed additional changes in zuul as well as needing testing for raw image upload/download timing
19:02:50 <clarkb> any updates on those items?
19:03:04 <corvus> nothing since last week
19:03:29 <corvus> slowly working on the upstream zuul changes; not started on the opendev-specific changes
19:04:12 <clarkb> thanks. So to recap on the opendev side we can add more image builds (in addition to bullseye) and test image builds using raw images instead of qcow2 to see what the timing of that looks like
19:04:50 <corvus> yep -- though note that one of the needed upstream changes is making the zuul-launcher download faster; so raw upload timings would be useful, but download timings are not optimized yet.
19:05:47 <clarkb> got it
19:05:51 <clarkb> anything else on this subject?
19:06:01 <corvus> not from me
19:06:16 <clarkb> #topic Backup Server Pruning
19:06:59 <clarkb> as mentioned last week ianw pushed up a change to do automated retirement and purging of backups from ansible possible. Its requires us to explicitly list things to retire then purge but is nice for record keep in comparison to what I did manually
19:07:08 <clarkb> the underlying process of removing backups ends up being very similar though
19:07:16 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/933700 Backup deletions managed through Ansible
19:07:34 <clarkb> this is the primary change to do that. I'm happy with it but did write a followup to fix a minor issue we're already hitting after my manual cleanups
19:07:39 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/934768 Handle backup verification for purged backups
19:08:21 <clarkb> maybe we can get those reviewed and landed then continue with cleanups using this system? I suspect I'll have to manually bring ethercalc02 into the same state on the other backup server but that isn't a big deal
19:09:03 <clarkb> if we're happy with that I'll update my documentation change as well to refer to this system
19:09:19 <clarkb> or just abandon it if it doesn't serve a prupose
19:10:14 <fungi> worth noting, it seems like the "backup inconsistency" warnings we keep receiving for ethercalc are sent weekly
19:10:27 <clarkb> fungi: yup that second change should address that
19:10:28 <fungi> so it wasn't just a one-time event
19:10:32 <fungi> cool
19:10:43 <clarkb> (I have to touch .retired in the ethercalc02 dir to make that work but then it should be handled)
19:11:52 <clarkb> #topic Upgrading old servers
19:12:08 <clarkb> I don't think there is anything new on this topic. But I'll leave it open for a minute or two for others to jump in with udpates if we have them
19:14:01 <clarkb> #topic Docker compose plugin with podman service for servers
19:14:21 <clarkb> similar situation with this topic. I'm unaware of any updates but will leave it open for a couple minutes if ther eare any
19:16:27 <clarkb> #topic Enabling mailman3 bounce processing
19:16:34 <clarkb> there are updates on this topic.
19:16:56 <clarkb> As promised I configured service-discuss to enable bounce processing. I didn't change any of the defaults as they seemed like a reasonable place to start.
19:17:19 <clarkb> This list is very low traffic so nothing really happened until I sent the meeting agenda email earlier today. That resulted in two list members having non zero scores
19:17:52 <clarkb> This is a good indication that bounce processing is unlikely to take immediate drastic action but also a more active list like opnstack-discuss might more quickly trend twoards removing people
19:18:06 <clarkb> in any case I think I'm comfortable with proceeding with enabling this on more (all?) lists if others are
19:18:17 <frickler> +1
19:18:58 <clarkb> fungi: you moderate many lists any preference on approach here? SHould we try to do it on a list by list basis via moderator action or something else?
19:20:27 <fungi> i think it's probably fine to roll out to more lists at this point
19:21:18 <clarkb> fungi: did you want to pick some you moderate and do that? I can apply it to the other opendev lists
19:21:26 <clarkb> corvus: I guess you might consider doing it for the zuul lists too
19:21:53 <fungi> sure, i can
19:21:57 <corvus> do we need to manually do it for all lists?  is there a default for new lists?
19:22:28 <clarkb> corvus: we currently don't configure it when creating new lists, but in theory we can update list creation to enable it on them. But ya we don't really manage list settings post creation
19:22:33 <fungi> it's possible the default for new lists is already on, and this is migrated lists we're changing? i'd need to look
19:22:40 <clarkb> oh ya that could be too.
19:23:16 <corvus> ack... my view is that this should be enabled for all existing and new lists without further delay (but, also, without urgency).
19:23:22 <clarkb> but ya we should enable it by default on new lists as part of this process. I'm less sure if there is value in automating enablement in existing lists
19:23:28 <clarkb> corvus: ack thanks
19:23:33 <corvus> so whatever the best method to achieve that is... :)
19:24:13 <fungi> i'll take a look at the settings for the new list added by change 924432 in july
19:24:31 <fungi> but it'll take me a few minutes since i need to use the superuser on it
19:25:43 <fungi> "Process Bounces" is "no" there
19:26:00 <clarkb> ok so default is off on list creation but it should be possible to switch that to on somehow
19:26:13 <fungi> so i guess we have it off by default for new lists, i'll see if we have that set in config
19:26:19 <clarkb> and then we leave existing lists alone so we'll need to either manually toggle them or write some tool to go update each of them separately
19:26:48 <clarkb> corvus: more generally I think it should be safe to list moderators to toggle the setting in the lists they manage
19:26:58 <clarkb> and if we do automate it for everything we should noop on those
19:27:24 <corvus> ack; i'll manually flip the bit for zuul lists
19:28:43 <corvus> erm, what's the option name? :)
19:29:21 <clarkb> corvus: you go to settings then bounce processing then flip it to enabled
19:29:31 <corvus> https://imgur.com/FwNWrOI
19:29:32 <corvus> that one?
19:29:46 <clarkb> yes
19:29:57 <corvus> ok; that was already set for zuul lists
19:30:08 <clarkb> oh more data backing up that this should be fine
19:30:40 <clarkb> anyway we can move on I think we've got rough agreement to go aehad and do this, we can update new list creation then sort out existing lists as we go
19:30:45 <clarkb> #topic Intermediate Insecure CI Registry Pruning
19:31:03 <clarkb> The intermdiate registry / insecure ci registry seems to be more stable now after updating its installation
19:31:26 <clarkb> after discussing needed pruning last week we did some code review and found a bugfix as well as added a dry run option
19:31:44 <clarkb> the plan is still to do a proper pruning on Friday as announced but I'm curious if we want to do a dry run first say tomorrow?
19:32:00 <clarkb> or do we think it is safest to do the dry run in the announced window which is ~Friday
19:32:56 <corvus> i think dry-run now sounds good
19:33:21 <clarkb> ok I'll work on figuring that out probably first thing tomorrow. (I only took one day off yesterday but I returend to a fairly big backlog I'm digging through today)
19:33:30 <corvus> i mean, worst case if i completely botched it is that we accidentally delete some temporary registry things and there's a small blip in jobs which can be corrected by rechecking.
19:33:31 <frickler> this might also be affected by the rax identity issues, though?
19:33:44 <clarkb> frickler: oh yes good point
19:33:51 <clarkb> so waiting for that to resolve is a good idea too
19:34:42 <clarkb> hopefully by tommorrow morning that wilkl be happy and I'll start a dry run in screen on the registry node
19:34:55 <clarkb> I am hopeful that the bugfix means this will just work (tm)
19:35:25 <corvus> yep, it's a plausible explanation and i think we can reset to "assume it works and debug what doesn't"
19:35:42 <clarkb> #topic Gerrit 3.10 Upgrade Planning
19:35:48 <clarkb> #link https://etherpad.opendev.org/p/gerrit-upgrade-3.10 Gerrit upgrade planning document
19:36:18 <clarkb> I would like to announce this upgrade soon if possible. Does the currently pencilled in time of Friday December 6 starting at 1600 UTC not work for some reason?
19:37:26 <fungi> wfm
19:37:49 <clarkb> ya I'm not hearing any concerns I'll work on sending that email out later today unless something comes up before then
19:38:14 <clarkb> Other than announcing the upgrade the other current today is simply going through the etherpad and ensuring we're comfortable with the changes/updates and any mitigations we might have
19:38:59 <clarkb> so please look over the etehrpad, add notes or concernsi f you have them and I'll try to regularly review it and address them
19:39:11 <clarkb> otherwise this seems like we're on track for uprading on the 6th
19:40:28 <clarkb> #topic RTD Build Trigger Requests
19:40:47 <clarkb> after more debugging ianw is of the opinion that we're getting hit by client fingerprinting in the CDN layer
19:41:03 <clarkb> one thought is that simply using a different client (like curl) may sidestep the issue
19:41:05 <clarkb> #link https://review.opendev.org/c/zuul/zuul-jobs/+/934243 switch to curl instead of ansible uri module
19:41:45 <ianw> yeah, they linked to  a post about their anti-bot things
19:42:06 <ianw> https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/
19:42:23 <clarkb> #link https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/
19:42:52 <clarkb> I'm actually not sure if curl is in our executor images
19:43:06 <clarkb> we should check that before approving I guess but otherwise that seems like a reasonable workaround to me
19:43:07 <fungi> it's present on the executors themselves at least (i checked)
19:43:27 <clarkb> this would run within the container I think
19:43:38 <ianw> i think that job might start a node but not use it?
19:43:50 <fungi> ah, i'm always confused as to whether ansible is running things from inside the zuul-executor container or the distro
19:44:12 <fungi> ianw: yeah, that came up separately, for some reason it has a default nodeset instead of an empty one
19:44:18 <clarkb> https://opendev.org/openstack/project-config/src/branch/master/playbooks/publish/trigger-rtd.yaml#L2 it runs on localhost in the job not sure if the job has a nodeset
19:44:44 <corvus> docker run --rm -it quay.io/zuul-ci/zuul-executor:latest bash
19:44:44 <corvus> root@8f99eae3145d:/# curl
19:44:44 <corvus> curl: try 'curl --help' or 'curl --manual' for more information
19:44:54 <clarkb> https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L1108-L1127
19:44:58 <clarkb> it does have a nodeset I think
19:45:14 <clarkb> corvus: ok cool so it should work then we can also use an empty nodeset in the job
19:46:13 <clarkb> #topic promote-openstack-manuals-developer fails with Ansible errors
19:46:25 <clarkb> This is a different type of job error that occurs due to vairables being undefined
19:46:30 <clarkb> #link https://zuul.opendev.org/t/openstack/build/1a84db5d173c4777b9d730923721b04a Example failure
19:47:01 <clarkb> part of the problem here is that this promtoe job works for a special set of developer docs that aren't part of the main docs.openstack.org site so have specail everything and in this case something isn't quite right
19:47:36 <clarkb> I suspect that fixing this is going to require someone pages in all of the things that make this different and then apply that perspective to the jobs and add the missing bits?
19:48:02 <frickler> note that the last known successful run of that job was > 3y ago. and I assume the regression may have been triggered by a change in zuul almost that old
19:48:42 <frickler> https://opendev.org/zuul/zuul/commit/be50a6ca42c41c0608dd02930a01123afd4e6064
19:48:42 <clarkb> ya at this point I doubt that we're trying to dig down deltas in what changed to break it and instead just need to undersatnd it properly and determine why it is broken and roll forward
19:49:17 <clarkb> personally I've always argued that developer.openstack.org should've been docs.openstack.org/developer and use the same systems as exist for docs
19:49:37 <clarkb> this is I think evidence for why this is a good idea but also that ship sailed a long time ago and the best thing is to simply figure out what needs to be changed to make it work
19:51:23 <clarkb> is anyone interested in debugging this further? I know fungi took a stab at it.
19:51:37 <clarkb> also I'm not sure there is anything opendev or zuul specific about it other than that is where the failure is originating
19:51:49 <clarkb> it should be solveable by anyone reading the jobs and error message?
19:51:50 <fungi> i've unfortunately already paged out 99% of the context there. pulled in too many directions
19:52:27 <frickler> the error comes from some data that iiuc is coming from a secret
19:52:28 <fungi> the only reason i even started looking at it was because i noticed the site mentioned and linked to trystack.org which hasn't existed for years, and i wanted to get rid of that dead link
19:52:36 <frickler> so not easy to debug for an outsider IMO
19:52:52 <clarkb> frickler: outsiders have the same info that we do when it comes to secrest in zuul though?
19:52:59 <clarkb> like I don't generally go off and decrypt things
19:53:09 <clarkb> (in fact I'm not sure I ever have)
19:53:15 <frickler> it may be needed in this case, though
19:53:44 <frickler> but anyway, I can look further into this, but with low priority
19:53:45 <clarkb> yes it is possible this would be that case
19:53:50 <clarkb> ok thanks
19:53:57 <clarkb> #topic Open Discussion
19:53:59 <clarkb> Anything else?
19:54:44 <frickler> someone mentioned connectivity issues to vexxhost IPv6 earlier today
19:54:57 <fungi> the openinfra foundation gained control of the openinfra.org domain and is going to start working to relocate their various sites out of the google-controlled .dev tld
19:55:08 <frickler> I didn't look closer yet, but seems to be recurring issue of what we had earlier
19:55:10 <fungi> i'm putting together a plan to migrate lists.openinfra.dev to lists.openinfra.org
19:55:13 <clarkb> and jayf mentioned connectivity issues that I think were actually slowness (connections are logged in sshd log but things took longer than expected)
19:55:55 <frickler> I did confirm the "no route to host" from AS3320 (Deutsche Telekom, german incumbent ISP)
19:56:00 <fungi> i'm shooting for migrating that mailman site the first week of december, so probably sending an announcement to the foundation mailing list about it on monday. i'll circulate an etherpad with the planned steps in the coming days
19:56:27 <clarkb> fungi: thanks for the heads up
19:56:45 <frickler> fungi: can we keep a redirect from the old site?
19:56:56 <fungi> the change on the mailman side is pretty simple because it's all in mariadb, so just some update queries (ideally with the services temporarily offline)
19:57:10 <fungi> frickler: yeah, i plan to keep the old urls and addresses working indefinitely
19:57:10 <frickler> or do they want to drop the .dev domain?
19:57:30 <fungi> they just want .org to be the official domain, but will retain control of the .dev one
19:57:37 <frickler> cool +1
19:57:59 <fungi> there are no plans to drop it, so we can keep redirects and address aliases for ~ever
19:58:33 <fungi> i'll take care of the changes to add the redirects, aliases, config update, et cetera
19:58:44 <corvus> #link   https://review.opendev.org/c/openstack/project-config/+/934832 Fix openstack developer docs promote job [NEW]
19:58:56 <corvus> i did that with no special knowledge
19:59:07 <corvus> i just read the error message and looked at the secret def
19:59:09 <corvus> hope it works
20:00:07 <clarkb> and we are at time
20:00:10 <clarkb> thank you everyone
20:00:16 <clarkb> We'll be back next week same time and location
20:00:18 <clarkb> #endmeeting