19:00:16 #startmeeting infra 19:00:16 Meeting started Tue Nov 12 19:00:16 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:16 The meeting name has been set to 'infra' 19:00:27 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/HQSCECQODT5XIHWR633MLLITCB3FG243/ Our Agenda 19:00:44 sorry for getting the agenda out late this week. I was out yesterday so sent it first thing today 19:01:08 #topic Announcements 19:01:37 I didn't have anything 19:02:09 I expect we'll have our regularly scheduled meeting next week and the week after. There is a slight possibility the one the week after may run into thanksgiving plans and get skipped but I have no plans that would do so at this point 19:02:17 #topic Zuul-launcher image builds 19:02:46 corvus: last week you indicated that we needed additional changes in zuul as well as needing testing for raw image upload/download timing 19:02:50 any updates on those items? 19:03:04 nothing since last week 19:03:29 slowly working on the upstream zuul changes; not started on the opendev-specific changes 19:04:12 thanks. So to recap on the opendev side we can add more image builds (in addition to bullseye) and test image builds using raw images instead of qcow2 to see what the timing of that looks like 19:04:50 yep -- though note that one of the needed upstream changes is making the zuul-launcher download faster; so raw upload timings would be useful, but download timings are not optimized yet. 19:05:47 got it 19:05:51 anything else on this subject? 19:06:01 not from me 19:06:16 #topic Backup Server Pruning 19:06:59 as mentioned last week ianw pushed up a change to do automated retirement and purging of backups from ansible possible. Its requires us to explicitly list things to retire then purge but is nice for record keep in comparison to what I did manually 19:07:08 the underlying process of removing backups ends up being very similar though 19:07:16 #link https://review.opendev.org/c/opendev/system-config/+/933700 Backup deletions managed through Ansible 19:07:34 this is the primary change to do that. I'm happy with it but did write a followup to fix a minor issue we're already hitting after my manual cleanups 19:07:39 #link https://review.opendev.org/c/opendev/system-config/+/934768 Handle backup verification for purged backups 19:08:21 maybe we can get those reviewed and landed then continue with cleanups using this system? I suspect I'll have to manually bring ethercalc02 into the same state on the other backup server but that isn't a big deal 19:09:03 if we're happy with that I'll update my documentation change as well to refer to this system 19:09:19 or just abandon it if it doesn't serve a prupose 19:10:14 worth noting, it seems like the "backup inconsistency" warnings we keep receiving for ethercalc are sent weekly 19:10:27 fungi: yup that second change should address that 19:10:28 so it wasn't just a one-time event 19:10:32 cool 19:10:43 (I have to touch .retired in the ethercalc02 dir to make that work but then it should be handled) 19:11:52 #topic Upgrading old servers 19:12:08 I don't think there is anything new on this topic. But I'll leave it open for a minute or two for others to jump in with udpates if we have them 19:14:01 #topic Docker compose plugin with podman service for servers 19:14:21 similar situation with this topic. I'm unaware of any updates but will leave it open for a couple minutes if ther eare any 19:16:27 #topic Enabling mailman3 bounce processing 19:16:34 there are updates on this topic. 19:16:56 As promised I configured service-discuss to enable bounce processing. I didn't change any of the defaults as they seemed like a reasonable place to start. 19:17:19 This list is very low traffic so nothing really happened until I sent the meeting agenda email earlier today. That resulted in two list members having non zero scores 19:17:52 This is a good indication that bounce processing is unlikely to take immediate drastic action but also a more active list like opnstack-discuss might more quickly trend twoards removing people 19:18:06 in any case I think I'm comfortable with proceeding with enabling this on more (all?) lists if others are 19:18:17 +1 19:18:58 fungi: you moderate many lists any preference on approach here? SHould we try to do it on a list by list basis via moderator action or something else? 19:20:27 i think it's probably fine to roll out to more lists at this point 19:21:18 fungi: did you want to pick some you moderate and do that? I can apply it to the other opendev lists 19:21:26 corvus: I guess you might consider doing it for the zuul lists too 19:21:53 sure, i can 19:21:57 do we need to manually do it for all lists? is there a default for new lists? 19:22:28 corvus: we currently don't configure it when creating new lists, but in theory we can update list creation to enable it on them. But ya we don't really manage list settings post creation 19:22:33 it's possible the default for new lists is already on, and this is migrated lists we're changing? i'd need to look 19:22:40 oh ya that could be too. 19:23:16 ack... my view is that this should be enabled for all existing and new lists without further delay (but, also, without urgency). 19:23:22 but ya we should enable it by default on new lists as part of this process. I'm less sure if there is value in automating enablement in existing lists 19:23:28 corvus: ack thanks 19:23:33 so whatever the best method to achieve that is... :) 19:24:13 i'll take a look at the settings for the new list added by change 924432 in july 19:24:31 but it'll take me a few minutes since i need to use the superuser on it 19:25:43 "Process Bounces" is "no" there 19:26:00 ok so default is off on list creation but it should be possible to switch that to on somehow 19:26:13 so i guess we have it off by default for new lists, i'll see if we have that set in config 19:26:19 and then we leave existing lists alone so we'll need to either manually toggle them or write some tool to go update each of them separately 19:26:48 corvus: more generally I think it should be safe to list moderators to toggle the setting in the lists they manage 19:26:58 and if we do automate it for everything we should noop on those 19:27:24 ack; i'll manually flip the bit for zuul lists 19:28:43 erm, what's the option name? :) 19:29:21 corvus: you go to settings then bounce processing then flip it to enabled 19:29:31 https://imgur.com/FwNWrOI 19:29:32 that one? 19:29:46 yes 19:29:57 ok; that was already set for zuul lists 19:30:08 oh more data backing up that this should be fine 19:30:40 anyway we can move on I think we've got rough agreement to go aehad and do this, we can update new list creation then sort out existing lists as we go 19:30:45 #topic Intermediate Insecure CI Registry Pruning 19:31:03 The intermdiate registry / insecure ci registry seems to be more stable now after updating its installation 19:31:26 after discussing needed pruning last week we did some code review and found a bugfix as well as added a dry run option 19:31:44 the plan is still to do a proper pruning on Friday as announced but I'm curious if we want to do a dry run first say tomorrow? 19:32:00 or do we think it is safest to do the dry run in the announced window which is ~Friday 19:32:56 i think dry-run now sounds good 19:33:21 ok I'll work on figuring that out probably first thing tomorrow. (I only took one day off yesterday but I returend to a fairly big backlog I'm digging through today) 19:33:30 i mean, worst case if i completely botched it is that we accidentally delete some temporary registry things and there's a small blip in jobs which can be corrected by rechecking. 19:33:31 this might also be affected by the rax identity issues, though? 19:33:44 frickler: oh yes good point 19:33:51 so waiting for that to resolve is a good idea too 19:34:42 hopefully by tommorrow morning that wilkl be happy and I'll start a dry run in screen on the registry node 19:34:55 I am hopeful that the bugfix means this will just work (tm) 19:35:25 yep, it's a plausible explanation and i think we can reset to "assume it works and debug what doesn't" 19:35:42 #topic Gerrit 3.10 Upgrade Planning 19:35:48 #link https://etherpad.opendev.org/p/gerrit-upgrade-3.10 Gerrit upgrade planning document 19:36:18 I would like to announce this upgrade soon if possible. Does the currently pencilled in time of Friday December 6 starting at 1600 UTC not work for some reason? 19:37:26 wfm 19:37:49 ya I'm not hearing any concerns I'll work on sending that email out later today unless something comes up before then 19:38:14 Other than announcing the upgrade the other current today is simply going through the etherpad and ensuring we're comfortable with the changes/updates and any mitigations we might have 19:38:59 so please look over the etehrpad, add notes or concernsi f you have them and I'll try to regularly review it and address them 19:39:11 otherwise this seems like we're on track for uprading on the 6th 19:40:28 #topic RTD Build Trigger Requests 19:40:47 after more debugging ianw is of the opinion that we're getting hit by client fingerprinting in the CDN layer 19:41:03 one thought is that simply using a different client (like curl) may sidestep the issue 19:41:05 #link https://review.opendev.org/c/zuul/zuul-jobs/+/934243 switch to curl instead of ansible uri module 19:41:45 yeah, they linked to a post about their anti-bot things 19:42:06 https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/ 19:42:23 #link https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/ 19:42:52 I'm actually not sure if curl is in our executor images 19:43:06 we should check that before approving I guess but otherwise that seems like a reasonable workaround to me 19:43:07 it's present on the executors themselves at least (i checked) 19:43:27 this would run within the container I think 19:43:38 i think that job might start a node but not use it? 19:43:50 ah, i'm always confused as to whether ansible is running things from inside the zuul-executor container or the distro 19:44:12 ianw: yeah, that came up separately, for some reason it has a default nodeset instead of an empty one 19:44:18 https://opendev.org/openstack/project-config/src/branch/master/playbooks/publish/trigger-rtd.yaml#L2 it runs on localhost in the job not sure if the job has a nodeset 19:44:44 docker run --rm -it quay.io/zuul-ci/zuul-executor:latest bash 19:44:44 root@8f99eae3145d:/# curl 19:44:44 curl: try 'curl --help' or 'curl --manual' for more information 19:44:54 https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L1108-L1127 19:44:58 it does have a nodeset I think 19:45:14 corvus: ok cool so it should work then we can also use an empty nodeset in the job 19:46:13 #topic promote-openstack-manuals-developer fails with Ansible errors 19:46:25 This is a different type of job error that occurs due to vairables being undefined 19:46:30 #link https://zuul.opendev.org/t/openstack/build/1a84db5d173c4777b9d730923721b04a Example failure 19:47:01 part of the problem here is that this promtoe job works for a special set of developer docs that aren't part of the main docs.openstack.org site so have specail everything and in this case something isn't quite right 19:47:36 I suspect that fixing this is going to require someone pages in all of the things that make this different and then apply that perspective to the jobs and add the missing bits? 19:48:02 note that the last known successful run of that job was > 3y ago. and I assume the regression may have been triggered by a change in zuul almost that old 19:48:42 https://opendev.org/zuul/zuul/commit/be50a6ca42c41c0608dd02930a01123afd4e6064 19:48:42 ya at this point I doubt that we're trying to dig down deltas in what changed to break it and instead just need to undersatnd it properly and determine why it is broken and roll forward 19:49:17 personally I've always argued that developer.openstack.org should've been docs.openstack.org/developer and use the same systems as exist for docs 19:49:37 this is I think evidence for why this is a good idea but also that ship sailed a long time ago and the best thing is to simply figure out what needs to be changed to make it work 19:51:23 is anyone interested in debugging this further? I know fungi took a stab at it. 19:51:37 also I'm not sure there is anything opendev or zuul specific about it other than that is where the failure is originating 19:51:49 it should be solveable by anyone reading the jobs and error message? 19:51:50 i've unfortunately already paged out 99% of the context there. pulled in too many directions 19:52:27 the error comes from some data that iiuc is coming from a secret 19:52:28 the only reason i even started looking at it was because i noticed the site mentioned and linked to trystack.org which hasn't existed for years, and i wanted to get rid of that dead link 19:52:36 so not easy to debug for an outsider IMO 19:52:52 frickler: outsiders have the same info that we do when it comes to secrest in zuul though? 19:52:59 like I don't generally go off and decrypt things 19:53:09 (in fact I'm not sure I ever have) 19:53:15 it may be needed in this case, though 19:53:44 but anyway, I can look further into this, but with low priority 19:53:45 yes it is possible this would be that case 19:53:50 ok thanks 19:53:57 #topic Open Discussion 19:53:59 Anything else? 19:54:44 someone mentioned connectivity issues to vexxhost IPv6 earlier today 19:54:57 the openinfra foundation gained control of the openinfra.org domain and is going to start working to relocate their various sites out of the google-controlled .dev tld 19:55:08 I didn't look closer yet, but seems to be recurring issue of what we had earlier 19:55:10 i'm putting together a plan to migrate lists.openinfra.dev to lists.openinfra.org 19:55:13 and jayf mentioned connectivity issues that I think were actually slowness (connections are logged in sshd log but things took longer than expected) 19:55:55 I did confirm the "no route to host" from AS3320 (Deutsche Telekom, german incumbent ISP) 19:56:00 i'm shooting for migrating that mailman site the first week of december, so probably sending an announcement to the foundation mailing list about it on monday. i'll circulate an etherpad with the planned steps in the coming days 19:56:27 fungi: thanks for the heads up 19:56:45 fungi: can we keep a redirect from the old site? 19:56:56 the change on the mailman side is pretty simple because it's all in mariadb, so just some update queries (ideally with the services temporarily offline) 19:57:10 frickler: yeah, i plan to keep the old urls and addresses working indefinitely 19:57:10 or do they want to drop the .dev domain? 19:57:30 they just want .org to be the official domain, but will retain control of the .dev one 19:57:37 cool +1 19:57:59 there are no plans to drop it, so we can keep redirects and address aliases for ~ever 19:58:33 i'll take care of the changes to add the redirects, aliases, config update, et cetera 19:58:44 #link https://review.opendev.org/c/openstack/project-config/+/934832 Fix openstack developer docs promote job [NEW] 19:58:56 i did that with no special knowledge 19:59:07 i just read the error message and looked at the secret def 19:59:09 hope it works 20:00:07 and we are at time 20:00:10 thank you everyone 20:00:16 We'll be back next week same time and location 20:00:18 #endmeeting