19:01:25 <clarkb> #startmeeting infra 19:01:25 <opendevmeet> Meeting started Tue Jul 11 19:01:25 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:25 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:25 <opendevmeet> The meeting name has been set to 'infra' 19:01:49 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/FV2S3YE62K34SWSZRQNISEERZU3IR5A7/ Our Agenda 19:02:15 <clarkb> #topic Announcements 19:02:25 <clarkb> I did make it to UTC+11 19:03:14 <clarkb> I'm finding that the best time to sit at a computer is something like 01:00/02:00 UTC and later simply due to weather. But we'll see as I get more settled this is only day 5 or something there 19:04:36 <clarkb> #topic Topics 19:04:43 <clarkb> #topic Bastion Host Updates 19:04:56 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups 19:05:12 <clarkb> Looks like this set of changes from ianw could still use some infra root review 19:05:42 <clarkb> if we can get that review done we can plan the sharing of the individual key portions 19:06:17 <clarkb> #topic Mailman 3 19:06:32 <clarkb> fungi: any updates on the vhosting? then we can talk about the http 429 error emails 19:06:38 <fungi> no new progress, though a couple of things to bring up yeah 19:06:59 <clarkb> go for it on the new things 19:07:03 <fungi> the first you mentioned, i'm looking to see if there's a way to create fallback error page templates for django 19:07:19 <fungi> but perhaps someone more familiar with django knows? 19:07:35 <fungi> i know we can create specific error page templates for each status 19:08:12 <fungi> so we could create a 429 error page template, but what i'm unsure about is if there's a way to have an error page template that applies to any error response which doesn't have its own separate template 19:08:49 <fungi> i think i recall tonyb mentioning some familiarity with django so i might pick his brain later if so 19:09:06 <clarkb> I'm unsure myself 19:09:14 <fungi> assuming my web searches and documentation digging turn up little of value 19:09:23 <clarkb> a default would be nice if possible but I suspect adding a 429 file would be a big improement alone 19:09:53 <tonyb> I don't think it was me 19:10:01 <fungi> oh too bad 19:10:50 <tonyb> sorry I'll do better :P 19:10:55 <fungi> the other item is we've had a couple of (mild) spam incidents on the rust-vmm ml, similar to what hit zuul-discuss a few months back. for now it's just been one address i initially unsubscribed and then they resubscribed and sent more, so after the second time i switched the default moderation policy for their address to discard instead of unsubscribing them 19:11:58 <fungi> but still might consider switching the default moderation policy for all users on that list to moderate and then individually updating them to accept after they send good messages 19:12:17 <fungi> that is if the problem continues 19:12:38 <clarkb> I'm good with that but ideally if we can find a moderator in that community to do the filtering 19:12:50 <clarkb> I'm not sure we should be filtering for random lists like that. 19:13:44 <fungi> well, yes i stepped in as a moderator since i was already subscribed and the only current community moderator had gone on sabbatical, but we found another volunteer to take over now 19:14:03 <clarkb> great 19:14:06 <fungi> my concern is it seems like the killer feature of mm3, the ability for people to post via http, increases the spam risk as well 19:14:50 <fungi> which is going to mean a potentially increased amount of work for list moderators 19:15:13 <clarkb> Though two? incidents in ~6 months ins't too bad 19:16:08 <fungi> yeah, basically 19:16:21 <fungi> but these are also very low-volume and fairly low-profile lists 19:16:39 <fungi> so i don't know how that may translate to some of the more established lists once they get migrated 19:16:48 <fungi> something to keep an eye out for 19:16:55 <clarkb> there is probably only one way to find out unfortunately 19:17:00 <fungi> agreed 19:17:05 <fungi> anyway, that's all i had on this topic 19:17:15 <clarkb> #topic Gerrit Updates 19:17:47 <clarkb> We are still building a Gerrit 3.8 RC image. This is only used for testing the 3.7 to 3.8 upgrade as well as genereal gerrit tests on the 3.8 version but it would be good to fix that 19:17:59 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/885317?usp=dashboard Build final 3.8.0 release images 19:18:15 <clarkb> Additionally the Gerrit replication tasks stuff is still ongoing 19:18:42 <clarkb> I think my recommendation at this point is that we revert the bind mount for the task data so that when we periodically update our gerrit image and replace the gerrit contianer those files get automatically cleaned up 19:18:55 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/884779?usp=dashboard Stop bind mounting replication task file location 19:19:32 <clarkb> If we can get reviews on one or both of those then we can coordinate the moves on the server itself to ensure we're using the latest image and also cleaning up the leaked files etc 19:19:46 <fungi> what's the impact to container restarts? 19:19:59 <fungi> if we down/up the gerrit container, do we lose queued replication events? 19:20:27 <clarkb> fungi: yes. This was the case until very recently when I swapped out the giteas though so we were living with that for a while already 19:21:08 <clarkb> The tradeoff here is that having many leaked files on disk is potentially problematic when that number gets large neough. Also these bad replication tasks produce errors on gerrit startup that flood the logs 19:21:26 <clarkb> we'd be trading better replication resiliency for better service resiliency I think 19:22:53 <clarkb> fungi: that said having anothe rset of eyes look over the situation may produce additional ideas. The alternative I've got is the gerrit container startup script updates that try to clean up the leaked file sfor us. I don't think the script will clear all the file scurrently but having a smaller set to look at will help identify the additional ones 19:23:39 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/880672 Clear leaked replication tasks at gerrit startup using a script 19:24:24 <clarkb> I'm happy to continue down that path as well, its jus thte most risky and effort needed option 19:24:26 <fungi> thanks, makes sense 19:24:37 <clarkb> risky because we are automating file deletions 19:25:03 <clarkb> for a todo here maybe fungi can take a look this week and next week we can pick an option and proceed from there? 19:25:47 <clarkb> The other Gerrit item is disallowing implicit merges across branches in our All-Projects ACL 19:26:04 <clarkb> I can't think of any reason to not do this and I don't recall any objections to this in prior meetings where this was discussed 19:26:15 <fungi> yeah, i should be able to 19:26:28 <clarkb> receive.rejectImplicitMerges is the config option to reject those when set to true 19:26:32 <fungi> did i propose a change for that? i can't even remember now 19:26:48 <clarkb> fungi: I don't think so since it has to be done directly in All-Project sthen simply recorded in our docs? 19:26:55 <clarkb> there may be a chngae to do the recording bit /me looks 19:27:19 <clarkb> https://review.opendev.org/c/opendev/system-config/+/885318 19:27:39 <clarkb> so ya if you have time to push that All-Projects update I think you can +A the change to record it in our docs 19:28:39 <fungi> oh, cool 19:28:45 <fungi> i guess someone did propose that 19:29:23 <clarkb> that was all I had for gerrit. Anything else befor ewe move on? 19:31:04 <clarkb> #topic Server Upgrades 19:31:17 <clarkb> I'm not aware of any changes here since we last met 19:31:44 <clarkb> tonyb helped push the insecure ci registry upgrade through. I may still need to delete the old server I can't recal lif I did that right now 19:32:09 <corvus> ze04-ze06 upgraded to jammy today 19:32:23 <clarkb> I think tonyb is looking at other things now in order to diversify the bootstrapping process as an OpenDev contributor so I'll try to look at some of the remaining stragglers myself as I have time 19:32:26 <clarkb> corvus: excellent 19:32:44 <tonyb> the cleanup is removing it (01) from the inventory and then infra-root deleting the 01 vm? 19:32:59 <clarkb> tonyb: correct 19:33:12 <tonyb> Okay 19:34:11 <clarkb> #topic Fedora Cleanup 19:34:38 <clarkb> tonyb: I've lost track o fwhere we were in the mirror configuration stuff. Are there changes you need reivewing on or input on diretion? 19:35:21 <tonyb> I need to update the mirror setup witrh the new mirrorinfo variable 19:35:59 <clarkb> tonyb: is that something where some dedicated time to work through it would be helpful? if so we can probably sort that out with newly overlapping timezones 19:36:44 <tonyb> Yeah, that's a good idea. I understand the concept of what needs to happen but I'm in danger or overthiking it 19:37:04 <clarkb> ok lets sync up when it isn't first thing in the morning for both of us and take it from there 19:37:24 <tonyb> great 19:37:28 <clarkb> #topic Quo vadis Storyboard 19:38:16 <fungi> i think i switched a neutron deliverable repo over to inactive and updated its description to point to lp last week? openstack/networking-odl 19:38:18 <clarkb> One thing I noticed the other day is that some projects like starlingx are still createing subprojects in storyboard. We haven't told them to stop and I'm not sure we should, but they were confused that it seems to take some time to do that creation. I think we are only creating new storyboard projects once a day 19:38:56 <clarkb> At this point I'm not sure there is much benefit in having project-config updates trigger the storyboard job more quickly 19:39:16 <clarkb> But it was a thing people noticed so I'm mentioning it here 19:39:35 <fungi> there was also some discussion about sb in the #openstack-sdks channel, in particular a user was surprised to discover that an unescaped <script> tag in a story description resulted in truncating the text once displayed. fairly easy to avoid, but turned into some conversation about why getting a fix for such things implemented would be tough with out current deployment 19:39:56 <tonyb> Can we run the create, I assume via cron, twice (#gasp) a day? 19:40:35 <clarkb> tonyb: I believe the job that does it is infra-prod-remote-puppet-else 19:40:43 <clarkb> which at this point should mostly just be storyboard? 19:40:53 <clarkb> we have removed the vast majority of any remaining puppet 19:40:56 <tonyb> Ahhh okay 19:41:27 <clarkb> so ya we basically run that job more often or when necessary to decrease the wait time 19:42:51 <clarkb> anything else storyboard related? 19:43:18 <fungi> i got nothin' 19:43:27 <clarkb> #topic Gitea Upgrades 19:43:39 <clarkb> Gitea 1.19.4 exists and fungi has pushed an update to upgrade us 19:43:51 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/887734?usp=dashboard Upgrade Gitea to 1.19.4 19:44:06 <fungi> seems to be fairly minor for our purposes, fixes mostly to stuff we disable anyway 19:44:10 <clarkb> These bugfix point upgrades tend to be pretty safe and straightforward though this one has a small template update 19:44:15 <fungi> so also probably not urgent 19:44:58 <clarkb> The other gitea upgrade to think about is the 1.20 update. They only have RC releases so far and no changelog so also not urgent 19:45:08 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/886993?usp=dashboard Begin process to upgrade gitea to 1.20 19:45:18 <clarkb> The 1.20 upgrade should happen after we upgrade to 1.19.latest 19:46:08 <clarkb> fungi: I think I may be able to be around with overlap in your timezone early tomorrow morning for me later afternoon for you if we want to land that 1.19.4 chnage an dmonitor 19:46:19 <clarkb> I'll ping you tomorrow if I manage that and we can take it from there? 19:46:44 <fungi> yeah, sure that works 19:47:01 <fungi> i should be at the keyboard by 1200z 19:47:04 <clarkb> I don't expect trouble but good to have people around if necessary 19:47:12 <clarkb> 1900 is probbaly about as eraly as I can manage :) 19:47:27 <clarkb> though maybe my evening overlaps with 1200 I need to math that out 19:47:39 <fungi> oh, you said early in your timezone not the other way around 19:47:39 <clarkb> #topic Etherpad Upgrade 19:47:45 <clarkb> fungi: ya 19:48:07 <clarkb> After a long release drought Etherpad made a 1.9.1 release 19:48:34 <fungi> i assume the commit we've been running on is included in that release 19:48:36 <clarkb> At first the tagged sha didn't actually build nad I was forced to use a commit that fixed the build issues after the reelase. But I think they updated/replaced the tag and now it seems to work 19:48:46 <fungi> aha, cool 19:48:48 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/887006?usp=dashboard Etherpad 1.9.1 19:49:01 <clarkb> I updated that chnage to use the tag again and it did not fial 19:49:30 <clarkb> Now there is an issue where numbered lists don't properly increment the list number values so every entry is 1. basically making it a weird bulleted list 19:49:39 <tonyb> Going back to gitea (sorry), is there any merrit to landing the bullseye -> bookworm update in the same window? 19:50:06 <clarkb> tonyb: no I think we should decouple those if we can. Basically swap gitea to the new debian with a fixed gitea version 19:50:13 <tonyb> okay 19:50:33 <clarkb> tonyb: the gitea upgrades are very low impact (we roll through them one by one and shouldn't lose any replication events and the haproxy should handle http requests too) 19:50:47 <tonyb> Okay 19:50:52 <clarkb> if the gitea upgrades were a bit more impactful then we should cnsider combining but they ar esuper transparent to users 19:51:14 <clarkb> Going back to etherpad I think that this is laos not very urgent given the known list bug 19:51:39 <clarkb> I also haven't held a node yet to interact with it which is probbaly a good idea to double check that we don't hvae any plugin interactions that will create problems for us 19:52:25 <clarkb> Reviews welcome and I'll try to get a held node up soon 19:53:09 <clarkb> #topic Open Discussion 19:53:12 <clarkb> Anything else? 19:53:35 <tonyb> Just to note that we started the bullseye to bookworm updates 19:53:42 <fungi> yay! 19:54:26 <tonyb> the first set of services I tried failed due to, what I think is, missing requires to get the speculative images in the buildset registry 19:54:46 <clarkb> ya I looked at that briefly and wasn't able to understand what was missing. It looks like we have what we need 19:55:04 <tonyb> Hopefully with some push and TZ overlap we can make solid progress 19:55:09 <clarkb> I feel like this comes up semi regularly though and Ineed to be better about writing down what the issue was/improving my understanding 19:55:36 <clarkb> corvus: any chance yo umight have a few minutes to look at that? 19:55:52 <clarkb> https://zuul.opendev.org/t/openstack/build/ab79e98cdd0242649cbc50593e87dae1/log/job-output.txt#723 is the failure 19:56:20 <corvus> yeah i'll take a look and followup in #opendev 19:56:42 <clarkb> thank you 19:57:00 <tonyb> Thanks 19:58:45 <clarkb> sounds like that is everything for now. Thank you everyone! 19:58:51 <clarkb> #endmeeting