19:01:06 <clarkb> #startmeeting infra 19:01:06 <opendevmeet> Meeting started Tue Apr 18 19:01:06 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:06 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:06 <opendevmeet> The meeting name has been set to 'infra' 19:01:12 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/246L5WVFVKR4XU6PIQRILQ6Z4PPG6NDZ/ Our Agenda 19:01:19 <clarkb> #topic Announcements 19:02:30 <clarkb> I didn't have any announcements 19:02:35 <clarkb> #topic Topics 19:02:38 <clarkb> We can jump right in 19:02:47 <clarkb> #topic Migrating container images to quay.io 19:03:28 <clarkb> Last week the promotion of container images through the intermediate registry ran successfully against zuul-client (this is the image that we've been using to test changes to jobs/playbooks/roles) 19:04:23 <corvus> i plan on copying over zuul images and updating zuul repos this week 19:04:25 <clarkb> I suspect that we are really close to taking action on this in OpenDev. In particular I expect the next sets of tasks to roughly be: copying existing image data from docker hub to quay, updating our jobs and possibly rebuilding images to test things, figuring out how to auto provision public repos in quay for new images 19:04:53 <clarkb> corvus: were you planning to do the copy of images for opendev as well? 19:05:09 <corvus> was just planning on zuul 19:05:12 <clarkb> corvus: and does the script handle deltas if we were to copy things today then rerun it again quickly before images moved? 19:05:36 <corvus> i think so 19:06:10 <corvus> with a quick change to omit :latest, could also run it again after the move, but probably no point to doing that 19:06:43 <clarkb> ok cool I can look at running it for opendev this week as an initial step with a plan to sync up any deltas as we actually move images on the zuul job side 19:07:02 <ianw> i know we figured out from old blog posts the bits to pre-make a public image on quay, but did we codify that in zuul-jobs yet? 19:07:05 <corvus> i think i shared the latest version of the script (which handles org renames and multi-arch) earlier in #opendev; i don't have the link handy right now though 19:07:30 <clarkb> ianw: I pushed a role for it https://review.opendev.org/c/zuul/zuul-jobs/+/877834 but I'm not sure where we would inject that in the current job setup 19:08:05 <clarkb> ianw: maybe it would be a separate job that runs before the promote job 19:08:10 <clarkb> to decouple things cleanly? 19:08:16 <ianw> ahh right yes i remember that now :) 19:08:27 <corvus> or could be a pre-run playbook for an inherited job 19:08:40 <clarkb> corvus: oh ya that should work too due to the nesting order 19:09:04 <ianw> it could be. since this requires an API key, that was my thinking that you'd already have to have an api key to use the tag-baesd promotion path anyway 19:09:55 <clarkb> ianw: ya though the creation api token needs very little in terms of permissions so doing it through the intermediate registry with a very limited key may still make sense 19:10:18 <clarkb> I can try to take a look at where to add this later this week too. I basically need to get through the etherpad (and possibly gitea?) stuff then I have a lot more time 19:10:30 <corvus> incidentally, i haven't heard anything more from the quay people about the zuul org. that's a little disappointing. :/ 19:10:42 <ianw> ++ i'm happy to help out too. agree we can sort out details later 19:10:51 <clarkb> sounds good. 19:11:34 <corvus> i don't think we need the api creation role for the zuul projects; we don't make new container images very often 19:11:44 <clarkb> Also as a side note it doesn't look like docker hub accidentally did the april 14 doom change (we didn't expect them to but images are updating on docker hub since) 19:12:06 <corvus> i mean, once it shows up and is settled, i'm not opposed to having it there; just that it's not in the critical path for now. 19:12:42 <clarkb> ya we could manually create them in opendev too, but I think we end up adding/removing images often enough that would be annoying 19:12:53 <ianw> no they definitely backtracked on that one 19:12:58 <ianw> docker i mean 19:13:04 <corvus> yep, every python version...etc.. 19:13:12 <clarkb> ianw: ya I know. They announced it too. I just wanted to make sure that reality panned out that way and it appears to have done so 19:13:46 <corvus> trust but verify. also, maybe don't actually trust. 19:14:26 <clarkb> alright anything else on this? Hopefully we've got some exciting updates next week 19:14:53 <corvus> heh i'm hoping for boring updates :) 19:15:20 <ianw> nope -- irrespective of changes upstream, i think we've got something nicer giving us options to point at multiple places 19:15:23 <clarkb> exciting because its done (at least for zuul) not due to any fireworks :) 19:15:33 <clarkb> #topic Bastion Host Updates 19:15:34 <corvus> ++ 19:15:57 <clarkb> The only thing I'm aware of here is the multiway encrypted backups stack needing reviews still 19:16:27 <clarkb> Launch node appears to be managing reverse dns in rax now and the openstack command in the venv we install can talk to rax and dns helper output all appear to work now when launching nodes 19:17:01 <ianw> yeah i used it to launch some dns nodes and it finally worked to give me all output :) 19:17:47 <clarkb> any other bridge related items? 19:17:55 <ianw> i even thought, wow, this is close to being something that could be a zuul job ... :) 19:18:34 <fungi> that would certainly be a cool future 19:19:05 <clarkb> #topic Mailman 3 19:19:31 <clarkb> We'll keep moving along. I noticed some activity on the change srelated to mailman 3 vhosting this morning but suspect it is still too early to have much to report? 19:19:44 <fungi> i've got a fresh held lists node from today (104.130.219.137) which includes the changes in 867986 and 867987, and am starting to try out the recommended commands on it this week for django site creation and association in postorius 19:19:49 <fungi> #link https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/I5MLJAESRXQARS3MZHF75YQCBY2OUL6G/ Re: Multi-domain oddities in Hyperkitty and Postorius 19:20:20 <fungi> but yeah, no actual progress to report 19:20:40 <clarkb> hopefully we'll have good news next week 19:20:48 <clarkb> #topic Gerrit Updates 19:21:18 <clarkb> There has been some movement on ACL synchronization to better align our project-config acl files with what is in Gerrit now post 3.7 upgrade migration 19:21:25 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/880115 Update project-config acls to match post migration acls in Gerrit 19:21:49 <clarkb> ianw: ^ the only reason I haven't +2'd that is you indicated we could land a couple of changes together to correct some of the concerns (post-review in particluar) 19:22:08 <clarkb> I haven't seen that followup change yet, but I think what you've got in that first one is fine as long as we do have a followup 19:22:20 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/879906 Gerrit also normalized indentation of config files we should consider this to be in sync 19:22:28 <ianw> ohh yes sorry, i had that up in emacs and got distracted on dns yesterday. will do 19:22:46 <clarkb> This second change is going to modify every single acl though and should be coorindated with a manual run of mangae-projects I think 19:22:51 <clarkb> (that first one is small enough it should be fine to land) 19:23:22 <clarkb> But the idea here is that since Gerit seems to insist on hardtabbing in the config files we should od that same thing to reduce deltas making it easier to read diffs and understand changes when upgrades ahppen 19:23:45 <clarkb> I think this is less urgent, but something we should eventually get to. 19:24:23 <ianw> yeah, i mean ideally we don't have changes on upgrade that get us out of sync, but if we do, it's easier to look at without also reformatting the whole thing 19:25:09 <fungi> #link https://review.opendev.org/879906 Indent Gerrit ACL options 19:25:21 <fungi> oh that was linked already 19:25:41 <clarkb> ya I think we should get ianw's first update in and then look at the tabs situation 19:25:57 <clarkb> since tabs are less necessary and more painful to get applied cleanly 19:25:59 <fungi> it's still wip because i'm unconvinced we should enforce it, given people already struggle with the current acl normalization checks 19:26:30 <clarkb> and it gives us time to decide if we think it should be enforced. I'm really sad we don't think people can properly add tabs to files :( 19:26:43 <clarkb> I also started on trying to address the leaked replication files on disk 19:26:52 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk 19:27:00 <clarkb> This is tested now (though somewhat artificially) 19:27:14 <corvus> would it maybe help to add comments to files telling people to use tabs? 19:27:34 <clarkb> corvus: I think fungi's concern is that editors don't always treat the tab button as a hard tab 19:27:39 <corvus> they're not hard to deal with, but it's sometimes unclear whether one should... 19:27:41 <clarkb> which will lead to confusing people when their changes fail 19:27:53 <fungi> oh, there's an idea, insert a boilerplate comment block summarizing the normalization enforcement rules 19:28:22 <clarkb> it certainly can't hurt to have that info available when making edits to those files 19:28:27 <corvus> could also maybe look at editor config lines... 19:28:35 <fungi> though i don't know if gerrit strips comments, it would at least be easier to ignore that block in the diffs 19:28:58 <ianw> it does also give you a diff to correct your mistake 19:29:36 <ianw> (it == the normalization path) 19:29:52 <corvus> oh yeah i was assuming comments work... i don't know. 19:30:04 <fungi> yeah, mainly hoping to avoid more review round-trips 19:30:10 <clarkb> if any of the current files have comments in them in project-config we could check pretty easily 19:30:45 <clarkb> looks like they don't :( 19:31:05 <fungi> none do, no 19:31:08 <clarkb> anyway I think that is he less urgent of the two sides of the acl normalization cleanups so we can tackle that once we're happy with the functional side 19:31:20 <fungi> tangentially, came up in the gerrit matrix/discord today that 3.8 has a new plugin system for code linking so our overrides to swap gitiles out with gitea may need revisiting during the next upgrade 19:32:19 <clarkb> for the replication tasks cleanup workaround I'm fairly certain it is ok for those files to be deleted while gerrit is shutdown because we weren't bind mounting the data dir previously. My change adds a script to the gerrit images that inspects the json content and deletes those that we know leak leaving tasks we want to replicate behind 19:32:40 <clarkb> it is possible that it may also leave other classes of task that can't be replicated behind but there are 11k ish files currnetly and skimming them I've found these three types so far 19:32:54 <clarkb> it will be easier to see any potential fourth type after we deal with these first 19:33:31 <clarkb> Feedback on whether or not this seems like a good approach would be good in addition to reviews for whether or not it does what it says on the tin 19:34:03 <fungi> what is upstream's position on the bug, or has anyone weighed in yet? 19:34:22 <clarkb> I haven't filed a bug on this one yet but probably should 19:34:30 <clarkb> now that I understand it a bit better. 19:34:59 <clarkb> Another user said that changing group perms for the replication targets didn't change it for them though which was my hunch (we replicate as if we are the anonymous user) 19:35:11 <ianw> it tried it and failed, right? it does seem like it should unlink the file after that 19:35:35 <clarkb> ianw: yes however it also does retries and I suspect that is the problem here 19:36:01 <clarkb> the plugin probably can't tell the difference between failure due to gerrit acls (or some other internal mechanism) saying no and a network failure or temporary rejection from the remote 19:36:18 <ianw> ahh, yeah that sounds very likely 19:36:30 <fungi> i would totally buy that explanation 19:36:37 <clarkb> if I find time I can dig into the plugin implementation itself 19:37:00 <clarkb> in the meantime I suspect that what I've proposed is a safe way to manage the blast radius of leaking files to disk over time and avoiding errors in gerrit's error log at startup 19:37:01 <ianw> but also fixable, luckily clarkb is our honorary on staff Java developer 19:37:05 <clarkb> heh 19:37:19 <clarkb> I'm happy for reviewers to say "this should be fixed upstream we don't want this hacky workaround" too 19:37:44 <ianw> it's only a git revert away from removal though 19:37:49 <clarkb> and hte last gerrit related item I had was a reminder we should clean up the 3.6 image at some point. Add a 3.8 image and update our upgrade job 19:38:12 <ianw> that might be a fun one to test quay creation 19:38:19 <clarkb> I don't think we'llrevert at this point. I'm happy for us to remove the 3.6 image now 19:38:26 <clarkb> oh ya that could be a good one for adding 3.8 maybe 19:39:05 <clarkb> I can followup on this as I dig into the quay stuff more later this week to se if it makes sense in that process somewhere 19:39:10 <clarkb> #topic Upgrading Servers 19:39:32 <clarkb> static.opendev.org and the ~40 somethign other names it hosts are now on a jammy static02 host. static01 is removed and out of dns too 19:40:30 <clarkb> I've got a new etherpad02 server up and running and tested a data migration from etherpad01 to etherpad02. It takes about 30 minutes to dump the db and 30 minutes to restore it plus time to copy the data between hosts and double check you aren't doing something sill. I notified service-announce that there would be a 90 minute outage of etherpad tomorrow at 22:00 UTC to do the 19:40:32 <clarkb> actual move 19:40:45 <clarkb> #link https://paste.opendev.org/show/brRuhPssVLSi4UnF5hcN/ The etherpad move plan. 19:41:08 <clarkb> This is he plan I wrote down based on my local notes of testing the process. I put it in paste and not etherpad because therpad will be shutdown during this process to avoid data in the wrong location 19:41:36 <fungi> that's some serious foresight 19:41:37 <clarkb> Please review thta if you have time before tomorrow at 22:00 UTC it is relatively straightforward but extra eyeballs making ure i Haven't done something silly are appreciated 19:42:02 <clarkb> ianw: has also made progress on replacing nameservers 19:42:14 <clarkb> #link https://etherpad.opendev.org/p/2023-opendev-dns 19:42:31 <ianw> clarkb: plan lgtm. you could also use "zcat dump.gz | ..." :) 19:42:58 <fungi> yeah, i'm good with what you have there 19:43:01 <clarkb> This etherpad has links to changes. I've reviewed most of those changes and left questions on a couple of them. One of which also failed testing for a valid reason (I left a note indicating what i think is the fix) 19:43:03 <clarkb> thanks! 19:43:16 <ianw> yep thanks for going through that. and good catch on updating the other zones too 19:43:32 <clarkb> ianw: I also left notes on the etherpad about a few things I noticed were missing. We have 3 other zone files to update and reverse dns for the vexxhost nameserver would ideally be set 19:43:45 <ianw> i'm actually thinking maybe we template in the nameservers, but i'll think about that 19:43:50 <clarkb> ianw: I approved the change to update le testing to jammy as well not sure if that merged or not 19:43:55 <fungi> i'll probably be mia between 18:30 and 20:30 or so, but will definitely be back by 22:00 for the maintenance 19:44:44 <clarkb> Next week I'll probably start looking at jitsi meet or mirror nodes. Say something if you want to help and have a preference for what is left to do 19:45:27 <clarkb> #topic AFS volume utilization 19:45:48 <clarkb> we have crept up to 92.2% from 91.7% since last week 19:46:41 <clarkb> if that growth rate holds we'll have about 15 weeks before there is a problem. Which is about 3 ish months? 19:46:47 <ianw> i still haven't got back to wheel clearouts or f36 (sigh, now f38 is out anyway) 19:47:13 <clarkb> ack I think we have time. But we should probably look into those tasks sooner than later to see where we end up disk wise and make our next decisions from there 19:48:14 <clarkb> #topic Gitea 1.19 19:48:30 <clarkb> I've got a change up to upgrade to gitea 1.19.1 19:48:35 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/877541 Upgrade opendev.org to 1.19.1 19:48:48 <clarkb> however, ianw rightly pointed out that the changes to api interaction are weird and likely unnecessary 19:49:18 <ianw> yeah i just noted a comment on that about removing the auth for more endpoints 19:49:28 <clarkb> digging into this ianw filed two bugs against gitea around previously anonymous apis requiring auth now and improper headers for responses indicating auth is required 19:49:38 <ianw> #link https://github.com/go-gitea/gitea/issues/24159#issuecomment-1513620323 19:49:45 <ianw> oh, and also a pull request now 19:49:50 <clarkb> ianw: yes it looks like the change to add scoped tokens to their api system made far reaching changes to the api that are problematic 19:50:57 <clarkb> thank you for looking into that I was just focused on making it work I didn't even consider they might have problems (in particular we were passing auth creds to the request so it wasn't clear to me that we went from public to private) 19:51:21 <clarkb> Anyway I think we can wait for 1.19.2 to fix this to avoid any potential breakage for users anonymousl talking to our gitea api 19:51:34 <clarkb> (we could scan the request logs for evidence of this if we were in a hurry) 19:51:50 <clarkb> but 1.19.2 should be out soon enough I hope and we don't currently hvae an urgent need to upgrade. 19:52:36 <clarkb> Reivews on that chagne would be helpful though as I expect minimal deltas between now and 1.19.2 when available (just cleanup our api requests to reflect they don't need auth anymore) 19:52:37 <ianw> yeah i agree that's unlikely -- nobody complained yet. if we want to just go with it that's fine, but i think we should revert the user/pass/auth force when we can so we note any further regressions when it's fixed 19:53:25 <clarkb> #topic Storyboard 19:53:46 <clarkb> have we seen any more requests to mark things RO? 19:54:04 <clarkb> Mostly curious if the moves by some projects have been showing up on our radar yet 19:55:16 <clarkb> I'll take that as a no :) 19:55:19 <clarkb> #topic Open Discussion 19:55:21 <clarkb> Anything else? 19:55:33 <fungi> i have a handful of project moves off storyboard i need to clean up behind 19:55:48 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/880570?usp=dashboard 19:56:02 <fungi> see recent open/merged changes for gerrit/projects.yaml for a list of relevant exodus 19:56:05 <ianw> and a follow-on are a quick one to update links on the main page 19:56:17 <ianw> also i agree that usp= thing is very annoying 19:57:29 <clarkb> I noticed google uses usp in google docs I think it was yesterday 19:57:43 <clarkb> so ya its basically something they appear to have added to gerrit for their purposes but without any open source usage of it 19:59:03 <clarkb> they did at least ocnfirm that it is unused in gerrit 3.7. You would need to write a plugin or something like that to consume the info 19:59:30 <corvus> they interested in disabling it, or are they like "just ignore it" 19:59:47 <clarkb> corvus: they say it has no effect in open source gerrit and you should ignore it 19:59:49 <clarkb> to tl;dr 20:00:11 <ianw> except everyone knows how i copied the review link 20:00:29 <ianw> which, i admit, i don't really care about, but, why do you need to know 20:00:57 <clarkb> right it betrays info of the context where you copied links (email, dashboards, related changes, etc) 20:01:16 <clarkb> we are at time. Thank you everyone! we'll be back next week same time and location 20:01:36 <clarkb> feel free to pick up or continue conversation in #opendev or the mailing list if we want to continue to discuss any of these items 20:01:38 <clarkb> #endmeeting