Tuesday, 2023-04-04

clarkb	meeting time	19:00
fungi	ahoy!	19:01
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue Apr 4 19:01:27 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
clarkb	#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/T2HXV6JPAKXGQREUBECMYYVGEQC2PYNY/ Our Agenda	19:01
clarkb	#topic Announcements	19:02
clarkb	I have no announcements other than PTG is over and openstack release has happened. We should be clear to make changes regularly	19:02
clarkb	#topic Topics	19:03
clarkb	#topic Docker Hub team shutdown	19:03
clarkb	Good news everyone! They have reverted the team shutdown and will no longer be making this change	19:03
clarkb	That means our deadline 10 days from now is no longer present, but I think we are all in agreement we should move anyway	19:03
clarkb	#link https://review.opendev.org/q/topic:tag-deletion Changes to handle publication to registries generically.	19:04
corvus	they just took longer than the 24h we allocated for them to decide that ...	19:04
clarkb	ya took them about a week	19:04
clarkb	This stack of changes from ianw is the current set of work around being abel to move to our generic container roles whihc we would point at quay	19:04
clarkb	I need to rereview them but have had meetings all morning. I'll try to get to that soon	19:05
fungi	it's not surprising how quickly people will abandon a platform when it's already burned most of their good will	19:05
ianw	i think this gives us ultimate flexibility in the roles, which is pretty cool. we can use a promote pipeline like we have now with tags; a little bit of work and we'll have the ability to upload from intermediate registry	19:06
ianw	both have different trade-offs which are documented in the changes	19:06
clarkb	yup and i Think zuul users may choose one or the other depending on their specific needs	19:06
clarkb	for opendev I'd like us to try the intermediate registry approach first since that relies on no registry specific features	19:06
clarkb	(though we'd have the weird creation step either way for new images in quay)	19:07
clarkb	anyway reviews on that stack are the next step and figuring out the intermediate registry promotion process the step after that	19:07
clarkb	Anything else to call out re docker hub?	19:07
ianw	yep i personally would like to get that merged and try it with zuul-client, and then work on the promote from itnermediate registry	19:08
corvus	i plan on reviewing that today; fyi i'll be afk wed-fri this week.	19:08
clarkb	sound good I should be able to rereview today as well	19:08
ianw	we can probably do similar and switch zuul-client to that as well, use it as a test. because it pushes to real repos it is a bit hard to test 100% outside that	19:08
corvus	i'm happy to use zuul-client as a test for both paths and can help with that	19:09
clarkb	#topic Bastion Host Updates	19:10
clarkb	I don't think there is really anything new here at this point? There was the launch env and rax rdns stuff but I've got that listed under booting new servers later	19:10
ianw	no; i'm not sure the backup roles have really had full review, so i haven't done anything there	19:11
clarkb	ack we've been plenty busy with other items	19:11
clarkb	#topic Mailman 3	19:11
fungi	with end of quarter, openstack release and ptg finally in the rear view mirror i'm hoping to be able to resume work on this, but travelling for vacation all next week will likely delay things a little longer	19:11
fungi	no new updates though	19:11
clarkb	thanks	19:11
fungi	we did have a brief related item	19:11
fungi	i raised a question on the mm3-users ml	19:12
fungi	#link https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/CYSMH4H2VC3P5JIOFJIPRJ32QKQNITJS/	19:12
fungi	quick summary, there's a hyperkitty bug which currently prevents list owners from deleting posts	19:12
fungi	if you log into hyperkitty with the admin creds from our private ansible hostvars, you can do it	19:13
fungi	it's not intuitive, you basically need to go from the whole thread view to the single-message view before "delete this message" shows up as a button	19:14
clarkb	but possible at least	19:14
fungi	also the reply included a recommended alternative moderation workflow which may be worth consideration (basically moderate all users and then set them individually to unmoderated the first time they post something that isn't spam)	19:14
clarkb	This is what mailman's lists use themselves if I remember what it was like going through the initial setup there	19:15
fungi	it's similar to how we "patrol" edits on the wiki from new users	19:15
fungi	well, i say "we" but i think it's probably just me patrolling the wiki these days	19:15
clarkb	I think we worry aout that if the problem becomes more widespread	19:16
clarkb	its been one issue in a few months?	19:16
fungi	yes, with mm3 having the option to post from the webui it seems likely that we may run into it more	19:16
fungi	but i agree it's not critical for the moment	19:16
fungi	also we could bulk-set all current subscribers to unmoderated on a list of we switched moderation workflow	19:17
clarkb	oh that is good to know	19:17
fungi	or i think they probably already will be, since the setting is that new subscribers are initially moderated	19:17
fungi	anyway, just wanted to catch folks up on that development, i didn't have anything more on the topic	19:18
clarkb	#topic Gerrit 3.7 Upgrade and Project Renames	19:18
clarkb	#link https://etherpad.opendev.org/p/gerrit-upgrade-3.7	19:18
clarkb	We plan to do a gerrit upgrade and project renames on April 6 starting at 22:00 UTC	19:18
clarkb	There were a few items related ot this I wanted to cover in the meeting to make sure we're ahppy with proceeding	19:18
clarkb	First up: Do we have any strong opinions on renaming first or upgrading first? I think if we upgrade first the general process flow is easier with zuul and reindexing	19:19
clarkb	ianw: I think your etherpad implies doing the upgrade first as well?	19:19
clarkb	We have not done any project names under gerrit 3.6 yet so I don't think we need to prefer doing renames first for this reason	19:20
ianw	that was my thinking, seems like we can put everything in emergency and make sure manage-project isn't running	19:20
fungi	also it's easier to postpone the renames if things go sideways with the upgrade	19:20
clarkb	ianw: ok keep in mind that the rename playbook requires things not be in emergency	19:21
clarkb	but I think we can do the upgrade first, land the chnge to reflect that in configs, and have nothing in emergency then proceed with renames and be happy	19:21
ianw	hrm, i'm not sure the current procedure refelects that	19:22
clarkb	seems like upgrade then renames is the order of operations we are happy with.	19:22
ianw	https://docs.opendev.org/opendev/system-config/latest/gerrit.html#renaming-a-project	19:22
clarkb	ianw: I think it does. Step 19 on line 156 removes things from emergency before renaming	19:22
clarkb	ianw: oh! we must not use !disabled in the rename playbook	19:23
ianw	yeah, that's what i was thinking, but i didn't check	19:23
clarkb	yup thats the case so what I said above is not correct	19:23
clarkb	so I think we need to edit step 19 to keep things in emergency until we're completely done?	19:23
ianw	so there's step "18.5" before 19 which is "step to rename procedure"	19:24
ianw	so the idea was to do it with everything in emergency	19:24
clarkb	sounds good	19:24
ianw	https://opendev.org/opendev/system-config/src/branch/master/playbooks/rename_repos.yaml	19:25
ianw	does't look at disabled, so i think the assumption is correct	19:25
clarkb	Next up was comfort levels with landing three different rename changes after we are done and our in order processing of those changes. This has the potential to recreate projects under their old names in gerrit and gitea	19:26
clarkb	On the Gerrit side of things fungi came to the realization that our jeepyb cache file should prevent this from happening. I'm still not sure what prevents it on the gitea side	19:26
clarkb	possibly because the redirects exist for those names gitea would fail to create the names?	19:26
fungi	but also it's a warning to us not to invalidate jeepyb's cache during a rename ;)	19:27
clarkb	I'm starting to lean towards squashing the rename changes together for simplicity though then we don't have to worry about it	19:27
clarkb	but I wanted to see if we have a preference. I think if we stick to separate chagnes we would need to rundown what the potential gitea behavior is and double check the jeepyb cache file has appropriate data in it for all involved projects	19:28
ianw	are we talking about the force-merge of the project-config changes?	19:28
clarkb	ianw: yes	19:28
clarkb	the other option is to leave everything in emergency until after those jobs completely finish then let hourly/daily jobs ensure we're in sync after landing the seprate changes.	19:28
ianw	step 7 of https://docs.opendev.org/opendev/system-config/latest/gerrit.html#renaming-a-project ?	19:29
clarkb	So three options: 1) land as three changes and let things run normally if we understand why old projects names won't be recreated 2) land as three changes but keep hosts in emergency preventing the jobs for first two changes from running 3) squash into single change and then concerns go away	19:29
clarkb	ianw: yes step 7	19:30
ianw	is it three changes or two changes?	19:30
clarkb	ianw: basically I think we've come to a bit of a realization that if we land more than one project-config rename change that we've gotten lucky in the past that we haven't accidentally recreated projects	19:30
clarkb	ianw: its three now as of a coupel of hours ago	19:30
ianw	i've accounted for virtualpdu and xstatic-angular-something	19:30
clarkb	ovn-bgp-agent is the latest	19:31
ianw	oh right, ok, need to update for that	19:31
fungi	yeah, they just added it today	19:31
fungi	i tried to iterate with them on it quickly to give us time to work it in	19:31
fungi	we likely need to decide on a cut-off for further additions	19:32
fungi	which could be as soon as now, i suppose	19:32
ianw	(that's fine, was just a bit confused :)	19:32
clarkb	I think that maybe in the gitea case our checks for creating new projects may see the redirect and/or trying to create a project where a redirect exists is an error. But I don't know for sure and feel like I'm running out of time to check that. Maybe we should provisinally plan to do 2) or 3) and only do 1) if we manage to run down gitea and jeepyb cache state?	19:32
ianw	i was working under the assumption of 2) ... basically force merge everything, and manage-projects only runs after all 3 are committed	19:33
fungi	i'm good with any of those options	19:33
clarkb	ianw: ok that would be option 2)	19:33
ianw	but i guess the point is zuul is starting manage-projects and we're relying on it not working as things are in emergency right?	19:33
clarkb	ianw: by default if we take things out of the emergency file manage-projects will run for each of them in order	19:33
clarkb	yes exactly	19:34
clarkb	but if we leave them all in emergency zuul can run the jobs and they will noop because they can't talk to the hosts.	19:34
clarkb	I'm happy with that particularly if you were already planning to go with that	19:34
ianw	yeah, i cargo-culted the checklist from the system-config docs, but it makes a bit more sense to me now. i can update the checklist to be a bit more explicit	19:35
clarkb	sounds good	19:35
ianw	and i'll double check the manage-projects playbook to make sure things won't run	19:35
clarkb	next up is calling out there is a third rename request. I should've brought that up first :)	19:35
fungi	i suppose we could hack around it by exiting from emergency mode between the penultimate deploy failing and submitting the ultimate rename change	19:36
ianw	heh :) i'll also add that in today	19:36
clarkb	I think we've covered that and we can ensure all our notes and records change is updated	19:36
clarkb	oh since we've decided to keep the changes separate we may want to rebase them in order to address any merge conflicts	19:36
ianw	fungi: that could work	19:36
ianw	clarkb: i can do that and make sure they stack	19:37
clarkb	thanks!	19:37
clarkb	And last questions was whether or not the revert path for 3.7 -> 3.6 has been tested	19:37
clarkb	I think you tested this recently based on the plugins checking you did yesterday?	19:37
ianw	yep, https://23.253.56.187/ is a reverted host	19:37
ianw	basically some manual git fiddling to revert the index metadata, and then init and reindex everything	19:38
ianw	as noted, that puts into logs some initially worrying things that makes you think you've got the wrong plugins	19:38
ianw	but we decided that what's really happening is that if you've got multiple tags in a plugin, bazel must be choosing the highest one	19:39
ianw	to stamp in as a version	19:39
clarkb	great I think from a process and planning perspective this is coming together. Really jus need to udpate our notes and records change	19:39
clarkb	and get the chagnes rebased so they can merge cleanly	19:39
ianw	++ will do	19:39
clarkb	Oh I will note I did some due diligence around the xstatic rename to ensure we aren't hijacking a project like moin did with a different xstatic package to us	19:40
ianw	i'll add a note on the rebase too, so that when we copy the checklist for next time we have that	19:40
clarkb	and there was a discussin on a github issue with them that basically boiled down to there are packages moin cares about and there are packages horizon cares about and splitting them to make that clear is happening and this is part of that	19:40
clarkb	all that to say I think we are good to rename the xstatic repo	19:40
clarkb	I'll try to take a look oever everything again tomorrow too	19:41
clarkb	and I think that may be it? THank you everyone for helping put this together. Gerrit stuff is always fun :)	19:41
ianw	this has been a big one!	19:41
clarkb	#topic Upgrading Old Servers	19:42
clarkb	#link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes	19:42
clarkb	ianw: the nameserver changes have landed for the most part	19:42
clarkb	#link https://etherpad.opendev.org/p/2023-opendev-dns	19:42
clarkb	I suspect docker/etc/etc have made this go slowly (I ran into similar issues)	19:43
clarkb	But is there anything to do on this next? maybe reconvene next week after gerrit is done?	19:43
ianw	yeah i haven't got back to that, sorry. i just need some clear air to think about it :)	19:43
clarkb	understood	19:43
ianw	but yeah, still very high on the todo list	19:43
clarkb	Yesterday I picked up my todo list here and launched a replacement static server and a replacement etherpad server	19:43
clarkb	#link https://review.opendev.org/q/topic:add-static02 static and etherpad replacements	19:43
clarkb	reviews on these changes are very much appreciated. I think everything should be good to go except for reverse dns records for these two servers. Neither does email so that isn't critical and fixing that is in progress	19:44
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/879388 Fix rax rdns	19:44
clarkb	The updated scripting to do rdns automatically was tied into launch node and this change fixes that. I think once this chagne lands we can run the rdns command for the new servers to have it update the records. Worst case I can do it through the web ui	19:45
clarkb	I also discovered in this process that the launch env openstack client could not list rax volumes (though it could attach them)	19:45
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/879387 Reinstall launch env periodically	19:45
clarkb	this change should address that as I think we decided that we didn't reinstall the env after we fixed the dep versions for listing volumes in rax?	19:46
clarkb	fungi: ianw: ^ I'm still behind on that one if you have anything to add	19:46
ianw	i think that's it; fungi did confirm a fresh venv seemed to work	19:46
ianw	the other thing we can do is just rm it and start it again	19:46
fungi	i think we can just move /user/launcher-venv aside and let deploy recreate it in place as a test	19:46
fungi	or just merge the change	19:46
clarkb	I need to review the change before I decide onthat. But I suspect just merging the change is fine since we don't rely on this launch env for regularly processing	19:47
ianw	i'm happy to monitor it and make sure it's doing what i think it's going to do :)	19:47
clarkb	And ya reviews on the new static server in particular would be good. I stacked things due to the dns updates and etherpad is more involved	19:48
clarkb	#topic AFS volume quotas	19:48
ianw	lgtm and i'll review the etherpad group stuff after a tea :)	19:48
clarkb	thanks!	19:48
clarkb	I don't have much new here other than to point out that the utilization is slowly climbing up over time :)	19:49
clarkb	some of that may be all of the openstack release artifacts too so not just mirror volumes	19:49
fungi	but seems like we might free up some space with the wheel cache cleanup too	19:49
clarkb	But lets keep an eye on it so that we can intervene before it becomes and emergency	19:49
clarkb	++	19:49
clarkb	#topic Gitea 1.19	19:50
clarkb	moving along now as we are running out of time	19:50
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/877541 Upgrade opendev.org to 1.19.0	19:50
clarkb	There is no 1.19.1 yet last I checked and in the past we have waited for the next bugfix release to land before upgrading gitea. I'm not in a hurry for this reason. But I think if we review the change now it is unlikely there will be a bit delta when .1 arrives. Also if we like we can update to 1.19.0 before .1 releases	19:50
ianw	maybe first thing after gerrit upgrade? just in case?	19:51
clarkb	The major update to this release is gitea actions which is experimental and we disable	19:51
clarkb	++	19:51
clarkb	#topic Quo vadis Storyboard	19:52
clarkb	As prediced a number of projects decided to move off of storyboard at the PTG. We may be asked to mark the projects read only in storyboard once they have moved	19:52
clarkb	It does feel like the individualprojects could do a bit more to work together to create a more consistent process but I don't think we can force them to do that or mandate anything. Just continue to enourage them ot talk to one another I guess	19:53
fungi	i have a stack of projects i need to switch to inactive and update descriptions on now	19:53
fungi	just haven't gotten to that yet	19:53
clarkb	#topic Open Discussion	19:54
clarkb	I actually meant to put the wheel mirror stuff on the agenda then spaced it.	19:54
clarkb	But I think we can talk about that now if there was anything important to bring up that isn't already on the mailing list	19:54
ianw	yeah, i'm still trying to come up with a concrete plan	19:54
fungi	your preliminary exploration seems promising though	19:55
ianw	but i think the first thing we should probably action is double-checking the audit tool is right, and cleaning up	19:55
fungi	i need to find time to reply on the ml thread	19:55
clarkb	at the very lest I think the cleanup can probably be run before any long term plan is committed to	19:55
clarkb	ianw: ++	19:55
ianw	#link https://review.opendev.org/c/opendev/system-config/+/879239	19:55
ianw	that has some links to the output, etc.	19:55
fungi	yeah, i feel like we know what's safe to delete now, even if we haven't settled on what's safe to stop building yet	19:55
fungi	also, this is one of those times where vos backup may come in handy	19:56
ianw	right, we can make sure we're not putting back anything we don't want (i.e. the prune is working)	19:56
frickler	regarding afs, is there a way we could increase capacity further by adding more volumes or servers? or are we facing a hard limit there? just to know possible options	19:56
ianw	and evaluate what clarkb was talking about in that we don't need to carry .whl's that don't build against libraries really	19:56
clarkb	frickler: yes we can add up to 14 cinder volumes of 1TB each to each server	19:57
fungi	for some stuff it's pretty straightforward. if we decide to delete all pure python wheels on the premise that they're trivially rebuildable from sdist in jobs, we might want to vos backup so we can quickly switch the volume state back if our assumptions turn out to be incorrect	19:57
clarkb	frickler: then we add them to the vicepa pool or add a vicepb or something. I would have to look at the existing server setup to see how the existing 3TB is organized	19:57
corvus	and no known limit to the number of servers	19:57
ianw	it's all in vicepa	19:57
fungi	yeah, we discussed previously that adding more servers probably helps distribute the points of failure a bit, vs the situation we ended up in with the static file server that had 14 attached volumes and fell over if you sneezed	19:58
ianw	i think we can definitely grow if we need to, but "more storage == more problems" :)	19:58
ianw	heh, yeah	19:58
clarkb	yup and we do have three servers only one of which is under the massive pressure. We might want to double check if we can drop content from that server or rebalance somhow	19:58
clarkb	(though I suspect with only three servers the number of organizations is small)	19:59
clarkb	we definitely have options though which is a good thing	19:59
fungi	i would want to add a server if we're rebalancing, because we probably need to be able to continue functioning when one dies	19:59
clarkb	and we are officially at time	20:00
clarkb	Thank you everyone!	20:00
ianw	anyway, i'm happy to run the wheel cleanup, but yeah, after someone else has looked at it :)	20:00
clarkb	#endmeeting	20:00
opendevmeet	Meeting ended Tue Apr 4 20:00:29 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	20:00
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-04-19.01.html	20:00
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-04-19.01.txt	20:00
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-04-19.01.log.html	20:00
fungi	thanks!	20:00
* clarkb hunts down lunch. I'll dig into the various reviews afterwards		20:01
clarkb	actually let me do the container one now since it should go quickly	20:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!