Tuesday, 2023-11-14

clarkb	almost meeting time	18:59
clarkb	#startmeeting infra	19:00
opendevmeet	Meeting started Tue Nov 14 19:00:26 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:00
opendevmeet	The meeting name has been set to 'infra'	19:00
clarkb	#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/NIDXZX7JT4MQJOUS7GKI5PPRMDIIY6FI/ Our Agenda	19:00
clarkb	The agenda went out late because I wasn't around yesterday, but we do have an agenda	19:00
clarkb	#topic Announcements	19:01
clarkb	Next week is a big US holiday. That said, I expect to be around for the beginning of the week and plan to host our weekly meeting Tuesday	19:01
clarkb	But be aware that by Thursday I expect it to be very quiet	19:01
clarkb	#topic Mailman 3	19:02
clarkb	fungi: I think you dug up some more info on the template file parse error? And basically mailman3 is missing some file that they need to add after django removed it from their library?	19:02
fungi	the bug we talked about yesterday turns out to be legitimate, yes	19:03
fungi	er, last week i mean	19:03
tonyb	time flies	19:03
clarkb	to confirm we are running all of the versions of the softwrare we expect, but a new bug has surfaced and we aren't seeing an old bug due to accidental use of old libraries	19:03
fungi	yeah, and this error really just means django isn't pre-compressing some html templates, so they're a little bigger on the wire to users	19:04
clarkb	in that case I guess we're probably going to ignore this until the next mm3 upgrade?	19:05
fungi	#link https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/36U5NY725FNJSGRNELFOJLLEZQIS2L3Y/ mailman-web compress - Invalid template socialaccount/login_cancelled.html	19:05
fungi	yeah, it seems safe to just ignore and then we can plan to do a mid-release update when it gets fixed if we want, or wait until the next release	19:05
clarkb	should we drop this agenda item from next weeks meeting then?	19:06
clarkb	I believe this was the last open item for mm3	19:06
fungi	i think so, yes. we can add upgrades to the agenda as needed in the future	19:06
clarkb	sounds good. Thanks again for workign through all of this for us	19:06
clarkb	#topic Server Upgrades	19:06
fungi	thanks for your patience and help!	19:06
clarkb	we added tonyb to the root list last week and promtly put him to work booting new servers :)	19:07
tonyb	\o/	19:07
clarkb	mirror01.ord.rax is being replaced wtih a new mirror02.ord.rax server courtesy of tonyb	19:07
clarkb	#link https://review.opendev.org/c/opendev/zone-opendev.org/+/900922	19:07
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/900923	19:07
clarkb	These changes should get the server all deployed, then we can confirm it is happy before udpating DNS to slip over the mirror.ord.rax CNAMEs	19:08
clarkb	I think the plan is to work through this one first and then start doing others	19:08
tonyb	After a good session booting mirror02 I managed to clip some for the longer strings and so the reviews took me longer to publish	19:08
clarkb	tonyb: I did run two variations of ssh-keyscan in order to dobule check the data	19:08
tonyb	clarkb: Thanks	19:08
clarkb	I think it is correct and noted taht in my reviews when I noticed the note about the copy paste problems	19:08
clarkb	feel free to continue asking questions and poking for reviews. This is really helpful	19:09
tonyb	I started writing a "standalone" tool for handling the volume setup as the mirrors nodes are a little different	19:09
tonyb	Yup I certainly will do.	19:10
clarkb	tonyb: ++ to having the mirror volumes a bit more automated	19:10
fungi	agreed, we have enough following that pattern that it could be worthwhile	19:11
fungi	note that not all mirror servers get that treatment though, some have sufficiently large rootfs we just leave it as-is and don't create additional volumes	19:11
tonyb	I think thats about that for the mirror nodes. It's mostly carfully follwoing the bouncing ball at this stage	19:11
clarkb	cool. I'm happy to do another runtrhough too if we like. I feel like that was helpful for everyone as it made probelms with cinder volume creation apparent and so on	19:12
tonyb	fungi: Yup. and as we can't always predict the device name in the guest it wont be fully automated ot intgrated it's just to document/simlify the creation work we did on the meetpad	19:13
fungi	i too am happy to do another onboarding call, maybe for other activities	19:13
* tonyb too.		19:13
clarkb	anything else on this topic?	19:14
tonyb	not from me	19:14
clarkb	#topic Python Container Updates	19:14
clarkb	Unfortunately I haven't really had time to look at the failures here in more detail. I saw tonyb asking question about them though, were you looking?	19:15
clarkb	#link https://review.opendev.org/c/zuul/zuul-operator/+/881245 Is the zuul-operator canary change	19:15
clarkb	specifically we need that change to begin passing in zuul-operator before we can land the updates for the docker image in that repo	19:15
tonyb	I am looking at it	19:16
tonyb	I spoke to dpawlik about status and background	19:16
corvus	i suspect something has bitrotted with cert-manager; but with the switch in k8s setup away from docker, we don't have the right logs collected to see it, so that's probably the first task	19:16
tonyb	No substantial progress but I'm finding my feet there	19:17
corvus	(in other words, the old k8s setup got us all container logs via docker, but the new setup needs to get them from k8s and explicitly fetch from all namespaces)	19:17
clarkb	gotcha	19:17
clarkb	because we are no longer using docker under k8s	19:17
corvus	yep	19:18
clarkb	I agree, addressing log collection seems like a good next step	19:18
tonyb	Okay that's good to know.	19:18
clarkb	#topic Gitea 1.21	19:19
clarkb	1.21.0 has been released	19:20
clarkb	#link https://github.com/go-gitea/gitea/blob/v1.21.0/CHANGELOG.md we have a changelog	19:20
fungi	(and there was much rejoicing)	19:20
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/897679 Upgrade change needs updating now that we have changelog info	19:20
clarkb	so ya the next step here is to go over the changelog and make sure our change is modified properly to handle their breaking changes	19:20
clarkb	I haven't even looked at the changelog yet	19:21
clarkb	but doing so and modifying that change is on my todo	19:21
clarkb	*todo list	19:21
clarkb	In the past we've often not upgraded until the .1 release anyway due to them very quickly releasing bugfixes	19:21
fungi	nobody ever wants to go first	19:22
clarkb	between that and the gerrit upgrade and then thanksgiving I'm not sure this is urgent, but also dont' want it to get forgotten	19:22
fungi	i agree that the next two weeks are probably not a great time to merge it, but i'll review at least	19:22
clarkb	sounds good. Should have something to look at in the next day or so	19:23
frickler	I'm wondering about the key length thing, how much effort would it be to use longer keys?	19:23
tonyb	FWIW I'll review it to and, probably, ask "why do we $x" questions ;P	19:23
clarkb	frickler: we need to generate a ne key, add it to the gerrit user in gitea (this step may be manual currently I think we only automate this at user creation time) and then add the key to gerrit and restart gerrit to pick it up	19:24
clarkb	frickler: I suspect taht if we switch to ed25519 then we can have it sit next to the existing rsa key in gerrit and we don't have to coordinate any moves	19:24
clarkb	if we replace shorter rsa key with logner rsa key then we'd need a bit more coordination	19:24
fungi	well, we could have multiple rsa keys too, right?	19:25
clarkb	fungi: I don't think gerrit will find multiple rsa keys	19:25
clarkb	but I'm not sure of that. We can test that on a held node I guess	19:25
fungi	oh, right, local filename collision	19:25
fungi	we can do two different keytypes because they use separate filenames	19:26
clarkb	yup	19:26
clarkb	I can look into that more closely as I page the gitea upgrade stuff abck in	19:26
fungi	i was thinking in the webui, not gerrit as a client	19:26
fungi	so yeah, i agree parallel keys probably makes the transition smoother than having to swap one out in a single step	19:27
clarkb	speaking of Gerrit:	19:27
clarkb	#topic Gerrit 3.8 Upgrade	19:27
fungi	though i guess if we add the old and new keys to gitea first, then we could swap rsa for rsa on the gerrit side	19:27
fungi	but might need a restart	19:27
clarkb	it will need a restart of gerrit in all cases iirc	19:27
clarkb	because it reads the keys on startup	19:27
clarkb	For the Gerrit upgrade I'm planning on going through the etherpad again tomorrow	19:28
clarkb	#link https://etherpad.opendev.org/p/gerrit-upgrade-3.8	19:28
clarkb	I want to make sure I understand the screen logging magic a bit better	19:28
clarkb	but also would appreciate reviews of that plan if you haven't read it yet	19:28
fungi	also for the sake of the minutes...	19:28
fungi	#link https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/XT26HFG2FOZL3UHZVLXCCANDZ3TJZM7Q/ Upgrading review.opendev.org to Gerrit 3.8 on November 17, 2023	19:28
fungi	i figured you were going to include that in the announcements at the beginning	19:28
clarkb	as far as coordination goes on the day of I expect I can drive things, but maybe fungi you can do some of the earlier stuff like adding hosts to emergency files and sending #status notice notices	19:29
clarkb	I'll let you know if my expectations for that change, but I don't expect them to	19:29
fungi	happy to. i think i'm driving christine to an eye appointment, but can do basic stuff from my phone in the car	19:30
fungi	(also the appointment is about 2 minutes from the house)	19:31
clarkb	seems like we are in good shape. And I'll triple check myself before Friday anyway	19:31
tonyb	I can potentially do some of the "non-destructive" early work	19:31
clarkb	tonyb: oh! we should add you to the statusbot acls	19:31
fungi	we should add tonyb to statusbot	19:31
clarkb	and generally irc acls	19:31
tonyb	but that may make more work than doing it	19:31
fungi	hah, jinx!	19:31
tonyb	hehe	19:32
fungi	tonyb: it's work that needs doing sometime anyway	19:32
tonyb	so how owes who a soda?	19:32
fungi	i can take care of it	19:32
tonyb	kk	19:32
fungi	i owe everyone soda anyway	19:32
tonyb	LOL	19:32
clarkb	openstack/project-config/accessbot/channels.yaml is one file that needs editing	19:32
fungi	still repaying from my ptl days	19:32
tonyb	I can do that.	19:33
clarkb	I'm not acutally sure where statusbot gets its user list. Does it just check for opers in the channel it is in?	19:33
fungi	i'll look into it	19:33
corvus	i think it's a config file	19:34
clarkb	nope its statusbot_auth_nicks in system-config/inventory/service/group_vars/eavesdrop.yaml	19:34
clarkb	tonyb: ^ so that file too	19:34
fungi	thanks, i was almost there	19:34
tonyb	gotcha	19:34
clarkb	anything else Gerrit upgrade related?	19:34
fungi	i'm getting slow this afternoon, must be time to start on dinner	19:35
clarkb	its basically lunch time. I'm starving	19:35
tonyb	Coffee o'clock and then a run. ... and then lunch	19:35
clarkb	alright next up	19:36
clarkb	#topic Ironic Bug Dashboard	19:36
clarkb	#link https://github.com/dtantsur/ironic-bug-dashboard	19:36
clarkb	The ironic team is asking if we woudl be willing to run an instance of their bug dashboard tool for them	19:36
fungi	JayF: you were going to speak to this one?	19:36
JayF	So some context; this is an old bug dashboard. No auth needed. Simplest python app ever.	19:36
fungi	otherwise i can	19:36
JayF	We've run it in various places we've just done custom-ly, before doing that again with our move to LP, we thought we'd ask about getting it a real home.	19:37
JayF	No depedencies. Literally just needs a place to run, and I think dtantsur wrote a dockerfile for it the other day, too	19:37
clarkb	My major concern is that running a service for a single project feels very inefficient from our side. If someone wanted to resurrect the openstack bug dashboard instead I feel like that might be a little different?	19:37
fungi	so options are for adding it to opendev officially (deployment via ansible/container image building and testinfra tests), or us booting a vm for them to manage themselves	19:38
tonyb	The docs show using podman etc so yeah I think that's been done	19:38
clarkb	additionally teams like tripleo have had one tool and ironic has one apparently and so on. I think it is inefficient for the project teams too	19:38
fungi	for historical reference, "the openstack bug dashboard" was called "bugday"	19:38
JayF	clarkb: I talked to dtantsur; we are extremely willing to take patches (and will email the list about this existing again once we get it a home) if other teams want t ouse it	19:38
frickler	JayF: so would you be willing to run this yourself if we give you an vm with an DNS record?	19:40
JayF	fungi: it's extremely likely if infra says no, and we host it out of band, we'd do something similar to the second option (just get a VM somewhere and run it manually)	19:40
JayF	frickler: replace instances of "you" and yourself" with ironic community as appropriate and the answer is "yes", with specific contacts being dtantsur and I to start	19:40
JayF	frickler: if you all had no answer for us, nonzero chance this ended up as a container in my homelab :)	19:41
frickler	that would be an easy start and we could see how it develops	19:41
clarkb	so basically the idea behind openstack infra and now opendev was that we'd avoid doing stuff like this and instead create a commons where projects could work together to address common problems	19:41
fungi	yeah, when this came up yesterday in #openstack-ironic i mentioned the current situation with the opensearch-backed log ingestion service dpawlik set up	19:42
clarkb	where we've struggled is when projects do things like this specific tool and blaze their own trail. This takes away potential commons resources as well as multiplies effort required	19:42
JayF	From an infra standpoint; I'm with you.	19:42
JayF	This is why it's an opportunistic ask with almost an expectation that "no" was a likely answer.	19:42
clarkb	I think that if we were to host it it would need to be a more generic tool for OpenDev users and not ironic specific. I have fewer concerns with handing over a VM	19:42
JayF	From a community standpoint; that was storyboard; we adopted it; it disappeared; we are trying to dig out from that mistake	19:42
frickler	iiuc the tool is open to be used by other projects, they just need to amend it accordingly	19:43
fungi	i do think we want to encourage people to collaborate on infrastructure that supports their projects when there is a will to do so	19:43
JayF	and I do not want to burn more time trying to go down alternate "work together" paths in pursuit of that goal	19:43
clarkb	JayF: the problem is that all the cases of not working together are why we have massive debt	19:43
clarkb	ironic is not the only project trying to deal with storyboard for example	19:44
JayF	clarkb: I have lots of examples of cases of us working together that also have massive debt; so I'm not sure I agree with all of the root causing, but I do understand what you're getting at and like I said, if the answer is no, it's no.	19:44
fungi	basically the risk is that the opendev sysadmins are the last people standing when whoever put some service together disappears and there are still users	19:44
clarkb	and despite my prodding very little collaboration between teams with the same problems has occured as an example	19:44
fungi	so we get to be the ones who tell users "sorry, nobody's keeping this running any more"	19:44
clarkb	the infra sig continues to field questions about how to set up LP	19:45
clarkb	stuff that should have ideally been far more coordinated among the groups moving	19:45
fungi	i mostly just remind folks that we don't run launchpad, and it has documentation	19:45
clarkb	and I can't shake the feeling that an ironic bug dashboard is just an extension of these problems and we'll end up being asked to run a different tool for nova and then a different one for sdks and so on	19:46
JayF	This is off topic for the meeting, but the coordination is always the most difficult part ime; which is why for Ironic's LP migration it finally started moving when I stopped trying so hard to pull half of openstack with me.	19:46
clarkb	when what we need as a group is rough agreement on what a tool should be and then run that. And as mentioend before this tool did exist	19:46
clarkb	but it too ran into disrepair and was no longer maintained and we shut it off	19:46
JayF	It sounds like consensus is no though; so for this topic you all can move on. I wouldn't want you all to host it unless everyone was onboard, anyway.	19:47
clarkb	I don't think we necessarily need to resurrect bugday the code base, but I think if opendev hosts something it should be bugday the spiritial successor tool and not an ironic specific tool	19:47
fungi	i think it can be "not yet" instead of just "no"?	19:47
fungi	also i'm not opposed to booting a vm for them to run it on themselves, while they work on building consensus across other teams to possible make it useful beyond ironic's use case	19:48
JayF	I just sent an email to the mailing list, last week, about how a cornerstone library to OpenStack is rotting and ~nobody noticed. I'm skeptical someone is going to take up the banner of uniting bug dashboards across openstack.	19:48
JayF	fungi: I do not commit to building such a consesnus. I commit to being open to accepting patches.	19:48
fungi	with the expectation that if opendev is going to officially take it on, then there will need to be more of a cross-project interest (and of course configuration management and tests)	19:49
clarkb	ya I'm far less concerned with booting a VM and adding a DNS record	19:49
JayF	fungi: not trying to be harsh; just trying to set a reasonable expectation to be clear :)	19:49
JayF	my plate is overflowing and I can't fit another ounce on it	19:49
fungi	sure. and we've all been there more than once, i can assure you ;)	19:49
fungi	JayF: so there are some options and stipulations you can take back to the ironic team for further discussion, i guess	19:50
JayF	If you want to give us a VM and a DNS name, that will work for us. If not, I'll go get equivalent from my downstream/personal resources and my next steps are the same either way	19:51
corvus	i'm not sure i'm a fan of the "boot a vm and hand it over" approach	19:51
corvus	if a vm is going to be handed over, i don't see why that's an opendev/infra team ask... i don't feel like we're here to hand out vms, we're here to help facilitate collaboration. anyone can propose a patch to run a service if the service fits the mission. so if it does fit the mission, that's how it should be run. and if it doesn't, then it shouldn't be an opendev conversation.	19:51
fungi	should we not have provided the vm for the log ingestion system that loosely replaced the old logstash system? mistake in your opinion, or failed experiment, or...?	19:53
corvus	i thought that ran on aws or something	19:53
clarkb	the opensearch cluster runs in aws, but there is a node that fetches logs and sends them to opensearch that dpawlik is managing	19:54
fungi	the backend does, but the custom log ingestion glue to zuul's interface is on a vm we booted for the systadmins	19:54
fungi	er, s/systadmins/admins of that service/	19:54
corvus	i was unaware of that, and yeah, i think that's the wrong approach. for one, the fact that i'm a root member unaware of it and it's not documented in https://docs.opendev.org/opendev/system-config/latest/ seems like a red flag. :)	19:54
corvus	that seems like something that fits the mission and should be run in the usual manner to me	19:56
clarkb	ya better documentation of the exceptional node(s) is a good idea	19:56
fungi	and possibly also deciding as a group that exceptions are a bad idea	19:56
corvus	i think the wiki is an instructive example here too	19:56
JayF	One thing I'll note that is a key difference about the service I proposed (and I suspect that logstash service) is their stateless nature.	19:57
fungi	the main takeaway we had from the wiki is that we made it clear we would not take responsibility for the services running the log search service	19:57
JayF	It doesn't address the basic philosophical questions; but it does draw a different picture than something like the wiki does.	19:57
fungi	and that if the people maintaining it go away, we'll just turn it off with no notice	19:57
corvus	yeah, in both new cases running them is operationally dead simple	19:58
clarkb	(side note I think the original plan was to run the ingestion on the cluster itself but then realized that you can't really do that with the openserach as a service)	19:59
corvus	i must have gotten the first version of the memo and not the update	19:59
clarkb	because they delete and replace servers or somethign for upgrades. Its basically an appliance	20:00
clarkb	we are at time.	20:00
clarkb	#topic Upgrade Server Pruning	20:00
clarkb	#undo	20:00
opendevmeet	Removing item from minutes: #topic Upgrade Server Pruning	20:00
clarkb	#topic Backup Server Backup Pruning	20:00
clarkb	really quickly before we end I wanted to note that the rax backup server needs its backups pruned due to disk utilization	20:01
clarkb	Maybe that is somethign tonyb wants to do with anothe root (ianw set it up and documented and scripted it well so its mostly a matter of going through the motions)	20:01
tonyb	Yup happy to.	20:01
clarkb	#topic Open Discussion	20:01
fungi	i'm also happy to help tonyb if there are questions about backup pruning	20:02
clarkb	We don't really have time for this but feel free to take discussion to #opendev or service-discuss@lists.opendev.org to bring up extra stuff and/or keep talking about the boot a VM and hand it over stuff	20:02
tonyb	fungi: thanks.	20:02
clarkb	and happy 1700000000 day	20:02
fungi	woo!	20:02
clarkb	I thinkwe are about 2 hours away?	20:02
clarkb	something like that	20:02
clarkb	thank you everyoen for your time!	20:03
clarkb	#endmeeting	20:03
opendevmeet	Meeting ended Tue Nov 14 20:03:06 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	20:03
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-14-19.00.html	20:03
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-14-19.00.txt	20:03
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-14-19.00.log.html	20:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!