Tuesday, 2024-01-09

clarkb	meeting time	19:00
clarkb	#startmeeting infra	19:00
opendevmeet	Meeting started Tue Jan 9 19:00:25 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:00
opendevmeet	The meeting name has been set to 'infra'	19:00
clarkb	#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/7IXDFVY34MYBW3WO2EEU3AIGOLAL6WRB/ Our Agenda	19:00
clarkb	Its been a little while since we had these regularly	19:00
clarkb	#topic Announcements	19:01
clarkb	The OpenInfra Foundation Individual Board member election is happening now. Look for your ballot via email and vote.	19:02
clarkb	This election also includes bylaw ammendments to make the bylaws less openstack specific	19:02
clarkb	If you expected to have a ballot and can't find out please reach out. There may have been email delivery problems	19:02
clarkb	Separately we're going to feature OpenDev on the OpenInfra Live stream/podcast/show (I'm not sure exactly how you'd classify it)	19:03
clarkb	That will happen on January 18th at 1500 UTC?	19:04
clarkb	I know the day is correct but not positive on the time. Feel free to tune it	19:04
clarkb	*tune in	19:04
corvus	clarkb: i think the kids are calling it a "realplayer tv show" now ;)	19:04
fungi	also some streaming platforms have the ability for you to heckle us and ask questions	19:04
clarkb	#topic Topics	19:06
clarkb	#topic Server Upgrades	19:07
clarkb	I believe that tonyb has gotten all of the mirror nodes upgraded at this point	19:07
clarkb	Not sure if tonyb is around for the meeting, but I think the plan was to look at meetpad servers next	19:07
tonyb	Correct	19:08
tonyb	I started looking at meetpad, One thing that worries me a little is I can't quite see how we add the jvb nodes to meetpad	19:08
clarkb	tonyb: it should be automated via configuration somehow	19:09
clarkb	tonyb: I can look into that after the meeting	19:09
tonyb	it seems to just be "magic" and I don't want any new jvb nodes added to auto regiuster with the existing meetpad	19:09
tonyb	clarkb: Thanks	19:09
clarkb	tonyb: yes it should be magic and it happems via xmpp iirc	19:09
fungi	we've scaled up and down if you look at git history	19:09
tonyb	Ah okay.	19:10
clarkb	so ya one approach would be to have a new jvb join the old meetpad and the nreplace old meetpad and have new jvb join to the new thing. Or update config management to allow two side by side installations then update dns	19:10
clarkb	we'll need to sort out how the magic happens in order to make a decision on approach I think	19:10
tonyb	That was my thinking	19:10
corvus	(i think a rolling replacement sounds good, but i haven't thought about it deeply)	19:12
tonyb	I also looked at mediawiki and I'm reasonably close to starting that server. translate looks like we'll just turn it off when i18n are ready, but I'm trying to help them with new weblate tools	19:12
corvus	(just mostly that since we're not changing any software versions, we'd expect it to work)	19:12
tonyb	so that leaves cacti and storyboard to look at	19:12
clarkb	tonyb: we've got a spec to add a prometheus and some agents on servers to replace cacti which is one option there	19:12
clarkb	but maybe the easiest thing right now is to just uplift cacti? I don't know	19:13
fungi	cacti was in theory going to be retired in favor of prometheus	19:13
fungi	yeah that	19:13
clarkb	I think the main issue with prometheus was figuring out the agent stuff. Running the service to collect the data is straightforward	19:13
tonyb	Okay, I know ianw was thinking prometheus would be a good place for me to start so I'd be happy to look at that	19:14
clarkb	alright lets move on have a fair numebr of things things to discuss and it sounds like we're continuing to make progress there. Thanks!@	19:14
clarkb	#topic Python container updates	19:15
clarkb	The zuul registry service migrated to bookworm images so I've proposed a change to drop the bullseye images it was relying on	19:15
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/905018 Drop Bullseye python3.11 images	19:15
clarkb	That leaves us with zuul-operator on the bullseye python3.10 images as our last bullseye container images	19:15
clarkb	#topic Upgrading Zuul's DB server	19:16
clarkb	I realized while prepping for this meeting that I had completely spaced on this.	19:16
tonyb	It happens at this time of year ;P	19:16
clarkb	However, coincidentally hacker news had a post about postgres options recently	19:16
clarkb	#link https://www.crunchydata.com/blog/an-overview-of-distributed-postgresql-architectures a recent rundown of postgresql options	19:17
clarkb	I haven't read the article yet, but figured I should as a good next step on this item	19:17
clarkb	did anyone else have new input to add?	19:17
* tonyb shakes head		19:18
clarkb	#topic EMS discontinuing legacy consumer hosting plans	19:19
clarkb	fungi indicated that at the last meeting the general consensus was that we should investigate a switch to the newer plans	19:19
clarkb	fungi: have we done any discussion about this on the foundation side yet? I'm guessing we need a general ack there then we can reach out to element about changing the deployment type?	19:20
fungi	they indicated in the notice that they'd let folks on the old plan have a half-normal minimum user license	19:20
fungi	i did some cursory talking to wes about it and it sounded like they'd be able to work it in for 2024	19:21
fungi	we would have to pay for a full year up front though	19:21
clarkb	I don't expect we'll stop using matrix anytime soon	19:21
clarkb	so that seems fine from a usage standpoint	19:21
fungi	right, since we're supporting multiple openinfra projects with it, the cost is fairly easy to justify	19:22
clarkb	fungi: in that case I guess we should reach out to Element. IIRC the email gave a contact for the conversion	19:22
clarkb	maybe double check with wes that nothing has changed in the last few weeks before sending that email	19:22
* clarkb scribbles a note to do this stuff		19:22
fungi	will do	19:22
tonyb	Also gives us this year to test self-hosting a homeserver	19:23
fungi	we've still got about a month to sort it	19:23
clarkb	right we have until February 7	19:23
frickler	do we really want to test self-hosting? also, would we get an export from element that would allow moving and keeping rooms and history?	19:24
corvus	no export is needed; the system is fully distributed	19:24
clarkb	they provided a link to a mgiration document in the email too	19:24
clarkb	trying to find it	19:25
fungi	but they do have a settings export we can use too	19:25
clarkb	https://ems-docs.element.io/books/element-cloud-documentation/page/migrate-from-ems-to-self-hosted	19:25
fungi	basically the homeserver config	19:25
frickler	so you start a new homeserver with the same name and the rooms just magically migrate?	19:25
tonyb	frickler: I think it's something to investigate during the year. Gives us more information for making a long term decision	19:25
clarkb	we "own" the room names so ti would largely be history and room config to worry about aiui	19:25
corvus	the rooms and their contents exist on all matrix servers involved in the federation (typically homeservers of users in those rooms)	19:26
corvus	if the history is exported, cool, but in theory i think a replacement server should be able to grab the history from any other server	19:27
clarkb	oh interesting. So if you stand up a new server and have the well known file say it is the :opendev.org homeserver then clients will talk to the new server. That new server will sync out of the federated state the history of its rooms	19:28
corvus	that's what i'd expect. i have not tested it.	19:28
clarkb	ack. Also looks like we can copy databases per the ems migration doc should that be necessary	19:29
corvus	(you'd just need to use one of the other room ids initially)	19:29
corvus	but i'm still in no rush to self-host.	19:29
clarkb	in any case figuring that out is a next step. First up is figuring out a year of hosting	19:29
clarkb	and if that is reasonable. Which I can help coordinate with fungi at the foundation and talking to element	19:29
clarkb	#topic Followup on haproxy update being broken	19:30
clarkb	There was a lot of info under this item but the two main points seem to be "should we be more explicit about the versions of docker images we consume" and "should we prune less aggressively"	19:30
corvus	(like, i'm not looking at ems as an interim step based on our conversations so far -- but i agree that keeping aware of future options is good)	19:30
clarkb	I think for haproxy in particular we can and should probably stick with their lts tag	19:31
fungi	i think we mostly covered the haproxy topic at the last meeting, but happy to revisit since not everyone was present	19:31
corvus	++lts tag	19:31
clarkb	fungi: ack. I wanted to bring up one thing primiarly on pruning	19:31
clarkb	One gotcha with pruning is that it seems to be based on the image build/creation time not when you started using the newer image(s)	19:31
fungi	right, note that we hadn't actually pruned the old haproxy image we downgraded to, when i did the manual config change and pulled, it didn't need to retrieve the image	19:32
clarkb	and so it is a bit of a clunky tool, but better than nothing for images like haproxy for example where we could easily revert	19:32
clarkb	I'm happy for us to extend the time we keep images, but also be aware of this limitation with the pruning command	19:32
corvus	i'm ambivalent about pruning because i'm not worried about not being able to pull an old version from a registry on demand	19:33
fungi	the main thing it might offer is insurance against upstreams deleting their images	19:33
fungi	but i don't think that's actually been an issue we've encountered yet?	19:33
frickler	one concern of mine was being able to find out which last version it actually was that we were running	19:33
corvus	i'm not eager to run an image that upstream has deleted either	19:33
fungi	frickler: yes, if we could add some more verbosity around our image management, that could help	19:34
clarkb	frickler: we could update our ansible runs to do something like a docker ps -a and docker image list	19:34
clarkb	and record that in our deployment logs	19:34
fungi	even if it's just something that periodically interrogates docker for image ids and logs them to a file	19:34
fungi	or yeah that	19:34
frickler	maybe even somewhere more persistent than zuul build logs would be good	19:35
corvus	i agree with frickler that leaving an image sitting around for some number of days provides a good indication of what we were probably running before	19:35
clarkb	ok so the outstanding need is better records of what docker images we ran during which timeframes	19:36
corvus	(we could stick version numbers in prometheus; it's not great for that though, but it's okay as long as they don't change too often)	19:36
clarkb	ya this will probably require a bit more brainstorming	19:36
corvus	(the only way to do that with prometheus increases the cardinality of metrics with each new version number)	19:36
clarkb	maybe start with the simple thing of having ansible record a bit more info then try and improve on that for longer term retention	19:37
clarkb	I'll continue on as we have a few more items to discuss	19:38
clarkb	#topic Followup on haproxy update being broken	19:38
clarkb	Similar to the last one I'm not sure if this reached a conclusion but two things worth mentioning have happened recently. First zuul's doc quota was increased	19:38
frickler	that's the topic we just had?	19:38
clarkb	bah yes	19:39
clarkb	#undo	19:39
opendevmeet	Removing item from minutes: #topic Followup on haproxy update being broken	19:39
clarkb	#topic AFS Quota issues	19:39
clarkb	copy and paste failure	19:39
* fungi is now much less confused		19:39
clarkb	Second is that there are some early discussions around having openeuler be more involved with opendev and possibly contributing some CI resources	19:39
frickler	the zuul project quota was increased (not doc I think)	19:39
clarkb	frickler: ya it hosts the zuul docs iirc	19:40
clarkb	and website?	19:40
frickler	IIUC the release artefacts	19:40
clarkb	There may be an opportunity to lverage this interest in collaboration to clean up the openeuler mirrors and feedback to them on the growth problems	19:40
corvus	everything under zuul-ci.org is on one volume	19:40
fungi	zuul's docs are part of its project website	19:40
fungi	yeah that	19:40
corvus	and i increased it to 5gb	19:40
clarkb	ahah	19:40
clarkb	essentially work with the interested parties to improve the situation around mirrors for openeuler and maybe our CI quotas	19:41
clarkb	responding to their latest queries about the sizes of VMs and how many is on my todo list after meetings and lunch	19:41
clarkb	(you know we write that stuff down in a document but 100% of the time the questions get asked anyway)	19:42
frickler	do you have a reference to those openeuler discussions or are they private for now?	19:42
corvus	they have an openstack cloud?	19:42
clarkb	frickler: I think keeping the email discussion small while we sort out if it is even possible is good, but once we know if it will go somewhere we can do that more publicly	19:43
clarkb	corvus: yes sounds like it? We tried to be explicit that what we need is an openstack api endpoint and accounts that can provision VMs	19:43
frickler	yeah, I just wanted to know whether I missed something somewhere	19:43
fungi	for transparency: openeuler representatives were in discussion with openinfra foundation staff members and offered to supply system resources, so the foundation staff are trying to put them in touch with us to determine more scope around it	19:43
fungi	it's all been private discussions so far	19:43
corvus	neat	19:44
clarkb	were there other outstanding afs quota concerns to discuss?	19:44
fungi	since openstack is a primary use case for their distro, they have a vested interest in helping test openstack upstream on it	19:44
frickler	some other mirror volumes need watching	19:45
clarkb	for centos stream I seem to recall digging around in those mirrors and we end up with lots of packages with many versions	19:45
frickler	centos-stream and ubuntu-ports look very close to their limit	19:46
clarkb	in theory we only need the newest 2 to avoid installation failures	19:46
clarkb	we could potentially write a smarter syncing script that scanned through and deleted older versions	19:46
clarkb	for ubuntu ports I had thought we were still syncing old versions of the distro that we could delete but we aren't so I'm nto sure what we can do there	19:46
clarkb	are we syncing more than arm64 packages maybe? like 32bit arm and or ppc? I think not	19:47
clarkb	I don't think we have time to solve that in this meeting. Lets continue on as we have ~3 more topics to cover	19:48
clarkb	#topic Broken wheel build issues	19:48
frickler	I don't know, I just noticed these issues when checking whether we have room to mirror rocky	19:48
clarkb	frickler: ack	19:48
fungi	it's also possible that dropping old releases from our config isn't cleaning up the old packages associated with them	19:49
clarkb	fungi: oh interesting. Worth double checking	19:49
clarkb	for wheels I think we can stop building and mirroring them at any time beacuse pip will prefer new sdists over old wheels right? so we don't even need to update the pip.conf in our test nodes	19:49
fungi	correct	19:49
clarkb	fungi: ^ you probably know off the top of your head if that is the case. But that would be my main concern is that we start testing older stuff accidentally if we stop building wheels	19:50
fungi	unless you pass the pip option to prefer "binary" packages (wheels)	19:50
clarkb	right	19:50
fungi	but it's not on by default	19:50
fungi	i'd treat that as a case oc caveat emptor	19:50
clarkb	in that case I think it is reasonable to send email to the service announce list indicating we plan to stop running those jobs in the future (say beginning of february) ask if anyone is interested in keeping them alive and if not jobs will fallback to building from source	19:50
clarkb	the fallback is slower and may require some bindep file updates but it isn't going to hard stop anyone from getting work done on centos distros	19:51
fungi	wfm	19:51
frickler	will we also clean out existing wheels at the same time? maybe keep the afs volume but not publish anymore?	19:51
clarkb	frickler: I think we should keep the content for a bit as some of the existing wheels may be up to date for a while	19:52
fungi	we could probably do it in phases	19:52
frickler	ok	19:52
clarkb	since pip's behavior is acceptable by default here we can still take advantage of the remaining benefit from the mirror for a bit	19:52
clarkb	then maybe after 6-12 months clean it up	19:52
clarkb	alright next topic	19:53
clarkb	#topic Gitea repo-archives filling server disk	19:53
fungi	fwiw, the python ecosystem has gotten a lot better about making cross-platform wheels for releases of things now, and in a more timely fashion	19:53
fungi	so our pre-built wheels are far less necessary	19:53
clarkb	when you ask gitea for a repo archive (tarball/zip/.bundle) it caches that on disk	19:53
clarkb	then once a day it runs an itnernal cron task (using a go library implemtnation of cron not system cron) to clean up any repo archives that are older than a day old	19:54
fungi	oh, yeah this is a fun one. i'd somehow already pushed it to the back of my mind	19:54
frickler	can we disable that functionality? we do have our own tarballs instead (at least for openstack)?	19:54
corvus	i'm guessing people do that a lot to get releases even though like zero opendev projects make releases that way?	19:54
corvus	what frickler said :)	19:54
fungi	s/people/web crawlers/ i think	19:55
clarkb	upstream indicated it could be web crawlers	19:55
clarkb	so their suggestion was to update our robots.txt	19:55
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/904868 update robots.txt on upstream's suggestion	19:55
clarkb	and no we can't disable teh feature	19:55
clarkb	at least I haven't found a way to do that	19:55
clarkb	the problem is the daily claenup isn't actually cleaning up everything more than a day old	19:55
clarkb	I've spent a bit of time rtfs'ing and looking at the database and I can't figure out why it is broken but you can see on gitea12 that it falls about 4 hours behind each time it runs so we end up leaking and filling the disk	19:56
clarkb	In addition to reducing the number of archives generated by asking bots to leave them alone we can also run a cron job that simply deletes all archives	19:56
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/904874 Run weekly removal of all cached repo archives	19:56
frickler	does gitea break if we make the cache non-writeable?	19:56
clarkb	frickler: I haven't tesed that but I would assume so. I would expect a 500 error when you request the archive	19:57
frickler	which would also be like disabling it kind of	19:57
fungi	i suppose it depends on your definition of "break" ;)	19:57
clarkb	since we are already trying to delete archives more than a day old deleting all archives once a week on the weekend seems safe	19:57
clarkb	and when you ask it to delte all archives it does successfully delete all archives	19:57
clarkb	I would prefer we not intentionally create 500 errors	19:58
clarkb	there are valid reasons to get repo archives	19:58
clarkb	I also noticed when looking at the cron jobs that gitea has a phone home to check if it is running the latest release cron job	19:58
corvus	the cron might have a small window of breakage, but should immediately work on a retry so lgtm	19:59
clarkb	I pushed https://review.opendev.org/c/opendev/system-config/+/905020 to disable that cron job ecabsue I hate the idea of a phone home for that	19:59
clarkb	our hour is up and I have to context switch to another meeting	20:00
clarkb	#topic Service Coordinator Election	20:00
clarkb	really quickly because I end the meeting I wanted to call out that we're appraoaching the service coordinator election timeframe. I need to dig up emails to determine when I said that would happen (I beleivee it is end of january early february)	20:01
clarkb	nothing for anyone to do at this point other than consider if they wish to assume the role and nominate themselves. And I'll work to get things official via email	20:01
tonyb	If it matches openstack PTL/TC elections then they'll start in Feb	20:01
clarkb	tonyb: its slightly offset	20:01
tonyb	okay	20:01
clarkb	#topic Open Discussion	20:01
clarkb	Anything else important before we call the meeting?	20:02
tonyb	nope	20:03
clarkb	sounds like no. Thank you everyone for your time and help running the opendev services!	20:03
clarkb	we'll be back next week same time and location	20:03
clarkb	#endmeeting	20:03
opendevmeet	Meeting ended Tue Jan 9 20:03:27 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	20:03
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-01-09-19.00.html	20:03
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-01-09-19.00.txt	20:03
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-01-09-19.00.log.html	20:03
corvus	thanks clarkb !	20:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!