Wednesday, 2024-07-03

*** thuvh1 is now known as thuvh		11:02
fbo[m]	Hi I see that the AFS for centos mirror is outdated since 7 days https://grafana.opendev.org/d/9871b26303/afs?orgId=1. We are mirroring from there and some alert are ringing on our monitoring. So I was wondering if you were aware of it.	12:50
fungi	fbo[m]: the mirroring logs are public, let me get a link, they likely indicate the problem	12:53
fungi	fbo[m]: i guess not in this case, looks like the last time it tried was a week ago based on timestamps in https://static.opendev.org/mirror/logs/rsync-mirrors/centos.log	12:54
fungi	i'll see if we have any cron failures in system logs	12:54
fungi	fbo[m]: oh, my mistake, i should have been looking at https://static.opendev.org/mirror/logs/rsync-mirrors/centos-stream.log	12:56
fungi	we don't mirror the "centos" (non-stream) repository any longer, the last thing it contained was centos stream 8 which is now eol and was emptied upstream by centos themselves: https://static.opendev.org/mirror/centos/8-stream/readme	12:57
fungi	so for a while we were rsync'ing that single readme file, and then a week ago we turned off the rsync for it	12:58
fungi	apparently "deprecate" is centos community parlance for "delete"	12:59
fungi	(though they supposedly moved its frozen corpse to another location we don't mirror)	12:59
fbo[m]	Ok that make sense, so perhaps from our side we should make an exception for that and avoid ringing an alert for mirror.centos	12:59
fungi	yeah, we're in the process of deleting all centos-8-stream nodes from nodepool and job definitions	13:00
fungi	they're all currently broken anyway since there's no longer a way for them to install packages	13:00
fbo[m]	... we are doing the same ;)	13:01
fbo[m]	thanks fungi for the help !	13:01
fungi	you're welcome!	13:03
*** gthiemon1e is now known as gthiemonge		13:27
frickler	seems we have collected a stack of about 30 stuck-in-deleting nodes in rax-dfw + -ord again. also it looks like rax-ord maxes out well below the configured server limit (about 175 instead of 195) https://grafana.opendev.org/d/a8667d6647/nodepool3a-rackspace?orgId=1&from=now-2d&to=now	14:24
clarkb	frickler: I think that lower limit is due to quotas and isn't necessarily a bug. Basically nodepool will respect quotas if the cloud reduces them	14:43
clarkb	on the held gitea 1.22 node tarball downloads work so ya I don't think it is any mitigation specific thing we've done. I actually suspect there is a bug in 1.21.11	15:35
clarkb	it could possibly be a db related issue though since the 1.22 deployment would use a more fixed up db state? however I would've expected errors instead of http 200 results in that case	15:35
clarkb	so anyway I think we can proceed with upgrade plans to see if the problem is addressed and if not then see if the db doctoring helps and if not then do a deeper debug	15:47
opendevreview	James E. Blair proposed opendev/system-config master: Use jaeger all-in-one v1 image https://review.opendev.org/c/opendev/system-config/+/923439	18:35
corvus	infra-root: the zuul quickstart job started bombing today and i suspect that's the cause. i made an identical change to zuul; if that fixes the issue, then i think we should merge that change. if jaeger is unhappy about that (ie, because it ends up being a downgrade) then i think i should just delete the data and start over. i don't think we care about long-term retention there.	18:36
corvus	(i do think we should wait for the zuul results to confirm first)	18:37
opendevreview	Ghanshyam proposed openstack/project-config master: Retire kuryr-kubernetes and tempest plugin: end gate and update acl https://review.opendev.org/c/openstack/project-config/+/923072	18:37
fungi	corvus: sounds good. thanks for the patch!	18:38
clarkb	corvus: +2 from me I agree we can start over with minimal impact if it comes to that	19:57
clarkb	I've also approved the change on the zuul side: https://review.opendev.org/c/zuul/zuul/+/923438 since it did pass	19:57
clarkb	fungi: maybe you hvae time to review 923439 ?	19:57
fungi	i was watching for the zuul change to finish tests, yeah. looking now	20:33
fungi	both lgtm	20:34
clarkb	I don't know if everyone is on libera these days but they walloped a message about needing to upgrade znc if you use it due to an rce bug	20:52
clarkb	I don't use znc myself so not really clued into all the details but figured I would mention it here in case it was useful to anyone using znc	20:53
fungi	yeah, it crossed oss-security ml as well, and debian already issued a dsa too	20:55
opendevreview	Merged opendev/system-config master: Use jaeger all-in-one v1 image https://review.opendev.org/c/opendev/system-config/+/923439	21:18
opendevreview	Steve Baker proposed openstack/diskimage-builder master: CentOS-7 EOL: remove jobs https://review.opendev.org/c/openstack/diskimage-builder/+/923450	22:14
fungi	deploy of the jaeger change succeeded, ftr	22:40
clarkb	I seem to get results back in the service web ui too	22:41
clarkb	there may still be issues with the "downgrade" but my spot check seems to indicate this is working	22:42
clarkb	its possible the change happend super recently and occured after our daily run last night and so we didn't actually downgrade	22:43
clarkb	that would make sense to me actually given that zuul has been fairly active and would've noticed the problem quickly	22:43
corvus	yeah, that was my hope :)	22:47

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!