Wednesday, 2024-07-03

*** thuvh1 is now known as thuvh11:02
fbo[m]Hi I see that the AFS for centos mirror is outdated since 7 days https://grafana.opendev.org/d/9871b26303/afs?orgId=1. We are mirroring from there and some alert are ringing on our monitoring. So I was wondering if you were aware of it.12:50
fungifbo[m]: the mirroring logs are public, let me get a link, they likely indicate the problem12:53
fungifbo[m]: i guess not in this case, looks like the last time it tried was a week ago based on timestamps in https://static.opendev.org/mirror/logs/rsync-mirrors/centos.log12:54
fungii'll see if we have any cron failures in system logs12:54
fungifbo[m]: oh, my mistake, i should have been looking at https://static.opendev.org/mirror/logs/rsync-mirrors/centos-stream.log12:56
fungiwe don't mirror the "centos" (non-stream) repository any longer, the last thing it contained was centos stream 8 which is now eol and was emptied upstream by centos themselves: https://static.opendev.org/mirror/centos/8-stream/readme12:57
fungiso for a while we were rsync'ing that single readme file, and then a week ago we turned off the rsync for it12:58
fungiapparently "deprecate" is centos community parlance for "delete"12:59
fungi(though they supposedly moved its frozen corpse to another location we don't mirror)12:59
fbo[m]Ok that make sense, so perhaps from our side we should make an exception for that and avoid ringing an alert for mirror.centos 12:59
fungiyeah, we're in the process of deleting all centos-8-stream nodes from nodepool and job definitions13:00
fungithey're all currently broken anyway since there's no longer a way for them to install packages13:00
fbo[m]... we are doing the same ;)13:01
fbo[m]thanks fungi for the help !13:01
fungiyou're welcome!13:03
*** gthiemon1e is now known as gthiemonge13:27
fricklerseems we have collected a stack of about 30 stuck-in-deleting nodes in rax-dfw + -ord again. also it looks like rax-ord maxes out well below the configured server limit (about 175 instead of 195) https://grafana.opendev.org/d/a8667d6647/nodepool3a-rackspace?orgId=1&from=now-2d&to=now14:24
clarkbfrickler: I think that lower limit is due to quotas and isn't necessarily a bug. Basically nodepool will respect quotas if the cloud reduces them14:43
clarkbon the held gitea 1.22 node tarball downloads work so ya I don't think it is any mitigation specific thing we've done. I actually suspect there is a bug in 1.21.1115:35
clarkbit could possibly be a db related issue though since the 1.22 deployment would use a more fixed up db state? however I would've expected errors instead of http 200 results in that case15:35
clarkbso anyway I think we can proceed with upgrade plans to see if the problem is addressed and if not then see if the db doctoring helps and if not then do a deeper debug15:47
opendevreviewJames E. Blair proposed opendev/system-config master: Use jaeger all-in-one v1 image  https://review.opendev.org/c/opendev/system-config/+/92343918:35
corvusinfra-root: the zuul quickstart job started bombing today and i suspect that's the cause.  i made an identical change to zuul; if that fixes the issue, then i think we should merge that change.  if jaeger is unhappy about that (ie, because it ends up being a downgrade) then i think i should just delete the data and start over.  i don't think we care about long-term retention there.18:36
corvus(i do think we should wait for the zuul results to confirm first)18:37
opendevreviewGhanshyam proposed openstack/project-config master: Retire kuryr-kubernetes and tempest plugin: end gate and update acl  https://review.opendev.org/c/openstack/project-config/+/92307218:37
fungicorvus: sounds good. thanks for the patch!18:38
clarkbcorvus: +2 from me I agree we can start over with minimal impact if it comes to that19:57
clarkbI've also approved the change on the zuul side: https://review.opendev.org/c/zuul/zuul/+/923438 since it did pass19:57
clarkbfungi: maybe you hvae time to review 923439 ?19:57
fungii was watching for the zuul change to finish tests, yeah. looking now20:33
fungiboth lgtm20:34
clarkbI don't know if everyone is on libera these days but they walloped a message about needing to upgrade znc if you use it due to an rce bug20:52
clarkbI don't use znc myself so not really clued into all the details but figured I would mention it here in case it was useful to anyone using znc20:53
fungiyeah, it crossed oss-security ml as well, and debian already issued a dsa too20:55
opendevreviewMerged opendev/system-config master: Use jaeger all-in-one v1 image  https://review.opendev.org/c/opendev/system-config/+/92343921:18
opendevreviewSteve Baker proposed openstack/diskimage-builder master: CentOS-7 EOL: remove jobs  https://review.opendev.org/c/openstack/diskimage-builder/+/92345022:14
fungideploy of the jaeger change succeeded, ftr22:40
clarkbI seem to get results back in the service web ui too22:41
clarkbthere may still be issues with the "downgrade" but my spot check seems to indicate this is working22:42
clarkbits possible the change happend super recently and occured after our daily run last night and so we didn't actually downgrade22:43
clarkbthat would make sense to me actually given that zuul has been fairly active and would've noticed the problem quickly22:43
corvusyeah, that was my hope :)22:47

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!