*** thuvh1 is now known as thuvh | 11:02 | |
fbo[m] | Hi I see that the AFS for centos mirror is outdated since 7 days https://grafana.opendev.org/d/9871b26303/afs?orgId=1. We are mirroring from there and some alert are ringing on our monitoring. So I was wondering if you were aware of it. | 12:50 |
---|---|---|
fungi | fbo[m]: the mirroring logs are public, let me get a link, they likely indicate the problem | 12:53 |
fungi | fbo[m]: i guess not in this case, looks like the last time it tried was a week ago based on timestamps in https://static.opendev.org/mirror/logs/rsync-mirrors/centos.log | 12:54 |
fungi | i'll see if we have any cron failures in system logs | 12:54 |
fungi | fbo[m]: oh, my mistake, i should have been looking at https://static.opendev.org/mirror/logs/rsync-mirrors/centos-stream.log | 12:56 |
fungi | we don't mirror the "centos" (non-stream) repository any longer, the last thing it contained was centos stream 8 which is now eol and was emptied upstream by centos themselves: https://static.opendev.org/mirror/centos/8-stream/readme | 12:57 |
fungi | so for a while we were rsync'ing that single readme file, and then a week ago we turned off the rsync for it | 12:58 |
fungi | apparently "deprecate" is centos community parlance for "delete" | 12:59 |
fungi | (though they supposedly moved its frozen corpse to another location we don't mirror) | 12:59 |
fbo[m] | Ok that make sense, so perhaps from our side we should make an exception for that and avoid ringing an alert for mirror.centos | 12:59 |
fungi | yeah, we're in the process of deleting all centos-8-stream nodes from nodepool and job definitions | 13:00 |
fungi | they're all currently broken anyway since there's no longer a way for them to install packages | 13:00 |
fbo[m] | ... we are doing the same ;) | 13:01 |
fbo[m] | thanks fungi for the help ! | 13:01 |
fungi | you're welcome! | 13:03 |
*** gthiemon1e is now known as gthiemonge | 13:27 | |
frickler | seems we have collected a stack of about 30 stuck-in-deleting nodes in rax-dfw + -ord again. also it looks like rax-ord maxes out well below the configured server limit (about 175 instead of 195) https://grafana.opendev.org/d/a8667d6647/nodepool3a-rackspace?orgId=1&from=now-2d&to=now | 14:24 |
clarkb | frickler: I think that lower limit is due to quotas and isn't necessarily a bug. Basically nodepool will respect quotas if the cloud reduces them | 14:43 |
clarkb | on the held gitea 1.22 node tarball downloads work so ya I don't think it is any mitigation specific thing we've done. I actually suspect there is a bug in 1.21.11 | 15:35 |
clarkb | it could possibly be a db related issue though since the 1.22 deployment would use a more fixed up db state? however I would've expected errors instead of http 200 results in that case | 15:35 |
clarkb | so anyway I think we can proceed with upgrade plans to see if the problem is addressed and if not then see if the db doctoring helps and if not then do a deeper debug | 15:47 |
opendevreview | James E. Blair proposed opendev/system-config master: Use jaeger all-in-one v1 image https://review.opendev.org/c/opendev/system-config/+/923439 | 18:35 |
corvus | infra-root: the zuul quickstart job started bombing today and i suspect that's the cause. i made an identical change to zuul; if that fixes the issue, then i think we should merge that change. if jaeger is unhappy about that (ie, because it ends up being a downgrade) then i think i should just delete the data and start over. i don't think we care about long-term retention there. | 18:36 |
corvus | (i do think we should wait for the zuul results to confirm first) | 18:37 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire kuryr-kubernetes and tempest plugin: end gate and update acl https://review.opendev.org/c/openstack/project-config/+/923072 | 18:37 |
fungi | corvus: sounds good. thanks for the patch! | 18:38 |
clarkb | corvus: +2 from me I agree we can start over with minimal impact if it comes to that | 19:57 |
clarkb | I've also approved the change on the zuul side: https://review.opendev.org/c/zuul/zuul/+/923438 since it did pass | 19:57 |
clarkb | fungi: maybe you hvae time to review 923439 ? | 19:57 |
fungi | i was watching for the zuul change to finish tests, yeah. looking now | 20:33 |
fungi | both lgtm | 20:34 |
clarkb | I don't know if everyone is on libera these days but they walloped a message about needing to upgrade znc if you use it due to an rce bug | 20:52 |
clarkb | I don't use znc myself so not really clued into all the details but figured I would mention it here in case it was useful to anyone using znc | 20:53 |
fungi | yeah, it crossed oss-security ml as well, and debian already issued a dsa too | 20:55 |
opendevreview | Merged opendev/system-config master: Use jaeger all-in-one v1 image https://review.opendev.org/c/opendev/system-config/+/923439 | 21:18 |
opendevreview | Steve Baker proposed openstack/diskimage-builder master: CentOS-7 EOL: remove jobs https://review.opendev.org/c/openstack/diskimage-builder/+/923450 | 22:14 |
fungi | deploy of the jaeger change succeeded, ftr | 22:40 |
clarkb | I seem to get results back in the service web ui too | 22:41 |
clarkb | there may still be issues with the "downgrade" but my spot check seems to indicate this is working | 22:42 |
clarkb | its possible the change happend super recently and occured after our daily run last night and so we didn't actually downgrade | 22:43 |
clarkb | that would make sense to me actually given that zuul has been fairly active and would've noticed the problem quickly | 22:43 |
corvus | yeah, that was my hope :) | 22:47 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!