tkajinam | o/ I wonder if we can move this forward ? https://review.opendev.org/c/openstack/project-config/+/907954 | 06:21 |
---|---|---|
tkajinam | a few of the remaining patches to drop puppet-qdr | 06:21 |
amoralej | clarkb, yep, 18:43 run fixed centos8, thanks four your help! | 08:47 |
mnasiadka | Started seeing timeouts on the caching proxy mainly in rax - 500 Server Error for http+docker://localhost/v1.44/images/create?tag=master-ubuntu-jammy&fromImage=mirror-int.iad.rax.opendev.org%3A4447%2Fopenstack.kolla%2Ffluentd: Internal Server Error (\"Get \"https://mirror-int.iad.rax.opendev.org:4447/v2/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\")\\n'" | 08:57 |
mnasiadka | infra-root: can you have a look? | 08:57 |
*** tobias-urdin4 is now known as tobias-urdin | 10:23 | |
*** ralonsoh__ is now known as ralonsoh | 12:25 | |
*** carloss_ is now known as carloss | 13:15 | |
fungi | [2024-03-08 01:00:24.337] "GET /v2/openstack.kolla/rabbitmq/blobs/sha256:26862c3518e8de33fa5ada8cdc1795d1a60326464c12c899537b07d064e6bc40 HTTP/1.1" 500 141 - "-" "docker/25.0.4 go/go1.21.8 git-commit/061aa95 kernel/5.14.0-362.18.1.el9_3.0.1.x86_64 os/linux arch/amd64 UpstreamClient(docker-sdk-python/6.1.3)" | 13:44 |
fungi | that's the most recent 500 response from mirror(-int).iad.rax port 4447 | 13:44 |
fungi | mnasiadka: what time was your example? | 13:44 |
opendevreview | Merged openstack/project-config master: Retire puppet-qdr: Remove Project from Infrastructure System https://review.opendev.org/c/openstack/project-config/+/907954 | 13:47 |
mnasiadka | fungi: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f08/912030/5/check/kolla-ansible-ubuntu-upgrade/f08c3bd/primary/logs/ansible/genconfig | 14:03 |
mnasiadka | fungi: we’ve seen that in kayobe as well I think, now on a mobile - but will find occurrences in an hour if needed | 14:04 |
fungi | mnasiadka: it looks like there is some sort of proxy webserver on localhost based on that log. could the 500 be coming from there rather than from the regional cache (mirror) server? | 14:07 |
fungi | do you have access or proxy logs for whatever that local webserver is? | 14:09 |
mnasiadka | fungi: we use docker and set registry mirrors - no other webservers in place | 14:21 |
mnasiadka | See https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f08/912030/5/check/kolla-ansible-ubuntu-upgrade/f08c3bd/primary/logs/system_configs/docker/daemon.json | 14:21 |
mnasiadka | Wonder if that’s some quay.io timeout issue or something in rax | 14:22 |
fungi | mnasiadka: what's the "localhost" in the url then? | 14:23 |
mnasiadka | Docker API :-) | 14:24 |
fungi | can the "docker api" return a 500 error? | 14:25 |
mnasiadka | Ansible module using docker-py to talk to Docker API | 14:25 |
mnasiadka | On pull timeout it seems, but maybe it’s some new Docker bug | 14:25 |
mnasiadka | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f08/912030/5/check/kolla-ansible-ubuntu-upgrade/f08c3bd/primary/logs/system_logs/docker.txt - this is the docker log | 14:28 |
fungi | looks like https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f08/912030/5/check/kolla-ansible-ubuntu-upgrade/f08c3bd/primary/logs/system_logs/docker.txt might be a clearer source of info | 14:28 |
fungi | hah, you beat me to it by moments ;) | 14:28 |
mnasiadka | Well, I know the log locations in Kolla woken in the middle of the night ;) | 14:28 |
fungi | i'm sort of slow looking at logs because this is also openstack weekly release meeting time | 14:29 |
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire puppet-sahara: Remove Project from Infrastructure System https://review.opendev.org/c/openstack/project-config/+/910455 | 14:54 |
fungi | mnasiadka: i guess this is about the point in the docker log that corresponds to the genconfig log error, though i don't immediately see that it complains about anything: | 14:57 |
fungi | Mar 08 08:49:47 primary dockerd[9535]: time="2024-03-08T08:49:47.389984156Z" level=debug msg="Trying to pull mirror-int.iad.rax.opendev.org:4447/openstack.kolla/fluentd from https://mirror-int.iad.rax.opendev.org:4447" | 14:57 |
fungi | am i reading that right | 14:57 |
fungi | ? | 14:58 |
fungi | aha, i had to scroll down more: | 15:00 |
fungi | Mar 08 08:50:02 primary dockerd[9535]: time="2024-03-08T08:50:02.391974305Z" level=warning msg="Error getting v2 registry: Get \"https://mirror-int.iad.rax.opendev.org:4447/v2/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" | 15:00 |
fungi | so looks like a 15-second timeout probably | 15:01 |
fungi | mnasiadka: correlating to logs on the mirror server, the last time i see that node's ip address (10.209.0.58) request anything with "fluentd" in the name was half an hour earlier: | 15:05 |
fungi | 10.209.0.58 - - [2024-03-08 08:17:29.054] "GET /v2/openstack.kolla/fluentd/blobs/sha256:9514c7d8dbfd8d9cd5508701931a94b2e3196186868f1a445a4a6d2fa232592d HTTP/1.1" 302 1915 - "-" "docker/25.0.4 go/go1.21.8 git-commit/061aa95 kernel/5.15.0-97-generic os/linux arch/amd64 UpstreamClient(docker-sdk-python/6.1.3)" | 15:05 |
Clark[m] | Note the timeout you posted was just for the API root. No specific image in that path | 15:06 |
fungi | good point | 15:06 |
fungi | 10.209.0.58 - - [2024-03-08 08:49:47.404] "GET /v2/ HTTP/1.1" 401 4 Response status 401 "-" "docker/25.0.4 go/go1.21.8 git-commit/061aa95 kernel/5.15.0-97-generic os/linux arch/amd64 UpstreamClient(docker-sdk-python/6.1.3)" | 15:07 |
fungi | that seems to correspond | 15:07 |
fungi | and was the last request the server saw from that client ip address | 15:07 |
Clark[m] | I wonder if the token is expiring and docker client tooling isn't noticing so it sends the old token | 15:13 |
mnasiadka | token? that repo is public | 15:41 |
fungi | mnasiadka: yes, but the docker protocol requires "authentication" for everything and then makes up tokens on the fly | 15:41 |
fungi | which is why it's so painful to proxy | 15:42 |
fungi | every request is treated as authenticated even if you're not actually using login credentials because the repo is public | 15:43 |
mnasiadka | oh boy | 15:43 |
fungi | it is one of the worst designs for a file transfer protocol i've ever seen | 15:44 |
mnasiadka | might be that a proper pull through registry would be better, if that keeps reappearing - we (Kolla) would need to think about something | 15:45 |
Clark[m] | I'm not sure we can say either way yet. If the problem is/was on the quay side a different tool is unlikely to help | 15:55 |
Clark[m] | Quay's status page indicates no problems at least | 15:55 |
fungi | but even just intermittent connectivity problems between rackspace and red hat could account for the observed behaviors | 15:56 |
Clark[m] | Have any zuul jobs hit issues? Zuul and nodepool are hosted in quay | 16:00 |
fungi | not that i've noticed, but i haven't been watching any very closely today | 16:02 |
opendevreview | Clark Boylan proposed zuul/zuul-jobs master: Drop CentOS 7 test jobs https://review.opendev.org/c/zuul/zuul-jobs/+/912280 | 16:23 |
opendevreview | Clark Boylan proposed openstack/project-config master: Drop CentOS 7 wheel builds and bindep fallback check job https://review.opendev.org/c/openstack/project-config/+/912283 | 16:31 |
clarkb | I've just pushed three new topic:drop-centos-7 changes which do supporting infrastructure cleanup. I believe all three should be mergable at this point in time | 16:37 |
clarkb | though maybe I got the order between project-config and ozj wrong /me tries to parse the error | 16:38 |
fungi | i think the project-config change just has to merge before the ozj change will pass | 16:38 |
fungi | due to being a trusted config repo | 16:38 |
clarkb | oh yup, the order I've got is correct but the depends-on won't speculatively test it | 16:39 |
clarkb | I got confused because I initially read the error as saying openstack/requirements was the problem but that was me reading the message incorrectly | 16:39 |
fungi | i still need to put together the keystone et al changes to stop using the devstack centos7 nodeset | 16:40 |
*** blarnath is now known as d34dh0r53 | 18:09 | |
clarkb | fungi: any objectiosn to proceeding with https://review.opendev.org/c/openstack/project-config/+/912283 ? | 19:09 |
clarkb | I'd like to get that in to make sure the ozj chagne is mergeable and merge that too if pssible | 19:09 |
fungi | clarkb: none, i went ahead and approved it | 19:16 |
opendevreview | Merged openstack/project-config master: Drop CentOS 7 wheel builds and bindep fallback check job https://review.opendev.org/c/openstack/project-config/+/912283 | 19:23 |
fungi | hopefully the other change will pass now | 19:23 |
fungi | i've rechecked it | 19:24 |
Clark[m] | Thanks having some early lunch | 19:38 |
fungi | we may want to remove x/monitorstack from the tenant config, it's got references to ansible-role-functional-centos-7 | 19:40 |
Clark[m] | ++ | 19:41 |
fungi | last commit was a homepage update 5 years ago | 19:41 |
fungi | i'll push up that change | 19:41 |
Clark[m] | Thanks | 19:42 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Drop gating for x/monitorstack https://review.opendev.org/c/openstack/project-config/+/912304 | 19:47 |
fungi | looks like it's just the tenant config entry that needs to go | 19:47 |
opendevreview | Merged openstack/project-config master: Drop gating for x/monitorstack https://review.opendev.org/c/openstack/project-config/+/912304 | 20:05 |
clarkb | heh I rechecked the ozj change before ^ deployed. I should know better /me tries to practice patience | 20:10 |
fungi | yeah, saw that | 20:10 |
clarkb | once these jobs are cleaned up I think we're ready to remove the nodeset from base-jobs and the images from nodepool on the announced date. And just hope that as much cleanup as possible happens before then | 20:14 |
clarkb | and then xenial, that will be a fun one | 20:19 |
clarkb | one upside that I didn't anticipate is it is giving us good reason to pare down the zuul project list | 20:20 |
clarkb | which should in theory reduce the number of errors and other unwanted interactions from cleanups | 20:20 |
clarkb | for xenial one of the fun cleanups is going to be our puppet stuff. But we did retire a lot of them so maybe the web won't be too large | 20:29 |
clarkb | ok now the ozj job is failing because the aarch centos 8 openafs build fails due to the kernel being too old again. I'll restore the content of the playbook that builds that stuff to include centos 7 even though it isn't required any longer to make this mergeable | 20:34 |
clarkb | we will have to clean that up later though as it looks for artifacts and those will no longer be built | 20:35 |
fungi | noted, thanks | 20:35 |
clarkb | we've got plenty of disk on the arm nodepool builder rules that disk being the problem | 20:38 |
clarkb | the last build log indicates success. I suspect that the big centos 8 stream updates taht tristanC[m] and amorelj noticed have resulted in new kernels and we just need to rebuild that image which should happen nowish | 20:39 |
clarkb | we appaer to do weekyl builds and that image is just about a week old. So I think we can probably try to clean up that job on Monday and see if the openafs pacakges build then | 20:39 |
fungi | wfm | 20:48 |
opendevreview | Clark Boylan proposed openstack/project-config master: Remove old infra team puppet testing https://review.opendev.org/c/openstack/project-config/+/912309 | 21:07 |
opendevreview | Clark Boylan proposed opendev/system-config master: Remove old infra team puppet testing https://review.opendev.org/c/opendev/system-config/+/912311 | 21:08 |
clarkb | I'm mostly pushing these changes up for early comment. I don't think we are in a rush yet since centos 7 is still in progress and I expect a lot of other stuff to have random ties to xenial | 21:08 |
clarkb | but I think that is roughly what it will look like for us to stop testing puppet on xenial (and largely at all) in our world) | 21:09 |
clarkb | I think we can use this as a stepping stone to retiring more puppet-* repos too | 21:10 |
clarkb | like puppet-redis? I have no idea what would be using that at this point. Maybe it was for openstackid which we excised | 21:11 |
clarkb | oh and ethercalc etc | 21:12 |
opendevreview | Amy Marrich proposed opendev/irc-meetings master: Moving the meeting up https://review.opendev.org/c/opendev/irc-meetings/+/912314 | 21:53 |
opendevreview | Merged opendev/irc-meetings master: Moving the meeting up https://review.opendev.org/c/opendev/irc-meetings/+/912314 | 22:13 |
clarkb | that reminds me we lose an hour of sleep tomorrow night | 22:26 |
fungi | yes we do | 22:46 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!