Wednesday, 2025-11-12

clarkbcorvus: thanks!00:04
vlotorev[m]Is there html archive for messages for opendev and zuul room? I remember there were archives when IRC was used.13:56
vlotorev[m]* Hi, is there html archive for messages for opendev and zuul room? I remember there were archives when IRC was used.14:12
fungivlotorev[m]: #opendev (oftc irc channel) logs are at https://meetings.opendev.org/irclogs/%23opendev/14:23
fungi#zuul:opendev.org (matrix room) logs are at https://meetings.opendev.org/irclogs/%23zuul/14:23
opendevreviewRon Stone proposed openstack/project-config master: Update StarlingX docs promote job for R11 release  https://review.opendev.org/c/openstack/project-config/+/96686814:39
fungi#status notice Zuul job log URLs for storage.*.cloud.ovh.net are temporarily returning an access denied/payment required error, but the provider has been engaged and is working to correct it14:48
opendevstatusfungi: sending notice14:48
-opendevstatus- NOTICE: Zuul job log URLs for storage.*.cloud.ovh.net are temporarily returning an access denied/payment required error, but the provider has been engaged and is working to correct it14:48
opendevstatusfungi: finished sending notice14:51
fungirelated discussion took place in #openstack-infra rather than here, but since it looks like it may take a little longer to get corrected on the ovh side i'll push a change to stop uploading logs there temporarily14:52
fungior maybe not, sounds like it should already be fixed, but i'm still seeing payment required errors at the affected swift content urls14:57
fungiokay, one of the urls i was testing now has content again, so i think we're fine15:00
fungiand you mentioned the new voucher was added on 2025-11-10, which was 7 days after we were notified the credits ran out, but that was apparently not fast enough to avoid suspension15:26
fungier, wrong channel, sorry!15:26
clarkbfungi: reading the scrollback in the -infra channel I guess we don't have to worry about suspension of that account removing resources (like mirrors) that we now need to rebuild because those are hosted in the other account?15:47
clarkbfungi: but also it looks like the swift data wasn't removed just access to it while things got sorted out?15:47
fungiyes15:47
fungii also tested that the mirrors were still reachable15:47
clarkbya I just checked them via ping and my browser and they are both still there15:48
fungii don't know yet whether any jobs failed with log upload problems, though can assume some promote jobs or speculative container builds may have failed to fetch dependency artifacts15:48
clarkbya I think we can probably live with that. I was mostly worried we'd be reenabling a cloud that isn't functional for new jobs15:49
opendevreviewMerged openstack/project-config master: Update StarlingX docs promote job for R11 release  https://review.opendev.org/c/openstack/project-config/+/96686816:06
mnasiadkaI think there’s something wrong with VexxHost region ,,caching proxy’’ - see https://zuul.opendev.org/t/openstack/build/90515811e0e14734af1c2d7733259341/log/primary/logs/ansible/pull#79016:11
clarkbmnasiadka: looks like that responded with a 403 forbidden16:12
clarkbwhcih could be the response from the upstream source that we've forwarded or some issue with the proxy16:12
mnasiadkaWell, the repository in quay.io is public, let me try to pull that image on my laptop16:13
fungiyeah, odds are quay responded to our proxy's request with a 403 forbidden, possibly because they think that proxy isn't a real client16:14
mnasiadkaI can try to stop using that proxy as docker/podman registry mirror in Kolla-Ansible16:14
mnasiadkaYeah, pulling on my laptop works16:14
clarkbmnasiadka: `403 282 cache miss: attempting entity save` from the apache logs for those requests16:15
clarkbI think that means the data was not in the cache and we checked upstream and got the 403 then attempt to insert the 282 byte 403 response into the cache16:16
mnasiadkaMost probably, maybe it’s time to stop using cache/proxy for podman/docker - giving https://review.opendev.org/c/openstack/kolla-ansible/+/966880 a go - let’s see if that fixes things16:17
clarkbmnasiadka: I notice there is some sort of auth material in the url string. Maybe that token needs to be refreshed or something16:17
mnasiadkaThere shouldn’t be any auth material, this is purely public repository at quay.io16:18
mnasiadkaMaybe it’s some docker bug (again)16:18
clarkb`200 32120 cache hit` is what a successful response looks like (for the same url even just several hours prior)16:18
clarkbmnasiadka: docker uses anonymous authentication. So even publicly available data requires you to make an auth request to get a token that is then used to download the data16:19
clarkbI think it is done this way for tracking purposes but if things go beyond timeouts then maybe we hit problems like this?16:19
fungione of the most absurd protocol designs i've ever seen, btw16:19
mnasiadkaOh boy, docker-ce 29.0.016:20
* mnasiadka feels trouble16:20
mnasiadkaAnyway, thanks for checking16:20
clarkbthe user agent is docker-sdk-python/7.1.0 fwiw16:21
clarkbnot sure if or how that aligns with docker-ce 2916:21
mnasiadkaI think Ansible uses docker-py for some initial connection and then it uses docker engine to pull the image - I’ll see which one fixes it - stopping usage of proxy or pinning docker-ce to <2916:23
clarkbmnasiadka: [2025-11-12 11:48:21.480] last successful request and [2025-11-12 12:08:34.755] first 40316:26
clarkbnot sure if those timestamps help16:26
mnasiadkaclarkb: I think that’s the moment the 29.0.0 was rolled out, looking at various release metadata16:31
*** rcruise_ is now known as rcruise16:35
clarkbok cool more evidence that is the source of the issue16:46
clarkbinfra-root I've just tested using ssh -D 1080 to gitea-lb03 then setting up my firefox browser to proxy requests to gitea09-13.opendev.org through that socks proxy using a simple configuration script and that all works as expected16:48
clarkbI think this was the last thing I personally wanted to check before upgrading gitea. Let me know if there are other concerns with the gitea upgrade or the change to do the upgrade. But I suspect that if we're happy with the new release we can proceed with the upgrade16:49
clarkbI should be able to monitor that today too16:49
fungii'm heading out to run some lunch errands, should hopefully be back around 19:00 utc17:15
opendevreviewMerged zuul/zuul-jobs master: promote-docker-image: some notes on manual replication  https://review.opendev.org/c/zuul/zuul-jobs/+/87223717:22
opendevreviewJan Gutter proposed zuul/zuul-jobs master: Update ensure-helm role to add more functionality  https://review.opendev.org/c/zuul/zuul-jobs/+/96279418:21
clarkbhttps://158.69.67.86/opendev/system-config is the held gitea 1.25.1 node. https://review.opendev.org/c/opendev/system-config/+/965960 is the upgrade change which includes a link to the changelog18:59
clarkbreviews very much appreciated18:59
fungiclarkb: the dockerfile diff on 965960 brings it in sync with an upstream dockerfile?20:08
fungimostly wondering about the strange bash autocomplete module that was previously referenced but now cleaned up20:09
fungii'm around to help keep an eye on the gitea upgrade now if someone else is20:11
clarkbfungi: yes upstream dropped those files. I did not dig in further to see if they were copied by some other process or removed completely20:17
fungiare you good with trying to upgrade nowish or want to wait until you've had lunch?20:18
clarkbfungi: it should take about an hour to gate the cahnge so now should be fine20:20
fungiokay, approved. fire in the hole20:21
clarkbfungi: I see a +2 but not approval20:22
fungiah yeah, i did alt+2 instead of alt+3, fixed now20:23
opendevreviewPiotr Parczewski proposed zuul/zuul-jobs master: Drop Python 2 support  https://review.opendev.org/c/zuul/zuul-jobs/+/96697721:11
fungiclarkb: looks like james denton replied to your message a few minutes ago and we have more quota in iad3 now. i'll work on a change to bring it back in21:28
fungii'm trying to bring the mirror there back online now too21:33
fungier, s/there/in dfw3/21:33
clarkbwe should double check the limits report first but that is great21:34
clarkbfungi: the email doesn't address the dfw3 mirror fwiw21:34
clarkb(I don't think I sent email about that but maybe we can should)21:34
fungiopenstack console log show indicates openafs-client.service is taking a while trying to start21:34
fungibut at least it's not stuck in error21:35
clarkbfungi: if you're able to get a console log that is better than what I had21:35
fungiand it came up!21:35
clarkbbut ya I dont' think that email indicates anything about fixing that mirror so if it is better that is by chance21:35
clarkbfungi: the next thing is to see if the afs cache is functional. Maybe we need to rerun the fs flush command thing or manually clear it21:35
fungi/dev/mapper/main-afscache on /var/cache/openafs type ext4 (rw,relatime,nobarrier,errors=remount-ro)21:35
fungiit did manage to mount the volume at least21:35
clarkblimits do appear to have been increased in iad321:36
clarkbfungi: I think the errors were in syslog/dmesg before it went down hard21:36
clarkbabout the specific blocks being sad21:36
fungii'm able to `cat /afs/openstack.org/mirror/debian/dists/trixie/Release` from the server21:36
fungithat's new since it first went offline21:37
fungino afs-related errors in dmesg output at the moment21:37
clarkbso probably some underlying cloud issue preventing access to the data that aws subsequently addressed and now things are happy again21:39
opendevreviewJeremy Stanley proposed opendev/zuul-providers master: Increase quota in Rackspace Flex IAD3  https://review.opendev.org/c/opendev/zuul-providers/+/96697821:39
opendevreviewJeremy Stanley proposed opendev/zuul-providers master: Reenable Rackspace Flex DFW3  https://review.opendev.org/c/opendev/zuul-providers/+/96697921:42
fungii see we were doing 75 instances in dfw3 but 60 in sjc321:42
corvusis there a core limit?  i think clarkb mentioned that a while back, but i'm looking at the graphs and i don't see one there which means i didn't see zuul seeing it...21:46
clarkbyes we are limited by memory, core count, and instances21:47
clarkbI think instances is higher than memory and core counts can support with the flavors we're choosing21:47
clarkbI've already closed that window for iad3 let me pull the info up again21:47
corvusoh, huh i see it in graphite now.  neat.  i'll update the graphs.21:48
fungishould i adjust those two changes to reflect?21:48
clarkblooks like iad3 can support 94 instances by memory. 100 by cores and instances quota21:49
opendevreviewJames E. Blair proposed openstack/project-config master: Grafana: add core graphs to rackspace-flex  https://review.opendev.org/c/openstack/project-config/+/96698121:49
corvusmaybe we should set it to 90?21:50
clarkbcorvus: do you need to run the script to generate the new output and add that in ^?21:50
clarkbcorvus: ya I think 90 should be safe based on my napkin math21:50
opendevreviewJames E. Blair proposed openstack/project-config master: Grafana: add core graphs to rackspace-flex  https://review.opendev.org/c/openstack/project-config/+/96698121:50
corvusclarkb: yes :)21:50
fungiclarkb: corvus: in all 3 regions, or just iad3?21:50
clarkbSJC3 is more constrained and its limit is 64 by memory21:51
fungiokay, so 60 makes sense there i guess21:51
clarkbfungi: DFW3 and IAD3 have quotas that would support 90. SJC3 does not21:51
corvus90 90 60 sgtm21:51
fungiso we should lower dfw from 75 to 60, raise iad3 and sjc3 from 60 to 90?21:52
clarkbfungi: no raise dfw from 75 to 90. Keep sjc3 at 60. Bump iad3 to 9021:52
fungiah, okay21:53
opendevreviewJeremy Stanley proposed opendev/zuul-providers master: Reenable Rackspace Flex DFW3  https://review.opendev.org/c/opendev/zuul-providers/+/96697921:54
clarkbthe gitea upgrade should be merging soon21:54
opendevreviewJeremy Stanley proposed opendev/zuul-providers master: Increase quota in Rackspace Flex IAD3  https://review.opendev.org/c/opendev/zuul-providers/+/96697821:55
fungiokay, i think that covers the instances adjustments now too21:55
clarkbI've got my socks proxy running and system load looks reasonable across the backends21:55
clarkbI just realized that the git clone check won't work with socks proxy without some magic21:56
clarkbso I may need a direct port forward to test that too21:56
clarkbGIT_SSL_NO_VERIFY=1 git clone https://localhost:portforwardport/opendev/sytem-config currently works so I can use that approach post upgrade21:58
opendevreviewMerged opendev/zuul-providers master: Reenable Rackspace Flex DFW3  https://review.opendev.org/c/opendev/zuul-providers/+/96697921:59
opendevreviewMerged opendev/zuul-providers master: Increase quota in Rackspace Flex IAD3  https://review.opendev.org/c/opendev/zuul-providers/+/96697822:01
clarkbgitea deployment will be behind the hourly deployments at this point22:02
fungiand then some22:12
clarkbit should be done soon. I checked a few minutes ago and it was doing testinfra testing finally22:13
fungiyeah, logs are uploading now22:16
opendevreviewMerged opendev/system-config master: Update Gitea to 1.25  https://review.opendev.org/c/opendev/system-config/+/96596022:17
clarkbhere we go22:17
fungisystem-config-promote-image-gitea is failing?22:18
clarkbimage promotion is failing22:18
fungi"Task Get dockerhub JWT token failed running on host localhost"22:19
clarkbpossibly related to the new release maybe?22:19
fungiyeah, we no_log that so no additional detail to go on sadly22:19
clarkbhttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/promote-docker-image/tasks/main.yaml#L11 is the code that we're running. It does the equivalent of a curl request to get a token22:20
clarkbI can try reproducing that locally using my personal credentials22:20
corvushttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/promote-docker-image/tasks/promote-retag-inner.yaml22:20
corvusjan recently reminded us that ianw wrote some helpful comments22:21
clarkbcool thanks. The curl command there looks different than what we do to get the token in ansible though22:21
corvusyeah, i think that's focused on the next steps after the login that failed this time22:22
clarkbthe credentials in my password store are not working for login to docker hub22:24
clarkbpart of me wonders if they are just having an auth problem22:25
corvusi will try mine22:25
corvus"docker login" worked; trying with curl22:26
clarkbinteresting. My error message even said "please try again later" I wonder if they expired my account or something22:26
clarkbtrying to reset my password hits the same error22:27
fungiproject-config-grafana for 966981 hit an unspecified apt cache update error (thanks ansible) too, on noble22:29
corvusgot a token through curl22:29
clarkbI have managed to confirm that the email address I supplied is the one mapped to the account looking in email so I don't think I've got the wrong account details22:29
clarkbcorvus: I'm half wondering if we simply reenqueue the buildset if we might get better luck this time22:30
corvusyeah worth a try22:30
corvusi'm logged in, i'll do it22:30
clarkbthanks22:30
corvushttps://zuul.opendev.org/t/openstack/status/pipeline/deploy22:31
clarkbsuccess this time22:31
clarkbso ya I suspect there is some low rate auth failure problem or maybe it is specific to some cdn backend22:31
corvuschaos monkey active22:32
clarkbok gitea09 should be up now. I'm going to test it22:34
fungifun22:34
clarkbweb ui seems to work and my git clone through port forward is also working22:35
clarkb10 is upgraded now too. Definitely a few more steps to keep an eye on progress and check functionality now that the firewall rules are more aggressive but that isn't preventing me from doing any of the typical checks I would do.22:36
clarkball of the backends are upgraded at this point22:44
clarkbgit clone tests can hit the load balancer now if anyone else wants to test and regardless of backend you'll get the new code22:44
clarkband I'll look at tracking down evidence of replication as soon as the deployment job is compelte22:44
clarkbhttps://zuul.opendev.org/t/openstack/buildset/a8f573ebfdbb4e069721ffc4ee74dce0 deployment is a success here22:45
fungii cloned bindep just fine22:45
clarkbhttps://www.dockerstatus.com/ shows the issue now too22:45
clarkbit didn't when I first checked so we must've caught this rpetty early22:45
fungiseems so22:46
clarkbno new patchsets since all backends were updated22:47
clarkbfungi: re mirror.dfw3.raxflex.opendev.org I put it in the emergency file while it was shutdown. We should remove it now that it s back up again22:47
fungioh, doing now22:48
fungiand done22:49
clarkbthanks22:49
clarkbnow I'm digging around to see if I have any changes I can push22:51
clarkbnevermind https://review.opendev.org/c/x/tobiko/+/964443 has a new patchset22:51
clarkband https://opendev.org/x/tobiko/commit/c0e7066105bb3561858eb4b97f3438963a7da7ab loads after clicking the gitea button in gerrit so I think replication is working22:52
clarkbI think this is looking good for the moment22:59
clarkbI'll plan to approve the two gerrit changes related to fixing the bind mount and bumping to the latest bugfix release tomorrow morning so that everything is ready to go later in the day when tonyb is around23:00
fungisounds good, i should be around all day too23:32

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!