| clarkb | corvus: thanks! | 00:04 |
|---|---|---|
| vlotorev[m] | Is there html archive for messages for opendev and zuul room? I remember there were archives when IRC was used. | 13:56 |
| vlotorev[m] | * Hi, is there html archive for messages for opendev and zuul room? I remember there were archives when IRC was used. | 14:12 |
| fungi | vlotorev[m]: #opendev (oftc irc channel) logs are at https://meetings.opendev.org/irclogs/%23opendev/ | 14:23 |
| fungi | #zuul:opendev.org (matrix room) logs are at https://meetings.opendev.org/irclogs/%23zuul/ | 14:23 |
| opendevreview | Ron Stone proposed openstack/project-config master: Update StarlingX docs promote job for R11 release https://review.opendev.org/c/openstack/project-config/+/966868 | 14:39 |
| fungi | #status notice Zuul job log URLs for storage.*.cloud.ovh.net are temporarily returning an access denied/payment required error, but the provider has been engaged and is working to correct it | 14:48 |
| opendevstatus | fungi: sending notice | 14:48 |
| -opendevstatus- NOTICE: Zuul job log URLs for storage.*.cloud.ovh.net are temporarily returning an access denied/payment required error, but the provider has been engaged and is working to correct it | 14:48 | |
| opendevstatus | fungi: finished sending notice | 14:51 |
| fungi | related discussion took place in #openstack-infra rather than here, but since it looks like it may take a little longer to get corrected on the ovh side i'll push a change to stop uploading logs there temporarily | 14:52 |
| fungi | or maybe not, sounds like it should already be fixed, but i'm still seeing payment required errors at the affected swift content urls | 14:57 |
| fungi | okay, one of the urls i was testing now has content again, so i think we're fine | 15:00 |
| fungi | and you mentioned the new voucher was added on 2025-11-10, which was 7 days after we were notified the credits ran out, but that was apparently not fast enough to avoid suspension | 15:26 |
| fungi | er, wrong channel, sorry! | 15:26 |
| clarkb | fungi: reading the scrollback in the -infra channel I guess we don't have to worry about suspension of that account removing resources (like mirrors) that we now need to rebuild because those are hosted in the other account? | 15:47 |
| clarkb | fungi: but also it looks like the swift data wasn't removed just access to it while things got sorted out? | 15:47 |
| fungi | yes | 15:47 |
| fungi | i also tested that the mirrors were still reachable | 15:47 |
| clarkb | ya I just checked them via ping and my browser and they are both still there | 15:48 |
| fungi | i don't know yet whether any jobs failed with log upload problems, though can assume some promote jobs or speculative container builds may have failed to fetch dependency artifacts | 15:48 |
| clarkb | ya I think we can probably live with that. I was mostly worried we'd be reenabling a cloud that isn't functional for new jobs | 15:49 |
| opendevreview | Merged openstack/project-config master: Update StarlingX docs promote job for R11 release https://review.opendev.org/c/openstack/project-config/+/966868 | 16:06 |
| mnasiadka | I think there’s something wrong with VexxHost region ,,caching proxy’’ - see https://zuul.opendev.org/t/openstack/build/90515811e0e14734af1c2d7733259341/log/primary/logs/ansible/pull#790 | 16:11 |
| clarkb | mnasiadka: looks like that responded with a 403 forbidden | 16:12 |
| clarkb | whcih could be the response from the upstream source that we've forwarded or some issue with the proxy | 16:12 |
| mnasiadka | Well, the repository in quay.io is public, let me try to pull that image on my laptop | 16:13 |
| fungi | yeah, odds are quay responded to our proxy's request with a 403 forbidden, possibly because they think that proxy isn't a real client | 16:14 |
| mnasiadka | I can try to stop using that proxy as docker/podman registry mirror in Kolla-Ansible | 16:14 |
| mnasiadka | Yeah, pulling on my laptop works | 16:14 |
| clarkb | mnasiadka: `403 282 cache miss: attempting entity save` from the apache logs for those requests | 16:15 |
| clarkb | I think that means the data was not in the cache and we checked upstream and got the 403 then attempt to insert the 282 byte 403 response into the cache | 16:16 |
| mnasiadka | Most probably, maybe it’s time to stop using cache/proxy for podman/docker - giving https://review.opendev.org/c/openstack/kolla-ansible/+/966880 a go - let’s see if that fixes things | 16:17 |
| clarkb | mnasiadka: I notice there is some sort of auth material in the url string. Maybe that token needs to be refreshed or something | 16:17 |
| mnasiadka | There shouldn’t be any auth material, this is purely public repository at quay.io | 16:18 |
| mnasiadka | Maybe it’s some docker bug (again) | 16:18 |
| clarkb | `200 32120 cache hit` is what a successful response looks like (for the same url even just several hours prior) | 16:18 |
| clarkb | mnasiadka: docker uses anonymous authentication. So even publicly available data requires you to make an auth request to get a token that is then used to download the data | 16:19 |
| clarkb | I think it is done this way for tracking purposes but if things go beyond timeouts then maybe we hit problems like this? | 16:19 |
| fungi | one of the most absurd protocol designs i've ever seen, btw | 16:19 |
| mnasiadka | Oh boy, docker-ce 29.0.0 | 16:20 |
| * mnasiadka feels trouble | 16:20 | |
| mnasiadka | Anyway, thanks for checking | 16:20 |
| clarkb | the user agent is docker-sdk-python/7.1.0 fwiw | 16:21 |
| clarkb | not sure if or how that aligns with docker-ce 29 | 16:21 |
| mnasiadka | I think Ansible uses docker-py for some initial connection and then it uses docker engine to pull the image - I’ll see which one fixes it - stopping usage of proxy or pinning docker-ce to <29 | 16:23 |
| clarkb | mnasiadka: [2025-11-12 11:48:21.480] last successful request and [2025-11-12 12:08:34.755] first 403 | 16:26 |
| clarkb | not sure if those timestamps help | 16:26 |
| mnasiadka | clarkb: I think that’s the moment the 29.0.0 was rolled out, looking at various release metadata | 16:31 |
| *** rcruise_ is now known as rcruise | 16:35 | |
| clarkb | ok cool more evidence that is the source of the issue | 16:46 |
| clarkb | infra-root I've just tested using ssh -D 1080 to gitea-lb03 then setting up my firefox browser to proxy requests to gitea09-13.opendev.org through that socks proxy using a simple configuration script and that all works as expected | 16:48 |
| clarkb | I think this was the last thing I personally wanted to check before upgrading gitea. Let me know if there are other concerns with the gitea upgrade or the change to do the upgrade. But I suspect that if we're happy with the new release we can proceed with the upgrade | 16:49 |
| clarkb | I should be able to monitor that today too | 16:49 |
| fungi | i'm heading out to run some lunch errands, should hopefully be back around 19:00 utc | 17:15 |
| opendevreview | Merged zuul/zuul-jobs master: promote-docker-image: some notes on manual replication https://review.opendev.org/c/zuul/zuul-jobs/+/872237 | 17:22 |
| opendevreview | Jan Gutter proposed zuul/zuul-jobs master: Update ensure-helm role to add more functionality https://review.opendev.org/c/zuul/zuul-jobs/+/962794 | 18:21 |
| clarkb | https://158.69.67.86/opendev/system-config is the held gitea 1.25.1 node. https://review.opendev.org/c/opendev/system-config/+/965960 is the upgrade change which includes a link to the changelog | 18:59 |
| clarkb | reviews very much appreciated | 18:59 |
| fungi | clarkb: the dockerfile diff on 965960 brings it in sync with an upstream dockerfile? | 20:08 |
| fungi | mostly wondering about the strange bash autocomplete module that was previously referenced but now cleaned up | 20:09 |
| fungi | i'm around to help keep an eye on the gitea upgrade now if someone else is | 20:11 |
| clarkb | fungi: yes upstream dropped those files. I did not dig in further to see if they were copied by some other process or removed completely | 20:17 |
| fungi | are you good with trying to upgrade nowish or want to wait until you've had lunch? | 20:18 |
| clarkb | fungi: it should take about an hour to gate the cahnge so now should be fine | 20:20 |
| fungi | okay, approved. fire in the hole | 20:21 |
| clarkb | fungi: I see a +2 but not approval | 20:22 |
| fungi | ah yeah, i did alt+2 instead of alt+3, fixed now | 20:23 |
| opendevreview | Piotr Parczewski proposed zuul/zuul-jobs master: Drop Python 2 support https://review.opendev.org/c/zuul/zuul-jobs/+/966977 | 21:11 |
| fungi | clarkb: looks like james denton replied to your message a few minutes ago and we have more quota in iad3 now. i'll work on a change to bring it back in | 21:28 |
| fungi | i'm trying to bring the mirror there back online now too | 21:33 |
| fungi | er, s/there/in dfw3/ | 21:33 |
| clarkb | we should double check the limits report first but that is great | 21:34 |
| clarkb | fungi: the email doesn't address the dfw3 mirror fwiw | 21:34 |
| clarkb | (I don't think I sent email about that but maybe we can should) | 21:34 |
| fungi | openstack console log show indicates openafs-client.service is taking a while trying to start | 21:34 |
| fungi | but at least it's not stuck in error | 21:35 |
| clarkb | fungi: if you're able to get a console log that is better than what I had | 21:35 |
| fungi | and it came up! | 21:35 |
| clarkb | but ya I dont' think that email indicates anything about fixing that mirror so if it is better that is by chance | 21:35 |
| clarkb | fungi: the next thing is to see if the afs cache is functional. Maybe we need to rerun the fs flush command thing or manually clear it | 21:35 |
| fungi | /dev/mapper/main-afscache on /var/cache/openafs type ext4 (rw,relatime,nobarrier,errors=remount-ro) | 21:35 |
| fungi | it did manage to mount the volume at least | 21:35 |
| clarkb | limits do appear to have been increased in iad3 | 21:36 |
| clarkb | fungi: I think the errors were in syslog/dmesg before it went down hard | 21:36 |
| clarkb | about the specific blocks being sad | 21:36 |
| fungi | i'm able to `cat /afs/openstack.org/mirror/debian/dists/trixie/Release` from the server | 21:36 |
| fungi | that's new since it first went offline | 21:37 |
| fungi | no afs-related errors in dmesg output at the moment | 21:37 |
| clarkb | so probably some underlying cloud issue preventing access to the data that aws subsequently addressed and now things are happy again | 21:39 |
| opendevreview | Jeremy Stanley proposed opendev/zuul-providers master: Increase quota in Rackspace Flex IAD3 https://review.opendev.org/c/opendev/zuul-providers/+/966978 | 21:39 |
| opendevreview | Jeremy Stanley proposed opendev/zuul-providers master: Reenable Rackspace Flex DFW3 https://review.opendev.org/c/opendev/zuul-providers/+/966979 | 21:42 |
| fungi | i see we were doing 75 instances in dfw3 but 60 in sjc3 | 21:42 |
| corvus | is there a core limit? i think clarkb mentioned that a while back, but i'm looking at the graphs and i don't see one there which means i didn't see zuul seeing it... | 21:46 |
| clarkb | yes we are limited by memory, core count, and instances | 21:47 |
| clarkb | I think instances is higher than memory and core counts can support with the flavors we're choosing | 21:47 |
| clarkb | I've already closed that window for iad3 let me pull the info up again | 21:47 |
| corvus | oh, huh i see it in graphite now. neat. i'll update the graphs. | 21:48 |
| fungi | should i adjust those two changes to reflect? | 21:48 |
| clarkb | looks like iad3 can support 94 instances by memory. 100 by cores and instances quota | 21:49 |
| opendevreview | James E. Blair proposed openstack/project-config master: Grafana: add core graphs to rackspace-flex https://review.opendev.org/c/openstack/project-config/+/966981 | 21:49 |
| corvus | maybe we should set it to 90? | 21:50 |
| clarkb | corvus: do you need to run the script to generate the new output and add that in ^? | 21:50 |
| clarkb | corvus: ya I think 90 should be safe based on my napkin math | 21:50 |
| opendevreview | James E. Blair proposed openstack/project-config master: Grafana: add core graphs to rackspace-flex https://review.opendev.org/c/openstack/project-config/+/966981 | 21:50 |
| corvus | clarkb: yes :) | 21:50 |
| fungi | clarkb: corvus: in all 3 regions, or just iad3? | 21:50 |
| clarkb | SJC3 is more constrained and its limit is 64 by memory | 21:51 |
| fungi | okay, so 60 makes sense there i guess | 21:51 |
| clarkb | fungi: DFW3 and IAD3 have quotas that would support 90. SJC3 does not | 21:51 |
| corvus | 90 90 60 sgtm | 21:51 |
| fungi | so we should lower dfw from 75 to 60, raise iad3 and sjc3 from 60 to 90? | 21:52 |
| clarkb | fungi: no raise dfw from 75 to 90. Keep sjc3 at 60. Bump iad3 to 90 | 21:52 |
| fungi | ah, okay | 21:53 |
| opendevreview | Jeremy Stanley proposed opendev/zuul-providers master: Reenable Rackspace Flex DFW3 https://review.opendev.org/c/opendev/zuul-providers/+/966979 | 21:54 |
| clarkb | the gitea upgrade should be merging soon | 21:54 |
| opendevreview | Jeremy Stanley proposed opendev/zuul-providers master: Increase quota in Rackspace Flex IAD3 https://review.opendev.org/c/opendev/zuul-providers/+/966978 | 21:55 |
| fungi | okay, i think that covers the instances adjustments now too | 21:55 |
| clarkb | I've got my socks proxy running and system load looks reasonable across the backends | 21:55 |
| clarkb | I just realized that the git clone check won't work with socks proxy without some magic | 21:56 |
| clarkb | so I may need a direct port forward to test that too | 21:56 |
| clarkb | GIT_SSL_NO_VERIFY=1 git clone https://localhost:portforwardport/opendev/sytem-config currently works so I can use that approach post upgrade | 21:58 |
| opendevreview | Merged opendev/zuul-providers master: Reenable Rackspace Flex DFW3 https://review.opendev.org/c/opendev/zuul-providers/+/966979 | 21:59 |
| opendevreview | Merged opendev/zuul-providers master: Increase quota in Rackspace Flex IAD3 https://review.opendev.org/c/opendev/zuul-providers/+/966978 | 22:01 |
| clarkb | gitea deployment will be behind the hourly deployments at this point | 22:02 |
| fungi | and then some | 22:12 |
| clarkb | it should be done soon. I checked a few minutes ago and it was doing testinfra testing finally | 22:13 |
| fungi | yeah, logs are uploading now | 22:16 |
| opendevreview | Merged opendev/system-config master: Update Gitea to 1.25 https://review.opendev.org/c/opendev/system-config/+/965960 | 22:17 |
| clarkb | here we go | 22:17 |
| fungi | system-config-promote-image-gitea is failing? | 22:18 |
| clarkb | image promotion is failing | 22:18 |
| fungi | "Task Get dockerhub JWT token failed running on host localhost" | 22:19 |
| clarkb | possibly related to the new release maybe? | 22:19 |
| fungi | yeah, we no_log that so no additional detail to go on sadly | 22:19 |
| clarkb | https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/promote-docker-image/tasks/main.yaml#L11 is the code that we're running. It does the equivalent of a curl request to get a token | 22:20 |
| clarkb | I can try reproducing that locally using my personal credentials | 22:20 |
| corvus | https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/promote-docker-image/tasks/promote-retag-inner.yaml | 22:20 |
| corvus | jan recently reminded us that ianw wrote some helpful comments | 22:21 |
| clarkb | cool thanks. The curl command there looks different than what we do to get the token in ansible though | 22:21 |
| corvus | yeah, i think that's focused on the next steps after the login that failed this time | 22:22 |
| clarkb | the credentials in my password store are not working for login to docker hub | 22:24 |
| clarkb | part of me wonders if they are just having an auth problem | 22:25 |
| corvus | i will try mine | 22:25 |
| corvus | "docker login" worked; trying with curl | 22:26 |
| clarkb | interesting. My error message even said "please try again later" I wonder if they expired my account or something | 22:26 |
| clarkb | trying to reset my password hits the same error | 22:27 |
| fungi | project-config-grafana for 966981 hit an unspecified apt cache update error (thanks ansible) too, on noble | 22:29 |
| corvus | got a token through curl | 22:29 |
| clarkb | I have managed to confirm that the email address I supplied is the one mapped to the account looking in email so I don't think I've got the wrong account details | 22:29 |
| clarkb | corvus: I'm half wondering if we simply reenqueue the buildset if we might get better luck this time | 22:30 |
| corvus | yeah worth a try | 22:30 |
| corvus | i'm logged in, i'll do it | 22:30 |
| clarkb | thanks | 22:30 |
| corvus | https://zuul.opendev.org/t/openstack/status/pipeline/deploy | 22:31 |
| clarkb | success this time | 22:31 |
| clarkb | so ya I suspect there is some low rate auth failure problem or maybe it is specific to some cdn backend | 22:31 |
| corvus | chaos monkey active | 22:32 |
| clarkb | ok gitea09 should be up now. I'm going to test it | 22:34 |
| fungi | fun | 22:34 |
| clarkb | web ui seems to work and my git clone through port forward is also working | 22:35 |
| clarkb | 10 is upgraded now too. Definitely a few more steps to keep an eye on progress and check functionality now that the firewall rules are more aggressive but that isn't preventing me from doing any of the typical checks I would do. | 22:36 |
| clarkb | all of the backends are upgraded at this point | 22:44 |
| clarkb | git clone tests can hit the load balancer now if anyone else wants to test and regardless of backend you'll get the new code | 22:44 |
| clarkb | and I'll look at tracking down evidence of replication as soon as the deployment job is compelte | 22:44 |
| clarkb | https://zuul.opendev.org/t/openstack/buildset/a8f573ebfdbb4e069721ffc4ee74dce0 deployment is a success here | 22:45 |
| fungi | i cloned bindep just fine | 22:45 |
| clarkb | https://www.dockerstatus.com/ shows the issue now too | 22:45 |
| clarkb | it didn't when I first checked so we must've caught this rpetty early | 22:45 |
| fungi | seems so | 22:46 |
| clarkb | no new patchsets since all backends were updated | 22:47 |
| clarkb | fungi: re mirror.dfw3.raxflex.opendev.org I put it in the emergency file while it was shutdown. We should remove it now that it s back up again | 22:47 |
| fungi | oh, doing now | 22:48 |
| fungi | and done | 22:49 |
| clarkb | thanks | 22:49 |
| clarkb | now I'm digging around to see if I have any changes I can push | 22:51 |
| clarkb | nevermind https://review.opendev.org/c/x/tobiko/+/964443 has a new patchset | 22:51 |
| clarkb | and https://opendev.org/x/tobiko/commit/c0e7066105bb3561858eb4b97f3438963a7da7ab loads after clicking the gitea button in gerrit so I think replication is working | 22:52 |
| clarkb | I think this is looking good for the moment | 22:59 |
| clarkb | I'll plan to approve the two gerrit changes related to fixing the bind mount and bumping to the latest bugfix release tomorrow morning so that everything is ready to go later in the day when tonyb is around | 23:00 |
| fungi | sounds good, i should be around all day too | 23:32 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!