fungi | oof, this job doesn't get run very often... https://zuul.opendev.org/t/openstack/builds?job_name=promote-openstack-manuals-developer | 02:43 |
---|---|---|
fungi | looks like it has bitrotted from some change we made to opendev/base-jobs/playbooks/docs/promote.yaml sometime in the past several years | 02:44 |
fungi | 'ansible.utils.unsafe_proxy.AnsibleUnsafeText object' has no attribute 'path' | 02:45 |
fungi | anybody remember what we changed? | 02:45 |
ianw | fungi: good luck | 02:53 |
ianw | fungi / frickler: re working in curl but not ansible, i have serious dejavu about this and reading far to much about 401 auth calls or something | 02:54 |
fungi | ianw: there's an inline comment from you in the playbook talking about forcing basic auth due to the api not doing a 401 | 02:56 |
ianw | i feel like i even filed an ansible bug .. it was something about urllib ... i'm trying to find it :) | 02:57 |
ianw | here's one issue about a cross-site issue -> https://github.com/readthedocs/readthedocs.org/issues/4986 | 02:59 |
ianw | i think maybe i'm thinking of -> https://github.com/go-gitea/gitea/issues/24160 | 03:07 |
ianw | but ... using that python there, that uses urllib | 03:12 |
ianw | i get back a 403 | 03:13 |
ianw | ... maybe cause it's not a POST ... I dunno. but in short urllib v curl has definitely been an issue | 03:14 |
frickler | fungi: this seems related to https://review.opendev.org/c/opendev/base-jobs/+/798945 , afaict that change was never running with a real job, so it might be just wrong or incomplete at least? | 06:01 |
frickler | or to be more precise, the promote-openstack-manuals-developer job never ran on the api-site repo after that change. the page displaying at occurences of that with without the project filter just times out for me | 06:06 |
frickler | ianw: the current status seems to be more like: working in curl + ansible, but not in zuul | 06:07 |
*** elodilles_pto is now known as elodilles | 07:33 | |
ianw | frickler: do you think you'd have time to work with slaweq on holding a node? | 11:12 |
frickler | ianw: to do what? giving access to a held node that includes the opendevci rtd secret sounds questionable | 11:17 |
slaweq | frickler maybe you could then go to that hold node and try to trigger same playbook manually without `no_log: true` to check what the error there really is | 11:20 |
slaweq | I don't need to go there by myself but without knowing what the error is it's hard to fix it :) | 11:21 |
frickler | slaweq: I don't know how to trigger what zuul does manually. I did run the playbook locally on Friday and it worked without an issue | 11:22 |
slaweq | frickler maybe it is just some infra related issue and it works fine from other machines, but will not work from the ci node? | 11:23 |
slaweq | do you think it's worth to try e.g. curl from the hold node? | 11:23 |
ianw | yeah i'm trying to think how to trigger it. does it run exeuctor only anyway | 11:24 |
ianw | ... https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_553/f33c1d8defdb24fc56d4cb0449cccb8df6c17c6c/post/trigger-readthedocs-webhook/553fbd4/job-output.txt | 11:24 |
frickler | yes, I was just checking the latter. no node to hold | 11:24 |
ianw | it doesn't | 11:24 |
ianw | i think that the uri: module does not log the password anyway. but perhaps we could turn off "no_log" and just watch it closely, and if it does, just change the password | 11:26 |
ianw | it might be the easiest way to see what is coming back from ansible | 11:26 |
slaweq | ianw if that would work, that could be maybe the easiest way to go with it | 11:32 |
slaweq | but can we just test with "depends-on" or do we actually need to merge it? | 11:32 |
ianw | this will need to merge because it's in a post pipeline | 11:41 |
ianw | it would be worth logging the rtd_project_name too | 12:00 |
ianw | ok, i think it's ... something to do with noble | 12:13 |
ianw | using "ansible-playbook [core 2.17.5]" | 12:13 |
ianw | when runnign the uri: ping via a noble host, i get "Status code was 403 and not [200]: HTTP Error 403: Forbidden" | 12:14 |
ianw | this means "force_basic_auth" isn't working | 12:14 |
ianw | when i run it locally on my fedora host, it works | 12:15 |
ianw | same playbook, just via noble host and via local connection | 12:17 |
frickler | hmm, that's the same ansible-core version that I was testing with. just on trixie, I can try to test from a noble vm later today | 12:17 |
ianw | frickler: that would be great to confirm. i can't see any ansible bugs relating to this | 12:18 |
ianw | but also, hitting an api that requires setting force_basic_auth from noble might be a corner case restricted to us :) | 12:18 |
ianw | there's https://github.com/ansible/ansible/commit/11efa5c48bc0175770814af8efc446a90d3293c8 to "reduce complexity" | 12:22 |
ianw | changes that reduce complexity are always a bit suspicious :) | 12:23 |
ianw | i've got to turn in. frickler if you can confirm the ping via noble doesn't obey forcing the auth headers, that would be a great data point | 12:23 |
opendevreview | Dmitriy Rabotyagov proposed opendev/system-config master: Add MariaDB community repository to cache mirrors https://review.opendev.org/c/opendev/system-config/+/931882 | 12:32 |
fungi | ianw: frickler: i'm not sure what node we would even hold for that rtd webhook trigger. isn't that task run on the executor? the job seems to be nodeless. we could try reproducing it by running with the ansible venv from an executor though | 12:47 |
frickler | ok, I can reproduce the failure on noble. going to try bookworm for a comparison, then maybe instead of doing a lot of ansible debugging we could just switch the nodeset for that job | 14:23 |
fungi | frickler: what nodeset? the task runs on the executor doesn't it? | 14:24 |
fungi | though i guess we could give it a nonempty nodeset and set the task to the node | 14:25 |
frickler | ah, good point. didn't help though anyway | 14:30 |
frickler | so on my trixie workstation it is working, both with ansible from a venv and with distro ansible. noble and bookworm seem broken. | 14:31 |
frickler | fungi: the job does have a nodeset associated, which was confusing me at first. not sure if that's oversight or whether it is actually used earlier in the job, just not for the trigger | 14:32 |
opendevreview | Karolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 14:38 |
fungi | yeah, generally if we're using a secret for a uri module task we just do that from the executor for simplicity | 14:43 |
opendevreview | Karolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 14:50 |
opendevreview | yatin proposed openstack/project-config master: [neutron-dashboard] drop enforce-scope-old-defaults https://review.opendev.org/c/openstack/project-config/+/934052 | 15:01 |
corvus | https://zuul.opendev.org/t/opendev/build/944e17ac26f841b5a91f07e37ac79de6 is the first successful niz build (and it ran on a niz image too) | 15:39 |
Clark[m] | nice! | 15:40 |
Clark[m] | it looks like my irc bouncer node crashed over the weekend (cloud reports hypervisor had a sad) I think I'm going to take this as an opportunity to do an OS upgrade and so on which means I'll likely be off of IRC proper until that is done | 15:40 |
Clark[m] | I've got the board meeting first then I'll focus on getting that done | 15:40 |
corvus | still missing some significant chunks of functionality, but it means we've got much of the basics in place. | 15:41 |
Clark[m] | in the meantime I can be reached via this matrix account | 15:41 |
slaweq | fungi frickler IIUC the issue with rtd.org is just on the Noble nodes so I can abandon my patches related to this as those won't be needed anymore, right? | 15:44 |
fungi | slaweq: there are no "nodes" involved in the failing task, but yes it seems to be related to something that changed behavior where that task runs on the zuul executors | 15:47 |
fungi | possibly a regression in a specific version of urllib | 15:48 |
Clark[m] | the zuul executors have run under python3.11 for quite some time | 15:52 |
Clark[m] | in a debian bookworm container image (but not using bookworm's python) | 15:53 |
fungi | yeah, i wonder if something has changed with the python3.11-bookworm base image back in september | 15:53 |
fungi | Clark[m]: wait... the ansible run on executors isn't run from a container though is it? looks like we're using /var/lib/zuul/ansible/9 | 16:00 |
corvus | that's in the zuul executor container | 16:01 |
corvus | so in terms of dependencies/libs/etc, it'll be running with whatever bookworm system deps were in the zuul-executor container image build and whatever ansible and python deps were in the ansible venv install created within that container build | 16:02 |
fungi | ah, /var/lib/zuul is mapped from the host into the executor container and then executed from inside the container using the python interpreter in that container? | 16:02 |
corvus | if folks have a local reproducer, it's easy to try with the production container image. 1 sec and i'll make a command | 16:03 |
fungi | (but also under bwrap_ | 16:03 |
corvus | fungi: no, /var/lib/zuul is not on the host at all, it's in the container image | 16:03 |
fungi | huh, why do we map that as a volume in the docker-compose file then? | 16:03 |
fungi | maybe i still don't fully understand how containers work | 16:04 |
corvus | yeah why do we, am i wrong? | 16:04 |
fungi | i was mainly going by the ansible process paths showing up in the host ps output | 16:04 |
fungi | i had assumed we ran them from files outside the container so that zuul-manage-ansible could update the venvs since the container image contents themselves would be read-only | 16:05 |
corvus | i agree that the docker-compose file binds that in... though it should still be running with the python binary in the container | 16:06 |
corvus | yes that makes sense, but i wonder what we need to install... | 16:06 |
corvus | (i'm not going to bother creating a reproducer command right now since it looks like the ansible installed on our executors is not what's in the image) | 16:07 |
fungi | corvus: anyway, as for a local reproducer, it relies on a zuul secret. i think frickler and ianw were running with a copy of the necessay credentials | 16:08 |
Clark[m] | var lib zuul has git repo state and that is probablty why we bind mount? | 16:08 |
Clark[m] | but maybe we need to be mroe specific so that the container data also shows up alongside? | 16:09 |
corvus | fungi: ok i misread, let's back up | 16:10 |
corvus | the /var/lib/zuul/ansible directory bind-mounted from the host into the container is not the zuul installation, it's the staging area for zuul's ansible modules. | 16:10 |
slaweq | fungi ok, thx for confirmation | 16:11 |
corvus | so we are running ansible from the container image, and it would be useful for folks with a local reproducer to try with the container image. | 16:11 |
corvus | fungi: Clark as Clark suggests, we're mostly interested in bind-mounting the git repos, and we get that ansible dir along with it. it's fine, we don't care, and the functionality is no different either way. | 16:12 |
corvus | docker run --rm -it quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zuul/ansible/9/bin/ansible-playbook --version is an example command that will run exactly the ansible (along with debian system) that's on the executors. | 16:14 |
frickler | fyi this is my reproducer, stripped down version of what is in our role. I think ppl like slaweq could also use it with their personal account instead of openstackci to verify https://paste.opendev.org/show/bi9Im6d8mUjuizDaotgp/ | 16:14 |
corvus | (^ that /usr/local/lib/zuul path is the actual ansible installation in the image) | 16:14 |
corvus | yeah, combine a reproducer like that with that docker command i just sent and that should be exactly the conditions in production | 16:15 |
clarkb | Made it back | 17:00 |
clarkb | I appear to have lost scrollback in the process ( can probably manually dig it up if necessary though). If there is anything important I've missed over the weekend please remind me :) | 17:04 |
clarkb | I had to convert apt keys to the new system to make apt hapyp after upgrading the sytstem | 17:08 |
clarkb | not a big deal but also something they could've automated .... | 17:08 |
fungi | keys for third-party repos? | 17:18 |
clarkb | yes | 17:18 |
clarkb | in this case weechat | 17:18 |
fungi | aha | 17:19 |
fungi | hopefully there was an upgrade note letting you know it would be needed | 17:19 |
clarkb | there wasn't but after reenabling the apt source apt update complained and pointed me at the deprecation notice in apt-keys man page | 17:19 |
clarkb | or maybe it was noted in the docs but the docs were huge so I skimmed | 17:19 |
clarkb | I am tempted to move weechat into a conatiner so that I don't have to do the dance of uninstalling it and reinstalling it every debian upgrade but honestly thats straightfowrard so not a big deal | 17:22 |
fungi | is the current version packaged in debian still too old? | 17:24 |
clarkb | probably not but I'm happy sticking with upstream to avoid it becoming too old in a year or whatever | 17:25 |
fungi | looks like bookworm still has 3.8 while trixie is up to 4.4.2 now | 17:25 |
clarkb | I should've checked weechat changelogs though it rewrote a ton of my config automatically with a warning before it did so that I cna't downgrade now | 17:25 |
clarkb | so even if I watned to downgrade it would be painfukl | 17:26 |
fungi | and 4.4.3 was released on wednesday last week | 17:26 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Repair promote-openstack-manuals-developer https://review.opendev.org/c/openstack/project-config/+/934068 | 18:08 |
fungi | frickler: ^ i'm guessing that's what the openstack/api-site promote job needs? | 18:08 |
opendevreview | Clark Boylan proposed opendev/system-config master: Configure native Gerrit log rotation and cleanup https://review.opendev.org/c/opendev/system-config/+/934075 | 18:34 |
clarkb | this is a change that has come out of Gerrit 3.10 upgrade prep. I've actually tracked down a number of TODOs in the etherpad if you want to timeslider and see what I've found on a few topics | 18:35 |
clarkb | I think everything under Areas of Concern has been run down. Next up is the breaking changes list | 18:38 |
clarkb | and now I'm through that list | 19:00 |
opendevreview | Merged openstack/project-config master: Repair promote-openstack-manuals-developer https://review.opendev.org/c/openstack/project-config/+/934068 | 19:12 |
fungi | i'll reenqueue the openstack/api-docs promote jobs to see if that solved the problem | 19:12 |
fungi | no, that change was apparently wrong, https://zuul.opendev.org/t/openstack/build/1a84db5d173c4777b9d730923721b04a is the reenqueued promote and has the same error | 19:24 |
ianw | ok i can reproduce with the container image | 19:41 |
ianw | following playbook in /tmp/test.yaml -> https://paste.opendev.org/show/bUGxGsO4D9OIhO6Qvn5f/ | 19:42 |
fungi | ianw: awesome! i guess the next thing is to identify how ansible's environment there differs | 19:42 |
ianw | podman run --rm -v /tmp/test.yaml:/tmp/test.yaml:Z -it quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zuul/ansible/9/bin/ansible-playbook -i localhost, /tmp/test.yaml | 19:42 |
ianw | "msg": "Status code was 403 and not [200]: HTTP Error 403: Forbidden" | 19:43 |
fungi | so seems like it is indeed the uri module passing credentials differently in that environment. at least we know we were on the right track | 19:43 |
ianw | sigh, "apt-get update" fails in an interactive podman container with setgid() errors | 19:45 |
ianw | leaving me unable to install an editor | 19:46 |
fungi | that needs user namespaces, right? | 19:46 |
fungi | or, no, never mind. it's probably specific caps you need to allow for setuid/setgid calls | 19:46 |
ianw | i ... don't know ... and not sure i want to go down this rabbit hole :/ | 19:47 |
fungi | ianw: fwiw, i usually just track down the path in the host system to the file i want to edit inside the unpacked container image while it's running | 19:47 |
fungi | and run the editor outside the container | 19:48 |
fungi | basically there's a chroot somewhere while the container is running, though i guess if you're only instantiating the container on-demand and it's cleaned up immediately after your command completes, that doesn't help | 19:49 |
ianw | it works as root but not rootless | 19:49 |
fungi | i'd probably invoke a shell session from the container and leave it open, then edit the file(s) in its chroot from the host context | 19:50 |
* fungi uses blunt instruments like `sudo find /var/lib/docker/ -name somefile` to track it down, not sure what the podman equivalent path might be | 19:56 | |
ianw | you are freaking kidding me | 20:13 |
ianw | so i've installed mitmproxy and screen in the container | 20:13 |
ianw | https_proxy=https://localhost:8080 /usr/local/lib/zuul/ansible/9/bin/ansible-playbook -i localhost, /tmp/test.yaml ... mitmproxy intercepts it, POST looks totally fine with auth header and works | 20:14 |
ianw | drop the https_proxy and it fails!!!! | 20:14 |
fungi | this is the definition of a heisenbug, yes? | 20:15 |
clarkb | weekly reminder to update the meeting agenda or let me know of any content you want added/removed/edited | 20:40 |
clarkb | I did some edits last week and I Think it is fairly up to date though | 20:40 |
corvus | maybe we should try compiling it with something other than Borland C++ | 20:43 |
fungi | rewrite it in turbo pascal | 20:44 |
ianw | i can inject a syntaxerror before the urllib_request.urlopen | 20:45 |
ianw | that shows me that there is an auth header set | 20:46 |
ianw | but then it's a black hole :/ | 20:46 |
ianw | it is really really annoying to try and debug ansible modules | 20:46 |
ianw | raise SyntaxError(request.headers) | 20:48 |
ianw | SyntaxError: {'User-agent': 'ansible-httpget', 'Authorization': b' ... | 20:48 |
ianw | as another data point /usr/local/lib/zuul/ansible/8/bin/ansible-playbook -vvv -i localhost, /tmp/test.yaml also fails | 20:50 |
fungi | so ansible 8 and 9? i guess that points to a linked library in common between them | 20:51 |
ianw | ... which is like httplib | 20:51 |
ianw | https://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/urls.py ... it is quite a nightmare | 20:54 |
corvus | ianw: i often write to a file in tmp inside the module | 21:01 |
ianw | ... i give up. that it looks right to mitmproxy, and works going _through_ mitmproxy ... it makes we wonder if it's something cloudflare doesn't like | 21:30 |
ianw | someone else should confirm that i'm not totally on the wrong path that it does indeed fail with the zuul ansible and the playbook above, but you'll need to get the openstackci password to put into it | 21:30 |
clarkb | do we want to put ^ on the meeting agenda? | 23:43 |
clarkb | I'll put it on there since it is interesting | 23:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!