Monday, 2024-11-04

fungioof, this job doesn't get run very often... https://zuul.opendev.org/t/openstack/builds?job_name=promote-openstack-manuals-developer02:43
fungilooks like it has bitrotted from some change we made to opendev/base-jobs/playbooks/docs/promote.yaml sometime in the past several years02:44
fungi'ansible.utils.unsafe_proxy.AnsibleUnsafeText object' has no attribute 'path'02:45
fungianybody remember what we changed?02:45
ianwfungi: good luck02:53
ianwfungi / frickler: re working in curl but not ansible, i have serious dejavu about this and reading far to much about 401 auth calls or something02:54
fungiianw: there's an inline comment from you in the playbook talking about forcing basic auth due to the api not doing a 40102:56
ianwi feel like i even filed an ansible bug .. it was something about urllib ... i'm trying to find it :)02:57
ianwhere's one issue about a cross-site issue -> https://github.com/readthedocs/readthedocs.org/issues/498602:59
ianwi think maybe i'm thinking of -> https://github.com/go-gitea/gitea/issues/2416003:07
ianwbut ... using that python there, that uses urllib03:12
ianwi get back a 40303:13
ianw... maybe cause it's not a POST ... I dunno.  but in short urllib v curl has definitely been an issue 03:14
fricklerfungi: this seems related to https://review.opendev.org/c/opendev/base-jobs/+/798945 , afaict that change was never running with a real job, so it might be just wrong or incomplete at least?06:01
frickleror to be more precise, the promote-openstack-manuals-developer job never ran on the api-site repo after that change. the page displaying at occurences of that with without the project filter just times out for me06:06
fricklerianw: the current status seems to be more like: working in curl + ansible, but not in zuul06:07
*** elodilles_pto is now known as elodilles07:33
ianwfrickler: do you think you'd have time to work with slaweq on holding a node?  11:12
fricklerianw: to do what? giving access to a held node that includes the opendevci rtd secret sounds questionable11:17
slaweqfrickler maybe you could then go to that hold node and try to trigger same playbook manually without `no_log: true` to check what the error there really is11:20
slaweqI don't need to go there by myself but without knowing what the error is it's hard to fix it :)11:21
fricklerslaweq: I don't know how to trigger what zuul does manually. I did run the playbook locally on Friday and it worked without an issue11:22
slaweqfrickler maybe it is just some infra related issue and it works fine from other machines, but will not work from the ci node?11:23
slaweqdo you think it's worth to try e.g. curl from the hold node?11:23
ianwyeah i'm trying to think how to trigger it.  does it run exeuctor only anyway11:24
ianw... https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_553/f33c1d8defdb24fc56d4cb0449cccb8df6c17c6c/post/trigger-readthedocs-webhook/553fbd4/job-output.txt11:24
frickleryes, I was just checking the latter. no node to hold11:24
ianwit doesn't11:24
ianwi think that the uri: module does not log the password anyway.  but perhaps we could turn off "no_log" and just watch it closely, and if it does, just change the password11:26
ianwit might be the easiest way to see what is coming back from ansible11:26
slaweqianw if that would work, that could be maybe the easiest way to go with it11:32
slaweqbut can we just test with "depends-on" or do we actually need to merge it?11:32
ianwthis will need to merge because it's in a post pipeline11:41
ianwit would be worth logging the rtd_project_name too12:00
ianwok, i think it's ... something to do with noble12:13
ianwusing "ansible-playbook [core 2.17.5]"12:13
ianwwhen runnign the uri: ping via a noble host, i get "Status code was 403 and not [200]: HTTP Error 403: Forbidden"12:14
ianwthis means "force_basic_auth" isn't working12:14
ianwwhen i run it locally on my fedora host, it works12:15
ianwsame playbook, just via  noble host and via local connection12:17
fricklerhmm, that's the same ansible-core version that I was testing with. just on trixie, I can try to test from a noble vm later today12:17
ianwfrickler: that would be great to confirm.  i can't see any ansible bugs relating to this12:18
ianwbut also, hitting an api that requires setting force_basic_auth from noble might be a corner case restricted to us :)12:18
ianwthere's https://github.com/ansible/ansible/commit/11efa5c48bc0175770814af8efc446a90d3293c8 to "reduce complexity"12:22
ianwchanges that reduce complexity are always a bit suspicious :)12:23
ianwi've got to turn in.  frickler if you can confirm the ping via noble doesn't obey forcing the auth headers, that would be a great data point12:23
opendevreviewDmitriy Rabotyagov proposed opendev/system-config master: Add MariaDB community repository to cache mirrors  https://review.opendev.org/c/opendev/system-config/+/93188212:32
fungiianw: frickler: i'm not sure what node we would even hold for that rtd webhook trigger. isn't that task run on the executor? the job seems to be nodeless. we could try reproducing it by running with the ansible venv from an executor though12:47
fricklerok, I can reproduce the failure on noble. going to try bookworm for a comparison, then maybe instead of doing a lot of ansible debugging we could just switch the nodeset for that job14:23
fungifrickler: what nodeset? the task runs on the executor doesn't it?14:24
fungithough i guess we could give it a nonempty nodeset and set the task to the node14:25
fricklerah, good point. didn't help though anyway14:30
fricklerso on my trixie workstation it is working, both with ansible from a venv and with distro ansible. noble and bookworm seem broken.14:31
fricklerfungi: the job does have a nodeset associated, which was confusing me at first. not sure if that's oversight or whether it is actually used earlier in the job, just not for the trigger14:32
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404514:38
fungiyeah, generally if we're using a secret for a uri module task we just do that from the executor for simplicity14:43
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404514:50
opendevreviewyatin proposed openstack/project-config master: [neutron-dashboard] drop enforce-scope-old-defaults  https://review.opendev.org/c/openstack/project-config/+/93405215:01
corvushttps://zuul.opendev.org/t/opendev/build/944e17ac26f841b5a91f07e37ac79de6 is the first successful niz build (and it ran on a niz image too)15:39
Clark[m]nice!15:40
Clark[m]it looks like my irc bouncer node crashed over the weekend (cloud reports hypervisor had a sad) I think I'm going to take this as an opportunity to do an OS upgrade and so on which means I'll likely be off of IRC proper until that is done15:40
Clark[m]I've got the board meeting first then I'll focus on getting that done15:40
corvusstill missing some significant chunks of functionality, but it means we've got much of the basics in place.15:41
Clark[m]in the meantime I can be reached via this matrix account15:41
slaweqfungi frickler IIUC the issue with rtd.org is just on the Noble nodes so I can abandon my patches related to this as those won't be needed anymore, right?15:44
fungislaweq: there are no "nodes" involved in the failing task, but yes it seems to be related to something that changed behavior where that task runs on the zuul executors15:47
fungipossibly a regression in a specific version of urllib15:48
Clark[m]the zuul executors have run under python3.11 for quite some time15:52
Clark[m]in a debian bookworm container image (but not using bookworm's python)15:53
fungiyeah, i wonder if something has changed with the python3.11-bookworm base image back in september15:53
fungiClark[m]: wait... the ansible run on executors isn't run from a container though is it? looks like we're using /var/lib/zuul/ansible/916:00
corvusthat's in the zuul executor container16:01
corvusso in terms of dependencies/libs/etc, it'll be running with whatever bookworm system deps were in the zuul-executor container image build and whatever ansible and python deps were in the ansible venv install created within that container build16:02
fungiah, /var/lib/zuul is mapped from the host into the executor container and then executed from inside the container using the python interpreter in that container?16:02
corvusif folks have a local reproducer, it's easy to try with the production container image.  1 sec and i'll make a command16:03
fungi(but also under bwrap_16:03
corvusfungi: no, /var/lib/zuul is not on the host at all, it's in the container image16:03
fungihuh, why do we map that as a volume in the docker-compose file then?16:03
fungimaybe i still don't fully understand how containers work16:04
corvusyeah why do we, am i wrong?16:04
fungii was mainly going by the ansible process paths showing up in the host ps output16:04
fungii had assumed we ran them from files outside the container so that zuul-manage-ansible could update the venvs since the container image contents themselves would be read-only16:05
corvusi agree that the docker-compose file binds that in... though it should still be running with the python binary in the container16:06
corvusyes that makes sense, but i wonder what we need to install...16:06
corvus(i'm not going to bother creating a reproducer command right now since it looks like the ansible installed on our executors is not what's in the image)16:07
fungicorvus: anyway, as for a local reproducer, it relies on a zuul secret. i think frickler and ianw were running with a copy of the necessay credentials16:08
Clark[m]var lib zuul has git repo state and that is probablty why we bind mount?16:08
Clark[m]but maybe we need to be mroe specific so that the container data also shows up alongside?16:09
corvusfungi: ok i misread, let's back up16:10
corvusthe /var/lib/zuul/ansible directory bind-mounted from the host into the container is not the zuul installation, it's the staging area for zuul's ansible modules.16:10
slaweqfungi ok, thx for confirmation16:11
corvusso we are running ansible from the container image, and it would be useful for folks with a local reproducer to try with the container image.16:11
corvusfungi: Clark as Clark suggests, we're mostly interested in bind-mounting the git repos, and we get that ansible dir along with it.  it's fine, we don't care, and the functionality is no different either way.16:12
corvusdocker run --rm -it quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zuul/ansible/9/bin/ansible-playbook --version is an example command that will run exactly the ansible (along with debian system) that's on the executors.16:14
fricklerfyi this is my reproducer, stripped down version of what is in our role. I think ppl like slaweq could also use it with their personal account instead of openstackci to verify https://paste.opendev.org/show/bi9Im6d8mUjuizDaotgp/16:14
corvus(^ that /usr/local/lib/zuul path is the actual ansible installation in the image)16:14
corvusyeah, combine a reproducer like that with that docker command i just sent and that should be exactly the conditions in production16:15
clarkbMade it back17:00
clarkbI appear to have lost scrollback in the process ( can probably manually dig it up if necessary though). If there is anything important I've missed over the weekend please remind me :)17:04
clarkbI had to convert apt keys to the new system to make apt hapyp after upgrading the sytstem17:08
clarkbnot a big deal but also something they could've automated ....17:08
fungikeys for third-party repos?17:18
clarkbyes17:18
clarkbin this case weechat17:18
fungiaha17:19
fungihopefully there was an upgrade note letting you know it would be needed17:19
clarkbthere wasn't but after reenabling the apt source apt update complained and pointed me at the deprecation notice in apt-keys man page17:19
clarkbor maybe it was noted in the docs but the docs were huge so I skimmed17:19
clarkbI am tempted to move weechat into a conatiner so that I don't have to do the dance of uninstalling it and reinstalling it every debian upgrade but honestly thats straightfowrard so not a big deal17:22
fungiis the current version packaged in debian still too old?17:24
clarkbprobably not but I'm happy sticking with upstream to avoid it becoming too old in a year or whatever17:25
fungilooks like bookworm still has 3.8 while trixie is up to 4.4.2 now17:25
clarkbI should've checked weechat changelogs though it rewrote a ton of my config automatically with a warning before it did so that I cna't downgrade now17:25
clarkbso even if I watned to downgrade it would be painfukl17:26
fungiand 4.4.3 was released on wednesday last week17:26
opendevreviewJeremy Stanley proposed openstack/project-config master: Repair promote-openstack-manuals-developer  https://review.opendev.org/c/openstack/project-config/+/93406818:08
fungifrickler: ^ i'm guessing that's what the openstack/api-site promote job needs?18:08
opendevreviewClark Boylan proposed opendev/system-config master: Configure native Gerrit log rotation and cleanup  https://review.opendev.org/c/opendev/system-config/+/93407518:34
clarkbthis is a change that has come out of Gerrit 3.10 upgrade prep. I've actually tracked down a number of TODOs in the etherpad if you want to timeslider and see what I've found on a few topics18:35
clarkbI think everything under Areas of Concern has been run down. Next up is the breaking changes list18:38
clarkband now I'm through that list19:00
opendevreviewMerged openstack/project-config master: Repair promote-openstack-manuals-developer  https://review.opendev.org/c/openstack/project-config/+/93406819:12
fungii'll reenqueue the openstack/api-docs promote jobs to see if that solved the problem19:12
fungino, that change was apparently wrong, https://zuul.opendev.org/t/openstack/build/1a84db5d173c4777b9d730923721b04a is the reenqueued promote and has the same error19:24
ianwok i can reproduce with the container image19:41
ianwfollowing playbook in /tmp/test.yaml -> https://paste.opendev.org/show/bUGxGsO4D9OIhO6Qvn5f/19:42
fungiianw: awesome! i guess the next thing is to identify how ansible's environment there differs19:42
ianwpodman run --rm -v /tmp/test.yaml:/tmp/test.yaml:Z -it quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zuul/ansible/9/bin/ansible-playbook  -i localhost, /tmp/test.yaml19:42
ianw"msg": "Status code was 403 and not [200]: HTTP Error 403: Forbidden"19:43
fungiso seems like it is indeed the uri module passing credentials differently in that environment. at least we know we were on the right track19:43
ianwsigh, "apt-get update" fails in an interactive podman container with setgid() errors19:45
ianwleaving me unable to install an editor19:46
fungithat needs user namespaces, right?19:46
fungior, no, never mind. it's probably specific caps you need to allow for setuid/setgid calls19:46
ianwi ... don't know ... and not sure i want to go down this rabbit hole :/19:47
fungiianw: fwiw, i usually just track down the path in the host system to the file i want to edit inside the unpacked container image while it's running19:47
fungiand run the editor outside the container19:48
fungibasically there's a chroot somewhere while the container is running, though i guess if you're only instantiating the container on-demand and it's cleaned up immediately after your command completes, that doesn't help19:49
ianwit works as root but not rootless19:49
fungii'd probably invoke a shell session from the container and leave it open, then edit the file(s) in its chroot from the host context19:50
* fungi uses blunt instruments like `sudo find /var/lib/docker/ -name somefile` to track it down, not sure what the podman equivalent path might be19:56
ianwyou are freaking kidding me20:13
ianwso i've installed mitmproxy and screen in the container20:13
ianwhttps_proxy=https://localhost:8080  /usr/local/lib/zuul/ansible/9/bin/ansible-playbook  -i localhost, /tmp/test.yaml ... mitmproxy intercepts it, POST looks totally fine with auth header and works20:14
ianwdrop the https_proxy and it fails!!!!20:14
fungithis is the definition of a heisenbug, yes?20:15
clarkbweekly reminder to update the meeting agenda or let me know of any content you want added/removed/edited20:40
clarkb I did some edits last week and I Think it is fairly up to date though20:40
corvusmaybe we should try compiling it with something other than Borland C++20:43
fungirewrite it in turbo pascal20:44
ianwi can inject a syntaxerror before the urllib_request.urlopen20:45
ianwthat shows me that there is an auth header set20:46
ianwbut then it's a black hole :/20:46
ianwit is really really annoying to try and debug ansible modules20:46
ianw    raise SyntaxError(request.headers)20:48
ianwSyntaxError: {'User-agent': 'ansible-httpget', 'Authorization': b' ...20:48
ianwas another data point /usr/local/lib/zuul/ansible/8/bin/ansible-playbook -vvv  -i localhost, /tmp/test.yaml also fails20:50
fungiso ansible 8 and 9? i guess that points to a linked library in common between them20:51
ianw... which is like httplib20:51
ianwhttps://github.com/ansible/ansible/blob/devel/lib/ansible/module_utils/urls.py ... it is quite a nightmare20:54
corvusianw: i often write to a file in tmp inside the module21:01
ianw... i give up.  that it looks right to mitmproxy, and works going _through_ mitmproxy ... it makes we wonder if it's something cloudflare doesn't like21:30
ianwsomeone else should confirm that i'm not totally on the wrong path that it does indeed fail with the zuul ansible and the playbook above, but you'll need to get the openstackci password to put into it21:30
clarkbdo we want to put ^ on the meeting agenda?23:43
clarkbI'll put it on there since it is interesting23:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!