@joao15130:matrix.org | Hello all. | 13:41 |
---|---|---|
I'm trying to run a job which was fine before, but after a few changes, it does'nt work anymore and no logs are captured by Zuul. | ||
The only thing I see is | ||
--- END OF STREAM --- | ||
displaying continuously without any errors. | ||
Didn't find anything relevant in the executor. I tried to enable the verbose mode in the container with no effect. | ||
@joao15130:matrix.org | * Hello all. | 13:41 |
I'm trying to run a job which was fine before, but after a few changes, it does'nt work anymore and no logs are captured by Zuul. | ||
The only thing I see is | ||
--- END OF STREAM --- | ||
displaying continuously without any errors. | ||
Didn't find anything relevant in the executor. I tried to enable the verbose mode in the container with no effect. | ||
Do you have any idea? | ||
@joao15130:matrix.org | It's like the job is running but no streaming happens. | 13:46 |
@dfajfer:fsfe.org | honestly, there could be a few things but I'm having deja vu about you writing this one | 13:47 |
@joao15130:matrix.org | Yes but it was for something else | 13:47 |
@joao15130:matrix.org | this time, nothing appears on the console | 13:47 |
@joao15130:matrix.org | I remember to have seen debug in the zuul_executor container, how can I enable it? I tried by running zuul-executor verbose in the container, but it doesn't change anything in the logs | 13:49 |
@joao15130:matrix.org | I'm wondering... I've changed the file structure of my repository where the whole config resides | 13:56 |
@joao15130:matrix.org | In a job where we have this definition | 13:57 |
`pre-run: playbooks/base/pre.yaml` | ||
@joao15130:matrix.org | does the playbook directory needs to reside in `zuul-config` or `zuul-config/zuul.d` ? | 13:58 |
@fungicide:matrix.org | joao15130: in opendev we call the start-zuul-console role from a pre-run playbook in the base job that all other jobs inherit from: https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/base/pre.yaml#L21 | 14:13 |
@fungicide:matrix.org | https://zuul-ci.org/docs/zuul-jobs/latest/general-roles.html#role-start-zuul-console | 14:14 |
@joao15130:matrix.org | ok so in that case the playbooks reside in the upper level | 14:14 |
@joao15130:matrix.org | ``` | 14:14 |
- hosts: localhost | ||
roles: | ||
- role: emit-job-header | ||
zuul_log_path_shard_build: true | ||
- ensure-output-dirs | ||
- hosts: all | ||
pre_tasks: | ||
- name: Start zuul console daemon | ||
zuul_console: | ||
roles: | ||
- add-build-sshkey | ||
- prepare-workspace-git | ||
- validate-host | ||
- log-inventory | ||
``` | ||
@joao15130:matrix.org | that's the playbook the job calls | 14:15 |
@fungicide:matrix.org | that looks similar to https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/start-zuul-console/tasks/main.yaml so in theory you're embedding the same ansible module invocation i guess | 14:17 |
-@gerrit:opendev.org- Simon Westphahl proposed: | 14:17 | |
- [zuul/zuul] 941434: Refactor to a common tpe event connector https://review.opendev.org/c/zuul/zuul/+/941434 | ||
- [zuul/zuul] 941435: Make Gerrit event pre-processor multi-threaded https://review.opendev.org/c/zuul/zuul/+/941435 | ||
@joao15130:matrix.org | and it worked fine until last week, I did a few changes but something seems to be broken now | 14:17 |
@joao15130:matrix.org | cause I see in the target VM that the job is running in the background | 14:17 |
@joao15130:matrix.org | and I see the build-sshkey added so it confirms that the playbook is being processed | 14:20 |
@joao15130:matrix.org | I remember to have seen debug in the zuul_executor container, how can I enable it? I tried by running zuul-executor verbose in the container, but it doesn't change anything in the logs | 14:23 |
-@gerrit:opendev.org- Simon Westphahl proposed: | 14:24 | |
- [zuul/zuul] 941434: Refactor to a common tpe event connector https://review.opendev.org/c/zuul/zuul/+/941434 | ||
- [zuul/zuul] 941435: Make Gerrit event pre-processor multi-threaded https://review.opendev.org/c/zuul/zuul/+/941435 | ||
@joao15130:matrix.org | which comes from the doc: | 14:26 |
*To enable or disable running Ansible in verbose mode (with the -vvv argument to ansible-playbook) run zuul-executor verbose and zuul-executor unverbose.* | ||
@fungicide:matrix.org | is joao15130 is the finger gateway service running, and is it able to reach 7900/tcp on your executors? | 14:27 |
@joao15130:matrix.org | what is the finger gateway? My setup is based upon the quick-start tutorial and I don't remember to have seen this type of container | 14:28 |
@fungicide:matrix.org | https://zuul-ci.org/docs/zuul/latest/configuration.html#finger-gateway but maybe you don't need one if you don't have separate executors | 14:30 |
@joao15130:matrix.org | I just have one executor | 14:32 |
@fungicide:matrix.org | the zuul_console streams ansible output which is read back by the executor and then served up on a socket which the finger gateway connects to, and the zuul web interface connects to the finger gateway and requests the stream corresponding to the specific build you're trying to view | 14:32 |
@joao15130:matrix.org | never deployed finger gateway and it was working before. | 14:34 |
@joao15130:matrix.org | That's weird | 14:34 |
@fungicide:matrix.org | well, if you only have one executor then maybe zuul-web knows to connect directly to it rather than needing a multiplexer | 14:36 |
@joao15130:matrix.org | yeah that's possible | 14:36 |
@joao15130:matrix.org | The job has finished and I'm able to see all the logs just like the job was executing normally | 14:37 |
@joao15130:matrix.org | but no streaming happened | 14:37 |
@fungicide:matrix.org | is 7900/tcp listening on your executorr? | 14:38 |
@joao15130:matrix.org | give me 2mns, it's a container and no ss or netstat tool is available | 14:39 |
@joao15130:matrix.org | this is what I have on the container: | 14:42 |
``` | ||
root@8b9c297c23f6:/# netstat -laputen | ||
Active Internet connections (servers and established) | ||
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name | ||
tcp 0 0 10.89.0.14:40426 151.101.54.132:80 TIME_WAIT 0 0 - | ||
tcp 0 0 10.89.0.14:60140 10.89.0.5:2281 ESTABLISHED 0 204400 2/python | ||
tcp 0 0 10.89.0.14:35394 151.101.54.132:80 TIME_WAIT 0 0 - | ||
``` | ||
@joao15130:matrix.org | and what's defined for this containers at start-up: | 14:43 |
``` | ||
executor: | ||
privileged: true | ||
environment: | ||
- http_proxy | ||
- https_proxy | ||
- no_proxy=${no_proxy},gerrit,scheduler | ||
- ZUUL_MYSQL_PASSWORD=secret | ||
image: quay.io/zuul-ci/zuul-executor | ||
volumes: | ||
- "./etc_zuul/:/etc/zuul/:z" | ||
- "./playbooks/:/var/playbooks/:z" | ||
- "sshkey:/var/ssh:z" | ||
- "logs:/srv/static/logs:z" | ||
- "certs:/var/certs:z" | ||
- "lib-zuul-executor:/var/lib/zuul:z" | ||
command: "sh -c '/var/playbooks/wait-to-start-certs.sh && exec zuul-executor -f'" | ||
networks: | ||
- zuul | ||
``` | ||
@fungicide:matrix.org | i guess you're not altering the finger_port in configuration? https://zuul-ci.org/docs/zuul/latest/configuration.html#attr-executor.finger_port | 14:51 |
@fungicide:matrix.org | the base uickstart example doesn't seem to alter it | 14:51 |
@joao15130:matrix.org | indeed, I don't alter it | 14:53 |
@fungicide:matrix.org | when the executor service starts, you should see it log "Starting log streamer" like at https://zuul.opendev.org/t/zuul/build/85d75564949e401182070afa8ad1c1d5/log/container_logs/executor.log#2 | 14:53 |
@joao15130:matrix.org | good catch. | 14:57 |
``` | ||
2025-02-12 14:56:46,695 INFO zuul.Executor: Starting log streamer | ||
Traceback (most recent call last): | ||
File "/usr/local/bin/zuul-executor", line 8, in <module> | ||
sys.exit(main()) | ||
^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/zuul/cmd/executor.py", line 133, in main | ||
Executor().main() | ||
File "/usr/local/lib/python3.11/site-packages/zuul/cmd/__init__.py", line 267, in main | ||
self.run() | ||
File "/usr/local/lib/python3.11/site-packages/zuul/cmd/executor.py", line 107, in run | ||
self.start_log_streamer() | ||
File "/usr/local/lib/python3.11/site-packages/zuul/cmd/executor.py", line 65, in start_log_streamer | ||
streamer = zuul.lib.log_streamer.LogStreamer( | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/zuul/lib/log_streamer.py", line 173, in __init__ | ||
self.server = LogStreamerServer((host, port), | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/zuul/lib/log_streamer.py", line 162, in __init__ | ||
super(LogStreamerServer, self).__init__(*args, **kwargs) | ||
File "/usr/local/lib/python3.11/site-packages/zuul/lib/streamer_utils.py", line 112, in __init__ | ||
socketserver.ThreadingTCPServer.__init__(self, *args, **kwargs) | ||
File "/usr/local/lib/python3.11/socketserver.py", line 452, in __init__ | ||
self.socket = socket.socket(self.address_family, | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/socket.py", line 232, in __init__ | ||
_socket.socket.__init__(self, family, type, proto, fileno) | ||
OSError: [Errno 97] Address family not supported by protocol | ||
``` | ||
@joao15130:matrix.org | what can be the reason? | 14:57 |
@joao15130:matrix.org | IPV6 is enabled on the host | 14:57 |
@joao15130:matrix.org | In fact no, ipv6 is disabled | 14:58 |
@clarkb:matrix.org | corvus: https://review.opendev.org/c/zuul/zuul-jobs/+/939823 failed to merge after more rechecks yesterday evening. Do you want to do the honors or should I? I'm happy to do it but will be another hour or so (which is probably fine) | 14:59 |
@fungicide:matrix.org | joao15130: likely the fork is trying to bind to 7900/tcp on both 0.0.0.0 and :: but i'll need to take a closer look, there's a chance that's configurable | 15:00 |
@fungicide:matrix.org | joao15130: aha, no, looks like it assumes binding to :: will work in all cases, and that the kernel is transparently dual-stack | 15:01 |
@fungicide:matrix.org | so yes, i think you'll need ipv6 enabled in sysconfig, even if you don't have any global v6 networking established | 15:03 |
@fungicide:matrix.org | i think this is default for most kernels these days, but more details about your platform might help us figure out if the code needs to change | 15:03 |
@joao15130:matrix.org | my system is Ubuntu 24.04 | 15:04 |
@joao15130:matrix.org | no specific network change | 15:04 |
@joao15130:matrix.org | let me try to enable ipV6 | 15:04 |
@fungicide:matrix.org | on a default ubuntu 24.04 install you should find that sysctl -a|grep disable_ipv6 is 0 for everything | 15:09 |
@fungicide:matrix.org | same for sysctl net.ipv6.bindv6only which is what allows binding to :: to also work for v4 addresses | 15:10 |
@fungicide:matrix.org | if disable_ipv6 is set for any interfaces, then there's probably some custom configuration toggling that at boot | 15:11 |
@joao15130:matrix.org | I've juste enabled IPV6 | 15:12 |
@joao15130:matrix.org | and I got: | 15:12 |
``` | ||
2025-02-12T15:12:13+00:00 Wait for certs to be present | ||
2025-02-12 15:12:14,034 INFO zuul.Executor: Starting log streamer | ||
``` | ||
@joao15130:matrix.org | with no errors | 15:12 |
@joao15130:matrix.org | let me trigger a new job | 15:13 |
@joao15130:matrix.org | and it's working fine | 15:13 |
@joao15130:matrix.org | I understand what happened now, We've been told to use preshipped image from our IT dept | 15:14 |
@joao15130:matrix.org | this image has IPV6 disabled by default | 15:14 |
@joao15130:matrix.org | thank you fungi ! | 15:14 |
@fungicide:matrix.org | my pleasure, glad we got it sorted out | 15:20 |
@fungicide:matrix.org | separately, that any it department in this day and age is so frightened of ipv6 that they break modern software by disabling it is pretty absurd, but unfortunately not surprising | 15:21 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 941299: Fix busy loop in zuul console https://review.opendev.org/c/zuul/zuul/+/941299 | 16:07 | |
@clarkb:matrix.org | corvus: I'm finding myself distracted by haproxy. I'm happy for you to merge the zuul-jobs buildx fix otherwise I'll get to it when not distracted | 16:27 |
@jim:acmegating.com | Clark: ack, we'll coordinate here :) | 16:34 |
@jim:acmegating.com | Clark: merging now | 16:58 |
-@gerrit:opendev.org- corvus.admin merged on behalf of Yaguang Tang: [zuul/zuul-jobs] 939823: Install ca-certificates in the buildx image https://review.opendev.org/c/zuul/zuul-jobs/+/939823 | 17:01 | |
@clarkb:matrix.org | thanks | 17:20 |
@clarkb:matrix.org | corvus: that does seem to have made https://review.opendev.org/c/zuul/nodepool/+/941294 happier, but it failed in a single unittest. I'll recheck as it looks like maybe zookeeper got angry and that was fallout | 18:29 |
@jim:acmegating.com | not unlikely | 18:58 |
@mnaser:matrix.org | Is there a nice talk that someone has gave or done about doing CD with Zuul? I know OpenDev is doing a lot of it, but mostly like.. do you use the project key as a deploy key... do you add_host for a bastion host and run ansible in there, using semaphores, deploying post-merge, etc etc | 19:16 |
@mnaser:matrix.org | we're slowly moving things out of github actions and trying to figure out how to best make this work | 19:16 |
@fungicide:matrix.org | in opendev we basically run "nested" ansible through a bastion server | 19:19 |
@mnaser:matrix.org | ah ok got it, so vm spins up, add_host and then run ansible from there | 19:19 |
@mnaser:matrix.org | thats fair enough, helps decouple from the zuul ansible, but also i was wondering if the deploys happened in gate or a post-submit pipeline | 19:20 |
@fungicide:matrix.org | we have a "deploy" pipeline | 19:20 |
@fungicide:matrix.org | which is triggered post-merge | 19:21 |
@fungicide:matrix.org | we have nearly identical "run" and "deploy" jobs, the "run" jobs have ephemeral nodesets with one of them being a throwaway stand-in for the bastion and those are used in pre-merge (check, gate) pipelines | 19:23 |
@fungicide:matrix.org | that allows us to reuse the same playbooks to test and deploy to production, substituting test-only vars for the normal secrets and private inventory host/group vars that store our production creds | 19:24 |
@mnaser:matrix.org | i see, for us the run might be a little bit more difficult, since we're using it to deploy entire clouds.. its kinda hard to simulate that :( | 19:24 |
@mnaser:matrix.org | but i guess i can see us running linters and other things, since its probably hard for us to build and test a whole cloud (well, we try, but in some envs it would depend on external ceph clusters or physical storage arrays) | 19:25 |
@fungicide:matrix.org | well, our jobs are service-specific, so only deploy or test the subset of our services that are relevant to the files being changed at that point in time | 19:26 |
@fungicide:matrix.org | we don't test or deploy everything all the time | 19:26 |
@mnaser:matrix.org | Oh interesting, I think that might be difficult for us since there is a lot of pieces working together, such as roles that depend on other roles, so the file filters might not work as nicely (or we might miss things) | 19:27 |
@mnaser:matrix.org | and I guess in testing.. well if a nova change happens then we need glance and keystone and all the whole other crew to validate | 19:27 |
@clarkb:matrix.org | unless you try to update how we collcet logs and install docker/podman then everything wants to run | 19:27 |
@fungicide:matrix.org | hah, true that | 19:28 |
@fungicide:matrix.org | hello dockerhub | 19:28 |
@mnaser:matrix.org | lol! yeah, I feel you on that :( and somehow with all the IPs you have access to, it still goes limits | 19:30 |
@clarkb:matrix.org | mnaser: beacuse ipv6 | 19:32 |
@clarkb:matrix.org | they treat an entire /64 as one ip | 19:32 |
@clarkb:matrix.org | ironic that docker refused to care about ipv6 for years then just before the enforce stricter rate limits add functionality for ipv6 making the explosions more spectacular | 19:32 |
@fungicide:matrix.org | at least with clients in clouds that issue individual v6 addresses out of a common /64 pool | 19:34 |
@clarkb:matrix.org | corvus: 941294 passes testing now. Might be worth a quick review and approval? | 20:15 |
-@gerrit:opendev.org- Aurelio Jargas proposed: [zuul/zuul-jobs] 941490: Add role: `ensure-python-command`, refactor similar roles https://review.opendev.org/c/zuul/zuul-jobs/+/941490 | 21:54 | |
@aureliojargas:matrix.org | Clark corvus the other day we've talked about deduplicating the code from all the `ensure-<python-command>` roles. That's my shot at it 👆️ | 22:18 |
@aureliojargas:matrix.org | I've documented it in the commit message, but also added some comments to the diff to explain some decisions. | 22:19 |
@aureliojargas:matrix.org | At the end it is almost a full refactor, but I needed to do things differently in two specific places: not overloading a var as being both input and output, and improving the "already installed" command detection. | 22:20 |
@aureliojargas:matrix.org | The deduped roles are: `ensure-nox`, `ensure-poetry`, `ensure-pyproject-build`, `ensure-twine`, `ensure-uv` (I've left `ensure-tox` out because of the Python 2 thingy we discussed the other day) | 22:24 |
@jim:acmegating.com | Aurelio Jargas: thanks! | 22:37 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!