Wednesday, 2025-02-12

@joao15130:matrix.orgHello all.13:41
I'm trying to run a job which was fine before, but after a few changes, it does'nt work anymore and no logs are captured by Zuul.
The only thing I see is
--- END OF STREAM ---
displaying continuously without any errors.
Didn't find anything relevant in the executor. I tried to enable the verbose mode in the container with no effect.
@joao15130:matrix.org* Hello all.13:41
I'm trying to run a job which was fine before, but after a few changes, it does'nt work anymore and no logs are captured by Zuul.
The only thing I see is
--- END OF STREAM ---
displaying continuously without any errors.
Didn't find anything relevant in the executor. I tried to enable the verbose mode in the container with no effect.
Do you have any idea?
@joao15130:matrix.orgIt's like the job is running but no streaming happens.13:46
@dfajfer:fsfe.orghonestly, there could be a few things but I'm having deja vu about you writing this one13:47
@joao15130:matrix.orgYes but it was for something else 13:47
@joao15130:matrix.orgthis time, nothing appears on the console13:47
@joao15130:matrix.orgI remember to have seen debug in the zuul_executor container, how can I enable it? I tried by running zuul-executor verbose in the container, but it doesn't change anything in the logs13:49
@joao15130:matrix.orgI'm wondering... I've changed the file structure of my repository where the whole config resides13:56
@joao15130:matrix.orgIn a job where we have this definition 13:57
`pre-run: playbooks/base/pre.yaml`
@joao15130:matrix.orgdoes the playbook directory needs to reside in `zuul-config` or `zuul-config/zuul.d` ?13:58
@fungicide:matrix.orgjoao15130: in opendev we call the start-zuul-console role from a pre-run playbook in the base job that all other jobs inherit from: https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/base/pre.yaml#L2114:13
@fungicide:matrix.orghttps://zuul-ci.org/docs/zuul-jobs/latest/general-roles.html#role-start-zuul-console14:14
@joao15130:matrix.orgok so in that case the playbooks reside in the upper level14:14
@joao15130:matrix.org```14:14
- hosts: localhost
roles:
- role: emit-job-header
zuul_log_path_shard_build: true
- ensure-output-dirs
- hosts: all
pre_tasks:
- name: Start zuul console daemon
zuul_console:
roles:
- add-build-sshkey
- prepare-workspace-git
- validate-host
- log-inventory
```
@joao15130:matrix.orgthat's the playbook the job calls14:15
@fungicide:matrix.orgthat looks similar to https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/start-zuul-console/tasks/main.yaml so in theory you're embedding the same ansible module invocation i guess14:17
-@gerrit:opendev.org- Simon Westphahl proposed:14:17
- [zuul/zuul] 941434: Refactor to a common tpe event connector https://review.opendev.org/c/zuul/zuul/+/941434
- [zuul/zuul] 941435: Make Gerrit event pre-processor multi-threaded https://review.opendev.org/c/zuul/zuul/+/941435
@joao15130:matrix.organd it worked fine until last week, I did a few changes but something seems to be broken now14:17
@joao15130:matrix.orgcause I see in the target VM that the job is running in the background14:17
@joao15130:matrix.organd I see the build-sshkey added so it confirms that the playbook is being processed14:20
@joao15130:matrix.orgI remember to have seen debug in the zuul_executor container, how can I enable it? I tried by running zuul-executor verbose in the container, but it doesn't change anything in the logs14:23
-@gerrit:opendev.org- Simon Westphahl proposed:14:24
- [zuul/zuul] 941434: Refactor to a common tpe event connector https://review.opendev.org/c/zuul/zuul/+/941434
- [zuul/zuul] 941435: Make Gerrit event pre-processor multi-threaded https://review.opendev.org/c/zuul/zuul/+/941435
@joao15130:matrix.orgwhich comes from the doc:14:26
*To enable or disable running Ansible in verbose mode (with the -vvv argument to ansible-playbook) run zuul-executor verbose and zuul-executor unverbose.*
@fungicide:matrix.orgis joao15130 is the finger gateway service running, and is it able to reach 7900/tcp on your executors?14:27
@joao15130:matrix.orgwhat is the finger gateway? My setup is based upon the quick-start tutorial and I don't remember to have seen this type of container14:28
@fungicide:matrix.orghttps://zuul-ci.org/docs/zuul/latest/configuration.html#finger-gateway but maybe you don't need one if you don't have separate executors14:30
@joao15130:matrix.orgI just have one executor14:32
@fungicide:matrix.orgthe zuul_console streams ansible output which is read back by the executor and then served up on a socket which the finger gateway connects to, and the zuul web interface connects to the finger gateway and requests the stream corresponding to the specific build you're trying to view14:32
@joao15130:matrix.orgnever deployed finger gateway and it was working before.14:34
@joao15130:matrix.orgThat's weird14:34
@fungicide:matrix.orgwell, if you only have one executor then maybe zuul-web knows to connect directly to it rather than needing a multiplexer14:36
@joao15130:matrix.orgyeah that's possible14:36
@joao15130:matrix.orgThe job has finished and I'm able to see all the logs just like the job was executing normally14:37
@joao15130:matrix.orgbut no streaming happened14:37
@fungicide:matrix.orgis 7900/tcp listening on your executorr?14:38
@joao15130:matrix.orggive me 2mns, it's a container and no ss or netstat tool is available14:39
@joao15130:matrix.orgthis is what I have on the container:14:42
```
root@8b9c297c23f6:/# netstat -laputen
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 10.89.0.14:40426 151.101.54.132:80 TIME_WAIT 0 0 -
tcp 0 0 10.89.0.14:60140 10.89.0.5:2281 ESTABLISHED 0 204400 2/python
tcp 0 0 10.89.0.14:35394 151.101.54.132:80 TIME_WAIT 0 0 -
```
@joao15130:matrix.organd what's defined for this containers at start-up:14:43
```
executor:
privileged: true
environment:
- http_proxy
- https_proxy
- no_proxy=${no_proxy},gerrit,scheduler
- ZUUL_MYSQL_PASSWORD=secret
image: quay.io/zuul-ci/zuul-executor
volumes:
- "./etc_zuul/:/etc/zuul/:z"
- "./playbooks/:/var/playbooks/:z"
- "sshkey:/var/ssh:z"
- "logs:/srv/static/logs:z"
- "certs:/var/certs:z"
- "lib-zuul-executor:/var/lib/zuul:z"
command: "sh -c '/var/playbooks/wait-to-start-certs.sh && exec zuul-executor -f'"
networks:
- zuul
```
@fungicide:matrix.orgi guess you're not altering the finger_port in configuration? https://zuul-ci.org/docs/zuul/latest/configuration.html#attr-executor.finger_port14:51
@fungicide:matrix.orgthe base uickstart example doesn't seem to alter it14:51
@joao15130:matrix.orgindeed, I don't alter it14:53
@fungicide:matrix.orgwhen the executor service starts, you should see it log "Starting log streamer" like at https://zuul.opendev.org/t/zuul/build/85d75564949e401182070afa8ad1c1d5/log/container_logs/executor.log#214:53
@joao15130:matrix.orggood catch.14:57
```
2025-02-12 14:56:46,695 INFO zuul.Executor: Starting log streamer
Traceback (most recent call last):
File "/usr/local/bin/zuul-executor", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/site-packages/zuul/cmd/executor.py", line 133, in main
Executor().main()
File "/usr/local/lib/python3.11/site-packages/zuul/cmd/__init__.py", line 267, in main
self.run()
File "/usr/local/lib/python3.11/site-packages/zuul/cmd/executor.py", line 107, in run
self.start_log_streamer()
File "/usr/local/lib/python3.11/site-packages/zuul/cmd/executor.py", line 65, in start_log_streamer
streamer = zuul.lib.log_streamer.LogStreamer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/zuul/lib/log_streamer.py", line 173, in __init__
self.server = LogStreamerServer((host, port),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/zuul/lib/log_streamer.py", line 162, in __init__
super(LogStreamerServer, self).__init__(*args, **kwargs)
File "/usr/local/lib/python3.11/site-packages/zuul/lib/streamer_utils.py", line 112, in __init__
socketserver.ThreadingTCPServer.__init__(self, *args, **kwargs)
File "/usr/local/lib/python3.11/socketserver.py", line 452, in __init__
self.socket = socket.socket(self.address_family,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/socket.py", line 232, in __init__
_socket.socket.__init__(self, family, type, proto, fileno)
OSError: [Errno 97] Address family not supported by protocol
```
@joao15130:matrix.orgwhat can be the reason?14:57
@joao15130:matrix.orgIPV6 is enabled on the host14:57
@joao15130:matrix.orgIn fact no, ipv6 is disabled14:58
@clarkb:matrix.orgcorvus: https://review.opendev.org/c/zuul/zuul-jobs/+/939823 failed to merge after more rechecks yesterday evening. Do you want to do the honors or should I? I'm happy to do it but will be another hour or so (which is probably fine)14:59
@fungicide:matrix.orgjoao15130: likely the fork is trying to bind to 7900/tcp on both 0.0.0.0 and :: but i'll need to take a closer look, there's a chance that's configurable15:00
@fungicide:matrix.orgjoao15130: aha, no, looks like it assumes binding to :: will work in all cases, and that the kernel is transparently dual-stack15:01
@fungicide:matrix.orgso yes, i think you'll need ipv6 enabled in sysconfig, even if you don't have any global v6 networking established 15:03
@fungicide:matrix.orgi think this is default for most kernels these days, but more details about your platform might help us figure out if the code needs to change15:03
@joao15130:matrix.orgmy system is Ubuntu 24.0415:04
@joao15130:matrix.orgno specific network change15:04
@joao15130:matrix.orglet me try to enable ipV615:04
@fungicide:matrix.orgon a default ubuntu 24.04 install you should find that sysctl -a|grep disable_ipv6 is 0 for everything15:09
@fungicide:matrix.orgsame for sysctl net.ipv6.bindv6only which is what allows binding to :: to also work for v4 addresses15:10
@fungicide:matrix.orgif disable_ipv6 is set for any interfaces, then there's probably some custom configuration toggling that at boot15:11
@joao15130:matrix.orgI've juste enabled IPV615:12
@joao15130:matrix.organd I got:15:12
```
2025-02-12T15:12:13+00:00 Wait for certs to be present
2025-02-12 15:12:14,034 INFO zuul.Executor: Starting log streamer
```
@joao15130:matrix.orgwith no errors15:12
@joao15130:matrix.orglet me trigger a new job15:13
@joao15130:matrix.organd it's working fine15:13
@joao15130:matrix.orgI understand what happened now, We've been told to use preshipped image from our IT dept15:14
@joao15130:matrix.orgthis image has IPV6 disabled by default15:14
@joao15130:matrix.orgthank you fungi !15:14
@fungicide:matrix.orgmy pleasure, glad we got it sorted out15:20
@fungicide:matrix.orgseparately, that any it department in this day and age is so frightened of ipv6 that they break modern software by disabling it is pretty absurd, but unfortunately not surprising15:21
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 941299: Fix busy loop in zuul console https://review.opendev.org/c/zuul/zuul/+/94129916:07
@clarkb:matrix.orgcorvus: I'm finding myself distracted by haproxy. I'm happy for you to merge the zuul-jobs buildx fix otherwise I'll get to it when not distracted16:27
@jim:acmegating.comClark: ack, we'll coordinate here :)16:34
@jim:acmegating.comClark: merging now16:58
-@gerrit:opendev.org- corvus.admin merged on behalf of Yaguang Tang: [zuul/zuul-jobs] 939823: Install ca-certificates in the buildx image https://review.opendev.org/c/zuul/zuul-jobs/+/93982317:01
@clarkb:matrix.orgthanks17:20
@clarkb:matrix.orgcorvus:  that does seem to have made https://review.opendev.org/c/zuul/nodepool/+/941294 happier, but it failed in a single unittest. I'll recheck as it looks like maybe zookeeper got angry and that was fallout18:29
@jim:acmegating.comnot unlikely18:58
@mnaser:matrix.orgIs there a nice talk that someone has gave or done about doing CD with Zuul?  I know OpenDev is doing a lot of it, but mostly like.. do you use the project key as a deploy key... do you add_host for a bastion host and run ansible in there, using semaphores, deploying post-merge, etc etc19:16
@mnaser:matrix.orgwe're slowly moving things out of github actions and trying to figure out how to best make this work19:16
@fungicide:matrix.orgin opendev we basically run "nested" ansible through a bastion server19:19
@mnaser:matrix.orgah ok got it, so vm spins up, add_host and then run ansible from there19:19
@mnaser:matrix.orgthats fair enough, helps decouple from the zuul ansible, but also i was wondering if the deploys happened in gate or a post-submit pipeline19:20
@fungicide:matrix.orgwe have a "deploy" pipeline19:20
@fungicide:matrix.orgwhich is triggered post-merge19:21
@fungicide:matrix.orgwe have nearly identical "run" and "deploy" jobs, the "run" jobs have ephemeral nodesets with one of them being a throwaway stand-in for the bastion and those are used in pre-merge (check, gate) pipelines19:23
@fungicide:matrix.orgthat allows us to reuse the same playbooks to test and deploy to production, substituting test-only vars for the normal secrets and private inventory host/group vars that store our production creds19:24
@mnaser:matrix.orgi see, for us the run might be a little bit more difficult, since we're using it to deploy entire clouds.. its kinda hard to simulate that :(19:24
@mnaser:matrix.orgbut i guess i can see us running linters and other things, since its probably hard for us to build and test a whole cloud (well, we try, but in some envs it would depend on external ceph clusters or physical storage arrays)19:25
@fungicide:matrix.orgwell, our jobs are service-specific, so only deploy or test the subset of our services that are relevant to the files being changed at that point in time19:26
@fungicide:matrix.orgwe don't test or deploy everything all the time19:26
@mnaser:matrix.orgOh interesting, I think that might be difficult for us since there is a lot of pieces working together, such as roles that depend on other roles, so the file filters might not work as nicely (or we might miss things)19:27
@mnaser:matrix.organd I guess in testing.. well if a nova change happens then we need glance and keystone and all the whole other crew to validate 19:27
@clarkb:matrix.orgunless you try to update how we collcet logs and install docker/podman then everything wants to run19:27
@fungicide:matrix.orghah, true that19:28
@fungicide:matrix.orghello dockerhub19:28
@mnaser:matrix.orglol!  yeah, I feel you on that :( and somehow with all the IPs you have access to, it still goes limits 19:30
@clarkb:matrix.orgmnaser: beacuse ipv619:32
@clarkb:matrix.orgthey treat an entire /64 as one ip19:32
@clarkb:matrix.orgironic that docker refused to care about ipv6 for years then just before the enforce stricter rate limits add functionality for ipv6 making the explosions more spectacular19:32
@fungicide:matrix.orgat least with clients in clouds that issue individual v6 addresses out of a common /64 pool19:34
@clarkb:matrix.orgcorvus: 941294 passes testing now. Might be worth a quick review and approval?20:15
-@gerrit:opendev.org- Aurelio Jargas proposed: [zuul/zuul-jobs] 941490: Add role: `ensure-python-command`, refactor similar roles https://review.opendev.org/c/zuul/zuul-jobs/+/94149021:54
@aureliojargas:matrix.orgClark corvus the other day we've talked about deduplicating the code from all the `ensure-<python-command>` roles. That's my shot at it 👆️22:18
@aureliojargas:matrix.orgI've documented it in the commit message, but also added some comments to the diff to explain some decisions.22:19
@aureliojargas:matrix.orgAt the end it is almost a full refactor, but I needed to do things differently in two specific places: not overloading a var as being both input and output, and improving the "already installed" command detection.22:20
@aureliojargas:matrix.orgThe deduped roles are: `ensure-nox`, `ensure-poetry`, `ensure-pyproject-build`, `ensure-twine`, `ensure-uv` (I've left `ensure-tox` out because of the Python 2 thingy we discussed the other day)22:24
@jim:acmegating.comAurelio Jargas: thanks!22:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!