@picog:matrix.org | Hi guys. | 06:00 |
---|---|---|
I'm still seeing zuul containers crash every night, would love to understand what is happening here. | ||
``` | ||
podman ps -a | ||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | ||
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1 | ||
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1 | ||
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1 | ||
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1 | ||
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1 | ||
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1 | ||
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1 | ||
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1 | ||
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1 | ||
``` | ||
gerrit container logs (Can't see anything interesting) | ||
``` | ||
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ] | ||
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ] | ||
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ] | ||
``` | ||
gerrit_config: | ||
``` | ||
podman logs --tail 20 c3ae94ef5c28 | ||
ok: [localhost] | ||
TASK [Create temp dir for Gerrit config update] ******************************** | ||
changed: [localhost] | ||
TASK [Set All-Project repo location] ******************************************* | ||
ok: [localhost] | ||
TASK [Checkout All-Projects config] ******************************************** | ||
changed: [localhost] | ||
TASK [Copy new All-Projects config into place] ********************************* | ||
ok: [localhost] | ||
TASK [Update All-Projects config in Gerrit] ************************************ | ||
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]} | ||
PLAY RECAP ********************************************************************* | ||
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0``` | ||
Scheduler quits because it can't connect to gerrit? | ||
``` | ||
podman logs --tail 20 ccc15f05723a | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher_election.run(self._run) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, *args, **kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(*args, **kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in _run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname, | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to_try = list(self._families_and_addresses(hostname, port)) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in _families_and_addresses | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo( | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in _socket.getaddrinfo(host, port, family, type, proto, flags): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: [Errno -2] Name or service not known | ||
``` | ||
Sorry, for the long message, should I file this info somewhere else? | ||
@picog:matrix.org | * Hi guys. | 06:00 |
I'm still seeing zuul containers crash every night, would love to understand what is happening here. | ||
``` | ||
podman ps -a | ||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | ||
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1 | ||
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1 | ||
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1 | ||
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1 | ||
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1 | ||
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1 | ||
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1 | ||
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1 | ||
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1 | ||
``` | ||
gerrit container logs (Can't see anything interesting) | ||
``` | ||
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ] | ||
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ] | ||
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ] | ||
``` | ||
gerrit\_config: | ||
```` | ||
podman logs --tail 20 c3ae94ef5c28 | ||
ok: [localhost] | ||
TASK [Create temp dir for Gerrit config update] ******************************** | ||
changed: [localhost] | ||
TASK [Set All-Project repo location] ******************************************* | ||
ok: [localhost] | ||
TASK [Checkout All-Projects config] ******************************************** | ||
changed: [localhost] | ||
TASK [Copy new All-Projects config into place] ********************************* | ||
ok: [localhost] | ||
TASK [Update All-Projects config in Gerrit] ************************************ | ||
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]} | ||
PLAY RECAP ********************************************************************* | ||
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0``` | ||
Scheduler quits because it can't connect to gerrit? | ||
```` | ||
podman logs --tail 20 ccc15f05723a | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher\_election.run(self.\_run) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, \*args, \*\*kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(\*args, \*\*kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in \_run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname, | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to\_try = list(self.\_families\_and\_addresses(hostname, port)) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in \_families\_and\_addresses | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo( | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in \_socket.getaddrinfo(host, port, family, type, proto, flags): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: \[Errno -2\] Name or service not known | ||
``` | ||
Sorry, for the long message, should I file this info somewhere else? | ||
@picog:matrix.org | * Hi guys. | 06:01 |
I'm still seeing zuul containers crash every night, would love to understand what is happening here. | ||
``` | ||
podman ps -a | ||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | ||
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1 | ||
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1 | ||
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1 | ||
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1 | ||
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1 | ||
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1 | ||
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1 | ||
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1 | ||
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1 | ||
``` | ||
gerrit container logs (Can't see anything interesting) | ||
``` | ||
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ] | ||
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ] | ||
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ] | ||
``` | ||
gerrit\_config: | ||
```` | ||
podman logs --tail 20 c3ae94ef5c28 | ||
ok: [localhost] | ||
TASK [Create temp dir for Gerrit config update] ******************************** | ||
changed: [localhost] | ||
TASK [Set All-Project repo location] ******************************************* | ||
ok: [localhost] | ||
TASK [Checkout All-Projects config] ******************************************** | ||
changed: [localhost] | ||
TASK [Copy new All-Projects config into place] ********************************* | ||
ok: [localhost] | ||
TASK [Update All-Projects config in Gerrit] ************************************ | ||
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]} | ||
PLAY RECAP ********************************************************************* | ||
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0``` | ||
Scheduler quits because it can't connect to gerrit? | ||
``` | ||
podman logs --tail 20 ccc15f05723a | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher\_election.run(self.\_run) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, \*args, \*\*kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(\*args, \*\*kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in \_run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname, | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to\_try = list(self.\_families\_and\_addresses(hostname, port)) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in \_families\_and\_addresses | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo( | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in \_socket.getaddrinfo(host, port, family, type, proto, flags): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: \[Errno -2\] Name or service not known | ||
``` | ||
Sorry, for the long message, should I file this info somewhere else? | ||
@picog:matrix.org | * Hi guys. | 06:02 |
I'm still seeing zuul containers crash every night, would love to understand what is happening here. | ||
``` | ||
podman ps -a | ||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | ||
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1 | ||
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1 | ||
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1 | ||
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1 | ||
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1 | ||
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1 | ||
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1 | ||
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1 | ||
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1 | ||
``` | ||
gerrit container logs (Can't see anything interesting) | ||
``` | ||
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ] | ||
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ] | ||
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ] | ||
``` | ||
gerrit\_config: | ||
``` | ||
podman logs --tail 20 c3ae94ef5c28 | ||
ok: [localhost] | ||
TASK [Create temp dir for Gerrit config update] ******************************** | ||
changed: [localhost] | ||
TASK [Set All-Project repo location] ******************************************* | ||
ok: [localhost] | ||
TASK [Checkout All-Projects config] ******************************************** | ||
changed: [localhost] | ||
TASK [Copy new All-Projects config into place] ********************************* | ||
ok: [localhost] | ||
TASK [Update All-Projects config in Gerrit] ************************************ | ||
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]} | ||
PLAY RECAP ********************************************************************* | ||
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0``` | ||
Scheduler quits because it can't connect to gerrit? | ||
``` | ||
podman logs --tail 20 ccc15f05723a | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher\_election.run(self.\_run) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, \*args, \*\*kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(\*args, \*\*kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in \_run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname, | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to\_try = list(self.\_families\_and\_addresses(hostname, port)) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in \_families\_and\_addresses | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo( | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in \_socket.getaddrinfo(host, port, family, type, proto, flags): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: \[Errno -2\] Name or service not known | ||
``` | ||
Sorry, for the long message, should I file this info somewhere else? | ||
@picog:matrix.org | * Hi guys. | 06:03 |
I'm still seeing zuul containers crash every night, would love to understand what is happening here. | ||
``` | ||
podman ps -a | ||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | ||
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1 | ||
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1 | ||
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1 | ||
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1 | ||
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1 | ||
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1 | ||
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1 | ||
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1 | ||
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1 | ||
``` | ||
gerrit container logs (Can't see anything interesting) | ||
``` | ||
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ] | ||
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ] | ||
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ] | ||
``` | ||
gerrit\_config: | ||
```` | ||
podman logs --tail 20 c3ae94ef5c28 | ||
ok: [localhost] | ||
TASK [Create temp dir for Gerrit config update] ******************************** | ||
changed: [localhost] | ||
TASK [Set All-Project repo location] ******************************************* | ||
ok: [localhost] | ||
TASK [Checkout All-Projects config] ******************************************** | ||
changed: [localhost] | ||
TASK [Copy new All-Projects config into place] ********************************* | ||
ok: [localhost] | ||
TASK [Update All-Projects config in Gerrit] ************************************ | ||
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]} | ||
PLAY RECAP ********************************************************************* | ||
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 | ||
``` | ||
Scheduler quits because it can't connect to gerrit? | ||
``` | ||
podman logs --tail 20 ccc15f05723a | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher\_election.run(self.\_run) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, \*args, \*\*kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(\*args, \*\*kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in \_run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname, | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to\_try = list(self.\_families\_and\_addresses(hostname, port)) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in \_families\_and\_addresses | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo( | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in \_socket.getaddrinfo(host, port, family, type, proto, flags): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: \[Errno -2\] Name or service not known | ||
``` | ||
Sorry, for the long message, should I file this info somewhere else? | ||
@picog:matrix.org | Hi guys. | 06:06 |
I'm still seeing zuul containers crash every night, would love to understand what is happening here. | ||
``` | ||
podman ps -a | ||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | ||
f41107fe744e docker.io/gerritcodereview/gerrit:latest 22 hours ago Exited (137) 3 hours ago 0.0.0.0:8080->8080/tcp, 0.0.0.0:29418->29418/tcp zuul_gerrit_1 | ||
3b95446473e3 docker.io/library/zookeeper:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_zk_1 | ||
3f30fca7aaa4 docker.io/library/mariadb:latest mariadbd 22 hours ago Up 22 hours zuul_mysql_1 | ||
c98938fba7bc localhost/zuul_logs:latest httpd-foreground 22 hours ago Up 22 hours 0.0.0.0:8000->80/tcp zuul_logs_1 | ||
c3ae94ef5c28 quay.io/zuul-ci/zuul-executor:latest /usr/local/lib/zu... 22 hours ago Exited (2) 22 hours ago zuul_gerritconfig_1 | ||
5bc728574e98 quay.io/zuul-ci/nodepool-launcher:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:8005->8005/tcp zuul_launcher_1 | ||
ccc15f05723a quay.io/zuul-ci/zuul-scheduler:latest sh -c /var/playbo... 22 hours ago Exited (139) 3 hours ago zuul_scheduler_1 | ||
ddfce79beac9 quay.io/zuul-ci/zuul-web:latest sh -c /var/playbo... 22 hours ago Up 22 hours 0.0.0.0:9000->9000/tcp zuul_web_1 | ||
a718dabae24b localhost/zuul_executor:latest sh -c /var/playbo... 22 hours ago Up 22 hours zuul_executor_1 | ||
``` | ||
gerrit container logs (Can't see anything interesting) | ||
``` | ||
[2024-04-23T16:08:56.233Z] [HTTP POST /a/changes/zuul-config~master~I875745a421c5eba6457e10b93d2c56e43373aff7/revisions/8096783323b3f1b2ef39ec67d0bb (zuul from 10.89.0.1)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.commentAddedHook resolved to /var/gerrit/hooks/comment-added [CONTEXT project="zuul-config" request="REST /changes/*/revisions/*/review" ] | ||
[2024-04-23T16:38:23.222Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.submitHook resolved to /var/gerrit/hooks/submit [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:38:23.507Z] [HTTP POST /changes/zuul-config~261/revisions/1/submit (pgeyer from 172.16.20.10)] INFO com.googlesource.gerrit.plugins.hooks.HookFactory : hooks.changeMergedHook resolved to /var/gerrit/hooks/change-merged [CONTEXT SUBMISSION_ID="261-1713890303193-993ea61c" project="zuul-config" request="REST /changes/*/revisions/*/submit" ] | ||
[2024-04-23T16:52:17.489Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: new: 1 (\) [CONTEXT ratelimit_period="1 MINUTES [skipped: 7]" ] | ||
[2024-04-23T19:34:59.257Z] [SSH git-receive-pack /zuul-config (pgeyer)] INFO com.google.gerrit.server.git.MultiProgressMonitor : Processing changes: refs: 1, new: 1 [CONTEXT ratelimit_period="1 MINUTES [skipped: 4]" ] | ||
``` | ||
gerrit\_config: | ||
``` | ||
podman logs --tail 20 c3ae94ef5c28 | ||
ok: [localhost] | ||
TASK [Create temp dir for Gerrit config update] ******************************** | ||
changed: [localhost] | ||
TASK [Set All-Project repo location] ******************************************* | ||
ok: [localhost] | ||
TASK [Checkout All-Projects config] ******************************************** | ||
changed: [localhost] | ||
TASK [Copy new All-Projects config into place] ********************************* | ||
ok: [localhost] | ||
TASK [Update All-Projects config in Gerrit] ************************************ | ||
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -e\ngit config user.email 'admin@example.com'\ngit commit -a -m 'update config'\ngit push http://admin:secret@gerrit:8080/All-Projects +HEAD:refs/meta/config\n", "delta": "0:00:00.026840", "end": "2024-04-23 07:40:07.222905", "msg": "non-zero return code", "rc": 1, "start": "2024-04-23 07:40:07.196065", "stderr": "", "stderr_lines": [], "stdout": "Not currently on any branch.\nnothing to commit, working tree clean", "stdout_lines": ["Not currently on any branch.", "nothing to commit, working tree clean"]} | ||
PLAY RECAP ********************************************************************* | ||
localhost : ok=12 changed=3 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 | ||
``` | ||
Scheduler quits because it can't connect to gerrit? | ||
``` | ||
podman logs --tail 20 ccc15f05723a | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: Traceback (most recent call last): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 115, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: self.watcher\_election.run(self.\_run) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/zk/election.py", line 28, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: return super().run(func, \*args, \*\*kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/kazoo/recipe/election.py", line 54, in run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: func(\*args, \*\*kwargs) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/zuul/driver/gerrit/gerriteventssh.py", line 80, in \_run | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: client.connect(self.hostname, | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 377, in connect | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: to\_try = list(self.\_families\_and\_addresses(hostname, port)) | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/site-packages/paramiko/client.py", line 202, in \_families\_and\_addresses | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: addrinfos = socket.getaddrinfo( | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: for res in \_socket.getaddrinfo(host, port, family, type, proto, flags): | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
2024-04-24 02:40:59,180 ERROR zuul.GerritConnection.ssh: socket.gaierror: \[Errno -2\] Name or service not known | ||
``` | ||
Sorry, for the long message, should I file this info somewhere else? | ||
@picog:matrix.org | Maybe the gerrit_config container failing is irrelevant, I see it exits soon after starting even when I restart everything. Perhaps just used to do a one time setup? | 06:57 |
@picog:matrix.org | Oh, perhaps this is the real reason? | 07:07 |
``` | ||
zuul systemd-coredump[3228705]: [🡕] Process 3103907 (zuul-scheduler) of user 0 dumped core. | Apr 24 04:41 | |
Module /usr/local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1 without build-id. | ||
Module /usr/local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1 | ||
Module /usr/local/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so without build-id. | ||
Module /usr/local/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so | ||
Module /usr/local/lib/python3.11/site-packages/google/_upb/_message.abi3.so without build-id. | ||
Module /usr/local/lib/python3.11/site-packages/google/_upb/_message.abi3.so | ||
Stack trace of thread 88: | ||
#0 0x00007f8311ee583c n/a (/usr/lib/x86_64-linux-gnu/libc.so.6 + 0x8883c) | ||
ELF object binary architecture: AMD x86-64 | ||
``` | ||
@picog:matrix.org | * Oh, perhaps this is the real reason? | 07:11 |
``` | ||
zuul (sd-parse-elf)[3228707]: Could not parse number of program headers from core file: invalid `Elf' handle | Apr 24 04:41 | |
zuul (sd-parse-elf)[3228707]: Could not parse number of program headers from core file: invalid `Elf' handle | Apr 24 04:41 | |
zuul (sd-parse-elf)[3228707]: Could not parse number of program headers from core file: invalid `Elf' handle | Apr 24 04:41 | |
zuul systemd-coredump[3228705]: [🡕] Process 3103907 (zuul-scheduler) of user 0 dumped core. | Apr 24 04:41 | |
Module /usr/local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1 without build-id. | ||
Module /usr/local/lib/python3.11/site-packages/confluent_kafka.libs/librdkafka-55260171.so.1 | ||
Module /usr/local/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so without build-id. | ||
Module /usr/local/lib/python3.11/site-packages/confluent_kafka/cimpl.cpython-311-x86_64-linux-gnu.so | ||
Module /usr/local/lib/python3.11/site-packages/google/_upb/_message.abi3.so without build-id. | ||
Module /usr/local/lib/python3.11/site-packages/google/_upb/_message.abi3.so | ||
Stack trace of thread 88: | ||
#0 0x00007f8311ee583c n/a (/usr/lib/x86_64-linux-gnu/libc.so.6 + 0x8883c) | ||
ELF object binary architecture: AMD x86-64 | ||
``` | ||
-@gerrit:opendev.org- Felix Edel proposed: | 07:57 | |
- [zuul/zuul] 916744: Visualize branches in ChangeQueues https://review.opendev.org/c/zuul/zuul/+/916744 | ||
- [zuul/zuul] 916867: Implement admin actions (promote, dequeue) in new QueueItem component https://review.opendev.org/c/zuul/zuul/+/916867 | ||
-@gerrit:opendev.org- Felix Edel proposed: [zuul/zuul] 916867: Implement admin actions (promote, dequeue) in new QueueItem component https://review.opendev.org/c/zuul/zuul/+/916867 | 08:02 | |
-@gerrit:opendev.org- Christian Mueller proposed: [zuul/nodepool] 916801: WIP: enable EC2 Fleet API https://review.opendev.org/c/zuul/nodepool/+/916801 | 09:30 | |
@fungicide:matrix.org | we don't bound the confluent-kafka version in our image builds, so based on a reading of https://pypi.org/project/confluent-kafka/2.3.0/#files we should presumably be installing confluent_kafka-2.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl in our container images | 12:44 |
@fungicide:matrix.org | that's the latest version since october 2023 | 12:44 |
@fungicide:matrix.org | and that wheel does bundle a pre-built copy of confluent_kafka.libs/librdkafka-55260171.so.1 | 12:55 |
@fungicide:matrix.org | * and that wheel does bundle a pre-built copy of `confluent\_kafka.libs/librdkafka-55260171.so.1` | 12:55 |
@fungicide:matrix.org | * and that wheel does bundle a pre-built copy of `confluent_kafka.libs/librdkafka-55260171.so.1` | 12:56 |
@fungicide:matrix.org | there was a protobuf release much more recently, but still nearly a month ago so seems unlikely this is a new change there | 13:03 |
@picog:matrix.org | It's probably going to be quite hard to get a debug build right? If I switch out the python command to python3-dbg, would that help? | 13:03 |
@fungicide:matrix.org | i'd need to look into where the base images get their python builds from. the images themselves are based on debian bookworm, but i get the impression the cpython interpreter there is not from debian's own python3 packages, they may provide a separate image layer for debugging symbols | 13:07 |
@picog:matrix.org | I looked as if it was built from source, yes | 13:07 |
@fungicide:matrix.org | for approximately how long have you been observing this failure? a few days? weeks? longer? | 13:08 |
@picog:matrix.org | It seems to happen nightly, but this is a fairly new setup, so not longer than a week. | 13:09 |
@fungicide:matrix.org | have you checked dmesg for signs of oom killer activity or the like? | 13:09 |
@picog:matrix.org | I found the segfault when I looked through journalctl, saw nothing else. | 13:10 |
@picog:matrix.org | Are there any specific hardware requirements for zuul, I'm running in VM that reports a skylake cpu (probably from the host) | 13:12 |
``` | ||
Architecture: x86_64 | ||
CPU op-mode(s): 32-bit, 64-bit | ||
Address sizes: 40 bits physical, 48 bits virtual | ||
Byte Order: Little Endian | ||
CPU(s): 2 | ||
On-line CPU(s) list: 0,1 | ||
Vendor ID: GenuineIntel | ||
BIOS Vendor ID: QEMU | ||
Model name: Intel Core Processor (Skylake, IBRS) | ||
BIOS Model name: pc-q35-2.11 CPU @ 2.0GHz | ||
BIOS CPU family: 1 | ||
CPU family: 6 | ||
Model: 94 | ||
Thread(s) per core: 1 | ||
Core(s) per socket: 1 | ||
Socket(s): 2 | ||
Stepping: 3 | ||
BogoMIPS: 4399.99 | ||
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt | ||
tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti intel_ppin ibrs ibpb tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveop | ||
t arat vnmi | ||
``` | ||
@jim:acmegating.com | I build debug images of acme enterprise zuul for acme gating customers. here is the upstream change i use to do it: https://review.opendev.org/897859 -- you can pull that change locally if you want to make your own debug image. | 13:13 |
@picog:matrix.org | Very nice. | 13:15 |
@fungicide:matrix.org | comparing timestamps, it looks like there's about a 15-second delay between getting disconnected from gerrit in the scheduler container log and the coredump reported in the system journal, i suppose those are close enough together to potentially be related. wonder if the coredump is merely a side effect of how the process stopped | 13:21 |
@fungicide:matrix.org | Pico: could it be something simple, like some process (package upgrades?) restarting docker? | 13:22 |
@fungicide:matrix.org | i've seen unexpected docker restarts kill every running container | 13:23 |
@fungicide:matrix.org | your podman ps says both gerrit and zuul-scheduler containers stopped around the same time | 13:24 |
@fungicide:matrix.org | though i guess if you're running podman then there's no dockerd to get restarted | 13:30 |
@picog:matrix.org | I see this https://zuul-ci.org/docs/zuul/latest/tutorials/quick-start.html suggests 2GB of ram, but that's probably the bare minimum. | 13:31 |
I'm bumping this up to 4, or 8? | ||
@fungicide:matrix.org | regardless, it seems like the sequence is that connections to gerrit's ssh socket stopped working (perhaps when its container suddenly stopped), and then shortly thereafter the process in the zuul-scheduler container died ungracefully (possibly due to some outside signal) | 13:31 |
@picog:matrix.org | > <@fungicide:matrix.org> have you checked dmesg for signs of oom killer activity or the like? | 13:33 |
I didn't check properly before I responded, sorry. I see multiple events | ||
@fungicide:matrix.org | Pico: do you have anything tracking memory usage on that vm? maybe some internally-scheduled process caused memory usage to balloon, but why there wouldn't be oom killer messages in the journal i'm not sure | 13:33 |
@fungicide:matrix.org | oh, you see multiple oom killer events? yeah there's your next breadcrumb at least | 13:33 |
@picog:matrix.org | There are, I was just manually scrolling through the logs and didn't see it, now that I grep, there are a few. | 13:34 |
@fungicide:matrix.org | do they seem to coincide with when the containers stopped? do the process names correspond to things that would be running in the containers? but yes, regardless you'll want to get that sorted | 13:35 |
@fungicide:matrix.org | you could increase your available memory, but if there's something wrong causing utilization to grow unbounded then it may not do more than delay the issue | 13:36 |
@picog:matrix.org | Yeah, let me bump it up slightly and then keep an eye on it | 13:37 |
@fungicide:matrix.org | if you can afford to have more than one vm, you might be better off moving your zuul-executor process to a separate one | 13:37 |
@fungicide:matrix.org | it could be some job the executor is running around that time consuming all your memory, for example | 13:37 |
@picog:matrix.org | Funny, I think this explains another issue I was having with playbooks just stopping without any failure reason | 13:38 |
@fungicide:matrix.org | the various component services for zuul are designed to be able to be distributed across the network and can scale horizontally that way to increase capacity | 13:38 |
@picog:matrix.org | Out of memory: Killed process 3288515 (ansible-playboo) total-vm:1300052kB, anon-rss:885364kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:2272kB oom_score_adj:0 | 13:38 |
@fungicide:matrix.org | but the most volatile one, resource wise, is the executor of course since what it consumes will depend on job payloads | 13:39 |
@picog:matrix.org | Thanks a lot, I was on a bit of a wild goose chase | 13:39 |
@fungicide:matrix.org | and if your executor is on a separate vm, you get a little bit of added protection from a job eating every last byte of ram and killing all your other services on the same system | 13:40 |
@fungicide:matrix.org | the executor does have resource governors that can be used to try to prevent that from happening, but they're not bulletproof | 13:41 |
@picog:matrix.org | I will try that thanks | 13:45 |
@picog:matrix.org | Probably a stupid question, but don't immediately see the answer, where do I specify the hostname of the executor if it's running in a different vm? | 13:58 |
@jim:acmegating.com | Pico: you point the components at the same zookeeper cluster. basically they should just all have the same zuul.conf. they will figure out how to talk to each other that way. | 13:59 |
@picog:matrix.org | Okay, interesting. | 14:00 |
@jim:acmegating.com | though obviously, your zuul.conf shouldn't point to a zookeeper on localhost in that case; that's the hostname you'll need to set. | 14:10 |
-@gerrit:opendev.org- Christian Mueller proposed: [zuul/nodepool] 916801: WIP: enable EC2 Fleet API https://review.opendev.org/c/zuul/nodepool/+/916801 | 15:23 | |
@picog:matrix.org | > <@jim:acmegating.com> though obviously, your zuul.conf shouldn't point to a zookeeper on localhost in that case; that's the hostname you'll need to set. | 15:42 |
There is a single zookeeper instance right? | ||
@jim:acmegating.com | Pico: yes; there is a single zookeeper quorum (which is a cluster of zk servers acting in concert). in the zuul quickstart, that is configured as a quorum with a single server. like all parts of zuul, that can be scaled as necessary. | 16:03 |
@picog:matrix.org | > <@jim:acmegating.com> Pico: yes; there is a single zookeeper quorum (which is a cluster of zk servers acting in concert). in the zuul quickstart, that is configured as a quorum with a single server. like all parts of zuul, that can be scaled as necessary. | 16:06 |
``` | ||
[zookeeper] | ||
hosts=zk:2281 | ||
tls_cert=/var/certs/certs/client.pem | ||
tls_key=/var/certs/keys/clientkey.pem | ||
tls_ca=/var/certs/certs/cacert.pem | ||
``` | ||
So if I instantiate a new vm for the executor, I just need to edit this "zk" to point to the old zk instance which will now listen on the host network. | ||
@f2ked:matrix.org | sadly, this still does not work. | 17:39 |
the nodes even make the `/tmp/console-*.log` files. | ||
the executor logs do not mention console or the port number | ||
could it be the web server? | ||
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 916344: Gerrit: skip ref-updated /meta events https://review.opendev.org/c/zuul/zuul/+/916344 | 19:50 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 21:17 | |
- [zuul/nodepool] 916949: Add min-retention-time to metastatic driver https://review.opendev.org/c/zuul/nodepool/+/916949 | ||
- [zuul/nodepool] 916950: Add max-age to metastatic driver https://review.opendev.org/c/zuul/nodepool/+/916950 | ||
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 916008: Demote launch/delete timeeouts to warnings https://review.opendev.org/c/zuul/nodepool/+/916008 | 23:08 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl: [zuul/zuul] 914947: Temporarily pin urllib3 != 2.1.0 https://review.opendev.org/c/zuul/zuul/+/914947 | 23:10 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl: [zuul/zuul] 914947: Temporarily pin urllib3 != 2.1.0 https://review.opendev.org/c/zuul/zuul/+/914947 | 23:11 | |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 916343: Demote launch keyscan exceptions to warnings https://review.opendev.org/c/zuul/nodepool/+/916343 | 23:15 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!