@yoctozepto:matrix.org | morning Zuulers; any preference regarding https://lists.zuul-ci.org/archives/list/zuul-discuss@lists.zuul-ci.org/thread/WUWBM5F3PXXDLKK6JNSP4UR4VTWDNPZ4/ ? I do not know how you handle such decisions to be honest... | 06:34 |
---|---|---|
-@gerrit:opendev.org- Flavio Percoco Premoli proposed: [zuul/zuul] 885455: Use built-in URL data type instead of custom parse https://review.opendev.org/c/zuul/zuul/+/885455 | 06:36 | |
@muneerefx:matrix.org | Hi all,i am muneer and new to this technology | 06:43 |
@flaper87:matrix.org | > <@yoctozepto:matrix.org> morning Zuulers; any preference regarding https://lists.zuul-ci.org/archives/list/zuul-discuss@lists.zuul-ci.org/thread/WUWBM5F3PXXDLKK6JNSP4UR4VTWDNPZ4/ ? I do not know how you handle such decisions to be honest... | 06:44 |
I'm not in the mailing list (yet?) so dropping a comment here. I think it's safe, at this point, to drop support for Helm2 (assuming there are no other things using it). Helm 3 has been around for what it feels like ages at this point and it's been a long while since I ran into a Helm 2 only chart. $0.02 | ||
@yoctozepto:matrix.org | thanks flaper87 my point exactly :-) | 06:45 |
@yoctozepto:matrix.org | and I recommend you join the mailing list; it's good for async discussions | 06:45 |
@yoctozepto:matrix.org | although this time I received no reply :D | 06:46 |
@yoctozepto:matrix.org | and I seem to be impatient... | 06:46 |
@yoctozepto:matrix.org | :D | 06:46 |
@muneerefx:matrix.org | any video for learning zuul | 07:08 |
@flaper87:matrix.org | > <@muneerefx:matrix.org> any video for learning zuul | 07:10 |
I think the `docker-compose.yaml` in the examples dir is quite useful. I'd highly recommend you going through the tutorial https://zuul-ci.org/docs/zuul/latest/tutorials/quick-start.html | ||
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 885423: Clear tree cache queues on disconnect https://review.opendev.org/c/zuul/nodepool/+/885423 | 07:11 | |
@yoctozepto:matrix.org | muneerefx: it also depends on whether you want to run/maintain/deploy/administer zuul or "just use" zuul maintained by somebody else | 07:13 |
@jjbeckman:matrix.org | Hi folks, | 07:44 |
I've done a little troubleshooting regarding my problem where simple jobs(e.x. `echo foo`) are taking over 30 seconds, in my Kubernetes based Zuul setup. | ||
After adding, `-v=9` to this `kubectl port-forward` command, mysteriously, the 30+ second delay went away. Removing `-v=9` brings the issue back. | ||
https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L412 | ||
While this does not make sense to me at this point, could `kubectl port-forward` which I understand is required to stream logs to the web UI not working correctly be the reason I have this 30+ second delay? | ||
By the way, the web console has never worked for me, I just get multiple lines of `--- END OF STREAM ---`. | ||
@avass:matrix.vassast.org | > <@jjbeckman:matrix.org> Hi folks, | 08:10 |
> | ||
> I've done a little troubleshooting regarding my problem where simple jobs(e.x. `echo foo`) are taking over 30 seconds, in my Kubernetes based Zuul setup. | ||
> | ||
> After adding, `-v=9` to this `kubectl port-forward` command, mysteriously, the 30+ second delay went away. Removing `-v=9` brings the issue back. | ||
> https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L412 | ||
> | ||
> While this does not make sense to me at this point, could `kubectl port-forward` which I understand is required to stream logs to the web UI not working correctly be the reason I have this 30+ second delay? | ||
> | ||
> By the way, the web console has never worked for me, I just get multiple lines of `--- END OF STREAM ---`. | ||
Sounds like an issue with the zuul executor not being able to connect to the log streaming port | ||
@avass:matrix.vassast.org | Port 7900 | 08:11 |
https://zuul-ci.org/docs/zuul/4.3.0/discussion/components.html#overview | ||
Not exactly sure how that works with kubernetes | ||
@avass:matrix.vassast.org | Did you get logs when you increased verbosity on kubectl port forwardor did that only remove the 30s timeout? | 08:14 |
@jjbeckman:matrix.org | Hi Albin, thanks for the advice. I see, let me look into whether the executor is unable to connect to port 7900. | 08:16 |
> Did you get logs when you increased verbosity on kubectl port forwardor did that only remove the 30s timeout? | ||
I wasn't able to tell any difference in the executor logs. Just that the jobs were completing in seconds, rather than 30+ seconds... which I know, doesn't make sense. | ||
@avass:matrix.vassast.org | So it's not only logs in zuul-web, but the actual job in the zuul-executor takes 30 seconds? | 08:18 |
@jjbeckman:matrix.org | > So it's not only logs in zuul-web, but the actual job in the zuul-executor takes 30 seconds? | 08:19 |
Yes. the duration shown in the zuul-web, and the duration shown in the executor logs match. | ||
@avass:matrix.vassast.org | In any case my thought is that some part times out, likely because of a lack of logs, and increasing verbosity creates some kind of logs which gets past the step of waiting for logs somewhere | 08:20 |
@jjbeckman:matrix.org | Here is an example. | 08:21 |
``` | ||
2023-05-31 07:19:58,953 DEBUG zuul.AnsibleJob.output: [e: 60cdeba0-ff83-11ed-8b5c-6d2bd0e764a2] [build: 0a404664fcc746308cd49be128bfe325] Ansible output: b'TASK [Test] ********************************************************************' | ||
2023-05-31 07:20:30,877 DEBUG zuul.AnsibleJob.output: [e: 60cdeba0-ff83-11ed-8b5c-6d2bd0e764a2] [build: 0a404664fcc746308cd49be128bfe325] Ansible output: b'ok: [debian-bullseye] => {"changed": false, "cmd": ["echo", "foo"], "delta": "0:00:00.005171", "end": "2023-05-31 07:20:00.479927", "msg": "", "rc": 0, "start": "2023-05-31 07:20:00.474756", "stderr": "", "stderr_lines": [], "stdout": "foo", "stdout_lines": ["foo"], "zuul_log_id": "1a9f276b-d811-b3b3-b464-00000000000c-1-debianbullseye"}' | ||
``` | ||
As you can see "Test" takes 32 seconds to complete. But `delta` is only 0.005171 seconds. Something that doesn't appear in the logs is happenning... | ||
@jjbeckman:matrix.org | > In any case my thought is that some part times out, likely because of a lack of logs, and increasing verbosity creates some kind of logs which gets past the step of waiting for logs somewhere | 08:22 |
I see... I guess that's a possibility... | ||
@jjbeckman:matrix.org | * Here is an example. | 08:22 |
``` | ||
2023-05-31 07:19:58,953 DEBUG zuul.AnsibleJob.output: [e: 60cdeba0-ff83-11ed-8b5c-6d2bd0e764a2] [build: 0a404664fcc746308cd49be128bfe325] Ansible output: b'TASK [Test] ********************************************************************' | ||
2023-05-31 07:20:30,877 DEBUG zuul.AnsibleJob.output: [e: 60cdeba0-ff83-11ed-8b5c-6d2bd0e764a2] [build: 0a404664fcc746308cd49be128bfe325] Ansible output: b'ok: [debian-bullseye] => {"changed": false, "cmd": ["echo", "foo"], "delta": "0:00:00.005171", "end": "2023-05-31 07:20:00.479927", "msg": "", "rc": 0, "start": "2023-05-31 07:20:00.474756", "stderr": "", "stderr_lines": [], "stdout": "foo", "stdout_lines": ["foo"], "zuul_log_id": "1a9f276b-d811-b3b3-b464-00000000000c-1-debianbullseye"}' | ||
``` | ||
As you can see "Test" takes 32 seconds to complete. But `delta` you can see on line 2 is only 0.005171 seconds. Something that doesn't appear in the logs is happenning... | ||
@avass:matrix.vassast.org | In any case here's a link to the Ansible callback responsible for log streaming: https://opendev.org/zuul/zuul/src/commit/2bdc98b6d3f0c565813e6f2d234866539ba7337a/zuul/ansible/base/callback/zuul_stream.py#L344 | 08:27 |
@jjbeckman:matrix.org | Thanks a lot Albin! Let me look into what you have shared. | 08:29 |
@flaper87:matrix.org | jjbeckman: I recently went through an issue with the console. The solution was to make sure it is the *very first* task you run. It's got to be the very first one so it can start and the executor can run the `port-forward` to connect to it. If, for whatever reason, the `port-forward` is run before the console starts, then you won't get anything on the console | 08:43 |
@jjbeckman:matrix.org | Hi flaper87, thanks for your advice. | 09:15 |
> The solution was to make sure it is the very first task you run. | ||
I'm a bit confused with this bit though. Zuul automatically configures Ansible to execute `kubectl port-forward` as far as I can see by reading the source code, and as a user, I am not able to specify where to execute it in the playbooks. Hope that makes sense... | ||
@jjbeckman:matrix.org | Unsure if related, but I confirmed that from `fingergw`, accessing `executor:7900` takes exactly 10 seconds. Really slow. | 09:16 |
``` | ||
root@zuul-fingergw-795bb7884c-fsvt7:/# time openssl s_client -connect zuul-executor:7900 | ||
CONNECTED(00000003) | ||
... | ||
--- | ||
real 0m10.010s | ||
user 0m0.005s | ||
sys 0m0.000s | ||
``` | ||
-@gerrit:opendev.org- Tobias Henkel proposed: [zuul/nodepool] 885736: Fix typo that crashes playback worker when under load https://review.opendev.org/c/zuul/nodepool/+/885736 | 09:18 | |
@flaper87:matrix.org | mmh, the executor should execute the port-forward on its own. | 09:35 |
@flaper87:matrix.org | https://opendev.org/zuul/zuul/src/commit/0bd76048d10e12d1c914b199582f46f12fd3f732/zuul/executor/server.py#L414 | 09:36 |
@flaper87:matrix.org | grep for `forward` in the executor logs to see if there was an error while trying to run it | 09:37 |
@jjbeckman:matrix.org | Yes, hence my confusion with the advice "make sure it is the first task". How can I change the behavior of what is built in to Zuul? | 09:38 |
@jjbeckman:matrix.org | > grep for forward in the executor logs to see if there was an error while trying to run it | 09:38 |
``` | ||
2023-06-09 09:26:33,083 INFO zuul.ExecutorServer: [e: b0926af0-06a7-11ee-8b60-c4887a70d41f] [build: 0eb0ad20e3db465e9edb43a4e955696d] Started Kubectl port forward on port 46709 | ||
2023-06-09 09:27:33,274 DEBUG zuul.ExecutorServer: [e: b0926af0-06a7-11ee-8b60-c4887a70d41f] [build: 0eb0ad20e3db465e9edb43a4e955696d] Rest of kubectl port forward output was: Forwarding from [::1]:46709 -> 19885 | ||
2023-06-09 09:27:33,274 DEBUG zuul.ExecutorServer: E0609 09:26:41.638892 6409 portforward.go:406] an error occurred forwarding 46709 -> 19885: error forwarding port 19885 to pod 60a688d104bccf842deab568632e4497b36f632f297b85ee21a417f8289bc206, uid : failed to execute portforward in network namespace "/var/run/netns/cni-dcf196fd-bc77-2c54-096d-c417f951d682": failed to connect to localhost:19885 inside namespace "60a688d104bccf842deab568632e4497b36f632f297b85ee21a417f8289bc206", IPv4: dial tcp4 127.0.0.1:19885: connect: connection refused IPv6 dial tcp6: address localhost: no suitable address found | ||
2023-06-09 09:27:33,274 DEBUG zuul.ExecutorServer: E0609 09:26:41.639241 6409 portforward.go:234] lost connection to pod | ||
2023-06-09 09:28:03,250 INFO zuul.ExecutorServer: [e: e9b9097e-06a7-11ee-9712-dda02c6cdd52] [build: 5ca2a2417409418f9a9b16267b8bd910] Started Kubectl port forward on port 33573 | ||
2023-06-09 09:29:03,740 DEBUG zuul.ExecutorServer: [e: e9b9097e-06a7-11ee-9712-dda02c6cdd52] [build: 5ca2a2417409418f9a9b16267b8bd910] Rest of kubectl port forward output was: Forwarding from [::1]:33573 -> 19885 | ||
2023-06-09 09:29:03,740 DEBUG zuul.ExecutorServer: E0609 09:28:12.200585 7553 portforward.go:406] an error occurred forwarding 33573 -> 19885: error forwarding port 19885 to pod 54d4f6853fc013ced5c1dfd183f6f0a448558085f2edbf914014952238562a09, uid : failed to execute portforward in network namespace "/var/run/netns/cni-6d458b82-4fac-e9fa-1acf-79e66a42368c": failed to connect to localhost:19885 inside namespace "54d4f6853fc013ced5c1dfd183f6f0a448558085f2edbf914014952238562a09", IPv4: dial tcp4 127.0.0.1:19885: connect: connection refused IPv6 dial tcp6: address localhost: no suitable address found | ||
2023-06-09 09:29:03,740 DEBUG zuul.ExecutorServer: E0609 09:28:12.200912 7553 portforward.go:234] lost connection to pod | ||
``` | ||
@jjbeckman:matrix.org | I see errors, but they only occur after the pipeline has been executed. | 09:39 |
@jjbeckman:matrix.org | And don't explain why each job is so slow. | 09:39 |
@flaper87:matrix.org | This is a bit annoying and I've been meaning to send a PR. The job is slow because the executor is looping forever waiting for the console and it will consider it complete once it gives up on the stream. By fixing my zuul_console issue I took jobs from 7mins to1m :) | 09:42 |
@flaper87:matrix.org | Zuul console needs to run first so that it is launched in the "node" (pod) before the executor attempts to launch the port-forward | 09:42 |
@flaper87:matrix.org | Did you check that the console is actually runniing in the pod? | 09:42 |
@jjbeckman:matrix.org | I... see... I would very much like to solve my zuul_console issue as well :) | 09:44 |
@jjbeckman:matrix.org | > Zuul console needs to run first so that it is launched in the "node" (pod) before the executor attempts to launch the port-forward | 09:44 |
So Ineed to tweak the executor source code? | ||
@jjbeckman:matrix.org | > Did you check that the console is actually runniing in the pod? | 09:44 |
I wasn't aware this was a thing. There should be a zuul_console process running in the node pod? | ||
@jjbeckman:matrix.org | > Zuul console needs to run first so that it is launched in the "node" (pod) before the executor attempts to launch the port-forward | 09:45 |
* > Zuul console needs to run first so that it is launched in the "node" (pod) before the executor attempts to launch the port-forward | ||
So I need to tweak the executor source code? | ||
@flaper87:matrix.org | This is what my pre.yaml for the base task look like: | 09:46 |
@flaper87:matrix.org | well, at least a portion of it | 09:46 |
@flaper87:matrix.org | Once you have that, you can exec into one of the CI pods and run `ps aux` to get all the processes. You should see the console one | 09:47 |
@flaper87:matrix.org | Maybe get the output of listening ports to see if it's actually listening on the port | 09:47 |
@flaper87:matrix.org | `ss -lp | grep ...` | 09:48 |
@flaper87:matrix.org | * `ss -lp | grep .19885` | 09:48 |
@flaper87:matrix.org | * `ss -lp | grep 19885` | 09:48 |
@jjbeckman:matrix.org | Ahhh, I see what you mean now, thanks so much. | 09:50 |
@jjbeckman:matrix.org | I need to run now, but will definitely try what you've suggested. | 09:50 |
@yoctozepto:matrix.org | > <@yoctozepto:matrix.org> morning Zuulers; any preference regarding https://lists.zuul-ci.org/archives/list/zuul-discuss@lists.zuul-ci.org/thread/WUWBM5F3PXXDLKK6JNSP4UR4VTWDNPZ4/ ? I do not know how you handle such decisions to be honest... | 12:21 |
refreshing the message in case of more eyes in this channel at this time | ||
@avass:matrix.vassast.org | I think it makes sense to remove helm v2 since support stopped ~3 years ago. | 12:26 |
@yoctozepto:matrix.org | thanks, Albin | 12:31 |
@fungicide:matrix.org | not kubernetes, but similar situation in opendev as well: https://opendev.org/opendev/base-jobs/src/commit/ca59b60/playbooks/base/pre.yaml#L45 | 13:38 |
@jim:acmegating.com | muneerefx: https://www.youtube.com/watch?v=vb0Iuf-6wHs&pp=ygUHenV1bCBjaQ%3D%3D is based on the tutorial | 13:43 |
@jim:acmegating.com | jjbeckman: https://zuul-ci.org/docs/zuul/latest/operation.html#log-streaming has some reference information | 13:44 |
-@gerrit:opendev.org- Zuul merged on behalf of Tobias Henkel: [zuul/nodepool] 885736: Fix typo that crashes playback worker when under load https://review.opendev.org/c/zuul/nodepool/+/885736 | 14:29 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!