-@gerrit:opendev.org- Tristan Cacqueray https://matrix.to/#/@tristanc_:matrix.org proposed: [zuul/zuul-jobs] 847111: Update ensure-ghc to use ghcup https://review.opendev.org/c/zuul/zuul-jobs/+/847111 | 00:31 | |
@tristanc_:matrix.org | Clark: here is the change to stop using the fedora packages for ensure-haskell ^ | 00:33 |
---|---|---|
@tristanc_:matrix.org | And here was the announce: https://lists.zuul-ci.org/archives/list/zuul-discuss@lists.zuul-ci.org/thread/AWC4PQAC24Z7NFORVXR4HIDS5JE4UWNK/ | 00:34 |
-@gerrit:opendev.org- Tristan Cacqueray https://matrix.to/#/@tristanc_:matrix.org proposed: [zuul/zuul-jobs] 847111: Update ensure-ghc to use ghcup https://review.opendev.org/c/zuul/zuul-jobs/+/847111 | 00:40 | |
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 929900: Respect fail-fast setting for node failures https://review.opendev.org/c/zuul/zuul/+/929900 | 05:46 | |
-@gerrit:opendev.org- Simon Westphahl proposed: | 05:50 | |
- [zuul/zuul] 928913: Retry node launches on failure https://review.opendev.org/c/zuul/zuul/+/928913 | ||
- [zuul/zuul] 929267: Avoid launcher busy-looping due to constant wake https://review.opendev.org/c/zuul/zuul/+/929267 | ||
- [zuul/zuul] 929273: Add support for 'min-ready' provider nodes https://review.opendev.org/c/zuul/zuul/+/929273 | ||
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: | 05:50 | |
- [zuul/zuul] 928959: Inherit some attributes from provider configs https://review.opendev.org/c/zuul/zuul/+/928959 | ||
- [zuul/zuul] 928971: Add missing AWS image configuration options https://review.opendev.org/c/zuul/zuul/+/928971 | ||
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 929273: Add support for 'min-ready' provider nodes https://review.opendev.org/c/zuul/zuul/+/929273 | 08:13 | |
-@gerrit:opendev.org- Fredrik Medley proposed: [zuul/zuul] 929407: Inject queue_item_id and enqueue_time to inventory zuul.items https://review.opendev.org/c/zuul/zuul/+/929407 | 10:32 | |
-@gerrit:opendev.org- Benedikt Löffler proposed: [zuul/nodepool] 930289: Increase threadpool for ImageUploader https://review.opendev.org/c/zuul/nodepool/+/930289 | 10:59 | |
@jkkadgar:matrix.org | When using the ansible "until" keyword, the zuul live log output will only show the first try and then is silent for all the retries. The final console log will show the last successful retry in console output. This causes confusion for long running tasks we have that retry when a user is looking at the live log. Default ansible behavior shows "FAILED - RETRYING ... (5 retries left)." Has this been noticed by anyone and or is there a workaround? | 15:42 |
@clarkb:matrix.org | https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/build-container-image/tasks/push.yaml here is a playbook that uses until that runs fairly often that we should have example logs for however I suspect that 99% of the time it succeeds on the first try | 15:46 |
@clarkb:matrix.org | and no I haven't noticed that behavior myself | 15:46 |
@clarkb:matrix.org | https://opendev.org/zuul/zuul-registry/src/branch/master/playbooks/functional-test/docker-buildset.yaml#L28-L36 heres another that might be easier to trigger multiple retries for. I suspect we don't have any zuul-registry logs from the last 30 days but you could push a trivial change to get that to run then inspect the results to try and reproduce | 15:49 |
@jangutter:matrix.org | I've noticed that in ansible loops too, but I can't remember exactly if it was when including something or so. I think I saw it quite a while ago, looping over an include_role. | 15:52 |
@jangutter:matrix.org | Streaming large amount of things on the console feels like an antipattern to me. Sadly we do it a lot - partly in order to check if things are "still alive and running". | 15:54 |
@jkkadgar:matrix.org | I am simply expecting Zuul to do the same thing as running ansible locally. Ansible will output: "FAILED - RETRYING: Loop until job is complete (5 retries left) in the console, but I think Zuul console streamer probably ignores that output maybe? | 15:57 |
@jangutter:matrix.org | I'll need to check the code, but it might be hooking into one of the task reporting plugins that only get information when the task is complete. | 15:58 |
@clarkb:matrix.org | there are also test cases for console streaming in zuul's test suite that might be easier to manipulate to reproduce and debug with | 15:59 |
@clarkb:matrix.org | https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/action/command.py#L41-L43 might explain it though? | 16:01 |
@jangutter:matrix.org | Yeah, that would explain it on the command task. For "retries" there's possibly a different but similar explanation. | 16:02 |
@clarkb:matrix.org | command tasks are the only ones supported by the streaming stuff | 16:02 |
@jkkadgar:matrix.org | Ahh interesting, I am wondering if that was intentional for some other reason | 16:02 |
@clarkb:matrix.org | everything else is reported after the fact | 16:02 |
@jangutter:matrix.org | Ah, that answers that. | 16:03 |
@jangutter:matrix.org | So retries aren't streamed. | 16:03 |
@clarkb:matrix.org | https://opendev.org/zuul/zuul/commit/f8677cddb9dd397081acc30c47128b83b4061ae1 | 16:03 |
@clarkb:matrix.org | though that change is simply trying to improve existing behavior so doesn't explain why this is done | 16:04 |
@clarkb:matrix.org | https://opendev.org/zuul/zuul/commit/1607143b213db0d63ab93389dda9939d3bcde7ec that is the reason I think | 16:05 |
@clarkb:matrix.org | oh wait no that is still moving code around quite the thread to pull on | 16:05 |
@clarkb:matrix.org | https://opendev.org/zuul/zuul/commit/0ae7a157e2dc3c8c8779bc7b98ab7e24da40e087 | 16:06 |
@clarkb:matrix.org | jkkadgar: it almost sounds like you're running an older zuul prior to that commit based on having the first one logged. But the commit message says there that the reason is there aren't hook points for loop iteration starts | 16:07 |
@jkkadgar:matrix.org | I am on 10.2.0 currently | 16:07 |
@jkkadgar:matrix.org | Ok so based off what I read it sounds like without significant rework this isn't possible to get | 16:08 |
@jangutter:matrix.org | Yeah - the zuul executor, in general, does not have a 1:1 correlation with an ansible-playbook command's output. It tends to use the result of the task (where all the info is collected) and has the special in-task streaming output for the command task in certain cases. The integration is deep enough that it's a bit complex to untangle. | 16:09 |
@clarkb:matrix.org | also I don't think ansible itself does live streaming. It only outputs when tasks or loop iterations complete. | 16:11 |
@jangutter:matrix.org | I think it prints retries, but I'm not sure where exactly that output callback happens. | 16:11 |
@clarkb:matrix.org | it might be worth checking that newer ansible hasn't added new hook points for loop iterations to make this possible. That seems unlikely but it could have happened | 16:12 |
@clarkb:matrix.org | jkkadgar: re versions I wonder if until has the old loop behavior prior to that last commit I found. I suspect that it is handled like a loop but maybe the task vars don't indicate it that way | 16:14 |
@clarkb:matrix.org | since 10.2 should be new enough to have that commit | 16:14 |
@jkkadgar:matrix.org | Yea that is odd, I think for now I may be able to workaround this for tasks that I need to work by wrapping it in some block statement that would output after a single task run then retry | 16:17 |
@jangutter:matrix.org | Ah, we have an old version of Zuul and I can confirm that retries are streamed, so we see things like this in the log: | 16:17 |
``` | ||
2024-09-24 12:02:43.409395 | kolladev | FAILED - RETRYING: [localhost]: Poll until rebuild is finished (usually takes ~ 45 minutes) (120 retries left). | ||
2024-09-24 12:03:43.956320 | kolladev | FAILED - RETRYING: [localhost]: Poll until rebuild is finished (usually takes ~ 45 minutes) (119 retries left). | ||
2024-09-24 12:04:44.478419 | kolladev | FAILED - RETRYING: [localhost]: Poll until rebuild is finished (usually takes ~ 45 minutes) (118 retries left). | ||
2024-09-24 12:05:45.013461 | kolladev | FAILED - RETRYING: [localhost]: Poll until rebuild is finished (usually takes ~ 45 minutes) (117 retries left). | ||
``` | ||
@jangutter:matrix.org | So at least for Zuul 4.2.0 this worked... | 16:18 |
@clarkb:matrix.org | jangutter: if you read the commit message its likely incurring a long timeout for each iteration? | 16:19 |
@clarkb:matrix.org | not sure if the minute between iterations there is expected or part of the build in minimum to make that work | 16:19 |
@jangutter:matrix.org | It's expected yeah, retry with a timeout of 60s | 16:20 |
@jangutter:matrix.org | Basically it's the stupid way of waiting for an external task to complete. | 16:20 |
@jangutter:matrix.org | The task is basically running a command against an external API asking "are you done yet", so the executor is effectively pausing the playbook till it's done. | 16:24 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 17:30 | |
- [zuul/zuul] 926448: Implement validation reporter https://review.opendev.org/c/zuul/zuul/+/926448 | ||
- [zuul/zuul] 928101: Use uploaded image id when launching nodes https://review.opendev.org/c/zuul/zuul/+/928101 | ||
- [zuul/zuul] 928873: Adjust some AWS tests https://review.opendev.org/c/zuul/zuul/+/928873 | ||
- [zuul/zuul] 928950: Add some image upload debugging https://review.opendev.org/c/zuul/zuul/+/928950 | ||
- [zuul/zuul] 928874: Add aws snapshot image upload test https://review.opendev.org/c/zuul/zuul/+/928874 | ||
- [zuul/zuul] 928875: Add testing for AWS image import method https://review.opendev.org/c/zuul/zuul/+/928875 | ||
- [zuul/zuul] 928876: Test AWS ebs-direct image upload https://review.opendev.org/c/zuul/zuul/+/928876 | ||
- [zuul/zuul] 928959: Inherit some attributes from provider configs https://review.opendev.org/c/zuul/zuul/+/928959 | ||
- [zuul/zuul] 928971: Add missing AWS image configuration options https://review.opendev.org/c/zuul/zuul/+/928971 | ||
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl: | 17:30 | |
- [zuul/zuul] 927376: Complete preliminary provider node lifecycle https://review.opendev.org/c/zuul/zuul/+/927376 | ||
- [zuul/zuul] 927388: Handle create state-machine errors as node failure https://review.opendev.org/c/zuul/zuul/+/927388 | ||
- [zuul/zuul] 928785: Store multiple provider node launch attempts https://review.opendev.org/c/zuul/zuul/+/928785 | ||
- [zuul/zuul] 928913: Retry node launches on failure https://review.opendev.org/c/zuul/zuul/+/928913 | ||
- [zuul/zuul] 929267: Avoid launcher busy-looping due to constant wake https://review.opendev.org/c/zuul/zuul/+/929267 | ||
- [zuul/zuul] 929273: Add support for 'min-ready' provider nodes https://review.opendev.org/c/zuul/zuul/+/929273 | ||
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 930387: Provide config error information for dependency cycles https://review.opendev.org/c/zuul/zuul/+/930387 | 21:46 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 930387: Provide config error information for dependency cycles https://review.opendev.org/c/zuul/zuul/+/930387 | 22:18 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 930389: Support docker-compose v2 in test setup script https://review.opendev.org/c/zuul/nodepool/+/930389 | 22:33 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 930389: Support docker-compose v2 in test setup script https://review.opendev.org/c/zuul/nodepool/+/930389 | 22:43 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 930390: Update node external_id in ZK on retries https://review.opendev.org/c/zuul/nodepool/+/930390 | 22:45 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!