Tuesday, 2024-09-24

-@gerrit:opendev.org- Tristan Cacqueray https://matrix.to/#/@tristanc_:matrix.org proposed: [zuul/zuul-jobs] 847111: Update ensure-ghc to use ghcup https://review.opendev.org/c/zuul/zuul-jobs/+/84711100:31
@tristanc_:matrix.orgClark: here is the change to stop using the fedora packages for ensure-haskell ^00:33
@tristanc_:matrix.orgAnd here was the announce: https://lists.zuul-ci.org/archives/list/zuul-discuss@lists.zuul-ci.org/thread/AWC4PQAC24Z7NFORVXR4HIDS5JE4UWNK/00:34
-@gerrit:opendev.org- Tristan Cacqueray https://matrix.to/#/@tristanc_:matrix.org proposed: [zuul/zuul-jobs] 847111: Update ensure-ghc to use ghcup https://review.opendev.org/c/zuul/zuul-jobs/+/84711100:40
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 929900: Respect fail-fast setting for node failures https://review.opendev.org/c/zuul/zuul/+/92990005:46
-@gerrit:opendev.org- Simon Westphahl proposed:05:50
- [zuul/zuul] 928913: Retry node launches on failure https://review.opendev.org/c/zuul/zuul/+/928913
- [zuul/zuul] 929267: Avoid launcher busy-looping due to constant wake https://review.opendev.org/c/zuul/zuul/+/929267
- [zuul/zuul] 929273: Add support for 'min-ready' provider nodes https://review.opendev.org/c/zuul/zuul/+/929273
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com:05:50
- [zuul/zuul] 928959: Inherit some attributes from provider configs https://review.opendev.org/c/zuul/zuul/+/928959
- [zuul/zuul] 928971: Add missing AWS image configuration options https://review.opendev.org/c/zuul/zuul/+/928971
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 929273: Add support for 'min-ready' provider nodes https://review.opendev.org/c/zuul/zuul/+/92927308:13
-@gerrit:opendev.org- Fredrik Medley proposed: [zuul/zuul] 929407: Inject queue_item_id and enqueue_time to inventory zuul.items https://review.opendev.org/c/zuul/zuul/+/92940710:32
-@gerrit:opendev.org- Benedikt Löffler proposed: [zuul/nodepool] 930289: Increase threadpool for ImageUploader https://review.opendev.org/c/zuul/nodepool/+/93028910:59
@jkkadgar:matrix.org When using the ansible "until" keyword, the zuul live log output will only show the first try and then is silent for all the retries. The final console log will show the last successful retry in console output. This causes confusion for long running tasks we have that retry when a user is looking at the live log. Default ansible behavior shows "FAILED - RETRYING ... (5 retries left)." Has this been noticed by anyone and or is there a workaround?15:42
@clarkb:matrix.orghttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/build-container-image/tasks/push.yaml here is a playbook that uses until that runs fairly often that we should have example logs for however I suspect that 99% of the time it succeeds on the first try15:46
@clarkb:matrix.organd no I haven't noticed that behavior myself15:46
@clarkb:matrix.orghttps://opendev.org/zuul/zuul-registry/src/branch/master/playbooks/functional-test/docker-buildset.yaml#L28-L36 heres another that might be easier to trigger multiple retries for. I suspect we don't have any zuul-registry logs from the last 30 days but you could push a trivial change to get that to run then inspect the results to try and reproduce15:49
@jangutter:matrix.orgI've noticed that in ansible loops too, but I can't remember exactly if it was when including something or so. I think I saw it quite a while ago, looping over an include_role.15:52
@jangutter:matrix.orgStreaming large amount of things on the console feels like an antipattern to me. Sadly we do it a lot - partly in order to check if things are "still alive and running".15:54
@jkkadgar:matrix.orgI am simply expecting Zuul to do the same thing as running ansible locally. Ansible will output: "FAILED - RETRYING: Loop until job is complete (5 retries left) in the console, but I think Zuul console streamer probably ignores that output maybe?15:57
@jangutter:matrix.orgI'll need to check the code, but it might be hooking into one of the task reporting plugins that only get information when the task is complete.15:58
@clarkb:matrix.orgthere are also test cases for console streaming in zuul's test suite that might be easier to manipulate to reproduce and debug with15:59
@clarkb:matrix.orghttps://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/action/command.py#L41-L43 might explain it though?16:01
@jangutter:matrix.orgYeah, that would explain it on the command task. For "retries" there's possibly a different but similar explanation.16:02
@clarkb:matrix.orgcommand tasks are the only ones supported by the streaming stuff16:02
@jkkadgar:matrix.orgAhh interesting, I am wondering if that was intentional for some other reason16:02
@clarkb:matrix.orgeverything else is reported after the fact16:02
@jangutter:matrix.orgAh, that answers that.16:03
@jangutter:matrix.orgSo retries aren't streamed.16:03
@clarkb:matrix.orghttps://opendev.org/zuul/zuul/commit/f8677cddb9dd397081acc30c47128b83b4061ae116:03
@clarkb:matrix.orgthough that change is simply trying to improve existing behavior so doesn't explain why this is done16:04
@clarkb:matrix.orghttps://opendev.org/zuul/zuul/commit/1607143b213db0d63ab93389dda9939d3bcde7ec that is the reason I think16:05
@clarkb:matrix.orgoh wait no that is still moving code around quite the thread to pull on16:05
@clarkb:matrix.orghttps://opendev.org/zuul/zuul/commit/0ae7a157e2dc3c8c8779bc7b98ab7e24da40e08716:06
@clarkb:matrix.orgjkkadgar: it almost sounds like you're running an older zuul prior to that commit based on having the first one logged. But the commit message says there that the reason is there aren't hook points for loop iteration starts16:07
@jkkadgar:matrix.orgI am on 10.2.0 currently16:07
@jkkadgar:matrix.orgOk so based off what I read it sounds like without significant rework this isn't possible to get16:08
@jangutter:matrix.orgYeah - the zuul executor, in general, does not have a 1:1 correlation with an ansible-playbook command's output. It tends to use the result of the task (where all the info is collected) and has the special in-task streaming output for the command task in certain cases. The integration is deep enough that it's a bit complex to untangle.16:09
@clarkb:matrix.orgalso I don't think ansible itself does live streaming. It only outputs when tasks or loop iterations complete.16:11
@jangutter:matrix.orgI think it prints retries, but I'm not sure where exactly that output callback happens.16:11
@clarkb:matrix.orgit might be worth checking that newer ansible hasn't added new hook points for loop iterations to make this possible. That seems unlikely but it could have happened16:12
@clarkb:matrix.orgjkkadgar: re versions I wonder if until has the old loop behavior prior to that last commit I found. I suspect that it is handled like a loop but maybe the task vars don't indicate it that way16:14
@clarkb:matrix.orgsince 10.2 should be new enough to have that commit16:14
@jkkadgar:matrix.orgYea that is odd, I think for now I may be able to workaround this for tasks that I need to work by wrapping it in some block statement that would output after a single task run then retry16:17
@jangutter:matrix.orgAh, we have an old version of Zuul and I can confirm that retries are streamed, so we see things like this in the log:16:17
```
2024-09-24 12:02:43.409395 | kolladev | FAILED - RETRYING: [localhost]: Poll until rebuild is finished (usually takes ~ 45 minutes) (120 retries left).
2024-09-24 12:03:43.956320 | kolladev | FAILED - RETRYING: [localhost]: Poll until rebuild is finished (usually takes ~ 45 minutes) (119 retries left).
2024-09-24 12:04:44.478419 | kolladev | FAILED - RETRYING: [localhost]: Poll until rebuild is finished (usually takes ~ 45 minutes) (118 retries left).
2024-09-24 12:05:45.013461 | kolladev | FAILED - RETRYING: [localhost]: Poll until rebuild is finished (usually takes ~ 45 minutes) (117 retries left).
```
@jangutter:matrix.orgSo at least for Zuul 4.2.0 this worked...16:18
@clarkb:matrix.orgjangutter: if you read the commit message its likely incurring a long timeout for each iteration?16:19
@clarkb:matrix.orgnot sure if the minute between iterations there is expected or part of the build in minimum to make that work16:19
@jangutter:matrix.orgIt's expected yeah, retry with a timeout of 60s16:20
@jangutter:matrix.orgBasically it's the stupid way of waiting for an external task to complete.16:20
@jangutter:matrix.orgThe task is basically running a command against an external API asking "are you done yet", so the executor is effectively pausing the playbook till it's done.16:24
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:17:30
- [zuul/zuul] 926448: Implement validation reporter https://review.opendev.org/c/zuul/zuul/+/926448
- [zuul/zuul] 928101: Use uploaded image id when launching nodes https://review.opendev.org/c/zuul/zuul/+/928101
- [zuul/zuul] 928873: Adjust some AWS tests https://review.opendev.org/c/zuul/zuul/+/928873
- [zuul/zuul] 928950: Add some image upload debugging https://review.opendev.org/c/zuul/zuul/+/928950
- [zuul/zuul] 928874: Add aws snapshot image upload test https://review.opendev.org/c/zuul/zuul/+/928874
- [zuul/zuul] 928875: Add testing for AWS image import method https://review.opendev.org/c/zuul/zuul/+/928875
- [zuul/zuul] 928876: Test AWS ebs-direct image upload https://review.opendev.org/c/zuul/zuul/+/928876
- [zuul/zuul] 928959: Inherit some attributes from provider configs https://review.opendev.org/c/zuul/zuul/+/928959
- [zuul/zuul] 928971: Add missing AWS image configuration options https://review.opendev.org/c/zuul/zuul/+/928971
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Simon Westphahl:17:30
- [zuul/zuul] 927376: Complete preliminary provider node lifecycle https://review.opendev.org/c/zuul/zuul/+/927376
- [zuul/zuul] 927388: Handle create state-machine errors as node failure https://review.opendev.org/c/zuul/zuul/+/927388
- [zuul/zuul] 928785: Store multiple provider node launch attempts https://review.opendev.org/c/zuul/zuul/+/928785
- [zuul/zuul] 928913: Retry node launches on failure https://review.opendev.org/c/zuul/zuul/+/928913
- [zuul/zuul] 929267: Avoid launcher busy-looping due to constant wake https://review.opendev.org/c/zuul/zuul/+/929267
- [zuul/zuul] 929273: Add support for 'min-ready' provider nodes https://review.opendev.org/c/zuul/zuul/+/929273
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 930387: Provide config error information for dependency cycles https://review.opendev.org/c/zuul/zuul/+/93038721:46
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 930387: Provide config error information for dependency cycles https://review.opendev.org/c/zuul/zuul/+/93038722:18
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 930389: Support docker-compose v2 in test setup script https://review.opendev.org/c/zuul/nodepool/+/93038922:33
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 930389: Support docker-compose v2 in test setup script https://review.opendev.org/c/zuul/nodepool/+/93038922:43
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 930390: Update node external_id in ZK on retries https://review.opendev.org/c/zuul/nodepool/+/93039022:45

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!