Friday, 2025-09-26

-@gerrit:opendev.org- Benjamin Schanzel proposed: [zuul/zuul] 962177: web: Upgrade re-ansi dependency to latest 0.7.4 https://review.opendev.org/c/zuul/zuul/+/96217708:41
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:12:27
- [zuul/zuul] 960921: Handle launch failures with subnodes https://review.opendev.org/c/zuul/zuul/+/960921
- [zuul/zuul] 960924: Always associate nodes with providers https://review.opendev.org/c/zuul/zuul/+/960924
- [zuul/zuul] 960927: Launcher: add max-age https://review.opendev.org/c/zuul/zuul/+/960927
- [zuul/zuul] 961292: Launcher: handle reused node failure https://review.opendev.org/c/zuul/zuul/+/961292
- [zuul/zuul] 961557: Assign unassigned building nodes to requests https://review.opendev.org/c/zuul/zuul/+/961557
- [zuul/zuul] 962145: Use a subnode for request assignment https://review.opendev.org/c/zuul/zuul/+/962145
@clarkb:matrix.orgtristanC: https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_designing this document has general information on designing zookeeper deployments including the need for an odd number of nodes15:26
-@gerrit:opendev.org- Christoph Kulla proposed: [zuul/zuul] 962376: Always display the branch name in project tabs https://review.opendev.org/c/zuul/zuul/+/96237615:40
@tristanc_:matrix.orgClark: thanks, we are looking into setting up 3 replicas. But I'm still surprised that restarting a single node/replica ZooKeeper is causing nodepool-launcher to delete nodes that are in-use. I would understand if the node were building, because of the use of ephemeral lock, but in-use node status should persist in the zookeeper database.15:45
@clarkb:matrix.orgin use nodes are also locked15:50
@clarkb:matrix.orgI think that what happens is when an in-use node is unlocked it is considered to be finished/no longer used and can be deleted15:51
@tristanc_:matrix.orgyes, but what we observed is that when restarting ZooKeeper with a running build, then nodepool-launcher delete the node, and sometime spawn another one with the same IP, which results in a weird Ansible panic where it is unable to continue, and somehow the job becomes a zombie in the status page.15:57
@tristanc_:matrix.orgperhaps that was always the case, and we never had that issue because previously, to perform a service upgrade, we would stop everything. But now we are trying to do a "graceful upgrade", without service interruption, and we are having trouble with restarting ZK15:59

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!