corvus | swest: you've looked into treecache alot -- can you take a look at this test failure: https://f26c34eeb08a0feb3125-e1a0fd98d652767a6b80e489a96366df.ssl.cf1.rackcdn.com/796583/2/check/zuul-tox-py38/2e1df22/testr_results.html ? | 03:51 |
---|---|---|
corvus | swest: that's from https://zuul.opendev.org/t/zuul/build/2e1df22f21c64ef3a1baf2f81f7aeec4 . it has extra debugging information i added in https://review.opendev.org/796739 | 03:52 |
corvus | swest: it looks to me like the component registry tree cache is processing events out of order, and it processes the node added event (type 0) after the node deleted event (type 2). | 03:53 |
*** zbr is now known as Guest2488 | 05:03 | |
tobiash[m] | corvus, clarkb I think https://review.opendev.org/c/zuul/zuul/+/795597 might have caused a regression in our system. Jobs that were on a crashed executor are no longer retried but transition directly into the state LOST and are failed then. | 05:07 |
*** marios is now known as marios|ruck | 05:44 | |
*** raukadah is now known as chandankumar | 05:46 | |
opendevreview | Simon Westphahl proposed zuul/zuul master: Add missing timestamp to change mgmt events https://review.opendev.org/c/zuul/zuul/+/796761 | 06:25 |
*** rpittau|afk is now known as rpittau | 07:22 | |
tobiash[m] | corvus: this is a fix for a serialized gate for manuelly re-enqueued items when using the mqtt reporter ^ | 07:24 |
*** jpena|off is now known as jpena | 07:31 | |
*** raukadah is now known as chandankumar | 08:28 | |
opendevreview | Simon Westphahl proposed zuul/zuul master: Add missing timestamp to change mgmt events https://review.opendev.org/c/zuul/zuul/+/796761 | 08:51 |
opendevreview | Matthieu Huin proposed zuul/zuul master: Remove another shebang and remove useless exec bits https://review.opendev.org/c/zuul/zuul/+/729240 | 10:02 |
opendevreview | Matthieu Huin proposed zuul/zuul master: Remove another shebang and remove useless exec bits https://review.opendev.org/c/zuul/zuul/+/729240 | 10:33 |
opendevreview | Matthieu Huin proposed zuul/zuul master: Revert to pyJWT 1.7.1 https://review.opendev.org/c/zuul/zuul/+/796817 | 10:33 |
masterpe[m] | I try to upgrade the zuul, but I get the error: "Error: pg_config executable not found." | 10:39 |
masterpe[m] | I get this error when I run sudo pip3 install . | 10:40 |
masterpe[m] | I think I found it, I think libpq-dev is missing in bindep.txt file | 10:44 |
masterpe[m] | After the upgrade I get the error: "Database configuration is required" I can remember that I heard something that is required now. But is there any documentation on that? | 10:51 |
guillaumec | masterpe: https://zuul-ci.org/docs/zuul/reference/releasenotes.html#relnotes-4-0-0 "database" , https://zuul-ci.org/docs/zuul/reference/releasenotes.html#relnotes-4-0-0-deprecation-notes | 11:03 |
*** jpena is now known as jpena|lunch | 11:34 | |
swest | corvus: I was able to reproduce this in a small test case and it seems to be a bug in the TreeCache: https://gist.github.com/westphahl/60097e6aa0f9c2d32f517eba5f5baefe | 11:37 |
swest | corvus: in case the async processing of the UPDATED event takes a little longer (simulated by a print statement in my test) the DELETED event will be processed first leading to a wrong TreeEvent and with that to an inconsistent state. | 11:41 |
fungi | tobiash[m]: we noticed the lost state as well. i think the explanation is that the scheduler is currently paying attention to zookeeper to track that, and once we shift the results off gearman we'll get retries back? | 12:32 |
*** jpena|lunch is now known as jpena | 12:32 | |
masterpe[m] | For the latest zuul zookeeper 3.5.1 is at least required. I'm using Ubuntu 18.04. Even for Bionic it is still on 3.4.13-5build1. Is there a ppa that I need to use? | 12:33 |
fungi | masterpe[m]: i'm not aware of any deb packages for zookeeper. ubuntu slurps their packages in from debian/sid which is currently frozen since a while in preparation for the bullseye release... i think pretty much everyone is either deploying the upstream zookeeper release tarballs (with dependencies satisfied from distribution packages), or more commonly running a zookeeper docker container | 12:38 |
fungi | to get zk >=3.5.1 i mean | 12:38 |
masterpe[m] | But if i'm going to change my setup to use docker then I need to remove gerrit out of the dockercompose? | 12:49 |
masterpe[m] | I'm going to use a gitlab setup. | 12:50 |
guillaumec | masterpe: https://review.opendev.org/c/zuul/zuul/+/795540 , "ansible-playbook -v playbooks/tutorial/run-localtest-gitlab-quick-start.yaml" | 12:52 |
masterpe[m] | Gitlab is not managed by my self. And I already have a running config with zuul and gitlab. But this config is already some time old. | 12:55 |
masterpe[m] | But I will give this a look. | 12:55 |
masterpe[m] | So in my setup gitlab will not be included in the docker compose. But a "external" service. | 12:58 |
masterpe[m] | But https://review.opendev.org/c/zuul/zuul/+/795540/8/doc/source/examples/gitlab-docker-compose.yaml is a nice example I can use. | 12:58 |
corvus | tobiash, fungi: yes -- that's a regression, but it's better than what was caused by https://review.opendev.org/782939 which made builds on crashed executors stuck and never report. once we finish the other half of the move to zk, we'll have the old behavior back. that's tested and confirmed in https://review.opendev.org/794687 which is ready for review. | 13:20 |
corvus | swest: that is a very weird bug; the mixup of the event type and data seems like it's not just an event sequencing issue, do you know the mechanism? and do you think it's worth fixing, or should we replace with our own implementation (borrowed from the config cache or executor api)? | 13:28 |
*** marios|ruck is now known as marios|call | 14:02 | |
swest | corvus: the problem is that the delete event is directly forwarded to the event listerner where as the updated event will first trigger an async get of the data and only after that dispatch the event to the listener. | 14:07 |
swest | corvus: I haven't checked if there is a way to fix the treecache to dispatch events to the listener in the same order they arrive from ZK | 14:07 |
corvus | swest: wow, that's unfortunate. i wonder why the fetch the data async | 14:24 |
swest | corvus: when the del event is forwarded they will just attach whatever (old) data is currently in the cache. | 14:25 |
swest | https://github.com/python-zk/kazoo/blob/6337fd6f72b59fb20886f980f2e0d6d41525dc35/kazoo/recipe/cache.py#L253:L266 | 14:27 |
corvus | swest: i'm inclined to say let's file a bug on kazoo and switch component registry to our own implementation | 14:28 |
corvus | swest: you want to file the bug (since you have that nice reproducer) and i'll start on the replacement? | 14:29 |
corvus | handling watches is fresh in my mind from digging into the executor api :) | 14:29 |
*** marios|call is now known as marios|ruck | 14:44 | |
*** rpittau is now known as rpittau|afk | 16:09 | |
*** sshnaidm is now known as sshnaidm|afk | 16:16 | |
*** marios|ruck is now known as marios|out | 16:28 | |
*** jpena is now known as jpena|off | 16:36 | |
opendevreview | James E. Blair proposed zuul/zuul master: Execute builds via ZooKeeper https://review.opendev.org/c/zuul/zuul/+/788988 | 18:17 |
opendevreview | James E. Blair proposed zuul/zuul master: Move build request cleanup from executor to scheduler https://review.opendev.org/c/zuul/zuul/+/794687 | 18:17 |
opendevreview | James E. Blair proposed zuul/zuul master: Handle errors in the executor main loop https://review.opendev.org/c/zuul/zuul/+/796583 | 18:17 |
swest | corvus: yep, can file the bug tomorrow | 18:46 |
opendevreview | James E. Blair proposed zuul/zuul master: Execute builds via ZooKeeper https://review.opendev.org/c/zuul/zuul/+/788988 | 20:04 |
opendevreview | James E. Blair proposed zuul/zuul master: Move build request cleanup from executor to scheduler https://review.opendev.org/c/zuul/zuul/+/794687 | 20:04 |
opendevreview | James E. Blair proposed zuul/zuul master: Handle errors in the executor main loop https://review.opendev.org/c/zuul/zuul/+/796583 | 20:04 |
opendevreview | James E. Blair proposed zuul/zuul master: Replace TreeCache in component registry https://review.opendev.org/c/zuul/zuul/+/796582 | 22:14 |
opendevreview | James E. Blair proposed zuul/zuul master: Add ExecutorApi https://review.opendev.org/c/zuul/zuul/+/770902 | 22:15 |
opendevreview | James E. Blair proposed zuul/zuul master: Change zone handling in ExecutorApi https://review.opendev.org/c/zuul/zuul/+/787833 | 22:15 |
opendevreview | James E. Blair proposed zuul/zuul master: Switch to string constants in BuildRequest https://review.opendev.org/c/zuul/zuul/+/791849 | 22:15 |
opendevreview | James E. Blair proposed zuul/zuul master: Clean up Executor API build request locking and add tests https://review.opendev.org/c/zuul/zuul/+/788624 | 22:15 |
opendevreview | James E. Blair proposed zuul/zuul master: Fix race with watches in ExecutorAPI https://review.opendev.org/c/zuul/zuul/+/792300 | 22:15 |
opendevreview | James E. Blair proposed zuul/zuul master: Execute builds via ZooKeeper https://review.opendev.org/c/zuul/zuul/+/788988 | 22:15 |
opendevreview | James E. Blair proposed zuul/zuul master: Move build request cleanup from executor to scheduler https://review.opendev.org/c/zuul/zuul/+/794687 | 22:15 |
opendevreview | James E. Blair proposed zuul/zuul master: Handle errors in the executor main loop https://review.opendev.org/c/zuul/zuul/+/796583 | 22:15 |
corvus | swest, felixedel: 796582 applies the executor api zk watcher/cache pattern (as it exiests at the end of that stack) to the component registry. i really like it -- it seems robust and simple to understand. | 22:17 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!