frickler | corvus: zuul shows https://review.opendev.org/825849 to be in the check queue since 38h, so likely not related to the zuul restart, but maybe worth looking at anyway? otherwise gibi likely would have to rebase or something like that, since the "recheck" didn't help when checks are still assumed to be running | 09:22 |
---|---|---|
fungi | frickler: that seems like it may coincide with when i downed all the executor containers on friday and rebooted the servers | 12:59 |
fungi | i wonder if one of the builds didn't get rescheduled | 12:59 |
*** clarkb is now known as Guest287 | 13:55 | |
fungi | last mention of that build (97381c05a56d4d6b99bf55a8b1a30a9b) was 2022-01-21 19:31:05,974 on ze07.opendev.org when i started the executor container after the reboot | 14:48 |
fungi | "Held status set to False" and then "Deleting stale jobdir" | 14:48 |
fungi | last mention in the scheduler logs was 2022-01-21 18:31:36,245 on zuul01 when the build was being started | 14:53 |
fungi | the executor debug log has this, which i think is what should have recorded the build getting prematurely terminated? | 14:55 |
fungi | 2022-01-21 19:19:48,906 DEBUG zuul.ExecutorQueue: [e: d8d565d62f874a67846ae1170f993e9c] [build: 97381c05a56d4d6b99bf55a8b1a30a9b] Updating request <BuildRequest 97381c05a56d4d6b99bf55a8b1a30a9b, job=tempest-ipv6-only, state=completed, path=/zuul/executor/unzoned/requests/97381c05a56d4d6b99bf55a8b1a30a9b zone=None> | 14:55 |
fungi | so if that request is marked complete, why haven't the schedulers acted on it? | 14:58 |
*** diablo_rojo_phone is now known as Guest292 | 15:16 | |
*** Guest287 is now known as clarkb | 16:44 | |
clarkb | fungi: I would assume some sort of miss of a zk state change | 16:44 |
clarkb | either because it didn't happen or the scheduler didn't process it properly? | 16:45 |
clarkb | I was going to push a change to compare gerrit config files but it seems ianw already did that in the 3.4 upgrade prep work (based on the etherpad content) | 16:47 |
opendevreview | Clark Boylan proposed opendev/bindep master: DNM This is a simple reproducer change for a setuptools bug https://review.opendev.org/c/opendev/bindep/+/825973 | 17:02 |
opendevreview | Clark Boylan proposed opendev/bindep master: DNM This is a simple reproducer change for a setuptools bug https://review.opendev.org/c/opendev/bindep/+/825973 | 17:04 |
*** rlandy__ is now known as rlandy | 17:34 | |
corvus | there is a known race condition with executors crashing and leaving jobs stuck. you can probably just dequeue/enqueue the change to fix. | 18:54 |
fungi | got it, so nothing really to look into which warrants leaving the item enqueued? | 18:55 |
opendevreview | Neil Hanlon proposed openstack/diskimage-builder master: Add new container element - Rocky Linux https://review.opendev.org/c/openstack/diskimage-builder/+/825957 | 22:13 |
opendevreview | Neil Hanlon proposed openstack/diskimage-builder master: Add new container element - Rocky Linux https://review.opendev.org/c/openstack/diskimage-builder/+/825957 | 22:20 |
fungi | frickler: i was tempted to try out the authenticated webui for this, but laziness won over since i'm on the sofa with a system that doesn't have easy access to my web id, so i ran zuul-client dequeue and then enqueue on one of the schedulers with the following parameters and fresh builds are now in progress: --tenant=openstack --pipeline=check --project=openstack/placement --change=825849,1 | 22:22 |
fungi | the closest solution for a normal user would be to abandon and restore the change on gerrit, which should hopefully trigger similar actions | 22:23 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!