clarkb | fungi: also it is the testinfra addition that is causing images to be built. A bad idea, but an idea nonetheless: remove that and put it in a separate change? | 00:03 |
---|---|---|
corvus | i'm going to restart the scheduler on zuul01 | 00:12 |
corvus | db migration in progress | 00:17 |
corvus | migration complete; zuul01 is reconfiguring | 00:22 |
*** persia is now known as Guest566 | 00:24 | |
clarkb | ya 831064 looks like it will work. I suspect this is specifically a problem when two jobs try to push the same blob at roughly the same time and the blob being large makes it easier to meet that criteria (as the upload takes longer) | 00:27 |
corvus | work==fail testing as expected? | 00:29 |
corvus | zuul01 is up; restarting zuul02 now, including web/finger | 00:29 |
Clark[m] | corvus: no it succeeded because I removed the 3.5 Gerrit build job which keeps it and the 3.4 build from pushing at the same time and inducing the error | 00:33 |
Clark[m] | I think that we notice with Gerrit because we build two near identical images with layer overlap and some of those layers are large which means there is more opportunity to conflict with uploads in the registry | 00:33 |
corvus | got it | 00:36 |
corvus | there is an error due to the upgrade; i'm triaging the severity now | 00:37 |
corvus | okay, the fix is posted in #zuul; the error will prevent items currently in pipelines from being reported. | 00:46 |
corvus | infra-root: if we want, we can merge the zuul fix and i can do a scheduler restart with it. that process will probably take 45 minutes minimum, but it would both confirm the fix and allow us to resume with no loss of service. i'm happy to do that if we are okay with the pause in service in the mean time. | 00:47 |
corvus | if we don't want to do that, then we'll need to accept the loss of the current queues, and i can just delete them. i prefer option 1. | 00:48 |
corvus | it's worth noting that changes enqueued after the restart will not be affected by this bug | 00:50 |
Clark[m] | I can take a look really quick just need to pause dinner prep | 00:56 |
clarkb | yup lgtm. I think we can proceed with your plan | 00:58 |
corvus | btw, if you mouseover the word 'queued' on the status page now, it'll tell you the node request id it's waiting for, or if it's waiting for an executor. | 01:03 |
corvus | and.... since you can't copy that, i guess you could write it down on a piece of paper with pencil, then type it in.... :) | 01:04 |
corvus | or use links | 01:04 |
opendevreview | James E. Blair proposed openstack/project-config master: Add Zuul performance metrics dashboard https://review.opendev.org/c/openstack/project-config/+/831071 | 02:12 |
*** mazzy5098812929580857 is now known as mazzy509881292958085 | 03:05 | |
corvus | the fix merged; i'm restarting zuul01 now | 05:41 |
corvus | and zuul02 at the same time | 05:42 |
corvus | bunch of reports going to gerrit now. looks like that fixed the issue. | 05:47 |
corvus | there is a ux bug in zuul which shows items in pipelines which are not really there... i have proposed a fix (see #zuul room) | 18:13 |
*** dhill is now known as Guest648 | 22:14 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!