ianw | fyi xenial is broken due to zuul console changes i made, that i forgot had to run on python 3.5 as well. fix in https://review.opendev.org/c/zuul/zuul/+/853073 | 01:22 |
---|---|---|
fungi | thanks for the heads up, reviewing | 01:45 |
fungi | looks like it's already approved too | 01:48 |
ianw | thanks! i think the executors will need a restart to get new containers with the new version of the console | 02:05 |
ianw | i've pulled the new image, and am running a restart on executors in a root screen | 03:35 |
*** ysandeep is now known as ysandeep|holiday | 06:35 | |
*** arne_wiebalck_ is now known as arne_wiebalck | 06:47 | |
*** chandankumar is now known as chkumar|ruck | 07:37 | |
tkajinam | Hi. centos7 jobs are failing now because zuul console can't start in python 2. Do we have any way to workaround this problem ? https://zuul.opendev.org/t/openstack/build/d2573574636f4d71a3f47a3221cd6414/log/job-output.txt#134-149 | 08:16 |
frickler | ianw: seems executors got stopped but not restarted? (just checked ze03+04) | 08:58 |
fungi | tkajinam: looks like https://review.opendev.org/853073 is expected to fix that once our executors are done restarting onto it | 09:38 |
tkajinam | fungi, thanks for the link. I'll rereun ci later after executors are restarted. | 09:41 |
ianw | frickler: hrm it does it in two batches, 0-6 then 7-12, maybe it was still in the first phase? | 10:01 |
ianw | they seem ok for me now | 10:01 |
fungi | they don't normally get batched up, no. the restart playbook just runs through executors one at a time, putting the next one into graceful shutdown as soon as the previous is started | 10:39 |
fungi | though maybe there's a different playbook in use for this? looks like 07-12 are all paused at the same time: https://zuul.opendev.org/components | 10:40 |
fungi | actually, 12 seems to be missing entirely at the moment | 10:41 |
fungi | anyway, in theory none of the executors accepting jobs are running the problem version of zuul any longer, so hopefully previously broken jobs are back to passing now | 10:43 |
fungi | tkajinam: it's probably safe to recheck your earlier failed results at this point | 10:44 |
fungi | yeah, i guess this playbook waits for the entire batch of 6 to be offline before restarting any of them | 11:05 |
fungi | so we're just waiting on 07-09 to stop now | 11:05 |
*** dviroel|out is now known as dviroel | 11:38 | |
fungi | and all 12 executors are running again | 11:57 |
tkajinam | https://zuul.opendev.org/t/openstack/build/0e2f658c270542bbae990d2d3e27e8f8 | 11:58 |
tkajinam | Executor: ze05.opendev.org | 11:58 |
tkajinam | this job was picked up by ze05 but failed with the same error | 11:59 |
fungi | ianw: ^ "conn.send(f'{ZUUL_CONSOLE_PROTO_VERSION}\n'.encode('utf-8'))" so we've still got an f-string there? did we restart on an old image? | 12:03 |
fungi | zuul/zuul-executor latest 8816566c45da 13 hours ago 1.99GB | 12:04 |
fungi | ianw: i see the problem. zuul-promote-image failed on 853073 so the latest tag on dockerhub is for the version prior to your fix | 12:05 |
fungi | "Task Get dockerhub token failed running on host localhost" | 12:07 |
fungi | there hasn't been another change merged since that, so i'll try to reenqueue it | 12:08 |
fungi | seems to have succeeded that time | 12:12 |
fungi | now we need to re-pull images and redo the restart | 12:12 |
fungi | rerunning the pull playbook in that existing bridge screen | 12:13 |
fungi | zuul/zuul-executor latest 8ec19cce6431 10 hours ago 1.99GB | 12:16 |
fungi | that should be what we want now | 12:17 |
fungi | restarting zuul_rolling_restart.yaml now | 12:17 |
fungi | 01-06 are paused in graceful shutdown | 12:18 |
*** Guest5 is now known as rcastillo | 13:06 | |
*** rcastillo is now known as rcastillo|rover | 13:06 | |
fungi | just waiting for 03 to finish and then we'll be onto the second half and the problem *should* finally be fixed | 13:55 |
tristanC | fungi: is the zuul_rolling_restart.yaml playbook available somewhere? | 14:00 |
Clark[m] | tristanC: https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul_rolling_restart.yaml the zuul_reboot.yaml playbook may also be interesting | 14:08 |
Clark[m] | Based on zuul's component listing 01-06 are up on a newer version than 07-12 and 07-12 are paused. | 14:11 |
Clark[m] | tkajinam I would expect jobs that start now to succeed. Can you confirm this? | 14:11 |
Clark[m] | I need to finish booting my morning then I can look more closely at this, but if all went as expected I suspect this is happy now | 14:12 |
* tkajinam triggered recheck | 14:12 | |
*** Guest167 is now known as dasm | 14:15 | |
tkajinam | Clark[m], yeah it's passing now. one c7 job was picked up by ze05 and the task to set up zuul console passed without failure | 14:18 |
tkajinam | fungi, thank you for your prompt help ! | 14:20 |
clarkb | FYI https://gerrit-review.googlesource.com/c/gerrit/+/342836 will break our zuul-results-summary plugin in Gerrit. They are asking for early feedback on slack and I've let them know about our use of that extension point | 15:25 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 15:34 |
*** dviroel is now known as dviroel|lunch | 15:49 | |
*** chkumar|ruck is now known as chandankumar | 15:51 | |
*** marios is now known as marios|out | 15:54 | |
*** dviroel|lunch is now known as dviroel | 17:01 | |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 17:06 |
opendevreview | Dr. Jens Harbott proposed opendev/system-config master: reprepro: mirror Ubuntu UCA Zed for Jammy https://review.opendev.org/c/opendev/system-config/+/853189 | 17:29 |
clarkb | ok that last mm3 patchset fixes the issue where we don't have any list archives or lists in hyperkitty yet (there is an hourly cron job that runs to populate that in the default install. We manually run the jobs during CI to ensure we update despite timing) | 17:41 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Ensure-java: update apt cache https://review.opendev.org/c/zuul/zuul-jobs/+/853190 | 17:54 |
clarkb | we are waiting on one last executor to finish up then 07-12 should update to the new image as well | 17:57 |
clarkb | ze09 specifically | 17:57 |
clarkb | all zuul executors are running the same updated version now. They are 4 commits ahead of the mergers and schedulers | 18:53 |
*** dviroel is now known as dviroel|brb | 19:31 | |
opendevreview | Merged zuul/zuul-jobs master: Ensure-java: update apt cache https://review.opendev.org/c/zuul/zuul-jobs/+/853190 | 20:19 |
clarkb | infra-root is there anything else that needs to be added to the meeting agenda? | 20:49 |
ianw | clarkb/fungi: thanks for that :/ i'm 0 for 2 on restarting executors to pick up fixes now i think | 21:47 |
fungi | it was easy to overlook. i didn't notice the promote failure until i went looking for why the bug seemed to still be there | 21:51 |
ianw | last time i checked the promote, but forgot to pull the images. i had it in my head this time "make sure you pull the images" but thinking back I just assumed the new images were pushed this time | 21:56 |
fungi | as for why it failed, hard to tell but looked like a rando failure getting an api token from dockerhub | 22:01 |
*** dviroel|brb is now known as dviroel | 22:08 | |
*** dviroel is now known as dviroel|out | 22:12 | |
*** dasm is now known as dasm|off | 22:28 | |
clarkb | meeting agenda sent | 22:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!