Monday, 2022-08-15

ianwfyi xenial is broken due to zuul console changes i made, that i forgot had to run on python 3.5 as well.  fix in 01:22
fungithanks for the heads up, reviewing01:45
fungilooks like it's already approved too01:48
ianwthanks!  i think the executors will need a restart to get new containers with the new version of the console02:05
ianwi've pulled the new image, and am running a restart on executors in a root screen03:35
*** ysandeep is now known as ysandeep|holiday06:35
*** arne_wiebalck_ is now known as arne_wiebalck06:47
*** chandankumar is now known as chkumar|ruck07:37
tkajinamHi. centos7 jobs are failing now because zuul console can't start in python 2. Do we have any way to workaround this problem ?
fricklerianw: seems executors got stopped but not restarted? (just checked ze03+04)08:58
fungitkajinam: looks like is expected to fix that once our executors are done restarting onto it09:38
tkajinamfungi, thanks for the link. I'll rereun ci later after executors are restarted.09:41
ianwfrickler: hrm it does it in two batches, 0-6 then 7-12, maybe it was still in the first phase?10:01
ianwthey seem ok for me now10:01
fungithey don't normally get batched up, no. the restart playbook just runs through executors one at a time, putting the next one into graceful shutdown as soon as the previous is started10:39
fungithough maybe there's a different playbook in use for this? looks like 07-12 are all paused at the same time:
fungiactually, 12 seems to be missing entirely at the moment10:41
fungianyway, in theory none of the executors accepting jobs are running the problem version of zuul any longer, so hopefully previously broken jobs are back to passing now10:43
fungitkajinam: it's probably safe to recheck your earlier failed results at this point10:44
fungiyeah, i guess this playbook waits for the entire batch of 6 to be offline before restarting any of them11:05
fungiso we're just waiting on 07-09 to stop now11:05
*** dviroel|out is now known as dviroel11:38
fungiand all 12 executors are running again11:57
tkajinamExecutor: ze05.opendev.org11:58
tkajinamthis job was picked up by ze05 but failed with the same error11:59
fungiianw: ^ "conn.send(f'{ZUUL_CONSOLE_PROTO_VERSION}\n'.encode('utf-8'))" so we've still got an f-string there? did we restart on an old image?12:03
fungizuul/zuul-executor                                         latest                                    8816566c45da   13 hours ago    1.99GB12:04
fungiianw: i see the problem. zuul-promote-image failed on 853073 so the latest tag on dockerhub is for the version prior to your fix12:05
fungi"Task Get dockerhub token failed running on host localhost"12:07
fungithere hasn't been another change merged since that, so i'll try to reenqueue it12:08
fungiseems to have succeeded that time12:12
funginow we need to re-pull images and redo the restart12:12
fungirerunning the pull playbook in that existing bridge screen12:13
fungizuul/zuul-executor                                         latest                                    8ec19cce6431   10 hours ago    1.99GB12:16
fungithat should be what we want now12:17
fungirestarting zuul_rolling_restart.yaml now12:17
fungi01-06 are paused in graceful shutdown12:18
*** Guest5 is now known as rcastillo13:06
*** rcastillo is now known as rcastillo|rover13:06
fungijust waiting for 03 to finish and then we'll be onto the second half and the problem *should* finally be fixed13:55
tristanCfungi: is the zuul_rolling_restart.yaml playbook available somewhere?14:00
Clark[m]tristanC: the zuul_reboot.yaml playbook may also be interesting 14:08
Clark[m]Based on zuul's component listing 01-06 are up on a newer version than 07-12 and 07-12 are paused.14:11
Clark[m]tkajinam I would expect jobs that start now to succeed. Can you confirm this?14:11
Clark[m]I need to finish booting my morning then I can look more closely at this, but if all went as expected I suspect this is happy now14:12
* tkajinam triggered recheck14:12
*** Guest167 is now known as dasm14:15
tkajinamClark[m], yeah it's passing now. one c7 job was picked up by ze05 and the task to set up zuul console passed without failure14:18
tkajinamfungi, thank you for your prompt help !14:20
clarkbFYI will break our zuul-results-summary plugin in Gerrit. They are asking for early feedback on slack and I've let them know about our use of that extension point15:25
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server
*** dviroel is now known as dviroel|lunch15:49
*** chkumar|ruck is now known as chandankumar15:51
*** marios is now known as marios|out15:54
*** dviroel|lunch is now known as dviroel17:01
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server
opendevreviewDr. Jens Harbott proposed opendev/system-config master: reprepro: mirror Ubuntu UCA Zed for Jammy
clarkbok that last mm3 patchset fixes the issue where we don't have any list archives or lists in hyperkitty yet (there is an hourly cron job that runs to populate that in the default install. We manually run the jobs during CI to ensure we update despite timing)17:41
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Ensure-java: update apt cache
clarkbwe are waiting on one last executor to finish up then 07-12 should update to the new image as well17:57
clarkbze09 specifically17:57
clarkball zuul executors are running the same updated version now. They are 4 commits ahead of the mergers and schedulers18:53
*** dviroel is now known as dviroel|brb19:31
opendevreviewMerged zuul/zuul-jobs master: Ensure-java: update apt cache
clarkbinfra-root is there anything else that needs to be added to the meeting agenda?20:49
ianwclarkb/fungi: thanks for that :/  i'm 0 for 2 on restarting executors to pick up fixes now i think 21:47
fungiit was easy to overlook. i didn't notice the promote failure until i went looking for why the bug seemed to still be there21:51
ianwlast time i checked the promote, but forgot to pull the images.  i had it in my head this time "make sure you pull the images" but thinking back I just assumed the new images were pushed this time21:56
fungias for why it failed, hard to tell but looked like a rando failure getting an api token from dockerhub22:01
*** dviroel|brb is now known as dviroel22:08
*** dviroel is now known as dviroel|out22:12
*** dasm is now known as dasm|off22:28
clarkbmeeting agenda sent22:57

Generated by 2.17.3 by Marius Gedminas - find it at!