Monday, 2022-08-15

ianw	fyi xenial is broken due to zuul console changes i made, that i forgot had to run on python 3.5 as well. fix in https://review.opendev.org/c/zuul/zuul/+/853073	01:22
fungi	thanks for the heads up, reviewing	01:45
fungi	looks like it's already approved too	01:48
ianw	thanks! i think the executors will need a restart to get new containers with the new version of the console	02:05
ianw	i've pulled the new image, and am running a restart on executors in a root screen	03:35
*** ysandeep is now known as ysandeep\|holiday		06:35
*** arne_wiebalck_ is now known as arne_wiebalck		06:47
*** chandankumar is now known as chkumar\|ruck		07:37
tkajinam	Hi. centos7 jobs are failing now because zuul console can't start in python 2. Do we have any way to workaround this problem ? https://zuul.opendev.org/t/openstack/build/d2573574636f4d71a3f47a3221cd6414/log/job-output.txt#134-149	08:16
frickler	ianw: seems executors got stopped but not restarted? (just checked ze03+04)	08:58
fungi	tkajinam: looks like https://review.opendev.org/853073 is expected to fix that once our executors are done restarting onto it	09:38
tkajinam	fungi, thanks for the link. I'll rereun ci later after executors are restarted.	09:41
ianw	frickler: hrm it does it in two batches, 0-6 then 7-12, maybe it was still in the first phase?	10:01
ianw	they seem ok for me now	10:01
fungi	they don't normally get batched up, no. the restart playbook just runs through executors one at a time, putting the next one into graceful shutdown as soon as the previous is started	10:39
fungi	though maybe there's a different playbook in use for this? looks like 07-12 are all paused at the same time: https://zuul.opendev.org/components	10:40
fungi	actually, 12 seems to be missing entirely at the moment	10:41
fungi	anyway, in theory none of the executors accepting jobs are running the problem version of zuul any longer, so hopefully previously broken jobs are back to passing now	10:43
fungi	tkajinam: it's probably safe to recheck your earlier failed results at this point	10:44
fungi	yeah, i guess this playbook waits for the entire batch of 6 to be offline before restarting any of them	11:05
fungi	so we're just waiting on 07-09 to stop now	11:05
*** dviroel\|out is now known as dviroel		11:38
fungi	and all 12 executors are running again	11:57
tkajinam	https://zuul.opendev.org/t/openstack/build/0e2f658c270542bbae990d2d3e27e8f8	11:58
tkajinam	Executor: ze05.opendev.org	11:58
tkajinam	this job was picked up by ze05 but failed with the same error	11:59
fungi	ianw: ^ "conn.send(f'{ZUUL_CONSOLE_PROTO_VERSION}\n'.encode('utf-8'))" so we've still got an f-string there? did we restart on an old image?	12:03
fungi	zuul/zuul-executor latest 8816566c45da 13 hours ago 1.99GB	12:04
fungi	ianw: i see the problem. zuul-promote-image failed on 853073 so the latest tag on dockerhub is for the version prior to your fix	12:05
fungi	"Task Get dockerhub token failed running on host localhost"	12:07
fungi	there hasn't been another change merged since that, so i'll try to reenqueue it	12:08
fungi	seems to have succeeded that time	12:12
fungi	now we need to re-pull images and redo the restart	12:12
fungi	rerunning the pull playbook in that existing bridge screen	12:13
fungi	zuul/zuul-executor latest 8ec19cce6431 10 hours ago 1.99GB	12:16
fungi	that should be what we want now	12:17
fungi	restarting zuul_rolling_restart.yaml now	12:17
fungi	01-06 are paused in graceful shutdown	12:18
*** Guest5 is now known as rcastillo		13:06
*** rcastillo is now known as rcastillo\|rover		13:06
fungi	just waiting for 03 to finish and then we'll be onto the second half and the problem should finally be fixed	13:55
tristanC	fungi: is the zuul_rolling_restart.yaml playbook available somewhere?	14:00
Clark[m]	tristanC: https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul_rolling_restart.yaml the zuul_reboot.yaml playbook may also be interesting	14:08
Clark[m]	Based on zuul's component listing 01-06 are up on a newer version than 07-12 and 07-12 are paused.	14:11
Clark[m]	tkajinam I would expect jobs that start now to succeed. Can you confirm this?	14:11
Clark[m]	I need to finish booting my morning then I can look more closely at this, but if all went as expected I suspect this is happy now	14:12
* tkajinam triggered recheck		14:12
*** Guest167 is now known as dasm		14:15
tkajinam	Clark[m], yeah it's passing now. one c7 job was picked up by ze05 and the task to set up zuul console passed without failure	14:18
tkajinam	fungi, thank you for your prompt help !	14:20
clarkb	FYI https://gerrit-review.googlesource.com/c/gerrit/+/342836 will break our zuul-results-summary plugin in Gerrit. They are asking for early feedback on slack and I've let them know about our use of that extension point	15:25
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248	15:34
*** dviroel is now known as dviroel\|lunch		15:49
*** chkumar\|ruck is now known as chandankumar		15:51
*** marios is now known as marios\|out		15:54
*** dviroel\|lunch is now known as dviroel		17:01
opendevreview	Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248	17:06
opendevreview	Dr. Jens Harbott proposed opendev/system-config master: reprepro: mirror Ubuntu UCA Zed for Jammy https://review.opendev.org/c/opendev/system-config/+/853189	17:29
clarkb	ok that last mm3 patchset fixes the issue where we don't have any list archives or lists in hyperkitty yet (there is an hourly cron job that runs to populate that in the default install. We manually run the jobs during CI to ensure we update despite timing)	17:41
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Ensure-java: update apt cache https://review.opendev.org/c/zuul/zuul-jobs/+/853190	17:54
clarkb	we are waiting on one last executor to finish up then 07-12 should update to the new image as well	17:57
clarkb	ze09 specifically	17:57
clarkb	all zuul executors are running the same updated version now. They are 4 commits ahead of the mergers and schedulers	18:53
*** dviroel is now known as dviroel\|brb		19:31
opendevreview	Merged zuul/zuul-jobs master: Ensure-java: update apt cache https://review.opendev.org/c/zuul/zuul-jobs/+/853190	20:19
clarkb	infra-root is there anything else that needs to be added to the meeting agenda?	20:49
ianw	clarkb/fungi: thanks for that :/ i'm 0 for 2 on restarting executors to pick up fixes now i think	21:47
fungi	it was easy to overlook. i didn't notice the promote failure until i went looking for why the bug seemed to still be there	21:51
ianw	last time i checked the promote, but forgot to pull the images. i had it in my head this time "make sure you pull the images" but thinking back I just assumed the new images were pushed this time	21:56
fungi	as for why it failed, hard to tell but looked like a rando failure getting an api token from dockerhub	22:01
*** dviroel\|brb is now known as dviroel		22:08
*** dviroel is now known as dviroel\|out		22:12
*** dasm is now known as dasm\|off		22:28
clarkb	meeting agenda sent	22:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!