Tuesday, 2022-03-29

*** soniya29 is now known as soniya29\|rover		04:12
*** ysandeep\|out is now known as ysandeep		04:35
*** soniya29\|rover is now known as soniya29\|rover\|afk		05:13
*** marios is now known as marios\|ruck		05:22
*** soniya29\|rover\|afk is now known as soniya29\|rover		05:31
*** jpena\|off is now known as jpena		07:38
*** ysandeep is now known as ysandeep\|lunch		07:43
*** soniya29\|rover is now known as soniya29\|rover\|afk		08:42
*** ysandeep\|lunch is now known as ysandeep		08:56
*** soniya29\|rover\|afk is now known as soniya29\|rover		09:23
*** yoctozepto_ is now known as yoctozepto		09:29
*** marios is now known as marios\|ruck		09:45
*** rlandy\|out is now known as rlandy		10:36
*** dviroel\|out is now known as dviroel		11:28
frickler	does anyone know why we have "AllowEncodedSlashes On" in our zuul-web proxy config? https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/zuul-web/templates/zuul.vhost.j2#L19	12:30
fungi	that may have been copied from gerrit, which did need it	12:34
fungi	i'm not sure if zuul does or doesn't need to support passing encoded slashes	12:35
frickler	o.k., so I'll try to run without it for my local setup and see what happens, thx	12:39
*** ysandeep is now known as ysandeep\|afk		12:43
*** soniya29\|rover is now known as soniya29\|rover\|call		12:45
*** ysandeep\|afk is now known as ysandeep		12:58
*** soniya29\|rover\|call is now known as soniya29\|rover\|afk		13:21
*** soniya29\|rover\|afk is now known as soniya29\|rover		13:38
iurygregory	Hey folks, does anyone know if we can update the schedule for the ptg using the ptg bot?	13:57
fungi	iurygregory: update the schedule in which way?	13:59
fungi	like you have things to add to the schedule, or things to remove?	14:00
iurygregory	https://ptg.opendev.org/ptg.html we have 13 UTC booked for example in Ocata	14:00
iurygregory	I was planning in move all ironic meetings to the Ocata room since it will be available (to make easier for the contributors also)	14:01
iurygregory	and I also need to book 21-22 UTC on Tuesday =)	14:02
fungi	iurygregory: you probably want the book and unbook commands listed here: https://opendev.org/openstack/ptgbot/src/branch/master/README.rst#book	14:02
fungi	also remember the bot is in the #openinfra-events channel	14:02
iurygregory	ohhhhhhhhhhhh	14:02
iurygregory	this was the info I was looking hehe	14:03
iurygregory	I was checking an old channel probably	14:03
iurygregory	tks fungi!	14:03
fungi	np	14:03
*** pojadhav is now known as pojadhav\|dinner		14:27
gthiemonge	Hey Folks, we have a lot of NODE_FAILUREs with the Octavia centos-8-stream jobs: https://zuul.openstack.org/builds?job_name=octavia-v2-dsvm-scenario-centos-8-stream&project=openstack%2Foctavia&skip=0	14:48
gthiemonge	do you have more information on it?	14:49
fungi	gthiemonge: more generally, seems to be mostly hitting octavia? https://zuul.openstack.org/builds?result=NODE_FAILURE	14:51
gthiemonge	fungi: yeah but only centos jobs (c8, c8s, c9s... and *-fips jobs are also based on centos)	14:53
fungi	https://zuul.openstack.org/job/octavia-v2-dsvm-scenario-centos-8 uses nodeset octavia-single-node-centos-8 which requires the nested-virt-centos-8 label	14:53
fungi	so odds are we're having trouble booting that anywhere that offers it	14:53
fungi	i have to jump to a meeting in a moment but can try to look through logs in a bit	14:53
fungi	a month or so back, ericsson cancelled the citycloud account they were donating to us, which was one of the only providers where we set the nested-virt labels. maybe the other(s) are having some issue today	14:55
fungi	gthiemonge: oh, maybe you want nested-virt-centos-8-stream instead of nested-virt-centos-8	14:56
clarkb	certainly centos-8 should be gone	14:56
gthiemonge	fungi: some stable jobs are still using centos-8, we can try to fix it	14:57
fungi	nested-virt-centos-8 got removed when https://review.opendev.org/827184 merged at the beginning of last month	14:57
fungi	have your jobs been failing for two months?	14:58
clarkb	gthiemonge: no, centos-8 is completely removed fro mour ci system now	14:58
clarkb	any fixing involves not using centos-8	14:58
fungi	we re-added a "centos-8" alias to the centos-8-stream label as a workaround for a catch-22 where zuul wouldn't allow you to remove a label which already didn't exist	14:59
gthiemonge	clarkb: yeah I mean: fixing the jobs by switching to c8s	14:59
fungi	we might have to temporarily do the same to nested-virt-centos-8 if the removal changes fail with a config error	14:59
clarkb	fungi: I think for these more specific flavors it makes less sense to do that as they weren't widley used	15:01
clarkb	and users were already expected to debug and address issues with them	15:01
fungi	just bypass zuul to merge the removals instead?	15:04
clarkb	you shouldn't need to bypass zuul to remove jobs?	15:05
clarkb	I guess if thi sis another osa like situation then maybe?	15:06
clarkb	more just thinking that the nested virt flavors are explicitly "use at your own risk and please help address problems with them" so doing the hack to keep just working by magically swapping out the instance type seems wrong	15:06
fungi	wasn't the reason we added the alias that osa couldn't remove the jobs because zuul wanted the original state to be coherent?	15:06
clarkb	fungi: it was specifically because we removed the nodeset iirc and that was incoherent	15:08
clarkb	but in this case it seems to be NODE_FAILURE implying the nodeset is valid somehow?	15:08
fungi	but yeah, maybe we need a way to be able to force removal of jobs which depend on resources like node labels which we're removing, so that the configuration doesn't end up in an unresolveable state. then projects can add jobs back with a working configuration if they want	15:08
fungi	there is no nested-virt-centos-8 label defined in project-config, at least	15:09
fungi	is the catch-22 that zuul wants to run any jobs which are being altered, so needs to be able to resolve a coherent original state in order to make the comparison?	15:11
fungi	if that's the case, yeah i agree this situation seems to be different since it's actually trying to run	15:12
clarkb	fungi: I don't recall the cause of th eneed for the old config, but basically it needs a valid old config to update the new config. And in the osa cases we had invalid old configs because the nodeset was removed iirc	15:12
fungi	gthiemonge: anyway, the only master branch reference to that label seems to be here (according to codesearch), so see if just adding -stream to the end fixes it: https://opendev.org/openstack/octavia-tempest-plugin/src/branch/master/zuul.d/jobs.yaml#L35	15:16
gthiemonge	fungi: thanks, I will try that	15:17
*** dviroel is now known as dviroel\|lunch		15:17
clarkb	another option is to try rocky (just putting it out there as stream has been a struggle at times due to its rolling nature)	15:18
fungi	sure, that's available too	15:18
*** ysandeep is now known as ysandeep\|out		15:19
clarkb	infra-root if https://review.opendev.org/c/opendev/system-config/+/835307 looks good to you that may be a a good one to land today	15:21
clarkb	updates gitea again (this time to a bug fix release)	15:21
*** pojadhav\|dinner is now known as pojadhav		15:21
*** soniya29\|rover is now known as soniya29\|rover\|dinner		15:44
*** dviroel\|lunch is now known as dviroel		16:21
*** sfinucan is now known as stephenfin		16:31
*** soniya29\|rover\|dinner is now known as soniya29\|rover		16:31
*** marios\|ruck is now known as marios\|out		16:34
clarkb	just occured to me that updating gitea is probably fine tomorrow after the openstack release just to minimize risk	16:43
*** jpena is now known as jpena\|off		16:46
corvus	clarkbfungi the zuul bugfixes landed. would you prefer: a) we do a fast restart of zuul on master to get as much runtime in today as possible; b) a rolling restart of zuul on master which will complete around north america EOD, c) continue running monkeypatched until wednesday morning north america?	17:07
corvus	(a fast restart == terminate all running jobs and restart, but otherwise maintain queue positions: hard restart executors and rolling restart schedulers)	17:08
clarkb	I think we should avoid terminating running jobs as the openstack release is coming together right now	17:11
clarkb	b and c both seem fine? I guess there were other change sthat landed too so maybe the monkey patch is safest until tomorrow and openstack release is done?	17:11
corvus	there is a speedup to reconfiguration which could be a big help during/after a release, unless it breaks, in which case it would not help.	17:12
clarkb	fungi can probably gauge the help that would provide better than me	17:13
clarkb	I think all of he branching is done and now its mostly a matter of landing some fixups and tagging things	17:13
corvus	there is also the second bugfix (which could potentially cause changes with the same topic and a git dependency not to be enqueued)	17:13
corvus	those are the two main differences between what we're running and master	17:13
fungi	yeah, tomorrow is going to be lots of tag creation	17:39
fungi	so probably unaffected?	17:40
fungi	all the scramble to merge things is basically done at this point	17:40
fungi	and as clarkb notes, the new stable branches were created weeks back	17:40
fungi	i'd be okay with a quick restart at this point, since it sounds like the updates are more likely to be helpful than problematic. if we didn't have known bugs we were fixing with this, i'd be more inclined to prefer stability	17:47
clarkb	fungi: by quick do you mean option a? or b?	17:48
fungi	option a	17:48
fungi	so we know sooner if it's destabilizing anything	17:48
fungi	but this might stop me from needing to restart schedulers on limited caffeine tomorrow when there's a bunch of queued up tag events not getting picked up and enqueued	17:49
clarkb	got it	17:49
fungi	though we may want to wait until 835322 and 835323 report in check	17:51
fungi	that's basically the last bits the release managers are readying for tomorrow	17:51
corvus	okay, i'll plan to do option a after my lunch (+2 hours from now max)	18:00
fungi	sounds great, thanks!	18:26
corvus	looks like those have reported; i'm stuffing my face now, but will start the process shortly	18:55
fungi	yep, perfect	18:57
corvus	performing the hard stop/start of the executors/mergers now	19:12
corvus	they're all back up now... i'm just letting the schedulers resettle before i start rolling them	19:21
corvus	it's useful to have 2 schedulers right now :)	19:22
fungi	awesome	19:22
fungi	yep	19:23
corvus	okay, i think they're more or less caught up with all the restarts; i'll restart zuul02 now	19:29
corvus	02 is up; going to restart 01 now	19:43
corvus	01 is down; we're now running data model v6	19:46
clarkb	That is what brings the performance improvement	19:47
corvus	yep -- possibly the second tenant reconfiguration after this (the first one will write the data, the second should use it)	19:48
clarkb	looks like those osa failures are related to the setuptools 61.0.0 problem	19:48
clarkb	which means unrelated to the zuul stuff	19:48
corvus	we're running unecessary cat jobs for zuul01 right now because the latest tenant configs don't have the performance improving data; they will after the next tenant config	20:00
corvus	zuul02 is processing normally	20:01
corvus	01 is up now	20:05
*** dviroel is now known as dviroel\|out		20:50
corvus	typical reconfiguration time before today: 2022-03-29 16:11:13,135 INFO zuul.Scheduler: Tenant reconfiguration complete for openstack (duration: 654.228 seconds)	22:00
corvus	with new optimizations: 2022-03-29 21:56:36,835 INFO zuul.Scheduler: Tenant reconfiguration complete for openstack (duration: 37.417 seconds)	22:00
fungi	WOW	22:02
fungi	20x speedup	22:02
clarkb	that is impressive	22:43
clarkb	fungi: thank you for https://review.opendev.org/c/opendev/system-config/+/834877 that is a good message to reassert	22:48
clarkb	I guess if we land ^ there are a few more changes we can abandon	22:55
fungi	the individual whose chance prompted me to write that responded that it was clearly articulated and would have saved them some time	22:57
fungi	er, whose change	22:57
clarkb	ya Ithink the struggle is that we did at one time intend on that use case	22:57
fungi	with the puppet modules, yes, at least the ones we put in their own repos	22:58
clarkb	and so there is a fuzzy transition period where we basically stopped gettin gthe help we needed to make that use caes possible where things may have been stuck in limbo	22:58
fungi	i don't think i ever had the impression that things in system-config were intended to be reusable though	22:58
fungi	examples, yes	22:59
clarkb	fungi: I think that is why system-config/roles exists fwiw	22:59
clarkb	in addition to system-config/playbook/roles	22:59
clarkb	the top level roles dir is something ansible understands if you intsal lthe repo via galaxy or some such	22:59
fungi	i would probably move them to other repos if we were to do that	23:00
clarkb	ya or shift to playbooks/roles	23:00
corvus	(i have always advocated putting nothing in system-config/roles)	23:00

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!