Tuesday, 2022-03-29

*** soniya29 is now known as soniya29|rover04:12
*** ysandeep|out is now known as ysandeep04:35
*** soniya29|rover is now known as soniya29|rover|afk05:13
*** marios is now known as marios|ruck05:22
*** soniya29|rover|afk is now known as soniya29|rover05:31
*** jpena|off is now known as jpena07:38
*** ysandeep is now known as ysandeep|lunch07:43
*** soniya29|rover is now known as soniya29|rover|afk08:42
*** ysandeep|lunch is now known as ysandeep08:56
*** soniya29|rover|afk is now known as soniya29|rover09:23
*** yoctozepto_ is now known as yoctozepto09:29
*** marios is now known as marios|ruck09:45
*** rlandy|out is now known as rlandy10:36
*** dviroel|out is now known as dviroel11:28
fricklerdoes anyone know why we have "AllowEncodedSlashes On" in our zuul-web proxy config? https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/zuul-web/templates/zuul.vhost.j2#L1912:30
fungithat may have been copied from gerrit, which did need it12:34
fungii'm not sure if zuul does or doesn't need to support passing encoded slashes12:35
fricklero.k., so I'll try to run without it for my local setup and see what happens, thx12:39
*** ysandeep is now known as ysandeep|afk12:43
*** soniya29|rover is now known as soniya29|rover|call12:45
*** ysandeep|afk is now known as ysandeep12:58
*** soniya29|rover|call is now known as soniya29|rover|afk13:21
*** soniya29|rover|afk is now known as soniya29|rover13:38
iurygregoryHey folks, does anyone know if we can update the schedule for the ptg using the ptg bot?13:57
fungiiurygregory: update the schedule in which way?13:59
fungilike you have things to add to the schedule, or things to remove?14:00
iurygregoryhttps://ptg.opendev.org/ptg.html we have 13 UTC booked for example in Ocata14:00
iurygregoryI was planning in move all ironic meetings to the Ocata room since it will be available (to make easier for the contributors also)14:01
iurygregoryand I also need to book 21-22 UTC on Tuesday =)14:02
fungiiurygregory: you probably want the book and unbook commands listed here: https://opendev.org/openstack/ptgbot/src/branch/master/README.rst#book14:02
fungialso remember the bot is in the #openinfra-events channel14:02
iurygregorythis was the info I was looking hehe14:03
iurygregoryI was checking an old channel probably14:03
iurygregorytks fungi!14:03
*** pojadhav is now known as pojadhav|dinner14:27
gthiemongeHey Folks, we have a lot of NODE_FAILUREs with the Octavia centos-8-stream jobs: https://zuul.openstack.org/builds?job_name=octavia-v2-dsvm-scenario-centos-8-stream&project=openstack%2Foctavia&skip=014:48
gthiemongedo you have more information on it?14:49
fungigthiemonge: more generally, seems to be mostly hitting octavia? https://zuul.openstack.org/builds?result=NODE_FAILURE14:51
gthiemongefungi: yeah but only centos jobs (c8, c8s, c9s... and *-fips jobs are also based on centos)14:53
fungihttps://zuul.openstack.org/job/octavia-v2-dsvm-scenario-centos-8 uses nodeset octavia-single-node-centos-8 which requires the nested-virt-centos-8 label14:53
fungiso odds are we're having trouble booting that anywhere that offers it14:53
fungii have to jump to a meeting in a moment but can try to look through logs in a bit14:53
fungia month or so back, ericsson cancelled the citycloud account they were donating to us, which was one of the only providers where we set the nested-virt labels. maybe the other(s) are having some issue today14:55
fungigthiemonge: oh, maybe you want nested-virt-centos-8-stream instead of nested-virt-centos-814:56
clarkbcertainly centos-8 should be gone14:56
gthiemongefungi: some stable jobs are still using centos-8, we can try to fix it14:57
funginested-virt-centos-8 got removed when https://review.opendev.org/827184 merged at the beginning of last month14:57
fungihave your jobs been failing for two months?14:58
clarkbgthiemonge: no, centos-8 is completely removed fro mour ci system now14:58
clarkbany fixing involves not using centos-814:58
fungiwe re-added a "centos-8" alias to the centos-8-stream label as a workaround for a catch-22 where zuul wouldn't allow you to remove a label which already didn't exist14:59
gthiemongeclarkb: yeah I mean: fixing the jobs by switching to c8s14:59
fungiwe might have to temporarily do the same to nested-virt-centos-8 if the removal changes fail with a config error14:59
clarkbfungi: I think for these more specific flavors it makes less sense to do that as they weren't widley used15:01
clarkband users were already expected to debug and address issues with them15:01
fungijust bypass zuul to merge the removals instead?15:04
clarkbyou shouldn't need to bypass zuul to remove jobs?15:05
clarkbI guess if thi sis another osa like situation then maybe?15:06
clarkbmore just thinking that the nested virt flavors are explicitly "use at your own risk and please help address problems with them" so doing the hack to keep just working by magically swapping out the instance type seems wrong15:06
fungiwasn't the reason we added the alias that osa couldn't remove the jobs because zuul wanted the original state to be coherent?15:06
clarkbfungi: it was specifically because we removed the nodeset iirc and that was incoherent15:08
clarkbbut in this case it seems to be NODE_FAILURE implying the nodeset is valid somehow?15:08
fungibut yeah, maybe we need a way to be able to force removal of jobs which depend on resources like node labels which we're removing, so that the configuration doesn't end up in an unresolveable state. then projects can add jobs back with a working configuration if they want15:08
fungithere is no nested-virt-centos-8 label defined in project-config, at least15:09
fungiis the catch-22 that zuul wants to run any jobs which are being altered, so needs to be able to resolve a coherent original state in order to make the comparison?15:11
fungiif that's the case, yeah i agree this situation seems to be different since it's actually trying to run15:12
clarkbfungi: I don't recall the cause of th eneed for the old config, but basically it needs a valid old config to update the new config. And in the osa cases we had invalid old configs because the nodeset was removed iirc15:12
fungigthiemonge: anyway, the only master branch reference to that label seems to be here (according to codesearch), so see if just adding -stream to the end fixes it: https://opendev.org/openstack/octavia-tempest-plugin/src/branch/master/zuul.d/jobs.yaml#L3515:16
gthiemongefungi: thanks, I will try that15:17
*** dviroel is now known as dviroel|lunch15:17
clarkbanother option is to try rocky (just putting it out there as stream has been a struggle at times due to its rolling nature)15:18
fungisure, that's available too15:18
*** ysandeep is now known as ysandeep|out15:19
clarkbinfra-root if https://review.opendev.org/c/opendev/system-config/+/835307 looks good to you that may be a a good one to land today15:21
clarkbupdates gitea again (this time to a bug fix release)15:21
*** pojadhav|dinner is now known as pojadhav15:21
*** soniya29|rover is now known as soniya29|rover|dinner15:44
*** dviroel|lunch is now known as dviroel16:21
*** sfinucan is now known as stephenfin16:31
*** soniya29|rover|dinner is now known as soniya29|rover16:31
*** marios|ruck is now known as marios|out16:34
clarkbjust occured to me that updating gitea is probably fine tomorrow after the openstack release just to minimize risk16:43
*** jpena is now known as jpena|off16:46
corvusclarkbfungi the zuul bugfixes landed. would you prefer: a) we do a fast restart of zuul on master to get as much runtime in today as possible; b) a rolling restart of zuul on master which will complete around north america EOD, c) continue running monkeypatched until wednesday morning north america?17:07
corvus(a fast restart == terminate all running jobs and restart, but otherwise maintain queue positions: hard restart executors and rolling restart schedulers)17:08
clarkbI think we should avoid terminating running jobs as the openstack release is coming together right now17:11
clarkbb and c both seem fine? I guess there were other change sthat landed too so maybe the monkey patch is safest until tomorrow and openstack release is done?17:11
corvusthere is a speedup to reconfiguration which could be a big help during/after a release, unless it breaks, in which case it would not help.17:12
clarkbfungi can probably gauge the help that would provide better than me17:13
clarkbI think all of he branching is done and now its mostly a matter of landing some fixups and tagging things17:13
corvusthere is also the second bugfix (which could potentially cause changes with the same topic and a git dependency not to be enqueued)17:13
corvusthose are the two main differences between what we're running and master17:13
fungiyeah, tomorrow is going to be lots of tag creation17:39
fungiso probably unaffected?17:40
fungiall the scramble to merge things is basically done at this point17:40
fungiand as clarkb notes, the new stable branches were created weeks back17:40
fungii'd be okay with a quick restart at this point, since it sounds like the updates are more likely to be helpful than problematic. if we didn't have known bugs we were fixing with this, i'd be more inclined to prefer stability17:47
clarkbfungi: by quick do you mean option a? or b?17:48
fungioption a17:48
fungiso we know sooner if it's destabilizing anything17:48
fungibut this might stop me from needing to restart schedulers on limited caffeine tomorrow when there's a bunch of queued up tag events not getting picked up and enqueued17:49
clarkbgot it17:49
fungithough we may want to wait until 835322 and 835323 report in check17:51
fungithat's basically the last bits the release managers are readying for tomorrow17:51
corvusokay, i'll plan to do option a after my lunch (+2 hours from now max)18:00
fungisounds great, thanks!18:26
corvuslooks like those have reported; i'm stuffing my face now, but will start the process shortly18:55
fungiyep, perfect18:57
corvusperforming the hard stop/start of the executors/mergers now19:12
corvusthey're all back up now... i'm just letting the schedulers resettle before i start rolling them19:21
corvusit's useful to have 2 schedulers right now :)19:22
corvusokay, i think they're more or less caught up with all the restarts; i'll restart zuul02 now19:29
corvus02 is up; going to restart 01 now19:43
corvus01 is down; we're now running data model v619:46
clarkbThat is what brings the performance improvement19:47
corvusyep -- possibly the second tenant reconfiguration after this (the first one will write the data, the second should use it)19:48
clarkblooks like those osa failures are related to the setuptools 61.0.0 problem19:48
clarkbwhich means unrelated to the zuul stuff19:48
corvuswe're running unecessary cat jobs for zuul01 right now because the latest tenant configs don't have the performance improving data; they will after the next tenant config20:00
corvuszuul02 is processing normally20:01
corvus01 is up now20:05
*** dviroel is now known as dviroel|out20:50
corvustypical reconfiguration time before today: 2022-03-29 16:11:13,135 INFO zuul.Scheduler: Tenant reconfiguration complete for openstack (duration: 654.228 seconds)22:00
corvuswith new optimizations: 2022-03-29 21:56:36,835 INFO zuul.Scheduler: Tenant reconfiguration complete for openstack (duration: 37.417 seconds)22:00
fungi20x speedup22:02
clarkbthat is impressive22:43
clarkbfungi: thank you for https://review.opendev.org/c/opendev/system-config/+/834877 that is a good message to reassert22:48
clarkbI guess if we land ^ there are a few more changes we can abandon22:55
fungithe individual whose chance prompted me to write that responded that it was clearly articulated and would have saved them some time22:57
fungier, whose change22:57
clarkbya  Ithink the struggle is that we did at one time intend on that use case22:57
fungiwith the puppet modules, yes, at least the ones we put in their own repos22:58
clarkband so there is a fuzzy transition period where we basically stopped gettin gthe help we needed to make that use caes possible where things may have been stuck in limbo22:58
fungii don't think i ever had the impression that things in system-config were intended to be reusable though22:58
fungiexamples, yes22:59
clarkbfungi: I think that is why system-config/roles exists fwiw22:59
clarkbin addition to system-config/playbook/roles22:59
clarkbthe top level roles dir is something ansible understands if you intsal lthe repo via galaxy or some such22:59
fungii would probably move them to other repos if we were to do that23:00
clarkbya or shift to playbooks/roles23:00
corvus(i have always advocated putting nothing in system-config/roles)23:00

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!