*** soniya29 is now known as soniya29|rover | 04:12 | |
*** ysandeep|out is now known as ysandeep | 04:35 | |
*** soniya29|rover is now known as soniya29|rover|afk | 05:13 | |
*** marios is now known as marios|ruck | 05:22 | |
*** soniya29|rover|afk is now known as soniya29|rover | 05:31 | |
*** jpena|off is now known as jpena | 07:38 | |
*** ysandeep is now known as ysandeep|lunch | 07:43 | |
*** soniya29|rover is now known as soniya29|rover|afk | 08:42 | |
*** ysandeep|lunch is now known as ysandeep | 08:56 | |
*** soniya29|rover|afk is now known as soniya29|rover | 09:23 | |
*** yoctozepto_ is now known as yoctozepto | 09:29 | |
*** marios is now known as marios|ruck | 09:45 | |
*** rlandy|out is now known as rlandy | 10:36 | |
*** dviroel|out is now known as dviroel | 11:28 | |
frickler | does anyone know why we have "AllowEncodedSlashes On" in our zuul-web proxy config? https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/zuul-web/templates/zuul.vhost.j2#L19 | 12:30 |
---|---|---|
fungi | that may have been copied from gerrit, which did need it | 12:34 |
fungi | i'm not sure if zuul does or doesn't need to support passing encoded slashes | 12:35 |
frickler | o.k., so I'll try to run without it for my local setup and see what happens, thx | 12:39 |
*** ysandeep is now known as ysandeep|afk | 12:43 | |
*** soniya29|rover is now known as soniya29|rover|call | 12:45 | |
*** ysandeep|afk is now known as ysandeep | 12:58 | |
*** soniya29|rover|call is now known as soniya29|rover|afk | 13:21 | |
*** soniya29|rover|afk is now known as soniya29|rover | 13:38 | |
iurygregory | Hey folks, does anyone know if we can update the schedule for the ptg using the ptg bot? | 13:57 |
fungi | iurygregory: update the schedule in which way? | 13:59 |
fungi | like you have things to add to the schedule, or things to remove? | 14:00 |
iurygregory | https://ptg.opendev.org/ptg.html we have 13 UTC booked for example in Ocata | 14:00 |
iurygregory | I was planning in move all ironic meetings to the Ocata room since it will be available (to make easier for the contributors also) | 14:01 |
iurygregory | and I also need to book 21-22 UTC on Tuesday =) | 14:02 |
fungi | iurygregory: you probably want the book and unbook commands listed here: https://opendev.org/openstack/ptgbot/src/branch/master/README.rst#book | 14:02 |
fungi | also remember the bot is in the #openinfra-events channel | 14:02 |
iurygregory | ohhhhhhhhhhhh | 14:02 |
iurygregory | this was the info I was looking hehe | 14:03 |
iurygregory | I was checking an old channel probably | 14:03 |
iurygregory | tks fungi! | 14:03 |
fungi | np | 14:03 |
*** pojadhav is now known as pojadhav|dinner | 14:27 | |
gthiemonge | Hey Folks, we have a lot of NODE_FAILUREs with the Octavia centos-8-stream jobs: https://zuul.openstack.org/builds?job_name=octavia-v2-dsvm-scenario-centos-8-stream&project=openstack%2Foctavia&skip=0 | 14:48 |
gthiemonge | do you have more information on it? | 14:49 |
fungi | gthiemonge: more generally, seems to be mostly hitting octavia? https://zuul.openstack.org/builds?result=NODE_FAILURE | 14:51 |
gthiemonge | fungi: yeah but only centos jobs (c8, c8s, c9s... and *-fips jobs are also based on centos) | 14:53 |
fungi | https://zuul.openstack.org/job/octavia-v2-dsvm-scenario-centos-8 uses nodeset octavia-single-node-centos-8 which requires the nested-virt-centos-8 label | 14:53 |
fungi | so odds are we're having trouble booting that anywhere that offers it | 14:53 |
fungi | i have to jump to a meeting in a moment but can try to look through logs in a bit | 14:53 |
fungi | a month or so back, ericsson cancelled the citycloud account they were donating to us, which was one of the only providers where we set the nested-virt labels. maybe the other(s) are having some issue today | 14:55 |
fungi | gthiemonge: oh, maybe you want nested-virt-centos-8-stream instead of nested-virt-centos-8 | 14:56 |
clarkb | certainly centos-8 should be gone | 14:56 |
gthiemonge | fungi: some stable jobs are still using centos-8, we can try to fix it | 14:57 |
fungi | nested-virt-centos-8 got removed when https://review.opendev.org/827184 merged at the beginning of last month | 14:57 |
fungi | have your jobs been failing for two months? | 14:58 |
clarkb | gthiemonge: no, centos-8 is completely removed fro mour ci system now | 14:58 |
clarkb | any fixing involves not using centos-8 | 14:58 |
fungi | we re-added a "centos-8" alias to the centos-8-stream label as a workaround for a catch-22 where zuul wouldn't allow you to remove a label which already didn't exist | 14:59 |
gthiemonge | clarkb: yeah I mean: fixing the jobs by switching to c8s | 14:59 |
fungi | we might have to temporarily do the same to nested-virt-centos-8 if the removal changes fail with a config error | 14:59 |
clarkb | fungi: I think for these more specific flavors it makes less sense to do that as they weren't widley used | 15:01 |
clarkb | and users were already expected to debug and address issues with them | 15:01 |
fungi | just bypass zuul to merge the removals instead? | 15:04 |
clarkb | you shouldn't need to bypass zuul to remove jobs? | 15:05 |
clarkb | I guess if thi sis another osa like situation then maybe? | 15:06 |
clarkb | more just thinking that the nested virt flavors are explicitly "use at your own risk and please help address problems with them" so doing the hack to keep just working by magically swapping out the instance type seems wrong | 15:06 |
fungi | wasn't the reason we added the alias that osa couldn't remove the jobs because zuul wanted the original state to be coherent? | 15:06 |
clarkb | fungi: it was specifically because we removed the nodeset iirc and that was incoherent | 15:08 |
clarkb | but in this case it seems to be NODE_FAILURE implying the nodeset is valid somehow? | 15:08 |
fungi | but yeah, maybe we need a way to be able to force removal of jobs which depend on resources like node labels which we're removing, so that the configuration doesn't end up in an unresolveable state. then projects can add jobs back with a working configuration if they want | 15:08 |
fungi | there is no nested-virt-centos-8 label defined in project-config, at least | 15:09 |
fungi | is the catch-22 that zuul wants to run any jobs which are being altered, so needs to be able to resolve a coherent original state in order to make the comparison? | 15:11 |
fungi | if that's the case, yeah i agree this situation seems to be different since it's actually trying to run | 15:12 |
clarkb | fungi: I don't recall the cause of th eneed for the old config, but basically it needs a valid old config to update the new config. And in the osa cases we had invalid old configs because the nodeset was removed iirc | 15:12 |
fungi | gthiemonge: anyway, the only master branch reference to that label seems to be here (according to codesearch), so see if just adding -stream to the end fixes it: https://opendev.org/openstack/octavia-tempest-plugin/src/branch/master/zuul.d/jobs.yaml#L35 | 15:16 |
gthiemonge | fungi: thanks, I will try that | 15:17 |
*** dviroel is now known as dviroel|lunch | 15:17 | |
clarkb | another option is to try rocky (just putting it out there as stream has been a struggle at times due to its rolling nature) | 15:18 |
fungi | sure, that's available too | 15:18 |
*** ysandeep is now known as ysandeep|out | 15:19 | |
clarkb | infra-root if https://review.opendev.org/c/opendev/system-config/+/835307 looks good to you that may be a a good one to land today | 15:21 |
clarkb | updates gitea again (this time to a bug fix release) | 15:21 |
*** pojadhav|dinner is now known as pojadhav | 15:21 | |
*** soniya29|rover is now known as soniya29|rover|dinner | 15:44 | |
*** dviroel|lunch is now known as dviroel | 16:21 | |
*** sfinucan is now known as stephenfin | 16:31 | |
*** soniya29|rover|dinner is now known as soniya29|rover | 16:31 | |
*** marios|ruck is now known as marios|out | 16:34 | |
clarkb | just occured to me that updating gitea is probably fine tomorrow after the openstack release just to minimize risk | 16:43 |
*** jpena is now known as jpena|off | 16:46 | |
corvus | clarkbfungi the zuul bugfixes landed. would you prefer: a) we do a fast restart of zuul on master to get as much runtime in today as possible; b) a rolling restart of zuul on master which will complete around north america EOD, c) continue running monkeypatched until wednesday morning north america? | 17:07 |
corvus | (a fast restart == terminate all running jobs and restart, but otherwise maintain queue positions: hard restart executors and rolling restart schedulers) | 17:08 |
clarkb | I think we should avoid terminating running jobs as the openstack release is coming together right now | 17:11 |
clarkb | b and c both seem fine? I guess there were other change sthat landed too so maybe the monkey patch is safest until tomorrow and openstack release is done? | 17:11 |
corvus | there is a speedup to reconfiguration which could be a big help during/after a release, unless it breaks, in which case it would not help. | 17:12 |
clarkb | fungi can probably gauge the help that would provide better than me | 17:13 |
clarkb | I think all of he branching is done and now its mostly a matter of landing some fixups and tagging things | 17:13 |
corvus | there is also the second bugfix (which could potentially cause changes with the same topic and a git dependency not to be enqueued) | 17:13 |
corvus | those are the two main differences between what we're running and master | 17:13 |
fungi | yeah, tomorrow is going to be lots of tag creation | 17:39 |
fungi | so probably unaffected? | 17:40 |
fungi | all the scramble to merge things is basically done at this point | 17:40 |
fungi | and as clarkb notes, the new stable branches were created weeks back | 17:40 |
fungi | i'd be okay with a quick restart at this point, since it sounds like the updates are more likely to be helpful than problematic. if we didn't have known bugs we were fixing with this, i'd be more inclined to prefer stability | 17:47 |
clarkb | fungi: by quick do you mean option a? or b? | 17:48 |
fungi | option a | 17:48 |
fungi | so we know sooner if it's destabilizing anything | 17:48 |
fungi | but this might stop me from needing to restart schedulers on limited caffeine tomorrow when there's a bunch of queued up tag events not getting picked up and enqueued | 17:49 |
clarkb | got it | 17:49 |
fungi | though we may want to wait until 835322 and 835323 report in check | 17:51 |
fungi | that's basically the last bits the release managers are readying for tomorrow | 17:51 |
corvus | okay, i'll plan to do option a after my lunch (+2 hours from now max) | 18:00 |
fungi | sounds great, thanks! | 18:26 |
corvus | looks like those have reported; i'm stuffing my face now, but will start the process shortly | 18:55 |
fungi | yep, perfect | 18:57 |
corvus | performing the hard stop/start of the executors/mergers now | 19:12 |
corvus | they're all back up now... i'm just letting the schedulers resettle before i start rolling them | 19:21 |
corvus | it's useful to have 2 schedulers right now :) | 19:22 |
fungi | awesome | 19:22 |
fungi | yep | 19:23 |
corvus | okay, i think they're more or less caught up with all the restarts; i'll restart zuul02 now | 19:29 |
corvus | 02 is up; going to restart 01 now | 19:43 |
corvus | 01 is down; we're now running data model v6 | 19:46 |
clarkb | That is what brings the performance improvement | 19:47 |
corvus | yep -- possibly the second tenant reconfiguration after this (the first one will write the data, the second should use it) | 19:48 |
clarkb | looks like those osa failures are related to the setuptools 61.0.0 problem | 19:48 |
clarkb | which means unrelated to the zuul stuff | 19:48 |
corvus | we're running unecessary cat jobs for zuul01 right now because the latest tenant configs don't have the performance improving data; they will after the next tenant config | 20:00 |
corvus | zuul02 is processing normally | 20:01 |
corvus | 01 is up now | 20:05 |
*** dviroel is now known as dviroel|out | 20:50 | |
corvus | typical reconfiguration time before today: 2022-03-29 16:11:13,135 INFO zuul.Scheduler: Tenant reconfiguration complete for openstack (duration: 654.228 seconds) | 22:00 |
corvus | with new optimizations: 2022-03-29 21:56:36,835 INFO zuul.Scheduler: Tenant reconfiguration complete for openstack (duration: 37.417 seconds) | 22:00 |
fungi | WOW | 22:02 |
fungi | 20x speedup | 22:02 |
clarkb | that is impressive | 22:43 |
clarkb | fungi: thank you for https://review.opendev.org/c/opendev/system-config/+/834877 that is a good message to reassert | 22:48 |
clarkb | I guess if we land ^ there are a few more changes we can abandon | 22:55 |
fungi | the individual whose chance prompted me to write that responded that it was clearly articulated and would have saved them some time | 22:57 |
fungi | er, whose change | 22:57 |
clarkb | ya Ithink the struggle is that we did at one time intend on that use case | 22:57 |
fungi | with the puppet modules, yes, at least the ones we put in their own repos | 22:58 |
clarkb | and so there is a fuzzy transition period where we basically stopped gettin gthe help we needed to make that use caes possible where things may have been stuck in limbo | 22:58 |
fungi | i don't think i ever had the impression that things in system-config were intended to be reusable though | 22:58 |
fungi | examples, yes | 22:59 |
clarkb | fungi: I think that is why system-config/roles exists fwiw | 22:59 |
clarkb | in addition to system-config/playbook/roles | 22:59 |
clarkb | the top level roles dir is something ansible understands if you intsal lthe repo via galaxy or some such | 22:59 |
fungi | i would probably move them to other repos if we were to do that | 23:00 |
clarkb | ya or shift to playbooks/roles | 23:00 |
corvus | (i have always advocated putting nothing in system-config/roles) | 23:00 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!