-@gerrit:opendev.org- Michael Kelly proposed: [zuul/zuul-jobs] 861799: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/861799 | 00:03 | |
-@gerrit:opendev.org- Michael Kelly proposed: [zuul/zuul-jobs] 861799: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/861799 | 00:15 | |
-@gerrit:opendev.org- Michael Kelly proposed: [zuul/zuul-jobs] 861799: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/861799 | 00:25 | |
-@gerrit:opendev.org- Michael Kelly proposed: [zuul/zuul-jobs] 861799: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/861799 | 00:31 | |
-@gerrit:opendev.org- Michael Kelly proposed: | 03:35 | |
- [zuul/zuul-operator] 853592: Allow the specification of storageClassName in PVCs https://review.opendev.org/c/zuul/zuul-operator/+/853592 | ||
- [zuul/zuul-operator] 853695: Prefix zuul-specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853695 | ||
- [zuul/zuul-operator] 853696: Prefix nodepool specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853696 | ||
- [zuul/zuul-operator] 861488: helm: Add a basic helm chart for zuul-operator https://review.opendev.org/c/zuul/zuul-operator/+/861488 | ||
- [zuul/zuul-operator] 862390: helm: Add cert-manager as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/862390 | ||
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279 | ||
@michael_kelly_anet:matrix.org | tristanC: and corvus can one (or both) of you folks take another look at https://review.opendev.org/c/zuul/zuul-operator/+/853592/12 and maybe give it workflow +2? | 03:36 |
---|---|---|
-@gerrit:opendev.org- Michael Kelly proposed: | 04:28 | |
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279 | ||
- [zuul/zuul-operator] 863191: helm: Add pxc-operator as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/863191 | ||
-@gerrit:opendev.org- Michael Kelly proposed: | 04:32 | |
- [zuul/zuul-operator] 862390: helm: Add cert-manager as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/862390 | ||
- [zuul/zuul-operator] 863191: helm: Add pxc-operator as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/863191 | ||
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279 | ||
@iwienand:matrix.org | i can not let this zuul-sphinx issue rest because nothing about it seem to make sense :( | 07:17 |
@iwienand:matrix.org | upstream docutils have been helpful and engaged with the issue, but it's still not clear to me what's going on. i've bisected it down to one change in sphinx | 07:17 |
@iwienand:matrix.org | https://github.com/sphinx-doc/sphinx/issues/10951 | 07:18 |
@iwienand:matrix.org | i can't replicate with docutils >0.17 ... but we can't use that until rtd_sphinx_theme makes their next release (they are pinned to <0.18, but their changelog says next release is slated to remove that) | 07:19 |
@iwienand:matrix.org | that might be enough to convince us to just pin to sphinx 5.2.3 until that release. i just didn't want to end up in a situation where zuul-sphinx has bit-rotted and we can never move on to new sphinx versions | 07:20 |
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 863186: Fix skipped builds filter in web ui https://review.opendev.org/c/zuul/zuul/+/863186 | 14:37 | |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 863326: Fix config-errors dedicated page https://review.opendev.org/c/zuul/zuul/+/863326 | 15:48 | |
@clarkb:matrix.org | I'm not sure ^ is a complete or accurate fix. I'm hoping the preview site will help debug. | 15:50 |
@jim:acmegating.com | Clark: that wfm locally. | 16:02 |
@jim:acmegating.com | zuul-maint: ^ i think we can/should sneak that into the 8.0.1 release too if you want to +3 it. i can make the release later today. | 16:03 |
@clarkb:matrix.org | cool, the bit I was less sure about was whether or not I needed to handle the ready flag and clearing state. But I think beacuse the page is tenant specific we don't need to do that | 16:03 |
@jim:acmegating.com | fyi, we're about to perform a zk upgrade on opendev's zuul, join #_oftc_#opendev:matrix.org to follow along/participate | 16:06 |
-@gerrit:opendev.org- Michael Kelly proposed: | 16:23 | |
- [zuul/zuul-operator] 853592: Allow the specification of storageClassName in PVCs https://review.opendev.org/c/zuul/zuul-operator/+/853592 | ||
- [zuul/zuul-operator] 853695: Prefix zuul-specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853695 | ||
- [zuul/zuul-operator] 853696: Prefix nodepool specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853696 | ||
- [zuul/zuul-operator] 861488: helm: Add a basic helm chart for zuul-operator https://review.opendev.org/c/zuul/zuul-operator/+/861488 | ||
- [zuul/zuul-operator] 862390: helm: Add cert-manager as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/862390 | ||
- [zuul/zuul-operator] 863191: helm: Add pxc-operator as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/863191 | ||
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279 | ||
-@gerrit:opendev.org- Michael Kelly proposed: | 16:24 | |
- [zuul/zuul-operator] 853592: Allow the specification of storageClassName in PVCs https://review.opendev.org/c/zuul/zuul-operator/+/853592 | ||
- [zuul/zuul-operator] 853695: Prefix zuul-specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853695 | ||
- [zuul/zuul-operator] 853696: Prefix nodepool specific resources with instance name https://review.opendev.org/c/zuul/zuul-operator/+/853696 | ||
- [zuul/zuul-operator] 861488: helm: Add a basic helm chart for zuul-operator https://review.opendev.org/c/zuul/zuul-operator/+/861488 | ||
- [zuul/zuul-operator] 862390: helm: Add cert-manager as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/862390 | ||
- [zuul/zuul-operator] 863191: helm: Add pxc-operator as optional dependency https://review.opendev.org/c/zuul/zuul-operator/+/863191 | ||
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279 | ||
@clarkb:matrix.org | > <@jim:acmegating.com> Clark: that wfm locally. | 18:03 |
The site preview also shows it working for anyone else doing reviews. | ||
@jim:acmegating.com | i'm going to go ahead and +w that due to the breakage; retro-reviews still welcome | 18:04 |
-@gerrit:opendev.org- Joshua Watt proposed: [zuul/zuul] 863068: executor: Skip line mapping for special Gerrit files https://review.opendev.org/c/zuul/zuul/+/863068 | 20:02 | |
@clarkb:matrix.org | corvus: this isn't super urgent but would be nice to have fixed https://review.opendev.org/c/zuul/zuul-jobs/+/863098 allows the named-checkzone command to run via $PATH as the install dir changes across ubuntu versions | 20:29 |
@jim:acmegating.com | Clark: looks like no test job for that... that should be a pretty easy use of 'simple-role-test'... i think we'd just need to stick a sample zonedb file in the repo... | 20:39 |
@jim:acmegating.com | basically just: | 20:40 |
run: test-playbooks/simple-role-test.yaml | ||
vars: {zone_db_files: [path]} | ||
@clarkb:matrix.org | Ya, the change it broke has a depends on to show it works on jammy at least. But that doesn't cover the old /usr/sbin path I guess | 20:41 |
@clarkb:matrix.org | I'll take a look at making a test job | 20:42 |
@jim:acmegating.com | thanks, that'd be great. i really like having at least simple-role-test jobs for every role if we can. maybe that role predated that. | 20:42 |
@jim:acmegating.com | we can throw the multi-platform tag (whatever it's called) on it too to get versions for all the platforms we care about | 20:43 |
@jim:acmegating.com | (it's both regression and future-proofing, as if it had existed, it would have caught this issue ahead of time) | 20:44 |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 863098: Fix check zone role for Jammy https://review.opendev.org/c/zuul/zuul-jobs/+/863098 | 21:04 | |
@clarkb:matrix.org | ok that shows older ubuntu working which is great but debian fails I think because they don't have /usr/sbin in the path by default. I'll reduce this to ubuntu and if we decide to add more platforms later we can make it more robust then | 21:18 |
@clarkb:matrix.org | oh except I guess it may have run on debian before since it hardcoded /usr/sbin. Maybe I need to address that for debian now | 21:19 |
-@gerrit:opendev.org- Michael Kelly proposed: | 21:20 | |
- [zuul/zuul-operator] 861279: bug: Select scheduler pod based on instance name on update https://review.opendev.org/c/zuul/zuul-operator/+/861279 | ||
- [zuul/zuul-operator] 863439: doc: Re-write install doc to use helm chart https://review.opendev.org/c/zuul/zuul-operator/+/863439 | ||
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 863098: Fix check zone role for Jammy https://review.opendev.org/c/zuul/zuul-jobs/+/863098 | 21:21 | |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 863098: Fix check zone role for Jammy https://review.opendev.org/c/zuul/zuul-jobs/+/863098 | 21:26 | |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 863440: Fix deduplication exceptions in pipeline processing https://review.opendev.org/c/zuul/zuul/+/863440 | 21:28 | |
@clarkb:matrix.org | the diff between latest patchset and previous one on that zuul-jobs change is maybe my least favorite ansible behavior | 21:28 |
@clarkb:matrix.org | anyway that appears to be working now. yay for testing | 21:29 |
@clarkb:matrix.org | corvus: is 863440 related to the job timeouts we're seeing? | 21:34 |
@jim:acmegating.com | aroo? | 21:34 |
@clarkb:matrix.org | I noticed them with the py311 stuff first and attributed it to the interpreter. But now it seems py38 and py310 may be getting stuck too? | 21:34 |
@clarkb:matrix.org | https://zuul.opendev.org/t/zuul/stream/ee2643c5980240c98f80606bc547a9cf?logfile=console.log for example | 21:34 |
@clarkb:matrix.org | when I looked for py311 it appeared the job actually ran a full complement of tests but the stestr process wasn't completing | 21:34 |
@clarkb:matrix.org | and that was about as far as I got before other things distracted me | 21:35 |
@clarkb:matrix.org | but if you look at that job it hasn't done anything for almost 25 minutes | 21:35 |
@jim:acmegating.com | Clark: i'm unfamiliar with the issue you're describing. 440 is to fix a bug that should only apply to job deduplication which should never been seen in opendev due to the lack of circular dep support | 21:35 |
@clarkb:matrix.org | got it | 21:35 |
@clarkb:matrix.org | just talking out loud here: python38 and python310 run on different distro releases. This and the fact that the problem affects multiple python versions implies it isn't a platform problem but more likely to be a problem with our tools or the zuul test suite itself | 21:43 |
@clarkb:matrix.org | I've ssh'd onto the python310 test node and there seems to be a python process stuck on a read for fd 18 trying to determine what fd 18 is now | 21:45 |
@clarkb:matrix.org | Looks to be multiprocessing related | 21:48 |
@clarkb:matrix.org | https://paste.opendev.org/show/bIfaYeDOEgM6Zz8gqEry/ I'm not sure what to make of that yet but the nodes will be deleted soonish so making a record here | 21:52 |
@clarkb:matrix.org | those deprecation warning flags are something we set via tox.ini | 21:53 |
@clarkb:matrix.org | 8023's parent is 8022 which is a shell process that the stestr command seems to invoke. I think 9070 is the process actually running tests and 8023 is a subprocess of stestr coordinating things? | 21:56 |
@clarkb:matrix.org | stestr's last release was over a month ago. | 21:56 |
-@gerrit:opendev.org- Sebastian Gonzalez Pintor proposed wip: [zuul/zuul] 862983: [WIP] Add soft version of fail-fast https://review.opendev.org/c/zuul/zuul/+/862983 | 22:06 | |
-@gerrit:opendev.org- Sebastian Gonzalez Pintor proposed wip: [zuul/zuul] 862983: [WIP] Add soft version of fail-fast https://review.opendev.org/c/zuul/zuul/+/862983 | 22:09 | |
@clarkb:matrix.org | I've attempted to hold the node that paste was generated from | 22:12 |
@jim:acmegating.com | Clark: to be clear, we're looking at stestr processes hanging? not a zuul executor bug or something? | 22:13 |
@clarkb:matrix.org | corvus: It is theoretically possible that it could be a zuul executor issue, but I don't believe it is because the strace on the child process shows it waiting on a read from the parent process. That indicates to me that it is the test framework/tooling itself that is stuck not the code under test | 22:13 |
@clarkb:matrix.org | the zuul tests shouldn't know anything about that interprocess communication between the top level test runner and the per cpu actual test runners | 22:14 |
@jim:acmegating.com | right. mostly just confirming that we're talking about the contents of a job, not an operational zuul problem | 22:14 |
@clarkb:matrix.org | I've pinged mtreinish in #opendev about it (stestr maintainer) and have attmpted to put a hold in for that node. Hopefully we can use that to debug this properly | 22:15 |
@clarkb:matrix.org | corvus: the node is held now if you want to take a look | 22:20 |
@clarkb:matrix.org | oh except | 22:20 |
@clarkb:matrix.org | oh nevermind. I thought maybe the processes had died due to ansible connection going away | 22:20 |
@clarkb:matrix.org | but they are still there | 22:20 |
@clarkb:matrix.org | there is a defunct python process child of 8023. I wonder if it is actually hung up trying to reap that and thus not processing thing to allow 9070 to proceed? | 22:22 |
@jim:acmegating.com | Clark: the kazoo rolledbackerror exceptions are interesting. i don't believe we expect to see those normally; the system is probably extremely overloaded. they may be causing tests failures in such a way that the tests never exit. | 22:27 |
@jim:acmegating.com | we have had a ... complex relationship with unit test timeouts | 22:29 |
@clarkb:matrix.org | hrm we can rollback the concurrency to 6 from 7 to try and mitigate some of that load | 22:36 |
@clarkb:matrix.org | that will make python38 jobs even longer but py310 should stay within reasonable bounds | 22:36 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 862978: Add playbook semaphores https://review.opendev.org/c/zuul/zuul/+/862978 | 23:03 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!