dansmith | gmann: I've been seeing a bunch of these sorts of timeouts lately: https://zuul.opendev.org/t/openstack/build/d638073a5bb7457db0d5498065810086/log/job-output.txt#20890 | 13:56 |
---|---|---|
dansmith | usually in this second batch we run after the main set | 13:56 |
dansmith | I'm wondering if the sshable-ness has just slowed us down enough that we're legit running out of time? tests don't seem wedged, they're progressing, we just timeotu | 13:57 |
dansmith | we also really suffer from a poor distribution of the tests among workers in the parallel phase.. where we end up handing a bunch of very slow tests to just one worker, which ends up increasing our wallclock time | 13:59 |
*** ralonsoh is now known as ralonsoh_afk | 16:56 | |
gmann | dansmith: I am not sure if ssh things slow down the tests. | 18:13 |
gmann | on worker per tests, I think if any test is slow or became slow we can mark that slow which is a separation we do to not slow things in normal integration tests. that slow test run in tempest-slow job | 18:14 |
gmann | but yes I agree if parallel run can be more optimized it will be good but I have observed we do not have exact data on what test (not marked as slow) is consistently slow | 18:14 |
dansmith | gmann: I just mean there are some classes that are slow in general, and since they get scheduled on one worker, we spend some time being very linear towards the end | 18:15 |
dansmith | if you look at the worker numbers for the last bunch of tests that run, they're all the same | 18:15 |
dansmith | https://zuul.opendev.org/t/openstack/build/d638073a5bb7457db0d5498065810086/log/job-output.txt#20654-20862 | 18:15 |
dansmith | all one worker | 18:15 |
gmann | if we see here, 1 and 3 got some slow running tests https://5ba55f6ea55b4a9ca392-2c6cda48ae2944413654c9e504ee9baf.ssl.cf1.rackcdn.com/879500/11/check/tempest-integrated-compute/d638073/controller/logs/stackviz/index.html#/stdin/timeline | 18:17 |
dansmith | ah yeah, cool (/me didn't know about this) | 18:18 |
gmann | we can do grouping in tests to run but I am afraid that can cause more timeout if they all stuck as slow run test on single worker due to some reason | 18:19 |
gmann | or a single test stuck then all other waiting | 18:19 |
gmann | kopecmartin: this is ready https://review.opendev.org/c/openstack/tempest/+/884952 | 21:40 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!