Friday, 2023-06-02

dansmithgmann: I've been seeing a bunch of these sorts of timeouts lately: https://zuul.opendev.org/t/openstack/build/d638073a5bb7457db0d5498065810086/log/job-output.txt#2089013:56
dansmithusually in this second batch we run after the main set13:56
dansmithI'm wondering if the sshable-ness has just slowed us down enough that we're legit running out of time? tests don't seem wedged, they're progressing, we just timeotu13:57
dansmithwe also really suffer from a poor distribution of the tests among workers in the parallel phase.. where we end up handing a bunch of very slow tests to just one worker, which ends up increasing our wallclock time13:59
*** ralonsoh is now known as ralonsoh_afk16:56
gmanndansmith: I am not sure if ssh things slow down the tests. 18:13
gmannon worker per tests, I think if any test is slow or became slow we can mark that slow which is a separation we do to not slow things in normal integration tests. that slow test run in tempest-slow job18:14
gmannbut yes I agree if parallel run can be more optimized it will be good but I have observed we do not have exact data on what test (not marked as slow) is consistently slow18:14
dansmithgmann: I just mean there are some classes that are slow in general, and since they get scheduled on one worker, we spend some time being very linear towards the end18:15
dansmithif you look at the worker numbers for the last bunch of tests that run, they're all the same18:15
dansmithhttps://zuul.opendev.org/t/openstack/build/d638073a5bb7457db0d5498065810086/log/job-output.txt#20654-2086218:15
dansmithall one worker18:15
gmannif we see here, 1 and 3 got some slow running tests https://5ba55f6ea55b4a9ca392-2c6cda48ae2944413654c9e504ee9baf.ssl.cf1.rackcdn.com/879500/11/check/tempest-integrated-compute/d638073/controller/logs/stackviz/index.html#/stdin/timeline18:17
dansmithah yeah, cool (/me didn't know about this)18:18
gmannwe can do grouping in tests to run but I am afraid that can cause more timeout if they all stuck as slow run test on single worker due to some reason18:19
gmannor a single test stuck then all other waiting18:19
gmannkopecmartin: this is ready https://review.opendev.org/c/openstack/tempest/+/88495221:40

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!