| opendevreview | Joan Gilabert proposed openstack/watcher-tempest-plugin master: Drop job to test unmaintained 2024.1 branch https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/966146 | 08:39 |
|---|---|---|
| chandankumar | sean-k-mooney: Hello, from the comment on this review https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955472/19#message-4f4e7b93f43f51783bcd69721fbc379815313be0, many scenario tests are taking around 4 mins. Based on your comment, each test should not take 90 sec, is it true for scenario tests also? | 10:58 |
| chandankumar | I was checking tempest-full scenario tests timing https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_973/openstack/9733ab5e575a40cb904254a54aecc464/job-output.txt, there are many tests taking more than 200 sec. | 10:58 |
| chandankumar | https://paste.openstack.org/raw/bkRds7KRdJqhyIA5wtxg/ - logs from tempest-full run | 10:59 |
| chandankumar | Now I am confused what would be the time limit for a scenario tests? | 11:00 |
| chandankumar | I also checked https://docs.openstack.org/tempest/latest/field_guide/scenario.html there is no mention of timing | 11:00 |
| sean-k-mooney | chandankumar: so anything over about 90 secodn woudl be consider a candiat for the slow tag in tempest. they can go a littel longer to 150-200 range but if all our tste are in that range we effectivlly cant add any more tests | 11:10 |
| sean-k-mooney | the reason thsi is a problem is our tests are running serially | 11:11 |
| sean-k-mooney | in a downstream context its even more trobleing | 11:12 |
| sean-k-mooney | 21 tests in 3420.3461 sec. | 11:13 |
| sean-k-mooney | woudl add effectivly an hour to any downstream job we watned to include them in and that is not viable | 11:13 |
| sean-k-mooney | i realise after i left that coment that yoru new test are not directly the problem | 11:13 |
| sean-k-mooney | but the watcher test are too slow in general | 11:14 |
| sean-k-mooney | so before we continue extendiing the tempest suite we need to resolve this | 11:14 |
| sean-k-mooney | we might be able to merge the existign open tests but we likely cant affort to write new ones until we adress this | 11:14 |
| sean-k-mooney | chandankumar: the standard live migratoin tests are aroudn 80 secods for context https://zuul.opendev.org/t/openstack/build/efa22e7b9068419c8e3fb80157d6c470/log/job-output.txt#29663 | 11:16 |
| sean-k-mooney | so our version taking triple that time is a problem espcially when the normally cold/live migrtion tests can run in parallel | 11:17 |
| chandankumar | sean-k-mooney: I agree, we can fix the existing tests. | 11:17 |
| sean-k-mooney | this is partly why i brought up the topic of funcitonal testing and exploring using the nova fake driver isntead | 11:18 |
| sean-k-mooney | but in reality this is also a probelm of how watcher is sturcured inernally and the scaleablity of watcher | 11:19 |
| chandankumar | Let me see how much we can reduce the timing on existing tests and if it exceeds more than 90 sec then add slow attribute | 11:27 |
| jgilaber | chandankumar, I have this patch https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/964800 that prints how long each function call in the tests took, I rechecked it to get fresh numbers | 11:28 |
| jgilaber | previous runs showed the tests running much faster | 11:28 |
| sean-k-mooney | jgilaber: they are still in the 50 minute mark for 20 tests | 11:34 |
| sean-k-mooney | there seam to be a lot of variance i the jobs | 11:34 |
| jgilaber | yes, they are still slow, but about 3x faster | 11:34 |
| sean-k-mooney | oh some nodes | 11:34 |
| jgilaber | which is annoying because there is no clear cause for that difference | 11:34 |
| sean-k-mooney | what are they downstream in our compoent jobs | 11:35 |
| sean-k-mooney | or on github | 11:35 |
| sean-k-mooney | the more data souces we have for typical time the better here | 11:35 |
| chandankumar | 2025-10-28 14:47:56.268306 | controller | {0} watcher_tempest_plugin.tests.scenario.test_execute_host_maintenance.TestExecuteHostMaintenanceStrategy.test_execute_host_maintenance_strategy [101.170922s] ... ok (from joan's patch) and in my one (2025-11-04 22:25:43.322287 | controller | {0} watcher_tempest_plugin.tests.scenario.test_execute_host_maintenance.TestExecuteHostMaintenanceStrategy.test_execute_host_maintenance_strategy | 11:35 |
| chandankumar | [175.034769s] ... ok) | 11:35 |
| chandankumar | it's a massive jump | 11:35 |
| chandankumar | the node seems to be same on both the jos | 11:37 |
| chandankumar | *jobs | 11:37 |
| sean-k-mooney | did you check the provider | 11:37 |
| chandankumar | both provider seems to be same https://paste.openstack.org/raw/bE1TeQUZjcBEabU1uH1e/ | 11:39 |
| chandankumar | only difference is the job name | 11:39 |
| chandankumar | If I check watcher-tempest-strategies SUCCESS - https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f46/openstack/f46a2428a5824342a363958ed0696c80/job-output.txt from my patch 2025-11-04 21:58:34.191700 | controller | {0} watcher_tempest_plugin.tests.scenario.test_execute_host_maintenance.TestExecuteHostMaintenanceStrategy.test_execute_host_maintenance_strategy [93.777421s] ... ok | 11:40 |
| jgilaber | from a github job "{0} watcher_tempest_plugin.tests.scenario.test_execute_host_maintenance.TestExecuteHostMaintenanceStrategy.test_execute_host_maintenance_strategy [104.708551s] ... ok" | 11:41 |
| jgilaber | seems to be on the "faster" side as well | 11:41 |
| chandankumar | I am not sure we need to tune something on job side. | 11:41 |
| chandankumar | the datasource job is taking too much time | 11:42 |
| chandankumar | *devstack datasource | 11:42 |
| sean-k-mooney | are right so this is runign on the old xen rax cluster | 11:43 |
| sean-k-mooney | that is know to both be slow and have kernel issues when uin ubuntu 24.04 | 11:43 |
| jgilaber | our downstream test are significantly faster: "atcher_tempest_plugin.tests.scenario.test_execute_host_maintenance.TestExecuteHostMaintenanceStrategy.test_execute_host_maintenance_strategy [57.760390s] ... ok" | 11:43 |
| sean-k-mooney | soemtiem it end up with on 1 cpu in the guest instead of 8 | 11:44 |
| sean-k-mooney | we can check this in the ansible host info | 11:44 |
| sean-k-mooney | we look for ansible_processor_vcpus | 11:45 |
| sean-k-mooney | in zuul-info/host-info.controller.yaml | 11:45 |
| chandankumar | ansible_processor_vcpus: 8 | 11:46 |
| chandankumar | both of the jobs have same vcpus in devstack | 11:48 |
| sean-k-mooney | ok so its not that then | 11:48 |
| sean-k-mooney | 025-10-28 14:33:27.629 | stack.sh completed in 1191 seconds. vs 2025-11-04 21:44:26.900 | stack.sh completed in 775 seconds. | 11:51 |
| sean-k-mooney | so there does seam to be a large performace deltat between the nodes | 11:51 |
| sean-k-mooney | https://pb.teim.app/?aea4fedaa333be72#3hqT6dh6fqKAAdEsfHSjYkTwLAUuhySfAKESJZpa5qrQ | 11:53 |
| sean-k-mooney | this might just be a case of getting slow nodes form tiem to time | 11:54 |
| chandankumar | in watcher-tempest-strategies we donot enable prometheus devstack plugin https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f46/openstack/f46a2428a5824342a363958ed0696c80/controller/logs/_.localrc_auto.txt | 11:54 |
| jgilaber | that is consistent with what I saw after timing the calls, creating an instance taking ~15s even in the "faster" case | 11:54 |
| jgilaber | what is osc in the devstack timings? | 11:55 |
| sean-k-mooney | the openstack client | 11:55 |
| sean-k-mooney | so it swhen we are creating flavor or images ectra | 11:55 |
| sean-k-mooney | all of the network/project/flavor/iamge creation just uses the openstack client | 11:56 |
| jgilaber | ack, so same behaviour we see in our test I think | 11:56 |
| sean-k-mooney | yep more or less | 11:56 |
| sean-k-mooney | the fact we do se some faster execution upstream and on rdo | 11:56 |
| sean-k-mooney | means this si run to run specific and not a general problem for all executions | 11:56 |
| sean-k-mooney | taht some what reassureing at least | 11:57 |
| sean-k-mooney | it implies that his may be more releated to the infra then the impelmetaion | 11:57 |
| sean-k-mooney | lets chat about it more in the irc meeting tomorrow | 12:00 |
| sean-k-mooney | we can likely proceed with the exiting patches but we need to spend some time diging inot this more | 12:01 |
| jgilaber | sounds good! | 12:04 |
| chandankumar | sean-k-mooney: need one more +2 on this https://review.opendev.org/c/openstack/watcher-dashboard/+/962898, thank you! | 13:12 |
| chandankumar | dviroel: need +2 on this https://review.opendev.org/c/openstack/watcher-dashboard/+/963778 , please add it to your list, thank you! | 13:12 |
| jgilaber | the jobs watcher-tempest-functional-2025-1 and watcher-tempest-functional-2024-2 are broken, failing with | 13:32 |
| jgilaber | 'The specified regex doesn't match with anythingERROR: InvocationError for command /opt/stack/tempest/.tox/tempest/bin/tempest run --regex watcher_tempest_plugin.tests.api --concurrency=6 (exited with code 1)' | 13:32 |
| jgilaber | watcher-tempest-functional-2025-2 is green, I wonder if it could be related to https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/966121 because of the timing | 13:33 |
| jgilaber | although I can't see how | 13:33 |
| jgilaber | the plugin seems to be installed correctly in all jobs | 13:33 |
| jgilaber | https://zuul.opendev.org/t/openstack/builds?job_name=watcher-tempest-functional-2025-1&project=openstack/watcher-tempest-plugin | 13:34 |
| sean-k-mooney | it shoudl not be as they shoudl be useing ubuntu 22.04 or 24.04 | 13:35 |
| sean-k-mooney | so they shoud be using 3.10 or 3.12 | 13:35 |
| sean-k-mooney | they also passed on that change | 13:35 |
| jgilaber | all are using 3.12, it might be just a coincidence | 13:35 |
| opendevreview | Merged openstack/watcher-dashboard master: Minor fixes from previous code reviews https://review.opendev.org/c/openstack/watcher-dashboard/+/962898 | 13:42 |
| opendevreview | Merged openstack/watcher master: Add second instance of watcher-decision-engine in the compute node https://review.opendev.org/c/openstack/watcher/+/964546 | 14:42 |
| opendevreview | Merged openstack/watcher-dashboard master: Improve audit creation model and add text wrapping https://review.opendev.org/c/openstack/watcher-dashboard/+/963778 | 19:28 |
| opendevreview | Douglas Viroel proposed openstack/watcher master: WIP - Adds support for threading mode in applier https://review.opendev.org/c/openstack/watcher/+/966226 | 20:46 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!