Wednesday, 2025-11-05

opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: Drop job to test unmaintained 2024.1 branch  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96614608:39
chandankumarsean-k-mooney: Hello, from the comment on this review https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955472/19#message-4f4e7b93f43f51783bcd69721fbc379815313be0, many scenario tests are taking around 4 mins. Based on your comment, each test should not take 90 sec, is it true for scenario tests also?10:58
chandankumarI was checking tempest-full scenario tests timing https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_973/openstack/9733ab5e575a40cb904254a54aecc464/job-output.txt, there are many tests taking more than 200 sec.10:58
chandankumarhttps://paste.openstack.org/raw/bkRds7KRdJqhyIA5wtxg/ - logs from tempest-full run10:59
chandankumarNow I am confused what would be the time limit for a scenario tests?11:00
chandankumarI also checked https://docs.openstack.org/tempest/latest/field_guide/scenario.html there is no mention of timing 11:00
sean-k-mooneychandankumar: so anything over about 90 secodn woudl be consider a candiat for the slow tag in tempest. they can go a littel longer  to 150-200 range but if all our tste are in that range we effectivlly cant add any more tests11:10
sean-k-mooneythe reason thsi is a problem is our tests are running serially11:11
sean-k-mooney in a downstream context its even more trobleing11:12
sean-k-mooney21 tests in 3420.3461 sec.11:13
sean-k-mooneywoudl add effectivly an hour to any downstream job we watned to include them in and that is not viable11:13
sean-k-mooneyi realise after i left that coment that yoru new test are not directly the problem11:13
sean-k-mooneybut the watcher test are too slow in general11:14
sean-k-mooneyso before we continue extendiing the tempest suite we need to resolve this11:14
sean-k-mooneywe might be able to merge the existign open tests but we likely cant affort to write new ones until we adress this11:14
sean-k-mooneychandankumar: the standard live migratoin tests are aroudn 80 secods for context https://zuul.opendev.org/t/openstack/build/efa22e7b9068419c8e3fb80157d6c470/log/job-output.txt#2966311:16
sean-k-mooneyso our version taking triple that time is a problem espcially when the normally cold/live migrtion tests can run in parallel11:17
chandankumarsean-k-mooney: I agree, we can fix the existing tests.11:17
sean-k-mooneythis is partly why i brought up the topic of funcitonal testing and  exploring using the nova fake driver isntead11:18
sean-k-mooneybut in reality this is also a probelm of how watcher is sturcured inernally and the scaleablity of watcher11:19
chandankumarLet me see how much we can reduce the timing on existing tests and if it exceeds more than 90 sec then add slow attribute11:27
jgilaberchandankumar, I have this patch https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/964800 that prints how long each function call in the tests took, I rechecked it to get fresh numbers11:28
jgilaberprevious runs showed the tests running much faster11:28
sean-k-mooneyjgilaber: they are still in the 50 minute mark for 20 tests11:34
sean-k-mooneythere seam to be a lot of variance i the jobs11:34
jgilaberyes, they are still slow, but about 3x faster11:34
sean-k-mooneyoh some nodes11:34
jgilaberwhich is annoying because there is no clear cause for that difference11:34
sean-k-mooneywhat are they downstream in our compoent jobs11:35
sean-k-mooneyor on github11:35
sean-k-mooneythe more data souces we have for typical time the better here11:35
chandankumar2025-10-28 14:47:56.268306 | controller | {0} watcher_tempest_plugin.tests.scenario.test_execute_host_maintenance.TestExecuteHostMaintenanceStrategy.test_execute_host_maintenance_strategy [101.170922s] ... ok (from joan's patch) and in my one (2025-11-04 22:25:43.322287 | controller | {0} watcher_tempest_plugin.tests.scenario.test_execute_host_maintenance.TestExecuteHostMaintenanceStrategy.test_execute_host_maintenance_strategy 11:35
chandankumar[175.034769s] ... ok)11:35
chandankumarit's a massive jump11:35
chandankumarthe node seems to be same on both the jos11:37
chandankumar*jobs11:37
sean-k-mooneydid you check the provider11:37
chandankumarboth provider seems to be same https://paste.openstack.org/raw/bE1TeQUZjcBEabU1uH1e/11:39
chandankumaronly difference is the job name11:39
chandankumarIf I check watcher-tempest-strategies SUCCESS - https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f46/openstack/f46a2428a5824342a363958ed0696c80/job-output.txt from my patch 2025-11-04 21:58:34.191700 | controller | {0} watcher_tempest_plugin.tests.scenario.test_execute_host_maintenance.TestExecuteHostMaintenanceStrategy.test_execute_host_maintenance_strategy [93.777421s] ... ok11:40
jgilaberfrom a github job "{0} watcher_tempest_plugin.tests.scenario.test_execute_host_maintenance.TestExecuteHostMaintenanceStrategy.test_execute_host_maintenance_strategy [104.708551s] ... ok"11:41
jgilaberseems to be on the "faster" side as well11:41
chandankumarI am not sure we need to tune something on job side.11:41
chandankumarthe datasource job is taking too much time11:42
chandankumar*devstack datasource11:42
sean-k-mooneyare right so this is runign on the old xen rax cluster11:43
sean-k-mooneythat is know to both be slow and have kernel issues when uin ubuntu 24.0411:43
jgilaberour downstream test are significantly faster: "atcher_tempest_plugin.tests.scenario.test_execute_host_maintenance.TestExecuteHostMaintenanceStrategy.test_execute_host_maintenance_strategy [57.760390s] ... ok"11:43
sean-k-mooneysoemtiem it end up with on 1 cpu in the guest instead of 811:44
sean-k-mooney we can check this in the ansible host info11:44
sean-k-mooneywe look for ansible_processor_vcpus11:45
sean-k-mooneyin zuul-info/host-info.controller.yaml11:45
chandankumar  ansible_processor_vcpus: 811:46
chandankumarboth of the jobs have same vcpus in devstack11:48
sean-k-mooney ok so its not that then11:48
sean-k-mooney025-10-28 14:33:27.629 | stack.sh completed in 1191 seconds. vs 2025-11-04 21:44:26.900 | stack.sh completed in 775 seconds.11:51
sean-k-mooneyso there does seam to be a large performace deltat between the nodes11:51
sean-k-mooneyhttps://pb.teim.app/?aea4fedaa333be72#3hqT6dh6fqKAAdEsfHSjYkTwLAUuhySfAKESJZpa5qrQ11:53
sean-k-mooneythis might just be a case of getting slow nodes form tiem to time11:54
chandankumarin watcher-tempest-strategies we donot enable prometheus devstack plugin https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f46/openstack/f46a2428a5824342a363958ed0696c80/controller/logs/_.localrc_auto.txt11:54
jgilaberthat is consistent with what I saw after timing the calls, creating an instance taking ~15s even in the "faster" case11:54
jgilaberwhat is osc in the devstack timings?11:55
sean-k-mooneythe openstack client11:55
sean-k-mooneyso it swhen we are creating flavor or images ectra11:55
sean-k-mooneyall of the network/project/flavor/iamge creation just uses the openstack client11:56
jgilaberack, so same behaviour we see in our test I think11:56
sean-k-mooneyyep more or less11:56
sean-k-mooneythe fact we do se some faster execution upstream and on rdo11:56
sean-k-mooneymeans this si run to run specific and not a general problem for all executions11:56
sean-k-mooneytaht some what reassureing at least11:57
sean-k-mooneyit implies that his may be more releated to the infra then the impelmetaion11:57
sean-k-mooneylets chat about it more in the irc meeting tomorrow12:00
sean-k-mooneywe can likely proceed with the exiting patches but we need to spend some time diging inot this more12:01
jgilabersounds good!12:04
chandankumarsean-k-mooney: need one more +2 on this https://review.opendev.org/c/openstack/watcher-dashboard/+/962898, thank you!13:12
chandankumardviroel: need +2 on this https://review.opendev.org/c/openstack/watcher-dashboard/+/963778 , please add it to your list, thank you!13:12
jgilaberthe jobs watcher-tempest-functional-2025-1 and watcher-tempest-functional-2024-2 are broken, failing with 13:32
jgilaber'The specified regex doesn't match with anythingERROR: InvocationError for command /opt/stack/tempest/.tox/tempest/bin/tempest run --regex watcher_tempest_plugin.tests.api --concurrency=6 (exited with code 1)'13:32
jgilaberwatcher-tempest-functional-2025-2 is green, I wonder if it could be related to https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/966121 because of the timing13:33
jgilaberalthough I can't see how13:33
jgilaberthe plugin seems to be installed correctly in all jobs13:33
jgilaberhttps://zuul.opendev.org/t/openstack/builds?job_name=watcher-tempest-functional-2025-1&project=openstack/watcher-tempest-plugin13:34
sean-k-mooneyit shoudl not be as they shoudl be useing ubuntu 22.04 or 24.0413:35
sean-k-mooneyso they shoud be using 3.10 or 3.1213:35
sean-k-mooneythey also passed on that change13:35
jgilaberall are using 3.12, it might be just a coincidence13:35
opendevreviewMerged openstack/watcher-dashboard master: Minor fixes from previous code reviews  https://review.opendev.org/c/openstack/watcher-dashboard/+/96289813:42
opendevreviewMerged openstack/watcher master: Add second instance of watcher-decision-engine in the compute node  https://review.opendev.org/c/openstack/watcher/+/96454614:42
opendevreviewMerged openstack/watcher-dashboard master: Improve audit creation model and add text wrapping  https://review.opendev.org/c/openstack/watcher-dashboard/+/96377819:28
opendevreviewDouglas Viroel proposed openstack/watcher master: WIP - Adds support for threading mode in applier  https://review.opendev.org/c/openstack/watcher/+/96622620:46

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!