Friday, 2025-10-24

opendevreviewThomas Goirand proposed openstack/watcher master: Fix term.py with 0.22.2  https://review.opendev.org/c/openstack/watcher/+/96476210:40
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Add option to SKIP Actions  https://review.opendev.org/c/openstack/watcher-dashboard/+/95820912:06
chandankumardviroel: Hello, please add this review to your list https://review.opendev.org/c/openstack/watcher-dashboard/+/958209, thank you!12:06
dviroelchandankumar: ack12:10
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Remove legacy integration test framework  https://review.opendev.org/c/openstack/watcher-dashboard/+/96477513:13
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Add option to SKIP Actions  https://review.opendev.org/c/openstack/watcher-dashboard/+/95820913:15
dviroelsean-k-mooney:  hi o/ -  when you have some time, can you revisit https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/956116 ? I removed some previous added config option to make it more simple13:20
opendevreviewJoan Gilabert proposed openstack/watcher master: Fix zone migration to accept dst_pool or dst_type  https://review.opendev.org/c/openstack/watcher/+/96477613:27
sean-k-mooneyoh that sure ill revew it now14:05
sean-k-mooneydviroel: +2 with comments14:33
sean-k-mooneyread tomorrow as next week since this is a long weekend 14:34
dviroelthanks sean-k-mooney i will take a look14:45
chandankumarsean-k-mooney: dviroel hello, when you get time, please have a look at these two https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/956004 and https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/955472 thank you!15:09
sean-k-mooneyim reviewing the seocond one currently 15:10
opendevreviewMerged openstack/watcher-tempest-plugin master: Add tests for extended compute datamodel  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/95611615:11
opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: Add test for volume migrate with zone migration  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/95864415:11
opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: Test zone migration volume and compute migrations  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96270215:11
opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: Add extra checks to zone migration retype test  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96355915:11
opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: Add test for volume migrate with zone migration  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/95864415:19
opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: Test zone migration volume and compute migrations  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96270215:19
opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: Add extra checks to zone migration retype test  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96355915:19
sean-k-mooneyjgilaber: dviroel we have a problem with our tempest pluging15:41
sean-k-mooney 20 tests in 3194.4666 sec.15:41
sean-k-mooneythat is no where near ok15:41
sean-k-mooneythe way we are buidlign the tests we are waiting for far far to long and reconfiling thigns far to slowly15:41
jgilaberhmm I thought with notifications the waiting would not be too bad15:43
sean-k-mooney20 tests in 50 minutes is extream15:43
sean-k-mooney watcher_tempest_plugin.tests.scenario.test_execute_zone_migration.TestExecuteZoneMigrationStrategyVolume.test_execute_zone_migration_volume_retype [211.483003s]15:43
sean-k-mooneythe zone migraiton tessts are some of the slowest15:43
sean-k-mooneyanything over 90seond shoudl be a cuase for alarm but we are double or triple that in many cases15:44
sean-k-mooneythe problem is that a lto fo the test are pooling and have embded sleeps15:49
sean-k-mooneywe run tests sequetially so that patter cnat scale15:50
sean-k-mooneyif we are polling like that we need to sue much much shorter intervals like 0.5-1.0 sconds not 10-15 15:50
jgilabersean-k-mooney, you mean in snippets like https://github.com/openstack/watcher-tempest-plugin/blob/99388fae9603a71564456757149dbad2d004c0cd/watcher_tempest_plugin/tests/scenario/base.py#L230?15:53
sean-k-mooney yep15:54
sean-k-mooneyno test is ment to take more then 5 minuts total i belvie by default os a single sterp bein allow to wait up to 10 mins is not correct15:54
sean-k-mooneyhttps://github.com/openstack/watcher-tempest-plugin/blob/99388fae9603a71564456757149dbad2d004c0cd/watcher_tempest_plugin/tests/scenario/base.py#L255-L26115:54
sean-k-mooneythere is a reason why these default to .5 seconds15:55
sean-k-mooneyoften we set it to 0.2 or similar15:55
sean-k-mooneyanything about about 30 seconds is consider a slow tst for temepst 30-90 is ok for a senario tests butif we are gettign in to a 200-300 second range that a problem15:56
sean-k-mooneyjgilaber: part of the reason wny we are injecting data is so we can speed up the tests and have them run considtently15:57
sean-k-mooneywe can difcuss this in the testing ptg session next week15:58
sean-k-mooneybut we cant keep addign tempest test that take that long to run and we shoudl try and optimise the eixsing ones15:58
jgilabersounds good, I'm trying to get some timings from the tempest logs15:58
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/a96614ce7797470e949eabf56c15632f/log/job-output.txt#5469415:59
sean-k-mooneyits not logged in teh tempest logs by default also we can get that info form stester with w tweak to the job15:59
sean-k-mooneythe test time is however in the ray job output15:59
jgilaber{0} watcher_tempest_plugin.tests.scenario.test_execute_zone_migration.TestExecuteZoneMigrationStrategy.test_execute_zone_migration_with_destination_host [218.771057s] ... ok16:01
jgilaber  1   {0} watcher_tempest_plugin.tests.scenario.test_execute_zone_migration.TestExecuteZoneMigrationStrategy.test_execute_zone_migration_without_destination_host [234.584126s] ... ok16:01
jgilaber  3   {0} watcher_tempest_plugin.tests.scenario.test_execute_zone_migration.TestExecuteZoneMigrationStrategyVolume.test_execute_zone_migration_volume_retype [211.483003s] ... ok16:01
jgilaberall three tests for zone migration have very similar runtimes16:02
dviroelwe can add some additional debug logging in the wait/sleep parts, to identify where/why 16:04
dviroelwait for instances in model is common in most of the scenario tests16:04
jgilabermost likely culprit for zone migration is wait_for_instances_in_model, since for example the volume retype test creates server/volumes directly from tempest lib methods16:06
jgilabernot the methods from the plugin base16:06
jgilaberand the waits for the audits/action plan16:07
dviroelthe collector run period is 2 min, but we should get model updates from notifications faster16:09
jgilaberI'm creating a quick and dirty patch with timing logs to check which calls take longer16:19
opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: [DNM] Log timing for function calls  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96480016:24
jgilaber^^ should confirm our suspicions16:24
dviroelreducing the sleep time in model check should help, but I would like to see if there is something else yes16:35
jgilaberI'll check back on Monday, I'm leaving for today o/16:35
dviroelsure, thanks jgilaber 16:35
dviroelhave a good weekend16:35
sean-k-mooneydviroel: we are injecting metrics and enbaling notficiaotn ot not depend on that16:52
sean-k-mooneybut ya the collector interval could be part of the problem16:52
sean-k-mooneymy expection is that the creatioin fo the test vm and noticiation of the saem shoudl take a cople of second at most16:53
sean-k-mooneysay 10 seconds form when we do the api request16:54
sean-k-mooneyso im execting to have the vm in our model very shortly after that16:54
dviroelright, unless we have something not working properly with notifications side16:54
sean-k-mooneyyep16:54
dviroelthe only test that needs the collector sync is the one that I just added, to get pinned_az info16:54
sean-k-mooneyso before we add more test to the plugin and start getting job timeouts16:54
sean-k-mooneywe will need to dig into that16:55
dviroelsince pinned_az is not in notifications from nova, which we could also fix in future releases 16:55
dviroelagree16:55
sean-k-mooneydviroel: ack i think this a prexisitng isseu16:55
sean-k-mooneyas in we have other test that are quite long16:55
sean-k-mooneyi looked at nova and its live migration seniaro test are all in the 50-80 second range16:56
sean-k-mooneyi would expect oru zone migration oenes to be in a similar ballpark16:56
dviroelack16:57
sean-k-mooneyit might be slightly longer but not by a lot ideally16:57
dviroelthe model update waiting can be the thing16:57
dviroelit first waits for the instance be in the model16:57
dviroeland in the end it wais for the instance to be deleted from the model16:57
dviroelwaits to times for the model16:58
dviroel+ migrations and other things16:58
sean-k-mooneywell the other issue i think is https://github.com/openstack/watcher/blob/74efcbf9992b0ffee1fcd5bc72b8b4f7963a4166/watcher/common/cinder_helper.py#L15116:59
sean-k-mooneyhttps://github.com/openstack/watcher/blob/74efcbf9992b0ffee1fcd5bc72b8b4f7963a4166/watcher/common/nova_helper.py#L162-L17616:59
sean-k-mooneywe have lots of place in watcher that are injecting hard code sleeps17:00
sean-k-mooneyhttps://github.com/openstack/watcher/blob/74efcbf9992b0ffee1fcd5bc72b8b4f7963a4166/watcher/common/nova_helper.py#L218-L22117:00
sean-k-mooneyevery time.sleep we have in teh apis like that are techinal debt17:00
sean-k-mooneywe shoudl be doing an expontial backoof with an opper bound17:02
dviroelyeah, make sense17:04
dviroeleach part has its own sleep17:04
sean-k-mooneyincluding the applies executro loop 17:04
sean-k-mooneyhttps://github.com/openstack/watcher/blob/74efcbf9992b0ffee1fcd5bc72b8b4f7963a4166/watcher/applier/workflow_engine/base.py#L24717:04
sean-k-mooneywe really need to rewrite this to use treading events or futures or similr.17:05
sean-k-mooneyall these 1 second sleeps are quickly going to add up17:05
sean-k-mooneythis is kind of the other half to the scaleablity quetion that amoralej is looking into17:06
sean-k-mooneywe need to enabel horizontal scalablity but we also need to adress the core of the executor loop and make that more efficent too17:07
dviroelack, there will be a session for applier in general too, which we should cover this part17:09
dviroelthere is also a critical part in the model collector too, with sleeps, that will require threading events17:10
sean-k-mooneyya so we shoudl move away form sleep to evetns or future which supprot timeouts when you wait on them17:11
sean-k-mooneynow if we need to look up external data like pooling a migraiton for complettion17:11
sean-k-mooneythen sure we need to still poll17:11
sean-k-mooneybut we shoudl not do that on a fixed interval17:12
dviroelyeah17:33

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!