Monday, 2026-06-22

opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Add Playwright-based E2E testing framework  https://review.opendev.org/c/openstack/watcher-dashboard/+/97035305:00
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Add Playwright-based E2E testing framework  https://review.opendev.org/c/openstack/watcher-dashboard/+/97035305:01
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Add Playwright-based E2E testing framework  https://review.opendev.org/c/openstack/watcher-dashboard/+/97035305:20
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Add Playwright-based E2E testing framework  https://review.opendev.org/c/openstack/watcher-dashboard/+/97035306:42
sean-k-mooneyamoralej: dviroel so i did a littel digging over the weekend and i bleive i know why the in line comment were not working08:25
sean-k-mooneythey actually are in most runs08:25
amoralejand can be fixed?08:25
sean-k-mooneyya i think i have fixed it08:25
sean-k-mooneyso basiclly gerrit allows you to ocmmment on files that are not modifed in the reviiew but github does not08:25
sean-k-mooneyso zuul has an internal check for that and drops the comments08:26
sean-k-mooneywhen using the old robot comment api08:26
sean-k-mooneyit only dropped it for the relevent file08:26
sean-k-mooneybut with the new api it drops all the inline comments08:26
sean-k-mooneyso i tweaked the reivew bot with a filtering layer08:26
sean-k-mooneyto drop any comment not on the changed files on my side08:27
sean-k-mooneyi also changed the prompt08:27
sean-k-mooneyso instad of comenting on file x sayign this fonction is broken by a change in file y it will not fommnet on file y to say it broke file x08:27
sean-k-mooneyso it shoudl instead of point ing out what broke it will point at the thing the broke it08:28
sean-k-mooneyso the sam einfo shoudl be reproted just in a diffent way08:28
amoralejmakes sense, seems to be working in https://review.opendev.org/c/openstack/watcher/+/99417808:28
sean-k-mooneywe can see how it goes and if i need to tweak it more or not but if you see it not reportign inlien just let me know an i can take  a look at that speciric example and see what went wrong08:29
amoralejsure, i will, thanks!08:29
sean-k-mooneythere is one down side or maybe upside i dont know 08:31
sean-k-mooneywith the new standard comments api08:31
sean-k-mooneythe comment are reported as resolved by default08:32
sean-k-mooneyand i dont think i can change that  without modifying zuul08:32
amoraleji think we can live with that08:32
sean-k-mooneyack, we just need to get in the habbit of clickign into them and confirming if they are actionable08:33
sean-k-mooneyif they are not then we just just leave them resolved if they are we can just comment saying htat.08:33
opendevreviewJoan Gilabert proposed openstack/watcher-tempest-plugin master: Remove SSH validation from created servers  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/99424608:54
opendevreviewJoan Gilabert proposed openstack/watcher master: [DNM] Test tempest changes  https://review.opendev.org/c/openstack/watcher/+/99424708:56
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Add Playwright-based E2E testing framework  https://review.opendev.org/c/openstack/watcher-dashboard/+/97035309:11
sean-k-mooneyamoralej: looks like z.ai updated the models in the plan im using so i need to upstae them in ci. which is good in that the review will be doen with glm 5.2 now but it also means the current jobs i kicked off are failing since the model i was trying to use is not availabel any more09:29
amoralejwe'll see if new ones are better09:30
sean-k-mooneyim not sure how big of a jump 5.1 to 5.2 will be but ya09:30
sean-k-mooneyi need to bump teh quick model form glm 4.7 flash to 4.709:31
sean-k-mooneynot that it really need it but i think that is the main issue i coudl be wrong09:31
sean-k-mooneyflash is there free teir so it might still be included and failing on that glm 5.1 request09:32
dviroelo/11:08
*** haleyb|out is now known as haleyb13:28
opendevreviewchandan kumar proposed openstack/watcher-dashboard master: Add Playwright integration test for skip action workflow  https://review.opendev.org/c/openstack/watcher-dashboard/+/97659413:46
amoralejHi, I asked AI about patterns of jobs timing out, these were the main findings https://etherpad.opendev.org/p/watcher-timeouts14:31
amoralejso apparently it's related to certain infra provider14:32
amoralejwhere jobs are much slower14:32
amoralejin all phases of the job14:32
amoralejresults seems consistent, that may justify increasing job timeouts14:32
jgilaberthat makes sense to me. I also did some digging in some job runs and found that time goes mostly creating resources (servers, floating ips, volumes, etc)14:38
jgilaberhttps://etherpad.opendev.org/p/watcher_tempest_slowness14:38
sean-k-mooneyamoralej: which ones14:38
dviroelinteresting results..14:38
jgilaberI found a potential small optimization for the zone migration test https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/994246 but nothing huge14:39
sean-k-mooneyso raxflex14:39
dviroeland ovh14:39
sean-k-mooneyis the newest provieder and signifcanly faster then the rest14:39
sean-k-mooneyalmost all our capstict is on rax legacy and ovh14:39
sean-k-mooneyso we dont concider ovh geranl as a slow provider14:39
sean-k-mooneyrax legacy is14:39
sean-k-mooneyovh typicly is faster then rax legacy and vexhost is typicly faster then ovh but not as fast of raxflex14:41
sean-k-mooneylookign at those number rax/ovh stackin in 28 mins is not an indicator fo a low node14:42
sean-k-mooney*slow node14:42
sean-k-mooneywe woudl only condier it a low node if that conoler was over about 45 mins14:42
sean-k-mooneyrax flex is compariable ot baremental for what its worth but this does not point to a provider problem14:43
sean-k-mooneythe slower provider are more likely to time out btu the job and test are still too slow in general14:44
sean-k-mooneyif we look at jgilaber link at test like | SkippedActionsInstances | **4m 47s** | 124 |14:45
sean-k-mooneythat well over the 90-120s maxium we would expect for a test letallow the class cleanup14:46
sean-k-mooneyso we shoudl try and underand why test class cleanup is takeing so long14:46
amoralejremember we are forced to run tempest tests sequentially14:48
dviroeljgilaber: is this cleanup the sum of indiviual tests cleanups? Maybe the numbers are high based on the amount of tests in each class14:49
sean-k-mooneyim pretty sure its becasue we are waitign for the instance to be removed form the model14:49
sean-k-mooneyhttps://opendev.org/openstack/watcher-tempest-plugin/src/branch/master/watcher_tempest_plugin/tests/scenario/test_execute_skipped_actions.py#L15114:49
sean-k-mooneydviroel: so i think those number are teh total time for all the cleanup form all tests in the class14:49
jgilaberdviroel, yes, claude added all the individula cleanup time, I would not put too much stock on that14:50
sean-k-mooneybut we are waiting for the modles ot be updated with the instnace delete14:50
sean-k-mooneywhich si addign latency14:50
sean-k-mooney...14:51
sean-k-mooneyhttps://opendev.org/openstack/watcher-tempest-plugin/src/branch/master/watcher_tempest_plugin/tests/scenario/base.py#L117914:51
sean-k-mooneywe are doign a 15 second sleep14:51
sean-k-mooneythat the problem14:51
dviroelack, we can reduce this 15s14:51
dviroelit is too big 14:51
sean-k-mooneylset it to 0.5-1s14:51
sean-k-mooneyor ideally make it a config option14:52
sean-k-mooneyand also use it here https://opendev.org/openstack/watcher-tempest-plugin/src/branch/master/watcher_tempest_plugin/tests/scenario/base.py#L115514:52
sean-k-mooneyso we shoudl have a model_poll_interval config option14:52
sean-k-mooneyand use that in all the places we check14:52
dviroelyeah, a single config option to be used in these methods14:53
sean-k-mooneywe also have a bunch of places where we do 2 second paues https://opendev.org/openstack/watcher-tempest-plugin/src/branch/master/watcher_tempest_plugin/tests/scenario/base.py#L97814:53
sean-k-mooneythose might need to ahave a diffent option14:53
amoralejthere are 41 scenario tests14:54
sean-k-mooneyyep which is pretty small14:54
amoralejreducing 10 secs per tests would be ~ 7 minutes14:54
sean-k-mooneybut but if we sleep once due to tha tthat 10 minutes14:54
sean-k-mooneyya14:54
amoralejyes, but if it takes 10 minutes to remove the created stuff, we need to wait14:55
sean-k-mooneyso we are waiting on additon and removal 14:55
sean-k-mooneywe do but it shoudl not14:56
sean-k-mooneydeleting a vm or volume should only take 5-10 seonds14:56
sean-k-mooneyas you noted we are runing serially so the devstack cluster is effectivly idle14:58
sean-k-mooneythere are some existing tempest config option we can use or use as a refence https://github.com/openstack/tempest/blob/master/tempest/config.py#L329-L33114:59
sean-k-mooneystandard tempest polls vm/volume/image creation on a 1 secodn interval15:00
dviroelack, is someone already working on this? we can propose a patch and check CI results15:01
dviroelready to trigger claude here if nobody started15:01
sean-k-mooneynot right now but if someone does nto push up a patch ill have an agent do it in a bit15:01
amoralejI'm trying to dig more into one specific test in a slow job to find out where the time is spent15:03
sean-k-mooneydviroel: i woudl add 2 config options one for polling the model and one for polling everything else adn defautl both too 1 second intervals for now. ideally we woudl make the timeout configurable as well so interval and timeout for both api request and model15:04
amoralejbut reducing polling time seems reasonable15:04
dviroelsean-k-mooney: ack, makes sense15:04
sean-k-mooneywe are mostly relying on notifictions but we have tuned the collecotr perod adn ceilomenter in out job config 15:06
sean-k-mooneyhttps://github.com/openstack/watcher/blob/master/.zuul.yaml#L98-L12515:06
sean-k-mooneywe might actully want to conider removing the collector overried or make it longer now15:07
sean-k-mooneyto avoid the lock conteion with the notifications15:07
sean-k-mooneythe upstram default is to run the collectors once an hour15:09
sean-k-mooneywe are running them every 2 minuts15:09
sean-k-mooneyso we are going to have a lot of lock contetion on the data model15:09
amoraleji think we have notifications in most or all jobs ?15:09
sean-k-mooneyi think in all15:10
sean-k-mooneywe orgianlly made that short before we hooked them up15:10
sean-k-mooneyso maybe we can just remove those overrieds as well15:10
sean-k-mooneythey were imporant when we were loosing notificaiotn due to the rebuild15:10
sean-k-mooneybut that should not happen now15:11
dviroelyeah, that's true15:13
sean-k-mooneyhttps://bugs.launchpad.net/watcher/+bug/213885715:13
sean-k-mooneywould have to be backported to stable for us to rely on that15:13
sean-k-mooneybut we plan to do that anyway15:14
sean-k-mooneydviroel: if you working on it let do it in 2 commits, one for removign the collector overried and another for the polling interval15:15
amoralejhttps://etherpad.opendev.org/p/watcher-timeouts#L4015:15
sean-k-mooneybetween those two i think we might see a non trivial cleanup15:15
amoralejthat's analysis for one specific test15:15
sean-k-mooneytest_execute_workload_stabilization_strategy_ram_bfv 15:17
dviroelsean-k-mooney: i am working in the intervals + timeout config15:17
amoralejyeah, i took one of the longest15:17
sean-k-mooneydviroel: ok ill jsut create a quick patch to remvoe the job config for the collectors15:18
sean-k-mooneyill push that indepently and we can see what ci says15:18
dviroelack15:19
opendevreviewsean mooney proposed openstack/watcher master: remove collector job config  https://review.opendev.org/c/openstack/watcher/+/99431415:22
sean-k-mooneyamoralej: 75 seconds is a lot for a live migratiron in this type of test15:23
sean-k-mooneyespiclly for bfv as there is no disk to copy15:23
sean-k-mooneybut we could try speading that up perhasp by enabling multi threaded live migrtion 15:24
sean-k-mooneyit really just depedn on why it take 75 seconds15:24
sean-k-mooneyi think that is 75 second for the miction to be visable in the model15:24
dviroelwould nova logs help on more details in this case, amoralej? 15:25
sean-k-mooneyif you have the request id you shoudl be able to track the migration in nova with that15:25
dviroelyeah, maybe is part of the time to update the model15:25
sean-k-mooneyi think this is also related to the hardcoded sleep we have in teh applier15:26
amoralejin a passing job, actionplan execution took 50 secs15:26
amoralej5515:27
sean-k-mooneyhttps://github.com/openstack/watcher/blob/master/watcher/conf/nova.py#L4315:29
sean-k-mooneyso we are checkign the migrtion state every 5 seconds15:29
sean-k-mooneyill drop that down to 0.515:30
sean-k-mooneyin one test that wont make much fo a difent but it will help overall15:30
sean-k-mooneyif it saved 3 second on every test that woudl be 2 minutes on the job15:31
sean-k-mooneyoh nevermind its already 1 in ci15:32
sean-k-mooneythats fine15:32
amoralejyes, it's 1sec15:36
sean-k-mooneyim not really seeing any other relvent config option to tune15:37
sean-k-mooneyhttps://github.com/openstack/watcher/blob/d0f2173ce9a9f08818ee75a426a0b129203c343d/watcher/common/nova_helper.py#L749 https://github.com/openstack/watcher/blob/d0f2173ce9a9f08818ee75a426a0b129203c343d/watcher/applier/actions/change_node_power_state.py#L11815:38
sean-k-mooneywe have a couple of hardcoded time.sleep() calls15:39
sean-k-mooneythat we shoudl make configbale eventualy15:39
sean-k-mooneybut that not the issu here15:39
amoralejchecking at those tests, the one timing out, 77secs for the action plan, it takes ~ 30 sec to migrate each vm from nova15:44
amoralejmay we tune nova to allow concurrent migrations?15:44
opendevreviewDouglas Viroel proposed openstack/watcher-tempest-plugin master: Make test polling intervals and timeouts configurable  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/99432315:45
sean-k-mooneywe can15:45
sean-k-mooneywe can also enabel mutliple thread15:45
sean-k-mooneyfor the migration or both15:45
sean-k-mooneyi can do that quickly15:45
amoralejnova takes several seconds to reply to the migrate POST call, btw, 6-7 secs, just in case it indicates something15:46
sean-k-mooneywe have a bunch of host/volume cleanup to do on the souce node after the libvirt migrtion completes15:46
sean-k-mooneyso im not nessirlly concerd by that 15:46
sean-k-mooneyif it was double that sure15:46
sean-k-mooneywe have to call neutron and cinder in this case + do cleanup on ovs and the host for the volume15:47
sean-k-mooneyso anythign under about 10-15 seconds is fine in ci15:47
sean-k-mooneyill tuene nova to do faster live migration as a follow up patch so that i dont kick the current job out of ci15:48
amoralejhttps://etherpad.opendev.org/p/watcher-timeouts#L14315:48
amoralejthat's the execution of the actionplan itself, 75secs15:48
amoralejfor a passing job, it's similar sequnce, just faster on every step to do it in 55secs15:49
amoralejgiven that we are launching the migratinons in prarllel from watcher, parallelizing in nova may be the greater win, if the compute nodes can handle it15:50
sean-k-mooneyok ya so ill allow say 3 migration in parallel and 2 thread per migration and enable post copy migration15:50
opendevreviewsean mooney proposed openstack/watcher master: [ci] make live migration faster  https://review.opendev.org/c/openstack/watcher/+/99432716:01
sean-k-mooneyamoralej: ^ lets see how that goes i chould make it more agressive but we run the risk of having memory issues16:02
amoralejsure, thanks, let's see how that goes16:03
opendevreviewsean mooney proposed openstack/watcher master: [ci] make live migration faster  https://review.opendev.org/c/openstack/watcher/+/99432717:21
sean-k-mooneyamoralej: hum so interesting watcher_tempest_plugin.tests.scenario.test_data_model.TestDataModelWithExtendedAttributes.test_data_model_with_extended_attributes17:32
sean-k-mooneyonly works based on the collector17:32
sean-k-mooneywe are not enrichging the instance with extended atributes form the notificaitons17:32
dviroelsean-k-mooney: hum, which one? maybe pinnez az is not available in notifications?17:46
sean-k-mooneyhttps://12b5c9445590716cbe81-2eb50734132c0e56282483bcdf57bf8a.ssl.cf2.rackcdn.com/openstack/8a6bf633ba574b2582331ec89542f578/testr_results.html17:48
sean-k-mooneyTraceback (most recent call last):17:48
sean-k-mooney  File "/opt/stack/tempest/.tox/tempest/lib/python3.12/site-packages/watcher_tempest_plugin/tests/scenario/test_data_model.py", line 155, in test_data_model_with_extended_attributes17:48
sean-k-mooney    self.wait_for_instances_attributes_in_model(17:48
sean-k-mooney  File "/opt/stack/tempest/.tox/tempest/lib/python3.12/site-packages/watcher_tempest_plugin/tests/scenario/base.py", line 1157, in wait_for_instances_attributes_in_model17:48
sean-k-mooney    raise Exception("Attributes were not updated in the model.")17:48
sean-k-mooneyException: Attributes were not updated in the model.17:48
sean-k-mooneyit does not say17:48
dviroelI think that pinned az is not available in instance notifications17:49
sean-k-mooneythat is proably correct. 17:49
sean-k-mooneywe likely need to call nova to backfil it when not found o rmake the test a little less strict17:49
sean-k-mooneywe coudl also update nova to add it groing forward17:50
dviroelin notification processing?17:50
sean-k-mooneybut we would not backport that17:50
sean-k-mooneyyes17:50
sean-k-mooneyin the notrifcaiotn processing17:50
sean-k-mooneywe woudl have to call the api if the extra notifiction are enabled17:50
sean-k-mooney*extra atributes17:50
dviroeli was thinking on adding scheduler_hints to nova notification, it is another one that does not exist, but not yet added to watcher too17:50
dviroelright17:51
sean-k-mooneysince we are not using either yet17:51
sean-k-mooneywe coudl just add both on master17:51
sean-k-mooneyin nova17:51
dviroelthe workaround would be to call nova api to get this info17:52
sean-k-mooneyyes17:52
dviroelat the same time, we are not using it in the code17:52
sean-k-mooneyonly when we configure it17:52
sean-k-mooneyright so for now i think its oke ot just wait until we need it17:52
sean-k-mooneyand ideally update nova to add the info before then17:53
dviroelyes17:53
sean-k-mooneyfor now ill flip the order and put the live migration tuning ahead fo the collector removal17:54
dviroelsome results from configurable pool intervals, now set to 1s by default17:54
dviroelhttps://review.opendev.org/c/openstack/watcher-tempest-plugin/+/994323/1#message-c8007e41b3a3f76153bd70f1bf53364a047b1a6e17:54
sean-k-mooneyi think that is faster 17:56
sean-k-mooney1 hour 40 ish on rax17:56
sean-k-mooney126 for the stabel job17:57
sean-k-mooneydviroel: we will obviouly need to see how it runs over time but that looks promising17:57
dviroelyeah, it seems that improved a bit17:58
* dviroel amoralej: for you to take a look tomorrow: https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/994323/1#message-c8007e41b3a3f76153bd70f1bf53364a047b1a6e17:59
sean-k-mooneywe could set resource_check_interval to 0.5 in ci17:59
sean-k-mooneymany of the check were alreday doing tha t17:59
sean-k-mooneybut i think we could merge this as is17:59
dviroelyeah, there were lots of 0,5, 1, 2 and 5 around :) 18:13
sean-k-mooneyi think 1 second is a good polling interval to start with and we can tune more after we get some more results18:17
sean-k-mooneylol  Read SKILL.md (github:yeet skill)18:25
sean-k-mooneythat codex's defualt github skill18:28
opendevreviewDouglas Viroel proposed openstack/watcher-tempest-plugin master: Make test polling intervals and timeouts configurable  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/99432319:08
opendevreviewIvan Anfimov proposed openstack/watcher-dashboard master: Add directory for locale  https://review.opendev.org/c/openstack/watcher-dashboard/+/99437723:40
opendevreviewIvan Anfimov proposed openstack/watcher-dashboard master: Add directory for locale  https://review.opendev.org/c/openstack/watcher-dashboard/+/99437723:40

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!