| opendevreview | chandan kumar proposed openstack/watcher-dashboard master: Add Playwright-based E2E testing framework https://review.opendev.org/c/openstack/watcher-dashboard/+/970353 | 05:00 |
|---|---|---|
| opendevreview | chandan kumar proposed openstack/watcher-dashboard master: Add Playwright-based E2E testing framework https://review.opendev.org/c/openstack/watcher-dashboard/+/970353 | 05:01 |
| opendevreview | chandan kumar proposed openstack/watcher-dashboard master: Add Playwright-based E2E testing framework https://review.opendev.org/c/openstack/watcher-dashboard/+/970353 | 05:20 |
| opendevreview | chandan kumar proposed openstack/watcher-dashboard master: Add Playwright-based E2E testing framework https://review.opendev.org/c/openstack/watcher-dashboard/+/970353 | 06:42 |
| sean-k-mooney | amoralej: dviroel so i did a littel digging over the weekend and i bleive i know why the in line comment were not working | 08:25 |
| sean-k-mooney | they actually are in most runs | 08:25 |
| amoralej | and can be fixed? | 08:25 |
| sean-k-mooney | ya i think i have fixed it | 08:25 |
| sean-k-mooney | so basiclly gerrit allows you to ocmmment on files that are not modifed in the reviiew but github does not | 08:25 |
| sean-k-mooney | so zuul has an internal check for that and drops the comments | 08:26 |
| sean-k-mooney | when using the old robot comment api | 08:26 |
| sean-k-mooney | it only dropped it for the relevent file | 08:26 |
| sean-k-mooney | but with the new api it drops all the inline comments | 08:26 |
| sean-k-mooney | so i tweaked the reivew bot with a filtering layer | 08:26 |
| sean-k-mooney | to drop any comment not on the changed files on my side | 08:27 |
| sean-k-mooney | i also changed the prompt | 08:27 |
| sean-k-mooney | so instad of comenting on file x sayign this fonction is broken by a change in file y it will not fommnet on file y to say it broke file x | 08:27 |
| sean-k-mooney | so it shoudl instead of point ing out what broke it will point at the thing the broke it | 08:28 |
| sean-k-mooney | so the sam einfo shoudl be reproted just in a diffent way | 08:28 |
| amoralej | makes sense, seems to be working in https://review.opendev.org/c/openstack/watcher/+/994178 | 08:28 |
| sean-k-mooney | we can see how it goes and if i need to tweak it more or not but if you see it not reportign inlien just let me know an i can take a look at that speciric example and see what went wrong | 08:29 |
| amoralej | sure, i will, thanks! | 08:29 |
| sean-k-mooney | there is one down side or maybe upside i dont know | 08:31 |
| sean-k-mooney | with the new standard comments api | 08:31 |
| sean-k-mooney | the comment are reported as resolved by default | 08:32 |
| sean-k-mooney | and i dont think i can change that without modifying zuul | 08:32 |
| amoralej | i think we can live with that | 08:32 |
| sean-k-mooney | ack, we just need to get in the habbit of clickign into them and confirming if they are actionable | 08:33 |
| sean-k-mooney | if they are not then we just just leave them resolved if they are we can just comment saying htat. | 08:33 |
| opendevreview | Joan Gilabert proposed openstack/watcher-tempest-plugin master: Remove SSH validation from created servers https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/994246 | 08:54 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: [DNM] Test tempest changes https://review.opendev.org/c/openstack/watcher/+/994247 | 08:56 |
| opendevreview | chandan kumar proposed openstack/watcher-dashboard master: Add Playwright-based E2E testing framework https://review.opendev.org/c/openstack/watcher-dashboard/+/970353 | 09:11 |
| sean-k-mooney | amoralej: looks like z.ai updated the models in the plan im using so i need to upstae them in ci. which is good in that the review will be doen with glm 5.2 now but it also means the current jobs i kicked off are failing since the model i was trying to use is not availabel any more | 09:29 |
| amoralej | we'll see if new ones are better | 09:30 |
| sean-k-mooney | im not sure how big of a jump 5.1 to 5.2 will be but ya | 09:30 |
| sean-k-mooney | i need to bump teh quick model form glm 4.7 flash to 4.7 | 09:31 |
| sean-k-mooney | not that it really need it but i think that is the main issue i coudl be wrong | 09:31 |
| sean-k-mooney | flash is there free teir so it might still be included and failing on that glm 5.1 request | 09:32 |
| dviroel | o/ | 11:08 |
| *** haleyb|out is now known as haleyb | 13:28 | |
| opendevreview | chandan kumar proposed openstack/watcher-dashboard master: Add Playwright integration test for skip action workflow https://review.opendev.org/c/openstack/watcher-dashboard/+/976594 | 13:46 |
| amoralej | Hi, I asked AI about patterns of jobs timing out, these were the main findings https://etherpad.opendev.org/p/watcher-timeouts | 14:31 |
| amoralej | so apparently it's related to certain infra provider | 14:32 |
| amoralej | where jobs are much slower | 14:32 |
| amoralej | in all phases of the job | 14:32 |
| amoralej | results seems consistent, that may justify increasing job timeouts | 14:32 |
| jgilaber | that makes sense to me. I also did some digging in some job runs and found that time goes mostly creating resources (servers, floating ips, volumes, etc) | 14:38 |
| jgilaber | https://etherpad.opendev.org/p/watcher_tempest_slowness | 14:38 |
| sean-k-mooney | amoralej: which ones | 14:38 |
| dviroel | interesting results.. | 14:38 |
| jgilaber | I found a potential small optimization for the zone migration test https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/994246 but nothing huge | 14:39 |
| sean-k-mooney | so raxflex | 14:39 |
| dviroel | and ovh | 14:39 |
| sean-k-mooney | is the newest provieder and signifcanly faster then the rest | 14:39 |
| sean-k-mooney | almost all our capstict is on rax legacy and ovh | 14:39 |
| sean-k-mooney | so we dont concider ovh geranl as a slow provider | 14:39 |
| sean-k-mooney | rax legacy is | 14:39 |
| sean-k-mooney | ovh typicly is faster then rax legacy and vexhost is typicly faster then ovh but not as fast of raxflex | 14:41 |
| sean-k-mooney | lookign at those number rax/ovh stackin in 28 mins is not an indicator fo a low node | 14:42 |
| sean-k-mooney | *slow node | 14:42 |
| sean-k-mooney | we woudl only condier it a low node if that conoler was over about 45 mins | 14:42 |
| sean-k-mooney | rax flex is compariable ot baremental for what its worth but this does not point to a provider problem | 14:43 |
| sean-k-mooney | the slower provider are more likely to time out btu the job and test are still too slow in general | 14:44 |
| sean-k-mooney | if we look at jgilaber link at test like | SkippedActionsInstances | **4m 47s** | 124 | | 14:45 |
| sean-k-mooney | that well over the 90-120s maxium we would expect for a test letallow the class cleanup | 14:46 |
| sean-k-mooney | so we shoudl try and underand why test class cleanup is takeing so long | 14:46 |
| amoralej | remember we are forced to run tempest tests sequentially | 14:48 |
| dviroel | jgilaber: is this cleanup the sum of indiviual tests cleanups? Maybe the numbers are high based on the amount of tests in each class | 14:49 |
| sean-k-mooney | im pretty sure its becasue we are waitign for the instance to be removed form the model | 14:49 |
| sean-k-mooney | https://opendev.org/openstack/watcher-tempest-plugin/src/branch/master/watcher_tempest_plugin/tests/scenario/test_execute_skipped_actions.py#L151 | 14:49 |
| sean-k-mooney | dviroel: so i think those number are teh total time for all the cleanup form all tests in the class | 14:49 |
| jgilaber | dviroel, yes, claude added all the individula cleanup time, I would not put too much stock on that | 14:50 |
| sean-k-mooney | but we are waiting for the modles ot be updated with the instnace delete | 14:50 |
| sean-k-mooney | which si addign latency | 14:50 |
| sean-k-mooney | ... | 14:51 |
| sean-k-mooney | https://opendev.org/openstack/watcher-tempest-plugin/src/branch/master/watcher_tempest_plugin/tests/scenario/base.py#L1179 | 14:51 |
| sean-k-mooney | we are doign a 15 second sleep | 14:51 |
| sean-k-mooney | that the problem | 14:51 |
| dviroel | ack, we can reduce this 15s | 14:51 |
| dviroel | it is too big | 14:51 |
| sean-k-mooney | lset it to 0.5-1s | 14:51 |
| sean-k-mooney | or ideally make it a config option | 14:52 |
| sean-k-mooney | and also use it here https://opendev.org/openstack/watcher-tempest-plugin/src/branch/master/watcher_tempest_plugin/tests/scenario/base.py#L1155 | 14:52 |
| sean-k-mooney | so we shoudl have a model_poll_interval config option | 14:52 |
| sean-k-mooney | and use that in all the places we check | 14:52 |
| dviroel | yeah, a single config option to be used in these methods | 14:53 |
| sean-k-mooney | we also have a bunch of places where we do 2 second paues https://opendev.org/openstack/watcher-tempest-plugin/src/branch/master/watcher_tempest_plugin/tests/scenario/base.py#L978 | 14:53 |
| sean-k-mooney | those might need to ahave a diffent option | 14:53 |
| amoralej | there are 41 scenario tests | 14:54 |
| sean-k-mooney | yep which is pretty small | 14:54 |
| amoralej | reducing 10 secs per tests would be ~ 7 minutes | 14:54 |
| sean-k-mooney | but but if we sleep once due to tha tthat 10 minutes | 14:54 |
| sean-k-mooney | ya | 14:54 |
| amoralej | yes, but if it takes 10 minutes to remove the created stuff, we need to wait | 14:55 |
| sean-k-mooney | so we are waiting on additon and removal | 14:55 |
| sean-k-mooney | we do but it shoudl not | 14:56 |
| sean-k-mooney | deleting a vm or volume should only take 5-10 seonds | 14:56 |
| sean-k-mooney | as you noted we are runing serially so the devstack cluster is effectivly idle | 14:58 |
| sean-k-mooney | there are some existing tempest config option we can use or use as a refence https://github.com/openstack/tempest/blob/master/tempest/config.py#L329-L331 | 14:59 |
| sean-k-mooney | standard tempest polls vm/volume/image creation on a 1 secodn interval | 15:00 |
| dviroel | ack, is someone already working on this? we can propose a patch and check CI results | 15:01 |
| dviroel | ready to trigger claude here if nobody started | 15:01 |
| sean-k-mooney | not right now but if someone does nto push up a patch ill have an agent do it in a bit | 15:01 |
| amoralej | I'm trying to dig more into one specific test in a slow job to find out where the time is spent | 15:03 |
| sean-k-mooney | dviroel: i woudl add 2 config options one for polling the model and one for polling everything else adn defautl both too 1 second intervals for now. ideally we woudl make the timeout configurable as well so interval and timeout for both api request and model | 15:04 |
| amoralej | but reducing polling time seems reasonable | 15:04 |
| dviroel | sean-k-mooney: ack, makes sense | 15:04 |
| sean-k-mooney | we are mostly relying on notifictions but we have tuned the collecotr perod adn ceilomenter in out job config | 15:06 |
| sean-k-mooney | https://github.com/openstack/watcher/blob/master/.zuul.yaml#L98-L125 | 15:06 |
| sean-k-mooney | we might actully want to conider removing the collector overried or make it longer now | 15:07 |
| sean-k-mooney | to avoid the lock conteion with the notifications | 15:07 |
| sean-k-mooney | the upstram default is to run the collectors once an hour | 15:09 |
| sean-k-mooney | we are running them every 2 minuts | 15:09 |
| sean-k-mooney | so we are going to have a lot of lock contetion on the data model | 15:09 |
| amoralej | i think we have notifications in most or all jobs ? | 15:09 |
| sean-k-mooney | i think in all | 15:10 |
| sean-k-mooney | we orgianlly made that short before we hooked them up | 15:10 |
| sean-k-mooney | so maybe we can just remove those overrieds as well | 15:10 |
| sean-k-mooney | they were imporant when we were loosing notificaiotn due to the rebuild | 15:10 |
| sean-k-mooney | but that should not happen now | 15:11 |
| dviroel | yeah, that's true | 15:13 |
| sean-k-mooney | https://bugs.launchpad.net/watcher/+bug/2138857 | 15:13 |
| sean-k-mooney | would have to be backported to stable for us to rely on that | 15:13 |
| sean-k-mooney | but we plan to do that anyway | 15:14 |
| sean-k-mooney | dviroel: if you working on it let do it in 2 commits, one for removign the collector overried and another for the polling interval | 15:15 |
| amoralej | https://etherpad.opendev.org/p/watcher-timeouts#L40 | 15:15 |
| sean-k-mooney | between those two i think we might see a non trivial cleanup | 15:15 |
| amoralej | that's analysis for one specific test | 15:15 |
| sean-k-mooney | test_execute_workload_stabilization_strategy_ram_bfv | 15:17 |
| dviroel | sean-k-mooney: i am working in the intervals + timeout config | 15:17 |
| amoralej | yeah, i took one of the longest | 15:17 |
| sean-k-mooney | dviroel: ok ill jsut create a quick patch to remvoe the job config for the collectors | 15:18 |
| sean-k-mooney | ill push that indepently and we can see what ci says | 15:18 |
| dviroel | ack | 15:19 |
| opendevreview | sean mooney proposed openstack/watcher master: remove collector job config https://review.opendev.org/c/openstack/watcher/+/994314 | 15:22 |
| sean-k-mooney | amoralej: 75 seconds is a lot for a live migratiron in this type of test | 15:23 |
| sean-k-mooney | espiclly for bfv as there is no disk to copy | 15:23 |
| sean-k-mooney | but we could try speading that up perhasp by enabling multi threaded live migrtion | 15:24 |
| sean-k-mooney | it really just depedn on why it take 75 seconds | 15:24 |
| sean-k-mooney | i think that is 75 second for the miction to be visable in the model | 15:24 |
| dviroel | would nova logs help on more details in this case, amoralej? | 15:25 |
| sean-k-mooney | if you have the request id you shoudl be able to track the migration in nova with that | 15:25 |
| dviroel | yeah, maybe is part of the time to update the model | 15:25 |
| sean-k-mooney | i think this is also related to the hardcoded sleep we have in teh applier | 15:26 |
| amoralej | in a passing job, actionplan execution took 50 secs | 15:26 |
| amoralej | 55 | 15:27 |
| sean-k-mooney | https://github.com/openstack/watcher/blob/master/watcher/conf/nova.py#L43 | 15:29 |
| sean-k-mooney | so we are checkign the migrtion state every 5 seconds | 15:29 |
| sean-k-mooney | ill drop that down to 0.5 | 15:30 |
| sean-k-mooney | in one test that wont make much fo a difent but it will help overall | 15:30 |
| sean-k-mooney | if it saved 3 second on every test that woudl be 2 minutes on the job | 15:31 |
| sean-k-mooney | oh nevermind its already 1 in ci | 15:32 |
| sean-k-mooney | thats fine | 15:32 |
| amoralej | yes, it's 1sec | 15:36 |
| sean-k-mooney | im not really seeing any other relvent config option to tune | 15:37 |
| sean-k-mooney | https://github.com/openstack/watcher/blob/d0f2173ce9a9f08818ee75a426a0b129203c343d/watcher/common/nova_helper.py#L749 https://github.com/openstack/watcher/blob/d0f2173ce9a9f08818ee75a426a0b129203c343d/watcher/applier/actions/change_node_power_state.py#L118 | 15:38 |
| sean-k-mooney | we have a couple of hardcoded time.sleep() calls | 15:39 |
| sean-k-mooney | that we shoudl make configbale eventualy | 15:39 |
| sean-k-mooney | but that not the issu here | 15:39 |
| amoralej | checking at those tests, the one timing out, 77secs for the action plan, it takes ~ 30 sec to migrate each vm from nova | 15:44 |
| amoralej | may we tune nova to allow concurrent migrations? | 15:44 |
| opendevreview | Douglas Viroel proposed openstack/watcher-tempest-plugin master: Make test polling intervals and timeouts configurable https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/994323 | 15:45 |
| sean-k-mooney | we can | 15:45 |
| sean-k-mooney | we can also enabel mutliple thread | 15:45 |
| sean-k-mooney | for the migration or both | 15:45 |
| sean-k-mooney | i can do that quickly | 15:45 |
| amoralej | nova takes several seconds to reply to the migrate POST call, btw, 6-7 secs, just in case it indicates something | 15:46 |
| sean-k-mooney | we have a bunch of host/volume cleanup to do on the souce node after the libvirt migrtion completes | 15:46 |
| sean-k-mooney | so im not nessirlly concerd by that | 15:46 |
| sean-k-mooney | if it was double that sure | 15:46 |
| sean-k-mooney | we have to call neutron and cinder in this case + do cleanup on ovs and the host for the volume | 15:47 |
| sean-k-mooney | so anythign under about 10-15 seconds is fine in ci | 15:47 |
| sean-k-mooney | ill tuene nova to do faster live migration as a follow up patch so that i dont kick the current job out of ci | 15:48 |
| amoralej | https://etherpad.opendev.org/p/watcher-timeouts#L143 | 15:48 |
| amoralej | that's the execution of the actionplan itself, 75secs | 15:48 |
| amoralej | for a passing job, it's similar sequnce, just faster on every step to do it in 55secs | 15:49 |
| amoralej | given that we are launching the migratinons in prarllel from watcher, parallelizing in nova may be the greater win, if the compute nodes can handle it | 15:50 |
| sean-k-mooney | ok ya so ill allow say 3 migration in parallel and 2 thread per migration and enable post copy migration | 15:50 |
| opendevreview | sean mooney proposed openstack/watcher master: [ci] make live migration faster https://review.opendev.org/c/openstack/watcher/+/994327 | 16:01 |
| sean-k-mooney | amoralej: ^ lets see how that goes i chould make it more agressive but we run the risk of having memory issues | 16:02 |
| amoralej | sure, thanks, let's see how that goes | 16:03 |
| opendevreview | sean mooney proposed openstack/watcher master: [ci] make live migration faster https://review.opendev.org/c/openstack/watcher/+/994327 | 17:21 |
| sean-k-mooney | amoralej: hum so interesting watcher_tempest_plugin.tests.scenario.test_data_model.TestDataModelWithExtendedAttributes.test_data_model_with_extended_attributes | 17:32 |
| sean-k-mooney | only works based on the collector | 17:32 |
| sean-k-mooney | we are not enrichging the instance with extended atributes form the notificaitons | 17:32 |
| dviroel | sean-k-mooney: hum, which one? maybe pinnez az is not available in notifications? | 17:46 |
| sean-k-mooney | https://12b5c9445590716cbe81-2eb50734132c0e56282483bcdf57bf8a.ssl.cf2.rackcdn.com/openstack/8a6bf633ba574b2582331ec89542f578/testr_results.html | 17:48 |
| sean-k-mooney | Traceback (most recent call last): | 17:48 |
| sean-k-mooney | File "/opt/stack/tempest/.tox/tempest/lib/python3.12/site-packages/watcher_tempest_plugin/tests/scenario/test_data_model.py", line 155, in test_data_model_with_extended_attributes | 17:48 |
| sean-k-mooney | self.wait_for_instances_attributes_in_model( | 17:48 |
| sean-k-mooney | File "/opt/stack/tempest/.tox/tempest/lib/python3.12/site-packages/watcher_tempest_plugin/tests/scenario/base.py", line 1157, in wait_for_instances_attributes_in_model | 17:48 |
| sean-k-mooney | raise Exception("Attributes were not updated in the model.") | 17:48 |
| sean-k-mooney | Exception: Attributes were not updated in the model. | 17:48 |
| sean-k-mooney | it does not say | 17:48 |
| dviroel | I think that pinned az is not available in instance notifications | 17:49 |
| sean-k-mooney | that is proably correct. | 17:49 |
| sean-k-mooney | we likely need to call nova to backfil it when not found o rmake the test a little less strict | 17:49 |
| sean-k-mooney | we coudl also update nova to add it groing forward | 17:50 |
| dviroel | in notification processing? | 17:50 |
| sean-k-mooney | but we would not backport that | 17:50 |
| sean-k-mooney | yes | 17:50 |
| sean-k-mooney | in the notrifcaiotn processing | 17:50 |
| sean-k-mooney | we woudl have to call the api if the extra notifiction are enabled | 17:50 |
| sean-k-mooney | *extra atributes | 17:50 |
| dviroel | i was thinking on adding scheduler_hints to nova notification, it is another one that does not exist, but not yet added to watcher too | 17:50 |
| dviroel | right | 17:51 |
| sean-k-mooney | since we are not using either yet | 17:51 |
| sean-k-mooney | we coudl just add both on master | 17:51 |
| sean-k-mooney | in nova | 17:51 |
| dviroel | the workaround would be to call nova api to get this info | 17:52 |
| sean-k-mooney | yes | 17:52 |
| dviroel | at the same time, we are not using it in the code | 17:52 |
| sean-k-mooney | only when we configure it | 17:52 |
| sean-k-mooney | right so for now i think its oke ot just wait until we need it | 17:52 |
| sean-k-mooney | and ideally update nova to add the info before then | 17:53 |
| dviroel | yes | 17:53 |
| sean-k-mooney | for now ill flip the order and put the live migration tuning ahead fo the collector removal | 17:54 |
| dviroel | some results from configurable pool intervals, now set to 1s by default | 17:54 |
| dviroel | https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/994323/1#message-c8007e41b3a3f76153bd70f1bf53364a047b1a6e | 17:54 |
| sean-k-mooney | i think that is faster | 17:56 |
| sean-k-mooney | 1 hour 40 ish on rax | 17:56 |
| sean-k-mooney | 126 for the stabel job | 17:57 |
| sean-k-mooney | dviroel: we will obviouly need to see how it runs over time but that looks promising | 17:57 |
| dviroel | yeah, it seems that improved a bit | 17:58 |
| * dviroel amoralej: for you to take a look tomorrow: https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/994323/1#message-c8007e41b3a3f76153bd70f1bf53364a047b1a6e | 17:59 | |
| sean-k-mooney | we could set resource_check_interval to 0.5 in ci | 17:59 |
| sean-k-mooney | many of the check were alreday doing tha t | 17:59 |
| sean-k-mooney | but i think we could merge this as is | 17:59 |
| dviroel | yeah, there were lots of 0,5, 1, 2 and 5 around :) | 18:13 |
| sean-k-mooney | i think 1 second is a good polling interval to start with and we can tune more after we get some more results | 18:17 |
| sean-k-mooney | lol Read SKILL.md (github:yeet skill) | 18:25 |
| sean-k-mooney | that codex's defualt github skill | 18:28 |
| opendevreview | Douglas Viroel proposed openstack/watcher-tempest-plugin master: Make test polling intervals and timeouts configurable https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/994323 | 19:08 |
| opendevreview | Ivan Anfimov proposed openstack/watcher-dashboard master: Add directory for locale https://review.opendev.org/c/openstack/watcher-dashboard/+/994377 | 23:40 |
| opendevreview | Ivan Anfimov proposed openstack/watcher-dashboard master: Add directory for locale https://review.opendev.org/c/openstack/watcher-dashboard/+/994377 | 23:40 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!