Friday, 2025-12-05

opendevreviewchandan kumar proposed openstack/watcher-tempest-plugin master: Wait for all audits to finish before cleanup  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96936608:04
opendevreviewchandan kumar proposed openstack/watcher-tempest-plugin master: Consolidate and improve Zuul CI job definitions  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96824708:10
opendevreviewchandan kumar proposed openstack/watcher-tempest-plugin master: Wait for all audits to finish before cleanup  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96936608:10
opendevreviewAlfredo Moralejo proposed openstack/watcher stable/2025.2: Make VM resize timeout configurable with migration defaults  https://review.opendev.org/c/openstack/watcher/+/96987708:58
opendevreviewJoan Gilabert proposed openstack/watcher master: Fix zone migration dest pool and type selection  https://review.opendev.org/c/openstack/watcher/+/96471809:02
opendevreviewAlfredo Moralejo proposed openstack/watcher master: Add documentation section for actions  https://review.opendev.org/c/openstack/watcher/+/96802509:41
opendevreviewAlfredo Moralejo proposed openstack/watcher master: Skip migrate actions in pre_condition phase  https://review.opendev.org/c/openstack/watcher/+/96669909:41
opendevreviewDavid proposed openstack/watcher master: [DNM] Testing nodeset with three nodes (two computes + 1 controller)  https://review.opendev.org/c/openstack/watcher/+/96733109:59
amoralejIt'd be great if we could get https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/968247/ and https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/969366/ merged. That will fix the flaky test we are hitting10:27
morenodo/11:17
amoralejafter merging https://review.opendev.org/c/openstack/watcher/+/967693 we are hitting https://bugs.launchpad.net/watcher/+bug/213393411:18
amoralejwhich morenod has just reported11:18
amoralejapparently devstack disables keep-alive in httpd for nova-api but operators set to 5 secs11:18
amoralejwhich seems to be badly interacting with the default value we are setting for retries, 5 also11:19
amoralejI wonder if should try to disable connection pooling from the client side totally11:19
amoralejor modifying the nova.interval in the operator11:21
amoralejactually, in devstack jobs we are setting it to 1, so it would not fail anyway11:21
jgilaberstill going through the bug report, but this looks like a problem with the installer, rather than watcher code?11:25
amoralejI'm not sure, tbh11:26
amoralejmaking watcher to depend on specific keepalive configuration in nova seems fragile11:27
jgilaberthe patch only increases the wait time, so the problem, IIUC is that apache keeps connections open for 5 seconds, and before we were calling nova every second11:29
jgilaberso we'd do 4 calls reusing the same connection and for the last one we would get a new one from the pool11:29
amoralejActually, iiuc, the problem is that the client uses connection pooling to reuse connections11:30
amoralejso, when it tries a second get call on the same tcp connection the api server closes connection11:30
jgilaberpython-novaclient manages that pool right?11:31
amoraleji'd say via keystoneauth111:31
amoralejapparently, there is some logic to retry that may work if we set migration_interval > keepalivetime, but we have exactly the same and seems to be some kind of race condition11:33
amoralejI'm testing by reducing interval to 3 in https://github.com/openstack-k8s-operators/watcher-operator/pull/31311:34
dviroelabout https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/969366 and https://bugs.launchpad.net/watcher-tempest-plugin/+bug/2133777 -> this looks a watcher bug. Since we can see that audit comes from CANCELLED to ONGOING.11:42
morenodmy +1 is to disable connection reuse, so we dont depend to a future change on the keepalive matching our sleep will make us fail again11:44
opendevreviewDavid proposed openstack/watcher master: [DNM] Testing nodeset with three nodes (two computes + 1 controller)  https://review.opendev.org/c/openstack/watcher/+/96733111:48
amoralejdviroel, you are right, a previous GET call shows it as CANCELLED and following delete fails with error about it being ONGOING :/11:57
jgilabermorenod, can we do that from the watcher side? I'm trying to understand how this is configured through keystoneauth1 and I don't see any config options12:00
morenodI think there is a way to close the connection so new one will need to force start a new one12:01
amoraleji dunno12:02
amoralejiiuc keystoneauth has https://github.com/openstack/keystoneauth/blob/4f0414d864bd790aa6dc54e55308a94653fbcfb4/keystoneauth1/session.py#L1624 which builds on top of requests.adapters.HTTPAdapter12:07
amoralejso, i think it passes any parameter accepted by https://requests.readthedocs.io/en/latest/_modules/requests/adapters/#HTTPAdapter12:08
amoralejso we may add options in https://github.com/openstack/watcher/blob/master/watcher/common/clients.py#L81-L8312:09
amoralejour issue seems similar to https://bugs.launchpad.net/charm-helpers/+bug/194701012:13
amoralejfor that, they patch the installer to increase keepalive timeout to 75 seconds in the apis12:14
jgilaberthat does seem like the same problem12:17
morenodbut I think the error is more underground than nova or keystone. it is related to http connection pool created by urllib3. so if we set to not to reuse the connection, or to close the connection on each iteration, we wont depend on a keealive timeout number12:17
sean-k-mooneysorry i mised that converstaion about keepalive and conenction reuse12:32
sean-k-mooneyis there a summary12:32
sean-k-mooneyare you just talkign about http session and reusing the same connecton  to the server to send multipel messages12:34
sean-k-mooneywatcher should be robust to conencton clsure adn retry in those cases12:34
sean-k-mooneybut also if we are movign to the sdk then we really shoudl not need to care about this beyodn a simple retyr 12:35
sean-k-mooneylow level parmater liek if conenctoin are resued are not somehtign we shoud generally be configur in our applcaition logic in watcher12:37
sean-k-mooneyso we shoudl not try to recah that deaply into the internals.12:38
sean-k-mooneywe have to assume that all rest api request can hit temproy conenction issues an make watcher robust to those transitent error like this 12:39
amoralejso, another option may be to use try/catch in nova.server.get as in https://github.com/openstack/watcher/blob/master/watcher/common/nova_helper.py#L359 and ignore connection errors12:55
amoralejgiven that we are in a retry loop, it'd check status in next retry12:55
jgilaberthat sounds good to me13:01
morenod+113:02
amoralejdviroel, wrt the issue in the tempest test, I've reported the bug in https://bugs.launchpad.net/watcher/+bug/2134046 thanks for your help dviroel++15:15
amoralejI reproduced it locally (I had to add a sleep somewhere to force a bit the race condition)15:15
opendevreviewAlfredo Moralejo proposed openstack/watcher master: Skip stop actions in pre_condition phase  https://review.opendev.org/c/openstack/watcher/+/96995015:17
opendevreviewJoan Gilabert proposed openstack/watcher-specs master: Add zone migration parameter rename spec  https://review.opendev.org/c/openstack/watcher-specs/+/96594315:54
dviroelthanks amoralej 16:06
opendevreviewAlfredo Moralejo proposed openstack/watcher-tempest-plugin master: Skip test api.admin.test_audit.TestShowListAudit.test_list_with_limit  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96996616:30
dviroelamoralej: ^ wrong bug number in the skip16:48
opendevreviewAlfredo Moralejo proposed openstack/watcher-tempest-plugin master: Skip test api.admin.test_audit.TestShowListAudit.test_list_with_limit  https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/96996617:05

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!