| opendevreview | chandan kumar proposed openstack/watcher-tempest-plugin master: Wait for all audits to finish before cleanup https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/969366 | 08:04 |
|---|---|---|
| opendevreview | chandan kumar proposed openstack/watcher-tempest-plugin master: Consolidate and improve Zuul CI job definitions https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/968247 | 08:10 |
| opendevreview | chandan kumar proposed openstack/watcher-tempest-plugin master: Wait for all audits to finish before cleanup https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/969366 | 08:10 |
| opendevreview | Alfredo Moralejo proposed openstack/watcher stable/2025.2: Make VM resize timeout configurable with migration defaults https://review.opendev.org/c/openstack/watcher/+/969877 | 08:58 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Fix zone migration dest pool and type selection https://review.opendev.org/c/openstack/watcher/+/964718 | 09:02 |
| opendevreview | Alfredo Moralejo proposed openstack/watcher master: Add documentation section for actions https://review.opendev.org/c/openstack/watcher/+/968025 | 09:41 |
| opendevreview | Alfredo Moralejo proposed openstack/watcher master: Skip migrate actions in pre_condition phase https://review.opendev.org/c/openstack/watcher/+/966699 | 09:41 |
| opendevreview | David proposed openstack/watcher master: [DNM] Testing nodeset with three nodes (two computes + 1 controller) https://review.opendev.org/c/openstack/watcher/+/967331 | 09:59 |
| amoralej | It'd be great if we could get https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/968247/ and https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/969366/ merged. That will fix the flaky test we are hitting | 10:27 |
| morenod | o/ | 11:17 |
| amoralej | after merging https://review.opendev.org/c/openstack/watcher/+/967693 we are hitting https://bugs.launchpad.net/watcher/+bug/2133934 | 11:18 |
| amoralej | which morenod has just reported | 11:18 |
| amoralej | apparently devstack disables keep-alive in httpd for nova-api but operators set to 5 secs | 11:18 |
| amoralej | which seems to be badly interacting with the default value we are setting for retries, 5 also | 11:19 |
| amoralej | I wonder if should try to disable connection pooling from the client side totally | 11:19 |
| amoralej | or modifying the nova.interval in the operator | 11:21 |
| amoralej | actually, in devstack jobs we are setting it to 1, so it would not fail anyway | 11:21 |
| jgilaber | still going through the bug report, but this looks like a problem with the installer, rather than watcher code? | 11:25 |
| amoralej | I'm not sure, tbh | 11:26 |
| amoralej | making watcher to depend on specific keepalive configuration in nova seems fragile | 11:27 |
| jgilaber | the patch only increases the wait time, so the problem, IIUC is that apache keeps connections open for 5 seconds, and before we were calling nova every second | 11:29 |
| jgilaber | so we'd do 4 calls reusing the same connection and for the last one we would get a new one from the pool | 11:29 |
| amoralej | Actually, iiuc, the problem is that the client uses connection pooling to reuse connections | 11:30 |
| amoralej | so, when it tries a second get call on the same tcp connection the api server closes connection | 11:30 |
| jgilaber | python-novaclient manages that pool right? | 11:31 |
| amoralej | i'd say via keystoneauth1 | 11:31 |
| amoralej | apparently, there is some logic to retry that may work if we set migration_interval > keepalivetime, but we have exactly the same and seems to be some kind of race condition | 11:33 |
| amoralej | I'm testing by reducing interval to 3 in https://github.com/openstack-k8s-operators/watcher-operator/pull/313 | 11:34 |
| dviroel | about https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/969366 and https://bugs.launchpad.net/watcher-tempest-plugin/+bug/2133777 -> this looks a watcher bug. Since we can see that audit comes from CANCELLED to ONGOING. | 11:42 |
| morenod | my +1 is to disable connection reuse, so we dont depend to a future change on the keepalive matching our sleep will make us fail again | 11:44 |
| opendevreview | David proposed openstack/watcher master: [DNM] Testing nodeset with three nodes (two computes + 1 controller) https://review.opendev.org/c/openstack/watcher/+/967331 | 11:48 |
| amoralej | dviroel, you are right, a previous GET call shows it as CANCELLED and following delete fails with error about it being ONGOING :/ | 11:57 |
| jgilaber | morenod, can we do that from the watcher side? I'm trying to understand how this is configured through keystoneauth1 and I don't see any config options | 12:00 |
| morenod | I think there is a way to close the connection so new one will need to force start a new one | 12:01 |
| amoralej | i dunno | 12:02 |
| amoralej | iiuc keystoneauth has https://github.com/openstack/keystoneauth/blob/4f0414d864bd790aa6dc54e55308a94653fbcfb4/keystoneauth1/session.py#L1624 which builds on top of requests.adapters.HTTPAdapter | 12:07 |
| amoralej | so, i think it passes any parameter accepted by https://requests.readthedocs.io/en/latest/_modules/requests/adapters/#HTTPAdapter | 12:08 |
| amoralej | so we may add options in https://github.com/openstack/watcher/blob/master/watcher/common/clients.py#L81-L83 | 12:09 |
| amoralej | our issue seems similar to https://bugs.launchpad.net/charm-helpers/+bug/1947010 | 12:13 |
| amoralej | for that, they patch the installer to increase keepalive timeout to 75 seconds in the apis | 12:14 |
| jgilaber | that does seem like the same problem | 12:17 |
| morenod | but I think the error is more underground than nova or keystone. it is related to http connection pool created by urllib3. so if we set to not to reuse the connection, or to close the connection on each iteration, we wont depend on a keealive timeout number | 12:17 |
| sean-k-mooney | sorry i mised that converstaion about keepalive and conenction reuse | 12:32 |
| sean-k-mooney | is there a summary | 12:32 |
| sean-k-mooney | are you just talkign about http session and reusing the same connecton to the server to send multipel messages | 12:34 |
| sean-k-mooney | watcher should be robust to conencton clsure adn retry in those cases | 12:34 |
| sean-k-mooney | but also if we are movign to the sdk then we really shoudl not need to care about this beyodn a simple retyr | 12:35 |
| sean-k-mooney | low level parmater liek if conenctoin are resued are not somehtign we shoud generally be configur in our applcaition logic in watcher | 12:37 |
| sean-k-mooney | so we shoudl not try to recah that deaply into the internals. | 12:38 |
| sean-k-mooney | we have to assume that all rest api request can hit temproy conenction issues an make watcher robust to those transitent error like this | 12:39 |
| amoralej | so, another option may be to use try/catch in nova.server.get as in https://github.com/openstack/watcher/blob/master/watcher/common/nova_helper.py#L359 and ignore connection errors | 12:55 |
| amoralej | given that we are in a retry loop, it'd check status in next retry | 12:55 |
| jgilaber | that sounds good to me | 13:01 |
| morenod | +1 | 13:02 |
| amoralej | dviroel, wrt the issue in the tempest test, I've reported the bug in https://bugs.launchpad.net/watcher/+bug/2134046 thanks for your help dviroel++ | 15:15 |
| amoralej | I reproduced it locally (I had to add a sleep somewhere to force a bit the race condition) | 15:15 |
| opendevreview | Alfredo Moralejo proposed openstack/watcher master: Skip stop actions in pre_condition phase https://review.opendev.org/c/openstack/watcher/+/969950 | 15:17 |
| opendevreview | Joan Gilabert proposed openstack/watcher-specs master: Add zone migration parameter rename spec https://review.opendev.org/c/openstack/watcher-specs/+/965943 | 15:54 |
| dviroel | thanks amoralej | 16:06 |
| opendevreview | Alfredo Moralejo proposed openstack/watcher-tempest-plugin master: Skip test api.admin.test_audit.TestShowListAudit.test_list_with_limit https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/969966 | 16:30 |
| dviroel | amoralej: ^ wrong bug number in the skip | 16:48 |
| opendevreview | Alfredo Moralejo proposed openstack/watcher-tempest-plugin master: Skip test api.admin.test_audit.TestShowListAudit.test_list_with_limit https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/969966 | 17:05 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!