*** mriedem has quit IRC | 00:06 | |
*** tssurya has quit IRC | 01:03 | |
openstackgerrit | licanwei proposed openstack/watcher master: Fix API version header https://review.opendev.org/658237 | 01:52 |
---|---|---|
openstackgerrit | Merged openstack/watcher master: Fix bandit runs with 1.6.0 https://review.opendev.org/658089 | 02:08 |
openstackgerrit | licanwei proposed openstack/watcher master: Add force field to api-ref https://review.opendev.org/658244 | 02:53 |
openstackgerrit | licanwei proposed openstack/python-watcherclient master: Add force option https://review.opendev.org/657375 | 03:01 |
openstackgerrit | Merged openstack/watcher master: update wsme types https://review.opendev.org/657218 | 03:49 |
openstackgerrit | Merged openstack/watcher master: Fix reraising of exceptions https://review.opendev.org/657643 | 03:49 |
*** adisky__ has joined #openstack-watcher | 05:22 | |
openstackgerrit | Merged openstack/watcher master: Allow for global datasources preference from config https://review.opendev.org/645294 | 06:04 |
openstackgerrit | Dantali0n proposed openstack/watcher master: Improve exceptions and logging in ds manager https://review.opendev.org/658127 | 06:32 |
openstackgerrit | chenker proposed openstack/watcher master: Bandit's version should be equal the min value in test-requriment https://review.opendev.org/658273 | 06:32 |
openstackgerrit | Dantali0n proposed openstack/watcher master: Improve exceptions and logging in ds manager https://review.opendev.org/658127 | 06:51 |
openstackgerrit | Merged openstack/watcher master: Use the common logging setup function in devstack runs https://review.opendev.org/657651 | 07:04 |
openstackgerrit | Merged openstack/watcher master: Add tempest voting https://review.opendev.org/656457 | 07:15 |
openstackgerrit | Dantali0n proposed openstack/watcher master: [WIP] Grafana proxy datasource to retrieve metrics https://review.opendev.org/649341 | 07:17 |
openstackgerrit | Dantali0n proposed openstack/watcher master: [wip] formal datasource interface implementation https://review.opendev.org/656622 | 07:23 |
*** adiantum has joined #openstack-watcher | 08:58 | |
*** adiantum has quit IRC | 09:22 | |
*** adiantum has joined #openstack-watcher | 09:27 | |
openstackgerrit | sumitjami proposed openstack/watcher master: Allow using file to override metric map https://review.opendev.org/657374 | 09:36 |
*** adisky__ has quit IRC | 10:48 | |
*** adisky__ has joined #openstack-watcher | 10:49 | |
*** zhurong has quit IRC | 10:50 | |
openstackgerrit | Dantali0n proposed openstack/python-watcherclient master: Fix sphinx builds for python 2.7 https://review.opendev.org/658345 | 12:32 |
openstackgerrit | Dantali0n proposed openstack/python-watcherclient master: Fix sphinx builds for python 2.7 https://review.opendev.org/658345 | 12:33 |
openstackgerrit | sumitjami proposed openstack/watcher master: pass default_config_dirs variable for config initialization. https://review.opendev.org/658348 | 12:39 |
*** adisky__ has quit IRC | 12:47 | |
openstackgerrit | Dantali0n proposed openstack/python-watcherclient master: Fix sphinx builds for python 2.7 https://review.opendev.org/658345 | 13:02 |
*** mriedem has joined #openstack-watcher | 13:05 | |
openstackgerrit | Dantali0n proposed openstack/python-watcherclient master: Fix sphinx builds for python 2.7 https://review.opendev.org/658345 | 13:40 |
*** ianychoi_ is now known as ianychoi | 14:00 | |
mriedem | i'm seeing a failure in the watcher-tempest-workload_balancing job and it looks like a race in the tempest test, live migration fails because the neutron port on the server is deleted while the server is being live migrated | 14:26 |
mriedem | it actually looks like the instance itself is being destroyed before the live migration is complete, which is what deletes the port | 14:27 |
mriedem | http://logs.openstack.org/74/657374/6/check/watcher-tempest-workload_balancing/3e34c3f/compute1/logs/screen-n-cpu.txt.gz#_May_10_10_18_19_106717 | 14:27 |
mriedem | May 10 10:18:19.106717 ubuntu-bionic-vexxhost-sjc1-0006079177 nova-compute[19758]: INFO nova.compute.manager [None req-0e809061-4c10-4e82-9c17-20ebf2ca3832 tempest-TestExecuteWorkloadBalancingStrategy-1723816204 tempest-TestExecuteWorkloadBalancingStrategy-1723816204] [instance: 27813a13-39dc-490d-86f3-c1877c07a010] Took 0.28 seconds to destroy the instance on the hypervisor. | 14:27 |
mriedem | http://logs.openstack.org/74/657374/6/check/watcher-tempest-workload_balancing/3e34c3f/compute1/logs/screen-n-cpu.txt.gz#_May_10_10_18_21_174978 | 14:27 |
mriedem | May 10 10:18:21.174978 ubuntu-bionic-vexxhost-sjc1-0006079177 nova-compute[19758]: WARNING nova.virt.libvirt.driver [None req-4e88ad59-55d7-4cc2-9deb-532900c94ab6 None None] [instance: 27813a13-39dc-490d-86f3-c1877c07a010] Error monitoring migration: Failed to activate binding for port 67f1aaa7-8966-448b-9655-56e19a01fb62 and host ubuntu-bionic-vexxhost-sjc1-0006079176.: PortBindingActivationFailed: Failed to activate bind | 14:27 |
mriedem | for port 67f1aaa7-8966-448b-9655-56e19a01fb62 and host ubuntu-bionic-vexxhost-sjc1-0006079176. | 14:27 |
mriedem | you can see the nova-api request to delete the server here: http://logs.openstack.org/74/657374/6/check/watcher-tempest-workload_balancing/3e34c3f/controller/logs/screen-n-api.txt.gz#_May_10_10_18_18_867714 | 14:28 |
mriedem | May 10 10:18:18.867714 ubuntu-bionic-vexxhost-sjc1-0006079176 devstack@n-api.service[6484]: INFO nova.api.openstack.requestlog [None req-0e809061-4c10-4e82-9c17-20ebf2ca3832 tempest-TestExecuteWorkloadBalancingStrategy-1723816204 tempest-TestExecuteWorkloadBalancingStrategy-1723816204] 38.108.68.233 "DELETE /compute/v2.1/servers/27813a13-39dc-490d-86f3-c1877c07a010" status: 204 len: 0 microversion: 2.1 time: 0.264338 | 14:28 |
mriedem | the live migration of that server starts here http://logs.openstack.org/74/657374/6/check/watcher-tempest-workload_balancing/3e34c3f/controller/logs/screen-n-api.txt.gz#_May_10_10_17_58_008645 | 14:31 |
mriedem | May 10 10:17:58.008645 ubuntu-bionic-vexxhost-sjc1-0006079176 devstack@n-api.service[6484]: DEBUG nova.compute.api [None req-908b9e12-08da-4f48-bbb2-f3b01c320d92 admin admin] [instance: 27813a13-39dc-490d-86f3-c1877c07a010] Going to try to live migrate instance to ubuntu-bionic-vexxhost-sjc1-0006079176 {{(pid=6485) live_migrate /opt/stack/nova/nova/compute/api.py:4540}} | 14:31 |
mriedem | so live migration starts at 10:17:58.008645, the server delete request is at 10:18:18.867714, the guest is deleted from the hypervisor by 10:18:19.106717, and live migration fails at 10:18:21.174978 | 14:32 |
mriedem | seems the watcher tempest plugin / test isn't waiting for the live migration that watcher kicked off to actually complete | 14:32 |
mriedem | so, the test creates an audit template from the workload_stabilization strategy, creates an audit from the template, and then polls until the audit is finished - and it's the audit that would kick off the live migration right? so i guess the race is in the workload_stabilization strategy not waiting for the live migration to complete? | 14:37 |
mriedem | looks like at this point the audit status is SUCCEEDED: http://logs.openstack.org/74/657374/6/check/watcher-tempest-workload_balancing/3e34c3f/job-output.txt.gz#_2019-05-10_10_18_22_112059 | 14:43 |
mriedem | 2019-05-10 10:18:09,494 12011 DEBUG [tempest.lib.common.utils.test_utils] Call partial returns true in 4.405191 seconds | 14:43 |
mriedem | i'm confused because i don't see this in the watcher logs https://opendev.org/openstack/watcher/src/branch/master/watcher/common/nova_helper.py#L282 | 15:05 |
mriedem | or when it's waiting https://opendev.org/openstack/watcher/src/branch/master/watcher/common/nova_helper.py#L308 | 15:05 |
mriedem | ah i see why i can't see those logs, | 15:21 |
mriedem | May 10 10:06:34.673237 ubuntu-bionic-vexxhost-sjc1-0006079176 watcher-applier[3306]: DEBUG watcher.common.service [-] default_log_levels = ['amqp=WARN', 'amqplib=WARN', 'qpid.messaging=INFO', 'oslo.messaging=INFO', 'sqlalchemy=WARN', 'keystoneclient=INFO', 'stevedore=INFO', 'eventlet.wsgi.server=WARN', 'iso8601=WARN', 'requests=WARN', 'neutronclient=WARN', 'glanceclient=WARN', 'watcher.openstack.common=WARN', ' | 15:21 |
mriedem | heduler=WARN'] {{(pid=3306) log_opt_values /usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py:2577}} | 15:21 |
mriedem | 'watcher.openstack.common=WARN', | 15:21 |
mriedem | that makes debugging this pretty hard... | 15:21 |
mriedem | anyway, reported a bug https://bugs.launchpad.net/watcher/+bug/1828598 since we'll probably continue to see that in the gate | 15:22 |
openstack | Launchpad bug 1828598 in watcher "test_execute_workload_stabilization intermittently fails because server is deleted before live migration is complete" [Undecided,New] | 15:22 |
*** josecastroleon has quit IRC | 15:32 | |
openstackgerrit | Matt Riedemann proposed openstack/watcher master: Remove watcher.openstack.common=WARN from _DEFAULT_LOG_LEVELS https://review.opendev.org/658399 | 15:34 |
mriedem | ^ should help with debugging gate failures ^ | 15:34 |
openstackgerrit | Matt Riedemann proposed openstack/watcher master: docs: fix link to install guide from user guide https://review.opendev.org/658401 | 15:38 |
Dantalion | mriedam: In general there is quite some flaky behavior in some of the tempest jobs haven't had time to properly look at it yet. it is the main reason I said we should improve documentation on tempest when making jobs voting. But now the jobs are voting we will have to fix them anyway as they will start blocking patches which is fine as they should be reliable and working anyway. | 15:41 |
mriedem | yeah, being able to tell what the nova helper is doing wrt monitoring the live migration is the first step there | 15:41 |
mriedem | i can definitely see that the server is deleted before the live migration is complete, which means the audit must think it's done too early for whatever reason | 15:42 |
Dantalion | I am off on holiday for the weekend so probably won't have time to look at it before monday | 15:42 |
mriedem | sure, https://review.opendev.org/#/c/658399/ fixes the logging issue if you want to hit that quick | 15:43 |
Dantalion | I'll look at it before leaving the office | 15:43 |
Dantalion | Have a great weekend~ | 15:43 |
mriedem | you too | 15:46 |
mriedem | fyi there is a nova change proposed to change the default notification format to unversioned https://review.opendev.org/#/c/603079/ which might break watcher (which i think relies on versioned notifications) | 16:13 |
openstackgerrit | Dantali0n proposed openstack/python-watcherclient master: Fix sphinx builds for python 2.7 https://review.opendev.org/658345 | 16:19 |
openstackgerrit | Dantali0n proposed openstack/python-watcherclient master: Remove python 2.7 build job https://review.opendev.org/658345 | 17:47 |
*** tssurya_ has joined #openstack-watcher | 19:23 | |
openstackgerrit | Dantali0n proposed openstack/python-watcherclient master: Limit sphinx version for python 2.7 https://review.opendev.org/658345 | 19:40 |
*** mriedem has quit IRC | 20:44 | |
*** tssurya_ is now known as tssurya | 22:03 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!