16:00:06 #startmeeting neutron_ci 16:00:07 Meeting started Tue Sep 3 16:00:06 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:10 The meeting name has been set to 'neutron_ci' 16:00:23 welcome back after short break :) 16:00:30 thanks! 16:00:34 hi 16:01:13 lets wait 1 or 2 minutes for njohnston and others 16:01:26 o/ (though I will probably leave soon) 16:02:51 ok, let's start than 16:03:06 first of all: Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:03:14 please open now that it will be ready later :) 16:03:22 #topic Actions from previous meetings 16:03:36 ralonsoh will continue working on error patterns and open bugs for functional tests 16:03:52 yes, I found today something that could be an error 16:03:57 related to pyroute2 16:04:15 tomorrow, reviewing the CI tests, I'll report an error if there is one 16:04:25 no other patterns found this week 16:05:04 just for information: the possible error is in test_get_devices_info_veth_different_namespaces 16:05:08 that's all 16:05:24 thx ralonsoh 16:05:35 I also saw today 2 failures which I wanted to raise here but lets do it later in functional tests section 16:05:50 next one: 16:05:52 mlavalle will continue debugging https://bugs.launchpad.net/neutron/+bug/1838449 16:05:53 Launchpad bug 1838449 in neutron "Router migrations failing in the gate" [Medium,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:06:33 last night I left my latest comments here: https://bugs.launchpad.net/neutron/+bug/1838449 16:07:49 mlavalle: so, based on Your comment, it looks that it could be introduced by https://review.opendev.org/#/c/597567, right? 16:08:04 as it is some race condition in case when router is updated 16:08:19 well, not necessarilly that patch 16:08:46 there are other 2 patches that have touched the "ralted routers" code later 16:08:55 which I also mention in the bug 16:09:41 the naive solution would be for each test case in the test_magrations script to create its router in separate nets / subnets 16:09:54 that would fix our tests 16:10:02 but this is a real bug IMO 16:10:07 which we need to fix 16:10:10 right? 16:10:52 so different routers from those tests are using same networks/subnets now? 16:10:53 and backport :) 16:11:06 I am assuming that 16:11:09 isn't it that there is one migration "per test"? 16:11:19 and then new network/subnet created per each test 16:11:39 well, if that is the case, the problem is even worst 16:12:07 why would the router under migration have related routers? 16:13:40 but here: https://github.com/openstack/neutron-tempest-plugin/blob/master/neutron_tempest_plugin/scenario/test_migration.py#L124 it looks like every test has got own network and subnet 16:13:52 IMO it's created here: https://github.com/openstack/neutron-tempest-plugin/blob/master/neutron_tempest_plugin/scenario/test_migration.py#L129 16:14:25 this method is defined here https://github.com/openstack/neutron-tempest-plugin/blob/d11f4ec31ab1cf7965671817f2733c362765ebb1/neutron_tempest_plugin/scenario/base.py#L173 16:14:41 see my comment just before yours please ^^^^ 16:14:46 we agree 16:15:23 yes, so it seems that it is "even worst" :) 16:16:04 I have a question that I want to confirm.... 16:16:44 shoot 16:17:07 the related routers ids are sent to the L3 agent in the router update rpc message, right? 16:18:16 yes (IIRC) 16:18:35 but delete router should also be sent in such case to controller, no? 16:18:38 in other words, this line https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L720 16:18:56 returns a non empty list, correct 16:19:46 in the case of a router migration, what is sent from the server to the agent is a router update 16:19:56 because we are not deleting the router 16:20:13 we are just setting admin_state_up to false 16:20:43 let's move on 16:20:48 I can test that locally 16:20:54 but, according to Your paste: 16:21:00 http://paste.openstack.org/show/769795/ 16:21:18 router 2cd0c0f0-75ab-444c-afe1-be7c6a1a7702 was deleted on compute 16:21:27 and was only updated on controller 16:21:29 why? 16:21:31 yes, but that is the result of processing an update 16:21:41 an update message 16:21:47 not a delete message 16:22:16 true 16:22:18 that I am 100% sure 16:23:07 the difference is that when the update message contains "related routers" updates 16:23:17 is is processed in a different manner 16:23:31 and therefore the router is not deleted locally in the agent 16:24:14 delete the router locally means removing the network space form that agent, even though the router still exists 16:24:15 but what are related routers in such case? 16:24:34 that is excatly the conundrum :-) 16:24:48 :) 16:25:04 why does the router under migration has related routers? 16:25:12 maybe You should send patch with some additional debug logging to get that info later 16:25:27 that is exactly the plan 16:25:30 ok 16:26:35 #action mlavalle to continue investigating router migrations issue 16:26:41 fine ^^? 16:26:50 yeap 16:26:53 ok 16:27:01 thx a lot mlavalle for working on this 16:27:09 let's move on 16:27:11 :-) 16:27:13 next action 16:27:15 njohnston will get the new location for periodic jobs logs 16:27:29 I did that at the end of the last meeting 16:27:46 and then contacted the glance folks for their broken job that was affecting all postgresql 16:28:18 njohnston: do You have link to those logs then? 16:28:30 just to add it to CI meeting agenda for the future :) 16:29:40 you can look here: http://zuul.openstack.org/builds?pipeline=periodic-stable&project=openstack%2Fneutron 16:30:00 or in the buildsets view http://zuul.openstack.org/buildsets?project=openstack%2Fneutron&pipeline=periodic 16:31:24 * mlavalle needs to step out 5 minutes 16:31:26 brb 16:31:32 thx a lot 16:31:48 and thx a lot for taking care of broken postgresql job 16:32:04 ok, let's move on to the next topic 16:32:06 #topic Stadium projects 16:32:12 first 16:32:14 Python 3 migration 16:32:20 Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:32:36 I saw today that we are in pretty good shape with networking-odl now thank to lajoskatona 16:32:55 and we made some progress with networking-bagpipe 16:33:15 so only "not touched" yet is networking-midonet 16:33:38 anything else You want to add regarding to python 3 migration? 16:33:53 No, I think that covers it, thanks! 16:34:39 thx 16:34:44 so next is 16:35:04 tempest-plugins migration 16:35:09 Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:35:17 any updates on this one? 16:36:10 I don't have any; I don't see tidwellr and mlavalle is AFK, they are who I would look at for updates. 16:36:22 njohnston: right 16:36:26 so lets move on then 16:36:48 #topic Grafana 16:36:56 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:38:13 I see that neutron-functional-python27 in gate queue was failing quite often last week 16:38:27 was there any known problem with it? 16:38:29 do You know? 16:38:55 no sorry 16:39:01 ok, maybe it wasn't any big problem as there was only few runs of this job then 16:39:11 so it wasn't relatively many failures 16:39:42 * mlavalle is back 16:40:04 from other things I see that neutron-tempest-plugin-scenario-openvswitch in check queue is failing quite often 16:40:55 any volunteer to check this job more deeply? 16:41:04 if no, I will assign it to myself for this week 16:41:17 I can take a look, but I don't know when 16:41:32 I have a very busy agenda now 16:42:01 ralonsoh: thx 16:42:10 but if You are busy, I can take a look into this 16:42:21 so I will assign it to myself 16:42:27 to not overload You :) 16:42:30 ok? 16:42:38 thanks! 16:43:11 #action slaweq to check reasons of failures of neutron-tempest-plugin-scenario-openvswitch job 16:43:50 from other things which I noticed from grafana, neutron-functional-with-uwsgi high failure rate but it's non voting so no big problem (yet) 16:44:19 I hope we will be able to focus more on stabilizing uwsgi jobs in next cycle :) 16:44:37 unit test failures look pretty high - 40% to 50% today 16:45:25 njohnston: correct 16:45:43 but again, please note that it's "just" 6 runs today 16:45:59 so it don't have to big problem 16:46:06 ok 16:46:08 :-) 16:46:27 and I think that I saw some patches with failures related to patch itself 16:46:33 so lets keep an eye on it :) 16:46:42 ok? 16:47:05 sounds good. We'll have better data later - I see 10 neutron jobs in queue right now 16:47:20 agree 16:47:38 FYI for those that don't know, you can put "neutron" in http://zuul.openstack.org/status to see all neutron jobs 16:48:08 btw, test_get_devices_info_veth_different_namespaces is a problem now 16:48:24 I see many CI jobs failing because of this 16:48:33 do we have a new pyroute2 version? 16:48:41 thx njohnston 16:48:57 ok, so as ralonsoh started with it, let's move to next topic :) 16:48:59 #topic fullstack/functional 16:49:19 so yes, I need to check what is happening with this test 16:49:20 and indeed I saw this test failing today also, first I though it is maybe related to the patch on which it was run 16:49:27 I'm on it now 16:49:33 but now it's clear it's some bigger issue 16:49:46 two examples of failure: 16:49:48 https://a23f52ac6d169d81429a-a52e23b005b6607e27c6770fa63e26fe.ssl.cf1.rackcdn.com/679462/1/gate/neutron-functional/6d6a4c1/testr_results.html.gz 16:49:50 https://e33ddd780e29e3545bf9-6c7fec3fffbf24afb7394804bcdecfae.ssl.cf5.rackcdn.com/679399/6/check/neutron-functional/bc96527/testr_results.html.gz 16:50:05 yes, at least is consistent 16:50:36 ohh, so it's now failing 100% times? 16:50:45 yes 16:50:57 ralonsoh: will You report bug for it? 16:51:01 yes 16:51:04 thx 16:51:31 #action ralonsoh to report bug and investigate failing test_get_devices_info_veth_different_namespaces functional test 16:51:45 please set it as critical :) 16:51:52 ok 16:52:11 thx 16:52:28 as I said before, I also saw 2 other failures in functional tests today 16:52:41 1. neutron.tests.functional.agent.test_firewall.FirewallTestCase.test_rule_ordering_correct 16:52:47 https://019ab552bc17f89947ce-f1e24edd0ae51a8de312c1bf83189630.ssl.cf2.rackcdn.com/670177/7/check/neutron-functional-python27/74e7c20/testr_results.html.gz 16:52:51 I saw it first time 16:53:00 did You maybe got something similar before? 16:53:16 no, first time 16:53:52 ok, that's because we need https://review.opendev.org/#/c/679428/ 16:54:41 ok, I will check tomorrow exactly if this failed test_rule_ordering_correct test wasn't related to patch on which it was running 16:54:54 #action slaweq to check reason of failure neutron.tests.functional.agent.test_firewall.FirewallTestCase.test_rule_ordering_correct 16:55:17 ralonsoh: we need https://review.opendev.org/#/c/679428/ to fix issue with test_get_devices_info_veth_different_namespaces ? 16:55:20 or for what? 16:55:23 no no 16:55:34 the last one, test_rule_ordering_correct 16:55:50 the error in this test 16:55:51 File "neutron/agent/linux/ip_lib.py", line 941, in list_namespace_pids 16:55:51 return privileged.list_ns_pids(namespace) 16:55:55 ahh, ok 16:55:57 :) 16:56:05 right 16:56:27 mlavalle: njohnston: if You will have some time, please review https://review.opendev.org/#/c/679428/ :) 16:56:37 slaweq ralonsoh: +2+w 16:56:43 thanks! 16:56:46 njohnston: thx :) 16:56:49 You're fast 16:56:55 he is indded 16:57:05 LOL 16:57:16 ok, and the second failed test which I saw: 16:57:17 neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_spawns_conflicting_pid_vrrp_subprocess 16:57:29 https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_66/677166/11/check/neutron-functional-python27/864c837/testr_results.html.gz 16:57:55 but I think I already saw something similar before 16:58:04 same problem 16:58:20 ahh, right 16:58:33 different stack trace but there is list_ns_pids in it too :) 16:58:52 ok, so we should be better in functinal tests with Your patch 16:59:13 we are almost out of time 16:59:20 so quickly, one last thing for today 16:59:22 #topic Tempest/Scenario 16:59:30 Recently I noticed that we are testing all jobs with MySQL 5.7 16:59:32 So I asked on ML about Mariadb: http://lists.openstack.org/pipermail/openstack-discuss/2019-August/008925.html 16:59:34 And I will need to add periodic job with mariadb to Neutron 16:59:41 are You ok with such job? 16:59:43 +1 16:59:44 sure 17:00:13 mlavalle? I hope You are fine too with such job :) 17:00:25 I'm ok 17:00:28 #action slaweq to add mariadb periodic job 17:00:36 thx 17:00:41 so we are out of time now 17:00:43 bye! 17:00:44 thx for attending 17:00:47 o/ 17:00:47 #endmeeting