16:00:40 #startmeeting neutron_ci 16:00:41 Meeting started Tue Jul 30 16:00:40 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:42 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:42 hi again 16:00:44 The meeting name has been set to 'neutron_ci' 16:00:46 hi 16:00:46 o/ 16:01:31 I know that haleyb and bcafarel will be late so I think we can start 16:01:40 I hope njohnston will join us soon :) 16:01:47 slaweq: Error: Can't start another meeting, one is in progress. Use #endmeeting first. 16:01:53 #undo 16:01:55 #topic Actions from previous meetings 16:02:05 first one: 16:02:07 mlavalle to report bug with router migrations 16:02:24 I didn't report the bug but I started working on fixing it 16:02:32 :) 16:02:33 I'll report it today 16:02:37 thx a lot 16:02:54 do You have any ideas what is the root cause of this failure? 16:03:07 haven't got to the root yet 16:03:23 ok, so please report it this week that we can track it 16:03:28 #action mlavalle to report bug with router migrations 16:03:33 but the problem is that the tests are failing because once the router is updated with... 16:03:50 admin_state_up False 16:04:29 the router service ports (at least the one used for the interface) never gets down 16:04:57 I remember we had such issue with some kind of routers in the past already 16:04:57 I can see it being removed from the hypervisor where it was originally scheduled 16:05:29 but the server doesn't catch it 16:05:39 that's where I am right now 16:06:13 ok, I think You should look into ovs-agent maybe as this agent is IMHO responsible for updating port status to DOWN or UP 16:06:32 yeap, that's where I am looking presently 16:06:38 great 16:06:40 thx mlavalle 16:07:12 ok, lets move on 16:07:16 next action item 16:07:18 ralonsoh to report bug with qos scenario test failures 16:07:38 #link https://bugs.launchpad.net/neutron/+bug/1838068 16:07:39 Launchpad bug 1838068 in neutron ""QoSTest:test_qos_basic_and_update" failing in DVR node scenario" [Undecided,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:07:47 and patch: #link https://review.opendev.org/#/c/673023/ 16:08:00 in 20 secs: 16:08:34 force to stop the ns process, close the socket from the test machine and set a socket timeout, to recheck it again if there is still time 16:08:45 that;s all 16:09:26 late hi o/ (as promised) 16:09:35 I hope this will help ralonsoh :) 16:09:38 thx for the patch 16:09:47 no problem! 16:09:49 btw. I forgot at the beginning: http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1 16:09:55 please open it to be ready later :) 16:10:06 ok, next one 16:10:09 slaweq to take a look at issue with dvr and metadata: https://bugs.launchpad.net/neutron/+bug/1830763 16:10:10 Launchpad bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:10:20 I did, my findings are in https://bugs.launchpad.net/neutron/+bug/1830763/comments/13 and I proposed patch https://review.opendev.org/#/c/673331/ 16:10:51 long story short: I found out that there is race condition and sometimes one L3 agent can have created 2 "floating ip agent gateway" ports for network 16:11:24 and that cause later error in L3 agent during configuration of one of routers and metadata is not reachable in this router 16:11:54 so workaround which I proposed now should help to solve this problem in gate as we are always using only single controller node there 16:12:12 and this can be also backported to stable branches if needed 16:12:38 but proper fix will IMO require some db changes to provide correct constraint on db level for that kind of ports 16:12:43 I will work on it later 16:13:45 so constraint the db and if when creating the gateway you get a duplicate ignore? 16:13:57 o/ sorry I am late 16:15:05 mlavalle: basically yes, something like that 16:15:19 ack 16:16:28 ok, next one 16:16:29 ralonsoh to try a patch to resuce the number of workers in FT 16:17:05 slaweq, I've seen that the problems we have now in zuul is lower than 2/3 weeks ago 16:17:17 and this patch will slow down the FT execution 16:17:22 can we hold this patch? 16:17:38 ralonsoh: so do You think that we should just wait to see how it will be in the future? 16:17:43 yes 16:18:07 +1 for that, I also didn't saw many of such issues last week 16:18:09 reducing the number of workers (from 8 to 7) will reduce a lot the speed of FT executiuon 16:18:37 ralonsoh: do You know by how much it will slow down the job? 16:18:56 almost proportionally to the core reduction 16:19:20 in this case, 12,5% 16:20:03 so second question: do You know by how much it may improve stability of tests? :) 16:20:22 slaweq, I can't answer this question 16:20:34 ralonsoh: I though that :) 16:20:52 ok, lets maybe keep it as our last possible thing to do 16:21:08 thx ralonsoh for checking that 16:21:11 next one 16:21:12 np! 16:21:12 ralonsoh to report a bug and investigate failed test neutron.tests.fullstack.test_qos.TestMinBwQoSOvs.test_bw_limit_qos_port_removed 16:21:30 that's the previous one 16:22:08 nope, my bad 16:22:16 no sorry, I didn't have time for this one 16:22:26 ok, no problem 16:22:35 can I assign it to You for next week than? 16:22:44 just to report it at least :) 16:22:46 I hope I have time, yes 16:22:50 thx 16:23:03 #action ralonsoh to report a bug about failed test neutron.tests.fullstack.test_qos.TestMinBwQoSOvs.test_bw_limit_qos_port_removed 16:23:08 thx ralonsoh 16:23:14 ok, that's all from last week 16:23:19 any questions/comments? 16:24:06 ok, so lets move on 16:24:15 #topic Stadium projects 16:24:23 Python 3 migration 16:24:25 Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:24:38 I think we covered that really well in the neutron team meeting 16:24:39 we already discussed that on neutron meeting today 16:24:44 right njohnston 16:24:50 slaweq++ 16:24:54 :) 16:25:03 so lets move quickly to second part of this topic 16:25:05 tempest-plugins migration 16:25:07 Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:25:15 any progress on this? 16:26:19 I'll update the second part of the fwaas change today; escalations got the better of me this week 16:27:03 sure, thx njohnston 16:27:18 I know that tidwellr is also making some progress on neutron-dynamic-routing recently 16:27:40 I saw some recent updates by tidwellr on https://review.opendev.org/#/c/652099/ (though it's still in zuul checks) 16:27:52 yep, it is 16:27:57 and I will try to make progress with vpn 16:28:46 so we are covered on this topic and I hope we will be ready with this at the end of T cycle 16:28:54 not sure if it got already in latest revisions, but we should make sure all these new moved plugins have a config switch to enable/disable them 16:29:06 (as added for the 3 completed ones for 0.4.0 release) 16:29:15 bcafarel: good point 16:30:42 ok, I think we can move on to the next topic then 16:30:44 #topic Grafana 16:30:51 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:32:04 we didn't have many commits in gate queue recently so there is no data in last few days there 16:32:14 but lets look at check queue graphs 16:32:28 interesting that in the last couple of hours there is a spike in failures across multiple charts in the check queue. I wonder if someone is just pushing some really crappy changes. 16:33:25 i was nowhere the check queue :) 16:33:32 lol 16:34:35 njohnston: IMHO it is just getting back to normal as there was almost nothing running during the weekend 16:34:50 oh, that makes sense 16:35:10 njohnston: but that is only my theory - lets keep an eye on it for next days :) 16:35:18 sounds good 16:35:30 is the midonet co-gating job healthy? it's been 100% failure (non-voting) 16:35:40 haleyb: no, it's not 16:35:56 yeah, I think we mentioned that last week, yamamoto needs to take a look 16:36:15 I think he also mentioned he doesn't have much time 16:37:32 I will take a look into this job this week and try to check if it's always the same test(s) which are failing or maybe various ones 16:37:37 and will report bug(s) for that 16:37:45 sounds good for You? 16:37:54 yes 16:37:55 yes, good for me 16:38:15 #action slaweq to check midonet job and report bug(s) related to it 16:38:37 other than that I think we are in pretty good shape recently 16:39:36 any other questions/comments about grafana? 16:40:32 niot from me 16:40:36 ok, so lets move on then 16:40:39 #topic fullstack/functional 16:41:00 I was looking into some recent patches looking for some failures and I found only few of them 16:41:06 first functional tests 16:41:11 http://logs.openstack.org/12/672612/4/check/neutron-functional/e357646/testr_results.html.gz 16:41:18 failure in neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase 16:41:52 but this looks like issue related to host load, and we talked about it with ralonsoh already 16:41:55 right ralonsoh? 16:42:02 I think so 16:42:15 yes, that's right 16:42:32 and second issue which I found: 16:42:34 neutron.tests.functional.services.trunk.drivers.openvswitch.agent.test_trunk_manager.TrunkManagerTestCase 16:42:38 http://logs.openstack.org/03/670203/10/check/neutron-functional/80d0831/testr_results.html.gz 16:43:35 looking at logs from this test: http://logs.openstack.org/03/670203/10/check/neutron-functional/80d0831/controller/logs/dsvm-functional-logs/neutron.tests.functional.services.trunk.drivers.openvswitch.agent.test_trunk_manager.TrunkManagerTestCase.test_connectivity.txt.gz 16:43:42 I don't see anything obvious 16:44:47 but as it happend only once, lets just keep an eye on it for now 16:44:50 do You agree? 16:44:54 yes 16:44:54 that's curious: the test is claiming that the ping process was not spawned, but it was 16:44:55 ++ 16:46:04 ralonsoh: true 16:46:31 that is strange 16:47:30 if I will have some time, I will take deeper look into this 16:47:47 maybe at least to add some more logs which will help debugging such issues in the future 16:48:39 any other issues related to functional tests? 16:48:43 or can we continue? 16:49:45 ok, lets move on 16:49:54 I don't have anything new related to fullstack for today 16:49:58 so next topic 16:50:03 #topic Tempest/Scenario 16:50:37 first of all, my patch to tempest https://review.opendev.org/#/c/672715/ is merged 16:51:00 so I hope it should be much better with SSH failured due to failing to get public-keys now 16:51:08 \o/ 16:51:21 if You will see such errors now, let me know - I will investigate again 16:51:38 Thanks! 16:51:55 this should help for all jobs which inherits from devstack-tempest 16:52:07 so it will not solve problem in e.g. tripleo based jobs 16:52:15 (just saying :)) 16:52:30 but in neutron u/s gate we should be much better now I hope 16:52:33 ok 16:52:48 from other things I spotted one new error in API tests: 16:52:55 http://logs.openstack.org/30/670930/3/check/neutron-tempest-plugin-api/5a731da/testr_results.html.gz 16:53:02 it is failure in neutron_tempest_plugin.api.test_port_forwardings.PortForwardingTestJSON 16:53:19 looks like issue in test for me, I will report bug and work on it 16:53:28 ok for You? 16:53:52 sure 16:54:17 thx 16:54:33 #action slaweq to report and try to fix bug in neutron_tempest_plugin.api.test_port_forwardings.PortForwardingTestJSON 16:54:38 ok 16:54:46 and that's all from my side for today 16:54:59 anything else You want to discuss today? 16:55:04 not from me 16:55:22 catching up on the recent activity, so nothing from me eiher :) 16:55:48 ok, thx for attending 16:55:50 nice recent findings btw (race condition, memcached in nova, ...) 16:55:54 and have a nice week 16:55:58 thx bcafarel :) 16:56:03 o/ 16:56:07 #endmeeting