16:00:40 <slaweq> #startmeeting neutron_ci 16:00:41 <openstack> Meeting started Tue Jul 30 16:00:40 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:42 <slaweq> hi again 16:00:44 <openstack> The meeting name has been set to 'neutron_ci' 16:00:46 <ralonsoh> hi 16:00:46 <mlavalle> o/ 16:01:31 <slaweq> I know that haleyb and bcafarel will be late so I think we can start 16:01:40 <slaweq> I hope njohnston will join us soon :) 16:01:47 <openstack> slaweq: Error: Can't start another meeting, one is in progress. Use #endmeeting first. 16:01:53 <slaweq> #undo 16:01:55 <slaweq> #topic Actions from previous meetings 16:02:05 <slaweq> first one: 16:02:07 <slaweq> mlavalle to report bug with router migrations 16:02:24 <mlavalle> I didn't report the bug but I started working on fixing it 16:02:32 <slaweq> :) 16:02:33 <mlavalle> I'll report it today 16:02:37 <slaweq> thx a lot 16:02:54 <slaweq> do You have any ideas what is the root cause of this failure? 16:03:07 <mlavalle> haven't got to the root yet 16:03:23 <slaweq> ok, so please report it this week that we can track it 16:03:28 <slaweq> #action mlavalle to report bug with router migrations 16:03:33 <mlavalle> but the problem is that the tests are failing because once the router is updated with... 16:03:50 <mlavalle> admin_state_up False 16:04:29 <mlavalle> the router service ports (at least the one used for the interface) never gets down 16:04:57 <slaweq> I remember we had such issue with some kind of routers in the past already 16:04:57 <mlavalle> I can see it being removed from the hypervisor where it was originally scheduled 16:05:29 <mlavalle> but the server doesn't catch it 16:05:39 <mlavalle> that's where I am right now 16:06:13 <slaweq> ok, I think You should look into ovs-agent maybe as this agent is IMHO responsible for updating port status to DOWN or UP 16:06:32 <mlavalle> yeap, that's where I am looking presently 16:06:38 <slaweq> great 16:06:40 <slaweq> thx mlavalle 16:07:12 <slaweq> ok, lets move on 16:07:16 <slaweq> next action item 16:07:18 <slaweq> ralonsoh to report bug with qos scenario test failures 16:07:38 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1838068 16:07:39 <openstack> Launchpad bug 1838068 in neutron ""QoSTest:test_qos_basic_and_update" failing in DVR node scenario" [Undecided,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:07:47 <ralonsoh> and patch: #link https://review.opendev.org/#/c/673023/ 16:08:00 <ralonsoh> in 20 secs: 16:08:34 <ralonsoh> force to stop the ns process, close the socket from the test machine and set a socket timeout, to recheck it again if there is still time 16:08:45 <ralonsoh> that;s all 16:09:26 <bcafarel> late hi o/ (as promised) 16:09:35 <slaweq> I hope this will help ralonsoh :) 16:09:38 <slaweq> thx for the patch 16:09:47 <ralonsoh> no problem! 16:09:49 <slaweq> btw. I forgot at the beginning: http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1 16:09:55 <slaweq> please open it to be ready later :) 16:10:06 <slaweq> ok, next one 16:10:09 <slaweq> slaweq to take a look at issue with dvr and metadata: https://bugs.launchpad.net/neutron/+bug/1830763 16:10:10 <openstack> Launchpad bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:10:20 <slaweq> I did, my findings are in https://bugs.launchpad.net/neutron/+bug/1830763/comments/13 and I proposed patch https://review.opendev.org/#/c/673331/ 16:10:51 <slaweq> long story short: I found out that there is race condition and sometimes one L3 agent can have created 2 "floating ip agent gateway" ports for network 16:11:24 <slaweq> and that cause later error in L3 agent during configuration of one of routers and metadata is not reachable in this router 16:11:54 <slaweq> so workaround which I proposed now should help to solve this problem in gate as we are always using only single controller node there 16:12:12 <slaweq> and this can be also backported to stable branches if needed 16:12:38 <slaweq> but proper fix will IMO require some db changes to provide correct constraint on db level for that kind of ports 16:12:43 <slaweq> I will work on it later 16:13:45 <mlavalle> so constraint the db and if when creating the gateway you get a duplicate ignore? 16:13:57 <njohnston> o/ sorry I am late 16:15:05 <slaweq> mlavalle: basically yes, something like that 16:15:19 <mlavalle> ack 16:16:28 <slaweq> ok, next one 16:16:29 <slaweq> ralonsoh to try a patch to resuce the number of workers in FT 16:17:05 <ralonsoh> slaweq, I've seen that the problems we have now in zuul is lower than 2/3 weeks ago 16:17:17 <ralonsoh> and this patch will slow down the FT execution 16:17:22 <ralonsoh> can we hold this patch? 16:17:38 <slaweq> ralonsoh: so do You think that we should just wait to see how it will be in the future? 16:17:43 <ralonsoh> yes 16:18:07 <slaweq> +1 for that, I also didn't saw many of such issues last week 16:18:09 <ralonsoh> reducing the number of workers (from 8 to 7) will reduce a lot the speed of FT executiuon 16:18:37 <slaweq> ralonsoh: do You know by how much it will slow down the job? 16:18:56 <ralonsoh> almost proportionally to the core reduction 16:19:20 <ralonsoh> in this case, 12,5% 16:20:03 <slaweq> so second question: do You know by how much it may improve stability of tests? :) 16:20:22 <ralonsoh> slaweq, I can't answer this question 16:20:34 <slaweq> ralonsoh: I though that :) 16:20:52 <slaweq> ok, lets maybe keep it as our last possible thing to do 16:21:08 <slaweq> thx ralonsoh for checking that 16:21:11 <slaweq> next one 16:21:12 <ralonsoh> np! 16:21:12 <slaweq> ralonsoh to report a bug and investigate failed test neutron.tests.fullstack.test_qos.TestMinBwQoSOvs.test_bw_limit_qos_port_removed 16:21:30 <ralonsoh> that's the previous one 16:22:08 <ralonsoh> nope, my bad 16:22:16 <ralonsoh> no sorry, I didn't have time for this one 16:22:26 <slaweq> ok, no problem 16:22:35 <slaweq> can I assign it to You for next week than? 16:22:44 <slaweq> just to report it at least :) 16:22:46 <ralonsoh> I hope I have time, yes 16:22:50 <slaweq> thx 16:23:03 <slaweq> #action ralonsoh to report a bug about failed test neutron.tests.fullstack.test_qos.TestMinBwQoSOvs.test_bw_limit_qos_port_removed 16:23:08 <slaweq> thx ralonsoh 16:23:14 <slaweq> ok, that's all from last week 16:23:19 <slaweq> any questions/comments? 16:24:06 <slaweq> ok, so lets move on 16:24:15 <slaweq> #topic Stadium projects 16:24:23 <slaweq> Python 3 migration 16:24:25 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:24:38 <njohnston> I think we covered that really well in the neutron team meeting 16:24:39 <slaweq> we already discussed that on neutron meeting today 16:24:44 <slaweq> right njohnston 16:24:50 <njohnston> slaweq++ 16:24:54 <slaweq> :) 16:25:03 <slaweq> so lets move quickly to second part of this topic 16:25:05 <slaweq> tempest-plugins migration 16:25:07 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:25:15 <slaweq> any progress on this? 16:26:19 <njohnston> I'll update the second part of the fwaas change today; escalations got the better of me this week 16:27:03 <slaweq> sure, thx njohnston 16:27:18 <slaweq> I know that tidwellr is also making some progress on neutron-dynamic-routing recently 16:27:40 <bcafarel> I saw some recent updates by tidwellr on https://review.opendev.org/#/c/652099/ (though it's still in zuul checks) 16:27:52 <slaweq> yep, it is 16:27:57 <mlavalle> and I will try to make progress with vpn 16:28:46 <slaweq> so we are covered on this topic and I hope we will be ready with this at the end of T cycle 16:28:54 <bcafarel> not sure if it got already in latest revisions, but we should make sure all these new moved plugins have a config switch to enable/disable them 16:29:06 <bcafarel> (as added for the 3 completed ones for 0.4.0 release) 16:29:15 <slaweq> bcafarel: good point 16:30:42 <slaweq> ok, I think we can move on to the next topic then 16:30:44 <slaweq> #topic Grafana 16:30:51 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:32:04 <slaweq> we didn't have many commits in gate queue recently so there is no data in last few days there 16:32:14 <slaweq> but lets look at check queue graphs 16:32:28 <njohnston> interesting that in the last couple of hours there is a spike in failures across multiple charts in the check queue. I wonder if someone is just pushing some really crappy changes. 16:33:25 <haleyb> i was nowhere the check queue :) 16:33:32 <njohnston> lol 16:34:35 <slaweq> njohnston: IMHO it is just getting back to normal as there was almost nothing running during the weekend 16:34:50 <njohnston> oh, that makes sense 16:35:10 <slaweq> njohnston: but that is only my theory - lets keep an eye on it for next days :) 16:35:18 <njohnston> sounds good 16:35:30 <haleyb> is the midonet co-gating job healthy? it's been 100% failure (non-voting) 16:35:40 <slaweq> haleyb: no, it's not 16:35:56 <njohnston> yeah, I think we mentioned that last week, yamamoto needs to take a look 16:36:15 <mlavalle> I think he also mentioned he doesn't have much time 16:37:32 <slaweq> I will take a look into this job this week and try to check if it's always the same test(s) which are failing or maybe various ones 16:37:37 <slaweq> and will report bug(s) for that 16:37:45 <slaweq> sounds good for You? 16:37:54 <mlavalle> yes 16:37:55 <haleyb> yes, good for me 16:38:15 <slaweq> #action slaweq to check midonet job and report bug(s) related to it 16:38:37 <slaweq> other than that I think we are in pretty good shape recently 16:39:36 <slaweq> any other questions/comments about grafana? 16:40:32 <mlavalle> niot from me 16:40:36 <slaweq> ok, so lets move on then 16:40:39 <slaweq> #topic fullstack/functional 16:41:00 <slaweq> I was looking into some recent patches looking for some failures and I found only few of them 16:41:06 <slaweq> first functional tests 16:41:11 <slaweq> http://logs.openstack.org/12/672612/4/check/neutron-functional/e357646/testr_results.html.gz 16:41:18 <slaweq> failure in neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase 16:41:52 <slaweq> but this looks like issue related to host load, and we talked about it with ralonsoh already 16:41:55 <slaweq> right ralonsoh? 16:42:02 <ralonsoh> I think so 16:42:15 <ralonsoh> yes, that's right 16:42:32 <slaweq> and second issue which I found: 16:42:34 <slaweq> neutron.tests.functional.services.trunk.drivers.openvswitch.agent.test_trunk_manager.TrunkManagerTestCase 16:42:38 <slaweq> http://logs.openstack.org/03/670203/10/check/neutron-functional/80d0831/testr_results.html.gz 16:43:35 <slaweq> looking at logs from this test: http://logs.openstack.org/03/670203/10/check/neutron-functional/80d0831/controller/logs/dsvm-functional-logs/neutron.tests.functional.services.trunk.drivers.openvswitch.agent.test_trunk_manager.TrunkManagerTestCase.test_connectivity.txt.gz 16:43:42 <slaweq> I don't see anything obvious 16:44:47 <slaweq> but as it happend only once, lets just keep an eye on it for now 16:44:50 <slaweq> do You agree? 16:44:54 <njohnston> yes 16:44:54 <ralonsoh> that's curious: the test is claiming that the ping process was not spawned, but it was 16:44:55 <mlavalle> ++ 16:46:04 <slaweq> ralonsoh: true 16:46:31 <slaweq> that is strange 16:47:30 <slaweq> if I will have some time, I will take deeper look into this 16:47:47 <slaweq> maybe at least to add some more logs which will help debugging such issues in the future 16:48:39 <slaweq> any other issues related to functional tests? 16:48:43 <slaweq> or can we continue? 16:49:45 <slaweq> ok, lets move on 16:49:54 <slaweq> I don't have anything new related to fullstack for today 16:49:58 <slaweq> so next topic 16:50:03 <slaweq> #topic Tempest/Scenario 16:50:37 <slaweq> first of all, my patch to tempest https://review.opendev.org/#/c/672715/ is merged 16:51:00 <slaweq> so I hope it should be much better with SSH failured due to failing to get public-keys now 16:51:08 <njohnston> \o/ 16:51:21 <slaweq> if You will see such errors now, let me know - I will investigate again 16:51:38 <mlavalle> Thanks! 16:51:55 <slaweq> this should help for all jobs which inherits from devstack-tempest 16:52:07 <slaweq> so it will not solve problem in e.g. tripleo based jobs 16:52:15 <slaweq> (just saying :)) 16:52:30 <slaweq> but in neutron u/s gate we should be much better now I hope 16:52:33 <slaweq> ok 16:52:48 <slaweq> from other things I spotted one new error in API tests: 16:52:55 <slaweq> http://logs.openstack.org/30/670930/3/check/neutron-tempest-plugin-api/5a731da/testr_results.html.gz 16:53:02 <slaweq> it is failure in neutron_tempest_plugin.api.test_port_forwardings.PortForwardingTestJSON 16:53:19 <slaweq> looks like issue in test for me, I will report bug and work on it 16:53:28 <slaweq> ok for You? 16:53:52 <mlavalle> sure 16:54:17 <slaweq> thx 16:54:33 <slaweq> #action slaweq to report and try to fix bug in neutron_tempest_plugin.api.test_port_forwardings.PortForwardingTestJSON 16:54:38 <slaweq> ok 16:54:46 <slaweq> and that's all from my side for today 16:54:59 <slaweq> anything else You want to discuss today? 16:55:04 <mlavalle> not from me 16:55:22 <bcafarel> catching up on the recent activity, so nothing from me eiher :) 16:55:48 <slaweq> ok, thx for attending 16:55:50 <bcafarel> nice recent findings btw (race condition, memcached in nova, ...) 16:55:54 <slaweq> and have a nice week 16:55:58 <slaweq> thx bcafarel :) 16:56:03 <slaweq> o/ 16:56:07 <slaweq> #endmeeting