16:00:16 #startmeeting neutron_ci 16:00:16 Meeting started Tue Jun 18 16:00:16 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:20 The meeting name has been set to 'neutron_ci' 16:00:21 hi (again) 16:00:22 o/ 16:00:27 o/ 16:00:55 Agenda for the meeting is on https://github.com/openstack/neutron/tree/master/neutron 16:01:04 #undo 16:01:12 Agenda for the meeting is on https://etherpad.openstack.org/p/neutron-ci-meetings 16:01:12 hi 16:01:14 hi 16:01:54 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:02:09 #topic Actions from previous meetings 16:02:19 we have only one action from previous week 16:02:26 mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) reproducing most common failure: test_connectivity_through_2_routers 16:02:27 bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel) 16:02:38 I am currently working on this 16:03:13 we recently had similar (maybe same) issue in our D/S CI on release based on Stein 16:03:15 just 10 minutes ago, I was able to get a succesful execution of that test case in my local environment, with a pdb break at the end of the test 16:04:23 sorry I may have spotty connection, will try to follow 16:04:24 and it is easy to reproduce a failure, where I will also set a pdb break before it tears down. so now I will be able to run tcpdump and compare ip tables to see where the problem is 16:04:34 that's the status at this point 16:05:11 great - I may ping You about this during this week to see if You found something and if we can use it in our d/s :) 16:05:24 yeap, cool 16:05:33 thx a lot for working on it mlavalle :) 16:05:46 #action mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) reproducing most common failure: test_connectivity_through_2_routers 16:05:47 bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel) 16:05:57 I will assign it for next week just as a reminder, ok? 16:06:03 of course 16:06:05 thx 16:06:20 keep the pressure on that guy 16:06:29 sure :) 16:06:47 anything else You want to add/ask regarding actions from last week? 16:07:04 not me 16:07:26 ok, lets move on then 16:07:30 #topic Stadium projects 16:07:39 Python 3 migration 16:07:41 Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:08:33 Train-2 is when I was planning on getting really serious about this, so I am gearing up to work on it later this week 16:08:46 thx njohnston 16:09:03 I also started sending some patches and I will try to continue this during next weeks 16:09:47 actually looking at etherpad there is no many projects left to switch 16:10:08 \o/ 16:10:24 this week I will push neutron-fwaas patches 16:10:45 #action slaweq to send patch to switch neutron-fwaas to python 3 16:10:56 ^^ this way I will have more pressure to do it :) 16:11:15 yeah let's also keep p[ressure on that other lazy guy 16:11:21 LOL 16:11:26 +1 16:11:43 but I think I'm the only lazy guy here :P 16:12:00 ok, lets move on then 16:12:02 tempest-plugins migration 16:12:04 Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:13:15 I'll take another look at fwaas later this week as well 16:13:32 here we are good with networking-sfc (thx bcafarel) and almost done with networking-bgpvpn 16:14:03 we still need to do vpnaas, fwaas and networking-dynamic-routing 16:14:11 I have slacked off with vpnaas 16:14:19 hoping to get back to it soon 16:14:49 sure mlavalle - that is not urgent for sure - more like "nice to have" :) 16:15:15 I would rather want to finish python 3 transition first if I would have to choose 16:16:14 anything else anyone wants to add regarding to stadium projects? 16:16:24 not me 16:16:45 I wonder if once we're done we could doublecheck the migration by adding a DNM change that deletes the python 2.7 version and see how all our jobs do 16:17:26 python 2.7 version of what? of python binaries on host? 16:17:31 yes 16:18:01 just to make absolutely sure we have everything 16:18:09 njohnston: yes, that is good idea and we can also do it for neutron jobs 16:19:06 njohnston: can You add such note on the top of the etherpad maybe? 16:19:13 to not forget about it 16:19:36 sure thing! 16:19:40 thx njohnston 16:19:45 ok, lets move on 16:19:48 next topic 16:19:50 #topic Grafana 16:19:52 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:20:37 during last couple of days our check queue was broken (neutron-tempest-plugin-scenario-linuxbridge job) so there is not too much in gate queue graphs 16:21:10 now it's getting back to normal thx to fix from liuyulong 16:21:37 also rally job was failing quite often 16:21:49 becasue of server creation? 16:21:55 yes, and that should be now fixed with https://review.opendev.org/#/c/665614/ 16:22:16 and it's going back to low numbers today 16:23:00 cool 16:23:24 from other things, networking-ovn job is almost switched to be voting - njohnston can You update dashboard when it will be merged in neutron? 16:23:35 will do 16:23:45 thx a lot 16:23:49 #action njohnston update dashboard when ovn job becomes voting 16:24:28 and last note from me about grafana - still functional and fullstack jobs are quite unstable :/ 16:25:06 I have couple of examples ready for latter part of the meeting :) 16:25:32 anything else strange/worth to discuss do You see on grafana? 16:25:53 not from me 16:26:48 ok, so lets talk about this unstable jobs then 16:26:50 #topic fullstack/functional 16:26:54 first functional tests 16:27:16 the most often during last week I saw failed neutron.tests.functional.agent.linux.test_ip_lib.IpMonitorTestCase tests, like: 16:27:22 http://logs.openstack.org/36/662236/3/check/neutron-functional/3214fdd/testr_results.html.gz 16:27:32 sorry for that 16:27:33 https://review.opendev.org/#/c/664889/ 16:27:35 but that should be fixed with patch from ralonsoh https://review.opendev.org/#/c/664889/ 16:27:46 ralonsoh: You don't need to sorry :) 16:28:00 thx for quick fix 16:28:35 other than that I saw some failures which happend once (at least I didn't found more examples) 16:28:44 e.g. tests from module neutron.tests.functional.test_server 16:28:50 http://logs.openstack.org/12/640812/2/check/neutron-functional/35dc53f/testr_results.html.gz 16:30:27 looking at logs from those tests, I see http://logs.openstack.org/12/640812/2/check/neutron-functional/35dc53f/controller/logs/dsvm-functional-logs/neutron.tests.functional.test_server.TestWsgiServer.test_restart_wsgi_on_sighup_multiple_workers.txt.gz#_2019-06-17_20_37_09_085 16:30:43 do You know if it is normal to send SIG_UNBLOCK there? 16:31:35 I don't know 16:31:42 ok, in other test (which passed) it's exactly the same 16:32:45 any ideas about why it could fail? 16:34:19 hard to say 16:34:30 without digging deeper 16:34:39 I know :/ 16:34:49 if that will happen again, I will open bug for it 16:35:21 where is the signal coming from? 16:35:46 it's from test: https://github.com/openstack/neutron/blob/master/neutron/tests/functional/test_server.py#L133 16:35:58 it failed exactly in L159 16:36:57 ahh, it's waiting 16:37:09 yes, 5 seconds 16:37:16 should be enough IMO 16:37:29 yes, looks like plenty 16:37:46 it takes 8 secs to stop the process 16:37:55 from 37:09 to 37:17 16:38:25 ralonsoh: so do You think it could be "just" overloaded node? 16:38:35 maybe, I'll take a look at this one 16:38:45 I'll open a low priority bug, just to track it 16:38:46 thx ralonsoh 16:38:47 ok? 16:38:53 sure, that is good idea 16:39:16 next one on my list is: neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase 16:39:23 http://logs.openstack.org/48/665548/4/check/neutron-functional/f6d6447/testr_results.html.gz 16:40:20 not only this one but I've seen this error in other tests 16:40:29 the namespace not created 16:40:38 logs from this one are in: 16:40:40 http://logs.openstack.org/48/665548/4/check/neutron-functional/f6d6447/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_spawns_conflicting_pid_base_process.txt.gz#_2019-06-17_10_02_30_829 16:42:02 Just a guess, but maybe is something in pyroute, I'll review the latest patches 16:42:31 so I will report it as a bug also to track those issues 16:42:41 ok? 16:42:44 +1 16:43:14 what is strange is that many commands in this namespace was executed earlier: http://logs.openstack.org/48/665548/4/check/neutron-functional/f6d6447/controller/logs/journal_log.txt.gz#_Jun_17_10_01_41 16:44:12 I will open bug and try to look deeper into this 16:44:35 #action slaweq to open bug related to missing namespace issue in functional tests 16:44:56 #action ralonsoh to open bug related to failed test_server functional tests 16:45:16 and last one on my list 16:45:18 neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase 16:45:30 http://logs.openstack.org/40/665640/4/check/neutron-functional-python27/e1df845/testr_results.html.gz 16:46:08 in logs for this test there is not too much http://logs.openstack.org/40/665640/4/check/neutron-functional-python27/e1df845/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_protection_dead_reference_removal.txt.gz 16:46:10 :/ 16:46:42 and it should be more there 16:48:18 slaweq, maybe (I need to check better those tests cases) we should execute them in temp namespaces 16:48:31 slaweq, I don't think we are doing this now 16:48:38 (if it's possible) 16:48:52 ralonsoh: but You mean to run "tox" in namespace? 16:48:57 no no 16:49:07 like in the ip_lib commands 16:49:15 create everything in a namespace 16:49:24 a temp one, to avoid interferences 16:49:31 ahh, ok 16:49:36 is the best way to isolate a test case 16:49:39 yes, we should probably 16:49:42 (if possible) 16:49:49 ok I'll take this one 16:49:51 that makes sense 16:50:01 but are You talking about this linuxbridge arp tests? or the previous one? 16:50:11 LB arp 16:50:22 yes, that we should run in namespaces if possible IMO 16:50:59 ralonsoh: so You will take care of it, right? 16:51:00 I'll create a bug for this 16:51:02 yes 16:51:19 great, thx a lot 16:51:31 so that would be all regarding functional tests from me 16:51:43 do You have anything else? 16:51:47 not me 16:52:13 ok, lets quickly go through fullstack issues 16:52:22 first one (I spotted it at least twice) 16:52:24 neutron.tests.fullstack.test_l3_agent.TestLegacyL3Agent.test_north_south_traffic 16:52:29 http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/testr_results.html.gz 16:52:31 http://logs.openstack.org/29/664629/1/check/neutron-fullstack/e99aa1c/testr_results.html.gz 16:53:52 there is a lot of errors in ovs agent logs: http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--01-52-58-123455_log.txt.gz?level=ERROR 16:55:36 CRITICAL neutron [req-c0ac0a6c-e65a-46c3-a123-82239299ec08 - - - - -] Unhandled error: RuntimeError: No datapath_id on bridge br-ethaa0e64ff9 16:55:56 and the same errors in second case http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--01-52-58-123455_log.txt.gz?level=ERROR 16:56:06 sorry, this is the same link as above 16:56:42 http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--01-52-58-123455_log.txt.gz?#_2019-06-18_01_53_52_665 16:56:46 http://logs.openstack.org/29/664629/1/check/neutron-fullstack/e99aa1c/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--02-04-44-968982_log.txt.gz?level=ERROR 16:56:56 there is log from second example 16:57:00 and the same error 16:57:03 so right before that the bridge was recreated 16:57:37 anyone wants to volunteer and check this issue deeper? 16:57:52 looks like that can be "real" bug not only test issue 16:58:03 *might be 16:58:44 too much in my plate right now. maybe next week 16:58:55 ok, so I will report this bug for now and we will see later 16:59:02 +1 16:59:11 mlavalle: sure, I also will not have time to work on it this week 16:59:34 #action slaweq to report bug regarding failing neutron.tests.fullstack.test_l3_agent.TestLegacyL3Agent.test_north_south_traffic tests 16:59:44 ok, and I think we are out of time now 16:59:48 thx for attending 16:59:51 o/ 16:59:52 see You next week 16:59:54 o/ 16:59:56 o/ 16:59:56 #endmeeting