16:00:16 <slaweq> #startmeeting neutron_ci 16:00:16 <openstack> Meeting started Tue Jun 18 16:00:16 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:20 <openstack> The meeting name has been set to 'neutron_ci' 16:00:21 <slaweq> hi (again) 16:00:22 <mlavalle> o/ 16:00:27 <njohnston> o/ 16:00:55 <slaweq> Agenda for the meeting is on https://github.com/openstack/neutron/tree/master/neutron 16:01:04 <slaweq> #undo 16:01:12 <slaweq> Agenda for the meeting is on https://etherpad.openstack.org/p/neutron-ci-meetings 16:01:12 <bcafarel> hi 16:01:14 <ralonsoh> hi 16:01:54 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:02:09 <slaweq> #topic Actions from previous meetings 16:02:19 <slaweq> we have only one action from previous week 16:02:26 <slaweq> mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) reproducing most common failure: test_connectivity_through_2_routers 16:02:27 <openstack> bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel) 16:02:38 <mlavalle> I am currently working on this 16:03:13 <slaweq> we recently had similar (maybe same) issue in our D/S CI on release based on Stein 16:03:15 <mlavalle> just 10 minutes ago, I was able to get a succesful execution of that test case in my local environment, with a pdb break at the end of the test 16:04:23 <bcafarel> sorry I may have spotty connection, will try to follow 16:04:24 <mlavalle> and it is easy to reproduce a failure, where I will also set a pdb break before it tears down. so now I will be able to run tcpdump and compare ip tables to see where the problem is 16:04:34 <mlavalle> that's the status at this point 16:05:11 <slaweq> great - I may ping You about this during this week to see if You found something and if we can use it in our d/s :) 16:05:24 <mlavalle> yeap, cool 16:05:33 <slaweq> thx a lot for working on it mlavalle :) 16:05:46 <slaweq> #action mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) reproducing most common failure: test_connectivity_through_2_routers 16:05:47 <openstack> bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel) 16:05:57 <slaweq> I will assign it for next week just as a reminder, ok? 16:06:03 <mlavalle> of course 16:06:05 <slaweq> thx 16:06:20 <mlavalle> keep the pressure on that guy 16:06:29 <slaweq> sure :) 16:06:47 <slaweq> anything else You want to add/ask regarding actions from last week? 16:07:04 <mlavalle> not me 16:07:26 <slaweq> ok, lets move on then 16:07:30 <slaweq> #topic Stadium projects 16:07:39 <slaweq> Python 3 migration 16:07:41 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:08:33 <njohnston> Train-2 is when I was planning on getting really serious about this, so I am gearing up to work on it later this week 16:08:46 <slaweq> thx njohnston 16:09:03 <slaweq> I also started sending some patches and I will try to continue this during next weeks 16:09:47 <slaweq> actually looking at etherpad there is no many projects left to switch 16:10:08 <mlavalle> \o/ 16:10:24 <slaweq> this week I will push neutron-fwaas patches 16:10:45 <slaweq> #action slaweq to send patch to switch neutron-fwaas to python 3 16:10:56 <slaweq> ^^ this way I will have more pressure to do it :) 16:11:15 <mlavalle> yeah let's also keep p[ressure on that other lazy guy 16:11:21 <slaweq> LOL 16:11:26 <njohnston> +1 16:11:43 <slaweq> but I think I'm the only lazy guy here :P 16:12:00 <slaweq> ok, lets move on then 16:12:02 <slaweq> tempest-plugins migration 16:12:04 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:13:15 <njohnston> I'll take another look at fwaas later this week as well 16:13:32 <slaweq> here we are good with networking-sfc (thx bcafarel) and almost done with networking-bgpvpn 16:14:03 <slaweq> we still need to do vpnaas, fwaas and networking-dynamic-routing 16:14:11 <mlavalle> I have slacked off with vpnaas 16:14:19 <mlavalle> hoping to get back to it soon 16:14:49 <slaweq> sure mlavalle - that is not urgent for sure - more like "nice to have" :) 16:15:15 <slaweq> I would rather want to finish python 3 transition first if I would have to choose 16:16:14 <slaweq> anything else anyone wants to add regarding to stadium projects? 16:16:24 <mlavalle> not me 16:16:45 <njohnston> I wonder if once we're done we could doublecheck the migration by adding a DNM change that deletes the python 2.7 version and see how all our jobs do 16:17:26 <slaweq> python 2.7 version of what? of python binaries on host? 16:17:31 <njohnston> yes 16:18:01 <njohnston> just to make absolutely sure we have everything 16:18:09 <slaweq> njohnston: yes, that is good idea and we can also do it for neutron jobs 16:19:06 <slaweq> njohnston: can You add such note on the top of the etherpad maybe? 16:19:13 <slaweq> to not forget about it 16:19:36 <njohnston> sure thing! 16:19:40 <slaweq> thx njohnston 16:19:45 <slaweq> ok, lets move on 16:19:48 <slaweq> next topic 16:19:50 <slaweq> #topic Grafana 16:19:52 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:20:37 <slaweq> during last couple of days our check queue was broken (neutron-tempest-plugin-scenario-linuxbridge job) so there is not too much in gate queue graphs 16:21:10 <slaweq> now it's getting back to normal thx to fix from liuyulong 16:21:37 <slaweq> also rally job was failing quite often 16:21:49 <mlavalle> becasue of server creation? 16:21:55 <slaweq> yes, and that should be now fixed with https://review.opendev.org/#/c/665614/ 16:22:16 <slaweq> and it's going back to low numbers today 16:23:00 <mlavalle> cool 16:23:24 <slaweq> from other things, networking-ovn job is almost switched to be voting - njohnston can You update dashboard when it will be merged in neutron? 16:23:35 <njohnston> will do 16:23:45 <slaweq> thx a lot 16:23:49 <njohnston> #action njohnston update dashboard when ovn job becomes voting 16:24:28 <slaweq> and last note from me about grafana - still functional and fullstack jobs are quite unstable :/ 16:25:06 <slaweq> I have couple of examples ready for latter part of the meeting :) 16:25:32 <slaweq> anything else strange/worth to discuss do You see on grafana? 16:25:53 <mlavalle> not from me 16:26:48 <slaweq> ok, so lets talk about this unstable jobs then 16:26:50 <slaweq> #topic fullstack/functional 16:26:54 <slaweq> first functional tests 16:27:16 <slaweq> the most often during last week I saw failed neutron.tests.functional.agent.linux.test_ip_lib.IpMonitorTestCase tests, like: 16:27:22 <slaweq> http://logs.openstack.org/36/662236/3/check/neutron-functional/3214fdd/testr_results.html.gz 16:27:32 <ralonsoh> sorry for that 16:27:33 <ralonsoh> https://review.opendev.org/#/c/664889/ 16:27:35 <slaweq> but that should be fixed with patch from ralonsoh https://review.opendev.org/#/c/664889/ 16:27:46 <slaweq> ralonsoh: You don't need to sorry :) 16:28:00 <slaweq> thx for quick fix 16:28:35 <slaweq> other than that I saw some failures which happend once (at least I didn't found more examples) 16:28:44 <slaweq> e.g. tests from module neutron.tests.functional.test_server 16:28:50 <slaweq> http://logs.openstack.org/12/640812/2/check/neutron-functional/35dc53f/testr_results.html.gz 16:30:27 <slaweq> looking at logs from those tests, I see http://logs.openstack.org/12/640812/2/check/neutron-functional/35dc53f/controller/logs/dsvm-functional-logs/neutron.tests.functional.test_server.TestWsgiServer.test_restart_wsgi_on_sighup_multiple_workers.txt.gz#_2019-06-17_20_37_09_085 16:30:43 <slaweq> do You know if it is normal to send SIG_UNBLOCK there? 16:31:35 <mlavalle> I don't know 16:31:42 <slaweq> ok, in other test (which passed) it's exactly the same 16:32:45 <slaweq> any ideas about why it could fail? 16:34:19 <mlavalle> hard to say 16:34:30 <mlavalle> without digging deeper 16:34:39 <slaweq> I know :/ 16:34:49 <slaweq> if that will happen again, I will open bug for it 16:35:21 <mlavalle> where is the signal coming from? 16:35:46 <slaweq> it's from test: https://github.com/openstack/neutron/blob/master/neutron/tests/functional/test_server.py#L133 16:35:58 <slaweq> it failed exactly in L159 16:36:57 <mlavalle> ahh, it's waiting 16:37:09 <slaweq> yes, 5 seconds 16:37:16 <slaweq> should be enough IMO 16:37:29 <mlavalle> yes, looks like plenty 16:37:46 <ralonsoh> it takes 8 secs to stop the process 16:37:55 <ralonsoh> from 37:09 to 37:17 16:38:25 <slaweq> ralonsoh: so do You think it could be "just" overloaded node? 16:38:35 <ralonsoh> maybe, I'll take a look at this one 16:38:45 <ralonsoh> I'll open a low priority bug, just to track it 16:38:46 <slaweq> thx ralonsoh 16:38:47 <ralonsoh> ok? 16:38:53 <slaweq> sure, that is good idea 16:39:16 <slaweq> next one on my list is: neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase 16:39:23 <slaweq> http://logs.openstack.org/48/665548/4/check/neutron-functional/f6d6447/testr_results.html.gz 16:40:20 <ralonsoh> not only this one but I've seen this error in other tests 16:40:29 <ralonsoh> the namespace not created 16:40:38 <slaweq> logs from this one are in: 16:40:40 <slaweq> http://logs.openstack.org/48/665548/4/check/neutron-functional/f6d6447/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_spawns_conflicting_pid_base_process.txt.gz#_2019-06-17_10_02_30_829 16:42:02 <ralonsoh> Just a guess, but maybe is something in pyroute, I'll review the latest patches 16:42:31 <slaweq> so I will report it as a bug also to track those issues 16:42:41 <slaweq> ok? 16:42:44 <ralonsoh> +1 16:43:14 <slaweq> what is strange is that many commands in this namespace was executed earlier: http://logs.openstack.org/48/665548/4/check/neutron-functional/f6d6447/controller/logs/journal_log.txt.gz#_Jun_17_10_01_41 16:44:12 <slaweq> I will open bug and try to look deeper into this 16:44:35 <slaweq> #action slaweq to open bug related to missing namespace issue in functional tests 16:44:56 <slaweq> #action ralonsoh to open bug related to failed test_server functional tests 16:45:16 <slaweq> and last one on my list 16:45:18 <slaweq> neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase 16:45:30 <slaweq> http://logs.openstack.org/40/665640/4/check/neutron-functional-python27/e1df845/testr_results.html.gz 16:46:08 <slaweq> in logs for this test there is not too much http://logs.openstack.org/40/665640/4/check/neutron-functional-python27/e1df845/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_protection_dead_reference_removal.txt.gz 16:46:10 <slaweq> :/ 16:46:42 <slaweq> and it should be more there 16:48:18 <ralonsoh> slaweq, maybe (I need to check better those tests cases) we should execute them in temp namespaces 16:48:31 <ralonsoh> slaweq, I don't think we are doing this now 16:48:38 <ralonsoh> (if it's possible) 16:48:52 <slaweq> ralonsoh: but You mean to run "tox" in namespace? 16:48:57 <ralonsoh> no no 16:49:07 <ralonsoh> like in the ip_lib commands 16:49:15 <ralonsoh> create everything in a namespace 16:49:24 <ralonsoh> a temp one, to avoid interferences 16:49:31 <slaweq> ahh, ok 16:49:36 <ralonsoh> is the best way to isolate a test case 16:49:39 <slaweq> yes, we should probably 16:49:42 <ralonsoh> (if possible) 16:49:49 <ralonsoh> ok I'll take this one 16:49:51 <mlavalle> that makes sense 16:50:01 <slaweq> but are You talking about this linuxbridge arp tests? or the previous one? 16:50:11 <ralonsoh> LB arp 16:50:22 <slaweq> yes, that we should run in namespaces if possible IMO 16:50:59 <slaweq> ralonsoh: so You will take care of it, right? 16:51:00 <ralonsoh> I'll create a bug for this 16:51:02 <ralonsoh> yes 16:51:19 <slaweq> great, thx a lot 16:51:31 <slaweq> so that would be all regarding functional tests from me 16:51:43 <slaweq> do You have anything else? 16:51:47 <mlavalle> not me 16:52:13 <slaweq> ok, lets quickly go through fullstack issues 16:52:22 <slaweq> first one (I spotted it at least twice) 16:52:24 <slaweq> neutron.tests.fullstack.test_l3_agent.TestLegacyL3Agent.test_north_south_traffic 16:52:29 <slaweq> http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/testr_results.html.gz 16:52:31 <slaweq> http://logs.openstack.org/29/664629/1/check/neutron-fullstack/e99aa1c/testr_results.html.gz 16:53:52 <slaweq> there is a lot of errors in ovs agent logs: http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--01-52-58-123455_log.txt.gz?level=ERROR 16:55:36 <haleyb> CRITICAL neutron [req-c0ac0a6c-e65a-46c3-a123-82239299ec08 - - - - -] Unhandled error: RuntimeError: No datapath_id on bridge br-ethaa0e64ff9 16:55:56 <slaweq> and the same errors in second case http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--01-52-58-123455_log.txt.gz?level=ERROR 16:56:06 <slaweq> sorry, this is the same link as above 16:56:42 <haleyb> http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--01-52-58-123455_log.txt.gz?#_2019-06-18_01_53_52_665 16:56:46 <slaweq> http://logs.openstack.org/29/664629/1/check/neutron-fullstack/e99aa1c/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--02-04-44-968982_log.txt.gz?level=ERROR 16:56:56 <slaweq> there is log from second example 16:57:00 <slaweq> and the same error 16:57:03 <haleyb> so right before that the bridge was recreated 16:57:37 <slaweq> anyone wants to volunteer and check this issue deeper? 16:57:52 <slaweq> looks like that can be "real" bug not only test issue 16:58:03 <slaweq> *might be 16:58:44 <mlavalle> too much in my plate right now. maybe next week 16:58:55 <slaweq> ok, so I will report this bug for now and we will see later 16:59:02 <mlavalle> +1 16:59:11 <slaweq> mlavalle: sure, I also will not have time to work on it this week 16:59:34 <slaweq> #action slaweq to report bug regarding failing neutron.tests.fullstack.test_l3_agent.TestLegacyL3Agent.test_north_south_traffic tests 16:59:44 <slaweq> ok, and I think we are out of time now 16:59:48 <slaweq> thx for attending 16:59:51 <mlavalle> o/ 16:59:52 <slaweq> see You next week 16:59:54 <slaweq> o/ 16:59:56 <njohnsto_> o/ 16:59:56 <slaweq> #endmeeting