16:00:04 #startmeeting neutron_ci 16:00:09 Meeting started Tue May 28 16:00:04 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:11 o/ 16:00:12 The meeting name has been set to 'neutron_ci' 16:00:15 hi 16:00:19 o/ 16:01:19 hi 16:01:29 ok, lets start 16:01:37 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:01:44 please open for later :) 16:01:59 #topic Actions from previous meetings 16:02:09 #undo 16:02:10 Removing item from minutes: #topic Actions from previous meetings 16:02:14 sorry, I forgot 16:02:18 LOL 16:02:21 agenda for today's meeting 16:02:23 #link https://etherpad.openstack.org/p/neutron-ci-meetings 16:02:28 and now lets start 16:02:31 #topic Actions from previous meetings 16:02:42 mlavalle to continue debuging reasons of neutron-tempest-plugin-dvr-multinode-scenario failures 16:03:00 I didn't dovote as much time as I wanted but I made some progress 16:03:58 looking at many patches, one common test case that fails is test_connectivity_through_2_routers 16:04:12 so I filed a bug: https://bugs.launchpad.net/neutron/+bug/1830763 16:04:13 Launchpad bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:04:19 assigned it to myself 16:04:37 it's test written by me :/ 16:04:46 and added a Kibana query: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22test_connectivity_through_2_routers%5C%22%20AND%20build_status:%5C%22FAILURE%5C%22%20AND%20build_branch:%5C%22master%5C%22%20AND%20build_name:%5C%22neutron-tempest-plugin-dvr-multinode-scenario%5C%22%20AND%20project:%5C%22openstack%2Fneutron%5C%22 16:05:06 so I will be focusing on this one over the next few days 16:05:21 that's all I have to say about this 16:05:40 ok, thx for update mlavalle 16:05:56 do You mind if I will assign it as an action for next week to You? 16:06:08 please do 16:06:19 #action mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) 16:06:20 bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel) 16:06:21 thx 16:06:35 ok, so next one 16:06:36 mlavalle to talk with nova folks about slow responses for metadata requests 16:06:46 so I decided not to 16:07:09 analyzing in detail some logs and the code 16:07:29 and after conversation with slaweq, we decided that the problem doesn't seem to be on the Nova side 16:07:39 yes 16:07:44 I agree :) 16:09:50 mlavalle: do You want to explain what You found in logs there? 16:10:19 correlating the code with the logs, we found that there time elpased 16:10:42 between sending the request for keypairs to Nova and getting the response was less than 2 secs 16:11:06 that's it in a nutshell 16:11:44 but from VM PoV there is (probably) more that 10 seconds for this request and that's why it fails 16:11:57 yeap 16:12:32 some time ago I started some patch to add zuul role to fetch journal log: https://review.opendev.org/#/c/643733/ 16:12:33 patch 643733 - zuul/zuul-jobs - Add role to fetch journal log from test node - 3 patch sets 16:12:42 but I never had time to work on this 16:13:02 I today respined this patch as it may help with this issue also 16:13:15 because e.g. haproxy logs are in journal log probably 16:13:46 yeah 16:13:50 good idea 16:13:56 so I will assign this to myself as an action for next week :) 16:14:14 #action slaweq to continue work on fetch-journal-log zuul role 16:14:27 that way I will force myself to spent some time on it :) 16:14:43 ok, lets move forward 16:14:46 next one 16:14:48 slaweq to reopen bug related to failures of neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost 16:14:56 Done, bug https://bugs.launchpad.net/neutron/+bug/1798475 reopened 16:14:58 Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] - Assigned to LIU Yulong (dragon889) 16:15:05 I send also patch to mark this test as unstable again https://review.opendev.org/#/c/660592/ 16:15:05 patch 660592 - neutron - Mark fullstack test_ha_router_restart_agents_no_pa... - 1 patch set 16:15:16 please check this patch if You will have some time :) 16:15:33 and the last one was: 16:15:35 ralonsoh to propose patch with additional logging to help debug https://bugs.launchpad.net/neutron/+bug/1799555 16:15:36 Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [High,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:16:08 https://review.opendev.org/#/c/660785/ 16:16:09 patch 660785 - neutron - Add debug information to AutoScheduler and BaseSch... - 4 patch sets 16:16:42 thx ralonsoh 16:16:52 I will review it tonight or tomorrow morning 16:17:00 thanks! 16:17:03 it's also in my pile 16:17:16 +1 16:17:45 ok 16:17:49 that's all from last week 16:17:53 questions/comments? 16:18:09 none from me 16:18:46 ok, lets move on then 16:18:48 #topic Stadium projects 16:18:54 Python 3 migration 16:19:02 Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:19:09 njohnston: any update on this? 16:19:51 nothing at present, no. 16:20:43 I think that thanks tidwellr we have migrated neutron-dynamic-routing to py3 16:20:50 it's done in https://review.opendev.org/#/c/657409/ 16:20:51 patch 657409 - neutron-dynamic-routing - Convert CI jobs to python 3 (MERGED) - 8 patch sets 16:21:00 so I will update etherpad 16:21:04 excellent 16:21:54 Great 16:22:17 I will try to pick up one of the projects this week if I will have couple of free minutes :) 16:22:37 ok, next thing related to stadium projects 16:22:39 tempest-plugins migration 16:22:43 Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:23:05 for bgpvpn we merged first patch https://review.opendev.org/#/c/652991/ 16:23:06 patch 652991 - neutron-tempest-plugin - Rehome tempest tests from networking-bgpvpn repo (MERGED) - 14 patch sets 16:23:17 and I have question regarding to this patch 16:23:45 I unfortunatelly added neutron-tempest-plugin-bgpvpn-bagpipe job to "neutron-tempest-plugin-jobs" template 16:23:55 thus it's now run on every neutron patch 16:24:01 do we want it like that? 16:24:13 I think that it shouldn't be in this template, right? 16:24:18 I don't think so 16:24:37 yes, I though that when I today realized that it's running in neutron gate too 16:24:38 I agree, I don't think so 16:24:40 so I will change that 16:24:56 #action slaweq to remove neutron-tempest-plugin-bgpvpn-bagpipe from "neutron-tempest-plugin-jobs" template 16:25:18 except that, there is also second patch for bgpvpn project: https://review.opendev.org/#/c/657793/ 16:25:19 patch 657793 - networking-bgpvpn - Rehome tempest tests to neutron-tempest-plugin repo - 1 patch set 16:25:27 please review it if You will have some time 16:25:38 especially mlavalle as You probably have +2 power in this repo :) 16:25:56 I do, indeed 16:26:01 ths 16:26:03 *thx 16:26:22 the tests for the migration of neutron-fwaas tempest tests are still failing, but I am concerned that is because fwaas is broken for other reasons, as slaweq noticed in the work to move the neutron-fwaas-fullstack job to zuulv3 https://review.opendev.org/644526 16:26:23 patch 644526 - neutron-fwaas - Switch neutron-fwaas-fullstack job to zuulv3 syntax - 20 patch sets 16:27:10 njohnston: but what I noticed in my patch is that fullstack job in fwaas repo is broken 16:27:14 njohnston: I intend to send an email to Sridhar and xgerman 16:27:23 all other jobs are working fine there 16:27:30 o about their involvment with fwaas 16:27:49 I don't even have a working email address for xgerman since he left Rackspace 16:27:54 after they respond, we can proceed to send a plea of help the the general ML 16:28:09 I have a way to find it 16:28:15 ok cool 16:28:33 mlavalle has got his secret PTL's tools to find it :P 16:28:33 anyway I will keep digging to see if I can find the root cause of the issue at least 16:28:58 * mlavalle wishes that was the case, LOL 16:29:11 :) 16:29:18 i have german's email if you need it 16:29:27 cool 16:29:28 ok, njohnston if You would need any help with this zuul issues, please ping me 16:29:45 ha, so haleyb is this secret PTL's tool :P 16:29:55 :) 16:30:30 ok, so moving on 16:30:40 there is also networking-sfc project 16:30:46 and first patch is merged https://review.opendev.org/#/c/653012 16:30:47 patch 653012 - neutron-tempest-plugin - Migrate networking-sfc tests to neutron-tempest-pl... (MERGED) - 10 patch sets 16:31:03 bcafarel said that second patch https://review.opendev.org/#/c/653747 is also ready for review 16:31:03 patch 653747 - networking-sfc - Complete move of networking-sfc tempest tests to t... - 21 patch sets 16:31:12 so please add it to Your review list :) 16:31:39 especially people who have +2 in this repo 16:32:32 ok 16:32:43 which I do 16:32:44 any other questions/comments related to stadium projects? 16:33:04 just to say I didn't make much proress with vpnaas this week 16:33:20 I'll try again over the next few days 16:33:29 sure, no rush :) 16:33:53 we don't have any deadline for this 16:34:28 (but You can turn on hulk some day if it will take too long ;D) 16:34:49 * mlavalle shudders just thinking about it 16:35:14 LOL 16:35:23 ok, lets move on 16:35:31 #topic Grafana 16:35:37 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:37:07 I don't see any urgent issues there 16:37:09 not too bad 16:37:15 all looks "as usually" 16:37:21 and indeed not very bad :) 16:37:34 it clearly shows the effect of the long weekend in the US 16:37:46 we still have some failures in tempest jobs and in functional/fullstack jobs but nothig very bad 16:38:04 mostly functional, the way I see it 16:38:10 is that what happened to the unit tests graph in the gate queue - disproportionate impact of job failures due to low volume? 16:39:09 njohnston: I think so - look that this highest failure rate is when it was run only 3 times 16:39:23 makes sense 16:39:27 so it could be even some DNM patch with broken tests :) 16:39:49 lets see in next couple of days how it will be 16:40:53 ok, lets move on then 16:40:58 #topic fullstack/functional 16:41:16 I was looking at some recent failed jobs 16:41:31 and I found fullstack failure on neutron.tests.fullstack.test_l3_agent.TestHAL3Agent. test_ha_router_restart_agents_no_packet_lost 16:41:39 which I already propsoed to mark as unstable again 16:41:48 and bug is reopened for this 16:42:39 but also liuyulong|away have patch which may help for this https://review.opendev.org/#/c/660905/ 16:42:40 patch 660905 - neutron - Set neutron-keepalived-state-change proctitle - 1 patch set 16:42:59 so I think that we don't need to talk about this failed test too much here 16:43:20 ok 16:43:22 regarding to functional tests I found 2 issues 16:43:29 one in neutron.tests.functional.test_server.TestWsgiServer. test_restart_wsgi_on_sighup_multiple_workers: 16:43:35 http://logs.openstack.org/69/655969/3/check/neutron-functional/2a0533c/testr_results.html.gz 16:45:28 according to logstash it happend twice in last week 16:45:36 so it's not too much for now 16:45:41 yeah 16:45:51 IMO let's keep an eye on it and we will see how it will be 16:45:56 what do You think? 16:46:16 yes 16:46:47 and the second issue, which I found at least twice in last couple of days 16:46:55 neutron.tests.functional.agent.test_l2_ovs_agent.TestOVSAgent. test_assert_br_phys_patch_port_ofports_dont_change 16:47:01 http://logs.openstack.org/87/658787/6/check/neutron-functional-python27/100ec44/testr_results.html.gz 16:47:03 http://logs.openstack.org/05/660905/1/check/neutron-functional/f48d9de/testr_results.html.gz 16:48:21 so, this one is also not happening very often for now 16:48:32 lets also keep an eye on it and we will see how it will be 16:48:34 ok? 16:48:37 cool 16:48:49 ok 16:48:57 other than that I think it's good 16:49:31 regarding tempest/scenario tests the issue which we have is well known (ssh problems) and we already talked about it 16:49:41 yeap 16:49:55 so, I have one last topic for today 16:49:59 #topic Open discussion 16:50:12 I wanted to ask about one thing here 16:50:36 recently I found that we don't have any API/scenario tests for port forwarding in neutron-tempest-plugin repo 16:50:57 only scenario? 16:51:05 so we have some e.g. functional tests of course but we are missing any end-to-end tests 16:51:18 so I started adding such tests in neutron-tempest-plugin repo 16:51:31 but I wanted to ask if we can do something to avoid such things in future 16:51:52 I proposed some small update to reviewers guide https://review.opendev.org/661770 16:51:53 patch 661770 - neutron - Add short info about tempest API/scenario tests to... - 2 patch sets 16:51:58 so please check it 16:52:11 but maybe there is something else what we can do also 16:52:15 what do You think? 16:52:32 Nice! 16:52:35 Thanks! 16:52:35 yeah, earlier today I noticed that it doesnt look like we have any kind of testing for vlan trunking that makes sure it works when instances get migrated 16:52:54 njohnston: yep, so it's second thing :/ 16:52:59 it should be a criterion for completion of the feature 16:53:09 slaweq++ 16:53:32 njohnston: I agree, that's why I added this note to reviewers guide 16:53:41 but maybe it should be written also somewhere else? 16:53:55 I don't know TBH :) 16:56:09 I think that's enough 16:56:26 ok, thx mlavalle and njohnston for opinions :) 16:56:34 i had one question 16:56:39 and we should all remember about this during reviews 16:56:43 haleyb: sure 16:57:12 i might have actually thought of an answer, but i'm trying to fix one of the OVN periodic jobs 16:57:15 https://review.opendev.org/#/c/661065/ 16:57:16 patch 661065 - neutron - Fix OVS build issue on Fedora - 1 patch set 16:57:29 but they only run on the master branch 16:57:48 i was wondering if there was a way to trigger that job on any change 16:58:08 or should i just add it to a test patch in the regular job run 16:58:29 haleyb: You can add it to check queue in zuul config file 16:58:36 and then it will be run on any patch 16:58:55 slaweq: yes, that's what i thought too just a second ago 16:59:04 we never much look at those periodic jobs 16:59:22 it's here https://github.com/openstack/networking-ovn/blob/master/zuul.d/project.yaml 16:59:45 haleyb: for neutron I'm usually looking before ci meeting if they are not failing too much 16:59:55 but except that, never :) 17:00:13 :( 17:00:17 ok, it's time to end meeting 17:00:17 time is up 17:00:21 #endmeeting