16:00:04 <slaweq> #startmeeting neutron_ci 16:00:09 <openstack> Meeting started Tue May 28 16:00:04 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:11 <njohnston_> o/ 16:00:12 <openstack> The meeting name has been set to 'neutron_ci' 16:00:15 <slaweq> hi 16:00:19 <mlavalle> o/ 16:01:19 <ralonsoh> hi 16:01:29 <slaweq> ok, lets start 16:01:37 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:01:44 <slaweq> please open for later :) 16:01:59 <slaweq> #topic Actions from previous meetings 16:02:09 <slaweq> #undo 16:02:10 <openstack> Removing item from minutes: #topic Actions from previous meetings 16:02:14 <slaweq> sorry, I forgot 16:02:18 <mlavalle> LOL 16:02:21 <slaweq> agenda for today's meeting 16:02:23 <slaweq> #link https://etherpad.openstack.org/p/neutron-ci-meetings 16:02:28 <slaweq> and now lets start 16:02:31 <slaweq> #topic Actions from previous meetings 16:02:42 <slaweq> mlavalle to continue debuging reasons of neutron-tempest-plugin-dvr-multinode-scenario failures 16:03:00 <mlavalle> I didn't dovote as much time as I wanted but I made some progress 16:03:58 <mlavalle> looking at many patches, one common test case that fails is test_connectivity_through_2_routers 16:04:12 <mlavalle> so I filed a bug: https://bugs.launchpad.net/neutron/+bug/1830763 16:04:13 <openstack> Launchpad bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:04:19 <mlavalle> assigned it to myself 16:04:37 <slaweq> it's test written by me :/ 16:04:46 <mlavalle> and added a Kibana query: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22test_connectivity_through_2_routers%5C%22%20AND%20build_status:%5C%22FAILURE%5C%22%20AND%20build_branch:%5C%22master%5C%22%20AND%20build_name:%5C%22neutron-tempest-plugin-dvr-multinode-scenario%5C%22%20AND%20project:%5C%22openstack%2Fneutron%5C%22 16:05:06 <mlavalle> so I will be focusing on this one over the next few days 16:05:21 <mlavalle> that's all I have to say about this 16:05:40 <slaweq> ok, thx for update mlavalle 16:05:56 <slaweq> do You mind if I will assign it as an action for next week to You? 16:06:08 <mlavalle> please do 16:06:19 <slaweq> #action mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) 16:06:20 <openstack> bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel) 16:06:21 <slaweq> thx 16:06:35 <slaweq> ok, so next one 16:06:36 <slaweq> mlavalle to talk with nova folks about slow responses for metadata requests 16:06:46 <mlavalle> so I decided not to 16:07:09 <mlavalle> analyzing in detail some logs and the code 16:07:29 <mlavalle> and after conversation with slaweq, we decided that the problem doesn't seem to be on the Nova side 16:07:39 <slaweq> yes 16:07:44 <slaweq> I agree :) 16:09:50 <slaweq> mlavalle: do You want to explain what You found in logs there? 16:10:19 <mlavalle> correlating the code with the logs, we found that there time elpased 16:10:42 <mlavalle> between sending the request for keypairs to Nova and getting the response was less than 2 secs 16:11:06 <mlavalle> that's it in a nutshell 16:11:44 <slaweq> but from VM PoV there is (probably) more that 10 seconds for this request and that's why it fails 16:11:57 <mlavalle> yeap 16:12:32 <slaweq> some time ago I started some patch to add zuul role to fetch journal log: https://review.opendev.org/#/c/643733/ 16:12:33 <patchbot> patch 643733 - zuul/zuul-jobs - Add role to fetch journal log from test node - 3 patch sets 16:12:42 <slaweq> but I never had time to work on this 16:13:02 <slaweq> I today respined this patch as it may help with this issue also 16:13:15 <slaweq> because e.g. haproxy logs are in journal log probably 16:13:46 <mlavalle> yeah 16:13:50 <mlavalle> good idea 16:13:56 <slaweq> so I will assign this to myself as an action for next week :) 16:14:14 <slaweq> #action slaweq to continue work on fetch-journal-log zuul role 16:14:27 <slaweq> that way I will force myself to spent some time on it :) 16:14:43 <slaweq> ok, lets move forward 16:14:46 <slaweq> next one 16:14:48 <slaweq> slaweq to reopen bug related to failures of neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost 16:14:56 <slaweq> Done, bug https://bugs.launchpad.net/neutron/+bug/1798475 reopened 16:14:58 <openstack> Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] - Assigned to LIU Yulong (dragon889) 16:15:05 <slaweq> I send also patch to mark this test as unstable again https://review.opendev.org/#/c/660592/ 16:15:05 <patchbot> patch 660592 - neutron - Mark fullstack test_ha_router_restart_agents_no_pa... - 1 patch set 16:15:16 <slaweq> please check this patch if You will have some time :) 16:15:33 <slaweq> and the last one was: 16:15:35 <slaweq> ralonsoh to propose patch with additional logging to help debug https://bugs.launchpad.net/neutron/+bug/1799555 16:15:36 <openstack> Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [High,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:16:08 <ralonsoh> https://review.opendev.org/#/c/660785/ 16:16:09 <patchbot> patch 660785 - neutron - Add debug information to AutoScheduler and BaseSch... - 4 patch sets 16:16:42 <slaweq> thx ralonsoh 16:16:52 <slaweq> I will review it tonight or tomorrow morning 16:17:00 <ralonsoh> thanks! 16:17:03 <mlavalle> it's also in my pile 16:17:16 <njohnston> +1 16:17:45 <slaweq> ok 16:17:49 <slaweq> that's all from last week 16:17:53 <slaweq> questions/comments? 16:18:09 <mlavalle> none from me 16:18:46 <slaweq> ok, lets move on then 16:18:48 <slaweq> #topic Stadium projects 16:18:54 <slaweq> Python 3 migration 16:19:02 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:19:09 <slaweq> njohnston: any update on this? 16:19:51 <njohnston> nothing at present, no. 16:20:43 <slaweq> I think that thanks tidwellr we have migrated neutron-dynamic-routing to py3 16:20:50 <slaweq> it's done in https://review.opendev.org/#/c/657409/ 16:20:51 <patchbot> patch 657409 - neutron-dynamic-routing - Convert CI jobs to python 3 (MERGED) - 8 patch sets 16:21:00 <slaweq> so I will update etherpad 16:21:04 <njohnston> excellent 16:21:54 <mlavalle> Great 16:22:17 <slaweq> I will try to pick up one of the projects this week if I will have couple of free minutes :) 16:22:37 <slaweq> ok, next thing related to stadium projects 16:22:39 <slaweq> tempest-plugins migration 16:22:43 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:23:05 <slaweq> for bgpvpn we merged first patch https://review.opendev.org/#/c/652991/ 16:23:06 <patchbot> patch 652991 - neutron-tempest-plugin - Rehome tempest tests from networking-bgpvpn repo (MERGED) - 14 patch sets 16:23:17 <slaweq> and I have question regarding to this patch 16:23:45 <slaweq> I unfortunatelly added neutron-tempest-plugin-bgpvpn-bagpipe job to "neutron-tempest-plugin-jobs" template 16:23:55 <slaweq> thus it's now run on every neutron patch 16:24:01 <slaweq> do we want it like that? 16:24:13 <slaweq> I think that it shouldn't be in this template, right? 16:24:18 <mlavalle> I don't think so 16:24:37 <slaweq> yes, I though that when I today realized that it's running in neutron gate too 16:24:38 <njohnston> I agree, I don't think so 16:24:40 <slaweq> so I will change that 16:24:56 <slaweq> #action slaweq to remove neutron-tempest-plugin-bgpvpn-bagpipe from "neutron-tempest-plugin-jobs" template 16:25:18 <slaweq> except that, there is also second patch for bgpvpn project: https://review.opendev.org/#/c/657793/ 16:25:19 <patchbot> patch 657793 - networking-bgpvpn - Rehome tempest tests to neutron-tempest-plugin repo - 1 patch set 16:25:27 <slaweq> please review it if You will have some time 16:25:38 <slaweq> especially mlavalle as You probably have +2 power in this repo :) 16:25:56 <mlavalle> I do, indeed 16:26:01 <slaweq> ths 16:26:03 <slaweq> *thx 16:26:22 <njohnston> the tests for the migration of neutron-fwaas tempest tests are still failing, but I am concerned that is because fwaas is broken for other reasons, as slaweq noticed in the work to move the neutron-fwaas-fullstack job to zuulv3 https://review.opendev.org/644526 16:26:23 <patchbot> patch 644526 - neutron-fwaas - Switch neutron-fwaas-fullstack job to zuulv3 syntax - 20 patch sets 16:27:10 <slaweq> njohnston: but what I noticed in my patch is that fullstack job in fwaas repo is broken 16:27:14 <mlavalle> njohnston: I intend to send an email to Sridhar and xgerman 16:27:23 <slaweq> all other jobs are working fine there 16:27:30 <mlavalle> o about their involvment with fwaas 16:27:49 <njohnston> I don't even have a working email address for xgerman since he left Rackspace 16:27:54 <mlavalle> after they respond, we can proceed to send a plea of help the the general ML 16:28:09 <mlavalle> I have a way to find it 16:28:15 <njohnston> ok cool 16:28:33 <slaweq> mlavalle has got his secret PTL's tools to find it :P 16:28:33 <njohnston> anyway I will keep digging to see if I can find the root cause of the issue at least 16:28:58 * mlavalle wishes that was the case, LOL 16:29:11 <slaweq> :) 16:29:18 <haleyb> i have german's email if you need it 16:29:27 <mlavalle> cool 16:29:28 <slaweq> ok, njohnston if You would need any help with this zuul issues, please ping me 16:29:45 <slaweq> ha, so haleyb is this secret PTL's tool :P 16:29:55 <haleyb> :) 16:30:30 <slaweq> ok, so moving on 16:30:40 <slaweq> there is also networking-sfc project 16:30:46 <slaweq> and first patch is merged https://review.opendev.org/#/c/653012 16:30:47 <patchbot> patch 653012 - neutron-tempest-plugin - Migrate networking-sfc tests to neutron-tempest-pl... (MERGED) - 10 patch sets 16:31:03 <slaweq> bcafarel said that second patch https://review.opendev.org/#/c/653747 is also ready for review 16:31:03 <patchbot> patch 653747 - networking-sfc - Complete move of networking-sfc tempest tests to t... - 21 patch sets 16:31:12 <slaweq> so please add it to Your review list :) 16:31:39 <slaweq> especially people who have +2 in this repo 16:32:32 <slaweq> ok 16:32:43 <mlavalle> which I do 16:32:44 <slaweq> any other questions/comments related to stadium projects? 16:33:04 <mlavalle> just to say I didn't make much proress with vpnaas this week 16:33:20 <mlavalle> I'll try again over the next few days 16:33:29 <slaweq> sure, no rush :) 16:33:53 <slaweq> we don't have any deadline for this 16:34:28 <slaweq> (but You can turn on hulk some day if it will take too long ;D) 16:34:49 * mlavalle shudders just thinking about it 16:35:14 <slaweq> LOL 16:35:23 <slaweq> ok, lets move on 16:35:31 <slaweq> #topic Grafana 16:35:37 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:37:07 <slaweq> I don't see any urgent issues there 16:37:09 <mlavalle> not too bad 16:37:15 <slaweq> all looks "as usually" 16:37:21 <slaweq> and indeed not very bad :) 16:37:34 <mlavalle> it clearly shows the effect of the long weekend in the US 16:37:46 <slaweq> we still have some failures in tempest jobs and in functional/fullstack jobs but nothig very bad 16:38:04 <mlavalle> mostly functional, the way I see it 16:38:10 <njohnston> is that what happened to the unit tests graph in the gate queue - disproportionate impact of job failures due to low volume? 16:39:09 <slaweq> njohnston: I think so - look that this highest failure rate is when it was run only 3 times 16:39:23 <njohnston> makes sense 16:39:27 <slaweq> so it could be even some DNM patch with broken tests :) 16:39:49 <slaweq> lets see in next couple of days how it will be 16:40:53 <slaweq> ok, lets move on then 16:40:58 <slaweq> #topic fullstack/functional 16:41:16 <slaweq> I was looking at some recent failed jobs 16:41:31 <slaweq> and I found fullstack failure on neutron.tests.fullstack.test_l3_agent.TestHAL3Agent. test_ha_router_restart_agents_no_packet_lost 16:41:39 <slaweq> which I already propsoed to mark as unstable again 16:41:48 <slaweq> and bug is reopened for this 16:42:39 <slaweq> but also liuyulong|away have patch which may help for this https://review.opendev.org/#/c/660905/ 16:42:40 <patchbot> patch 660905 - neutron - Set neutron-keepalived-state-change proctitle - 1 patch set 16:42:59 <slaweq> so I think that we don't need to talk about this failed test too much here 16:43:20 <mlavalle> ok 16:43:22 <slaweq> regarding to functional tests I found 2 issues 16:43:29 <slaweq> one in neutron.tests.functional.test_server.TestWsgiServer. test_restart_wsgi_on_sighup_multiple_workers: 16:43:35 <slaweq> http://logs.openstack.org/69/655969/3/check/neutron-functional/2a0533c/testr_results.html.gz 16:45:28 <slaweq> according to logstash it happend twice in last week 16:45:36 <slaweq> so it's not too much for now 16:45:41 <mlavalle> yeah 16:45:51 <slaweq> IMO let's keep an eye on it and we will see how it will be 16:45:56 <slaweq> what do You think? 16:46:16 <mlavalle> yes 16:46:47 <slaweq> and the second issue, which I found at least twice in last couple of days 16:46:55 <slaweq> neutron.tests.functional.agent.test_l2_ovs_agent.TestOVSAgent. test_assert_br_phys_patch_port_ofports_dont_change 16:47:01 <slaweq> http://logs.openstack.org/87/658787/6/check/neutron-functional-python27/100ec44/testr_results.html.gz 16:47:03 <slaweq> http://logs.openstack.org/05/660905/1/check/neutron-functional/f48d9de/testr_results.html.gz 16:48:21 <slaweq> so, this one is also not happening very often for now 16:48:32 <slaweq> lets also keep an eye on it and we will see how it will be 16:48:34 <slaweq> ok? 16:48:37 <mlavalle> cool 16:48:49 <slaweq> ok 16:48:57 <slaweq> other than that I think it's good 16:49:31 <slaweq> regarding tempest/scenario tests the issue which we have is well known (ssh problems) and we already talked about it 16:49:41 <mlavalle> yeap 16:49:55 <slaweq> so, I have one last topic for today 16:49:59 <slaweq> #topic Open discussion 16:50:12 <slaweq> I wanted to ask about one thing here 16:50:36 <slaweq> recently I found that we don't have any API/scenario tests for port forwarding in neutron-tempest-plugin repo 16:50:57 <mlavalle> only scenario? 16:51:05 <slaweq> so we have some e.g. functional tests of course but we are missing any end-to-end tests 16:51:18 <slaweq> so I started adding such tests in neutron-tempest-plugin repo 16:51:31 <slaweq> but I wanted to ask if we can do something to avoid such things in future 16:51:52 <slaweq> I proposed some small update to reviewers guide https://review.opendev.org/661770 16:51:53 <patchbot> patch 661770 - neutron - Add short info about tempest API/scenario tests to... - 2 patch sets 16:51:58 <slaweq> so please check it 16:52:11 <slaweq> but maybe there is something else what we can do also 16:52:15 <slaweq> what do You think? 16:52:32 <mlavalle> Nice! 16:52:35 <mlavalle> Thanks! 16:52:35 <njohnston> yeah, earlier today I noticed that it doesnt look like we have any kind of testing for vlan trunking that makes sure it works when instances get migrated 16:52:54 <slaweq> njohnston: yep, so it's second thing :/ 16:52:59 <njohnston> it should be a criterion for completion of the feature 16:53:09 <njohnston> slaweq++ 16:53:32 <slaweq> njohnston: I agree, that's why I added this note to reviewers guide 16:53:41 <slaweq> but maybe it should be written also somewhere else? 16:53:55 <slaweq> I don't know TBH :) 16:56:09 <mlavalle> I think that's enough 16:56:26 <slaweq> ok, thx mlavalle and njohnston for opinions :) 16:56:34 <haleyb> i had one question 16:56:39 <slaweq> and we should all remember about this during reviews 16:56:43 <slaweq> haleyb: sure 16:57:12 <haleyb> i might have actually thought of an answer, but i'm trying to fix one of the OVN periodic jobs 16:57:15 <haleyb> https://review.opendev.org/#/c/661065/ 16:57:16 <patchbot> patch 661065 - neutron - Fix OVS build issue on Fedora - 1 patch set 16:57:29 <haleyb> but they only run on the master branch 16:57:48 <haleyb> i was wondering if there was a way to trigger that job on any change 16:58:08 <haleyb> or should i just add it to a test patch in the regular job run 16:58:29 <slaweq> haleyb: You can add it to check queue in zuul config file 16:58:36 <slaweq> and then it will be run on any patch 16:58:55 <haleyb> slaweq: yes, that's what i thought too just a second ago 16:59:04 <haleyb> we never much look at those periodic jobs 16:59:22 <slaweq> it's here https://github.com/openstack/networking-ovn/blob/master/zuul.d/project.yaml 16:59:45 <slaweq> haleyb: for neutron I'm usually looking before ci meeting if they are not failing too much 16:59:55 <slaweq> but except that, never :) 17:00:13 <haleyb> :( 17:00:17 <slaweq> ok, it's time to end meeting 17:00:17 <haleyb> time is up 17:00:21 <slaweq> #endmeeting