16:00:47 <slaweq> #startmeeting neutron_ci 16:00:48 <openstack> Meeting started Tue May 21 16:00:47 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:49 <slaweq> hi 16:00:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:51 <ralonsoh> hi 16:00:52 <openstack> The meeting name has been set to 'neutron_ci' 16:00:53 <njohnston> o/ 16:01:44 <slaweq> lets wait few more minutes for mlavalle and others 16:02:09 <bcafarel> just passing by to say hi before I leave :) 16:02:17 <slaweq> hi bcafarel :) 16:02:32 <bcafarel> hi and bye! 16:02:50 <njohnston> a tout a l'heure bcafarel 16:02:53 <haleyb> hi 16:02:59 <slaweq> ok, lets start 16:03:02 <slaweq> first of all 16:03:05 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:03:14 <slaweq> please load it now to have it ready later :) 16:03:42 <slaweq> and one small announcement - I moved agenda of this meeting to the etherpad: https://etherpad.openstack.org/p/neutron-ci-meetings 16:03:58 <slaweq> so You can take a look at it and add anything You want to it :) 16:04:21 <slaweq> first topic for today 16:04:23 <slaweq> #topic Actions from previous meetings 16:04:27 <slaweq> and first action 16:04:33 <slaweq> mlavalle to continue debuging reasons of neutron-tempest-plugin-dvr-multinode-scenario failures 16:05:01 <slaweq> I think that mlavalle is not here now 16:05:11 <slaweq> so lets skip to actions assigned to other people 16:05:26 <slaweq> ralonsoh to debug issue with neutron_tempest_plugin.api.admin.test_network_segment_range test 16:06:06 <ralonsoh> slaweq, sorry again but I didn't find anything yet 16:06:19 <slaweq> ok, no problem 16:06:27 <slaweq> I don't think it is very urgent issue for now 16:06:54 <slaweq> can I assign it to You for next week also? 16:07:12 <ralonsoh> yes but I'll need a bit of help 16:07:17 <ralonsoh> I don't find the problem there 16:07:45 <slaweq> TBH I didn't saw this issue recently 16:08:05 <mlavalle> slaweq: I have to look at an internal issue 16:08:07 <slaweq> so maybe lets just wait until it will happend again, then report proper bug on launchpad and work on it 16:08:16 <slaweq> ralonsoh: how about that? 16:08:19 <ralonsoh> slaweq, perfect 16:08:24 <slaweq> ralonsoh: ok, thx 16:08:29 <slaweq> mlavalle: sure, no problem :) 16:09:59 <slaweq> ok, so lets go back to mlavalle's actions now 16:10:08 <slaweq> mlavalle to continue debuging reasons of neutron-tempest-plugin-dvr-multinode-scenario failures 16:10:14 <slaweq> any updates on this one? 16:10:20 <mlavalle> not much time doing that 16:10:54 <slaweq> ok, can I assign it to You for next week also? 16:10:59 <mlavalle> yes 16:11:02 <slaweq> #action mlavalle to continue debuging reasons of neutron-tempest-plugin-dvr-multinode-scenario failures 16:11:03 <slaweq> thx :) 16:11:13 <slaweq> so next one 16:11:14 <slaweq> mlavalle to talk with nova folks about slow responses for metadata requests 16:11:27 <mlavalle> didn't have time, sorry :-) 16:11:31 <mlavalle> ;-( 16:11:38 <slaweq> no problem :) 16:11:52 <slaweq> can I assign it to You for next week than? 16:11:56 <mlavalle> yes 16:12:12 <slaweq> #action mlavalle to talk with nova folks about slow responses for metadata requests 16:12:13 <slaweq> thx 16:12:20 <slaweq> and the last one: 16:12:22 <slaweq> slaweq to fix number of tempest-slow-py3 jobs in grafana 16:12:32 <slaweq> I didn't but haleyb fixed this 16:12:34 <slaweq> thx haleyb :) 16:12:45 <slaweq> any questions/comments? 16:12:50 <haleyb> :) 16:13:12 <haleyb> i don't know wat i did though 16:13:47 <slaweq> You fixed number of jobs in grafana dashboard's config :) 16:14:11 <slaweq> I don't have link to patch now but it was merged for sure already 16:14:41 <haleyb> oh yes, that one 16:14:50 <slaweq> yep 16:14:59 <slaweq> ok, lets move forward then 16:15:02 <slaweq> next topic 16:15:04 <slaweq> #topic Stadium projects 16:15:10 <slaweq> Python 3 migration 16:15:18 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:15:23 <slaweq> any updates on this? 16:16:04 <mlavalle> I have to talk to yamamoto about this, in regards to midonet 16:17:43 <njohnston> did not have a chance this week to work on it; current state of the fwaas tests in neutron-tempest-plugin is that they wait forever and all tests die with timeouts 16:18:08 <slaweq> that's bad :/ 16:18:14 <slaweq> so fwaas isn't py3 ready yet? 16:18:32 <njohnston> I think it is py3 ready, it just has massive issues in other areas 16:18:40 <slaweq> ahh ok :) 16:20:12 <slaweq> ok, so I think that we can move forward as there is no a lot of update on py3 migration 16:20:19 <slaweq> tempest-plugins migration 16:20:20 <njohnston> yeah 16:20:25 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:20:47 <njohnston> I realize I gave my tempest plugin update in the py3 section, sorry about that. 16:21:04 <slaweq> here I know that bcafarel did some progress and his first patch is even merged 16:21:10 <slaweq> so he is the first one :) 16:21:14 <slaweq> njohnston: no problem :) 16:21:42 <slaweq> for networking-bgpvpn I have patches ready for review: 16:21:48 <slaweq> Step 1: https://review.openstack.org/652991 16:21:50 <slaweq> Step 2: https://review.opendev.org/#/c/657793/ 16:22:42 <slaweq> so I'm kindly ask for review :) 16:22:57 <mlavalle> ok 16:23:00 <slaweq> any other comments/questions on this topic? 16:23:01 <njohnston> I'll take a look 16:23:07 <slaweq> thx guys 16:23:34 <mlavalle> not from me 16:24:25 <slaweq> ok, so lets move on then 16:24:34 <slaweq> #topic Grafana 16:24:51 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate (was alredy given at the beginning) 16:26:17 <slaweq> there are some gaps from the weekend but except that I think that our CI looks quite good in last days 16:26:40 <njohnston> agreed 16:26:41 <slaweq> and I also saw many patches merged even without rechecking, which is supprising :) 16:26:59 <mlavalle> yes, it looks good 16:27:12 <njohnston> yep! 16:28:00 <slaweq> even ssh problems happend less often recently but I'm not sure if that is just "by accident" or there was e.g. some change in infra which helped with it somehow maybe 16:28:59 <slaweq> one problem which I still see are fullstack and (bit less but still) functional tests 16:29:07 <slaweq> all of them are on quite high failure rates 16:29:51 <njohnston> fullstack is high, around 30% 16:29:58 <slaweq> yep 16:30:03 <slaweq> functional was like that last week too 16:30:40 <njohnston> but in grafana, in check queue, functional seems much less, between 5 and 11%, so hopefully we fixed something compared to last week 16:30:49 <slaweq> but then we merged https://review.opendev.org/#/c/657849/ from ralonsoh and I think that helped a lot 16:31:53 <slaweq> so lets talk about fullstack tests now 16:31:59 <slaweq> #topic fullstack/functional 16:32:32 <slaweq> I was checking results of fullstack jobs from last couple of days 16:32:46 <slaweq> and I found couple of failure examples 16:33:01 <slaweq> one is (again) problem with neutron.tests.fullstack.test_l3_agent.TestHAL3Agent 16:33:08 <slaweq> like in http://logs.openstack.org/78/653378/7/check/neutron-fullstack-with-uwsgi/d8b47f9/testr_results.html.gz 16:33:18 <slaweq> but I think I saw it more times during last couple of days 16:34:50 <slaweq> I can find and reopen bug related to this 16:35:09 <slaweq> but I don't think I will have time to look into this in next days 16:35:17 <slaweq> so maybe someone else will want to look 16:35:54 <slaweq> #action slaweq to reopen bug related to failures of neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost 16:36:12 <slaweq> I will ask liuyulong tomorrow if he can take a look at this once again 16:36:32 <slaweq> from other errors I saw also failed neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent test 16:36:38 <slaweq> http://logs.openstack.org/87/658787/5/check/neutron-fullstack/7d35c49/testr_results.html.gz 16:36:53 <slaweq> ralonsoh: I know You were looking into such failure some time ago, right? 16:37:20 <ralonsoh> slaweq, I don;t remember this one 16:38:19 <slaweq> ralonsoh: ha, found it: https://bugs.launchpad.net/neutron/+bug/1799555 16:38:20 <openstack> Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [High,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:38:34 <ralonsoh> slaweq, yes, I was looking at the patch 16:38:37 <ralonsoh> https://review.opendev.org/#/c/643079/ 16:38:59 <ralonsoh> same as before, I didn't find anything relevant to solve the bug 16:39:26 <ralonsoh> I'll take a look at those logs 16:39:37 <slaweq> maybe we should add some additional logging to master branch 16:39:54 <slaweq> it may help us investigate when same issue will happened again 16:40:01 <slaweq> what do You think about it? 16:40:06 <ralonsoh> slaweq, I'll propose a patch for this 16:40:13 <slaweq> ralonsoh++ thx 16:40:34 <slaweq> #action ralonsoh to propose patch with additional logging to help debug https://bugs.launchpad.net/neutron/+bug/1799555 16:40:35 <openstack> Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [High,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 16:41:29 <slaweq> there was also once failure of test_min_bw_qos_policy_rule_lifecycle test: 16:41:35 <slaweq> http://logs.openstack.org/46/453346/11/check/neutron-fullstack/7c6c94b/testr_results.html.gz 16:42:26 <slaweq> and there are some errors in log there http://logs.openstack.org/46/453346/11/check/neutron-fullstack/7c6c94b/controller/logs/dsvm-fullstack-logs/TestMinBwQoSOvs.test_min_bw_qos_policy_rule_lifecycle_egress,openflow-cli_/neutron-openvswitch-agent--2019-05-21--00-19-19-155407_log.txt.gz?level=ERROR 16:42:44 <ralonsoh> slaweq, but I think that was solved 16:42:52 <ralonsoh> slaweq, I'll review it too 16:43:04 <slaweq> results are from 2019-05-21 00:19 16:43:55 <slaweq> but it is quite old patch so it might be that this is just an old error 16:44:05 <slaweq> it happend on this patch https://review.opendev.org/#/c/453346/ 16:44:40 <ralonsoh> yes, I saw this. I think the patch I applied some weeks ago solved this 16:44:57 <slaweq> ok, so we should be good with this one then :) 16:45:02 <slaweq> thx ralonsoh for confirmation 16:45:08 <ralonsoh> there should be no complain about deleting a non exisitng QoS rule 16:45:24 <slaweq> ok, so lets move on to functional tests now 16:45:37 <slaweq> I saw 2 "new" issues there 16:45:48 <slaweq> first is test_ha_router_failover test fails again 16:45:56 <slaweq> http://logs.openstack.org/61/659861/1/check/neutron-functional/3708673/testr_results.html.gz - I reported it as new bug https://bugs.launchpad.net/neutron/+bug/1829889 16:45:58 <openstack> Launchpad bug 1829889 in neutron "_assert_ipv6_accept_ra method should wait until proper settings will be configured" [Medium,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:46:03 <slaweq> I will take care of this one 16:46:14 <slaweq> I think I already know what is the issue there 16:46:26 <slaweq> and I described it in bug report 16:46:35 <slaweq> and second issue which I found is: 16:46:37 <slaweq> neutron.tests.functional.agent.linux.test_bridge_lib.FdbInterfaceTestCase 16:46:42 <slaweq> http://logs.openstack.org/84/647484/9/check/neutron-functional/6709666/testr_results.html.gz 16:46:53 <slaweq> for which njohnston reported a bug https://bugs.launchpad.net/neutron/+bug/1829890 16:46:54 <openstack> Launchpad bug 1829890 in neutron "neutron-functional CI job fails with InterfaceAlreadyExists error" [Undecided,New] 16:47:27 <slaweq> it failed on patch https://review.opendev.org/#/c/647484 16:47:37 <slaweq> and I saw it only on this patch 16:47:50 <slaweq> but it don't looks like related to this patch IMO 16:47:55 <njohnston> http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Interface%20interface%20already%20exists%5C%22 16:48:10 <njohnston> and expand out to 7 days 16:48:24 <njohnston> I get 11 failures on 5 different changes 16:48:58 <slaweq> ok, so it is an issue, thx njohnston 16:49:06 <slaweq> is there any volunteer to look into this one? 16:49:30 <mlavalle> I have some catch up to do this week 16:49:46 <mlavalle> but if by next week, nobody volunteers, I'll jump in 16:50:01 <slaweq> ok, I will add it to my todo for this week, but I'm not sure if I will have time 16:50:18 <slaweq> so I will not assign it to myself for now, maybe there will be someone else who wants to take it 16:50:22 <njohnston> I'm in the same boat; I have a number of things ahead of it, but if I get a chance I'll jump in 16:51:14 <slaweq> ok 16:51:23 <slaweq> so that's all regarding fullstack/functional jobs 16:51:30 <slaweq> any questions/comments? 16:51:46 <mlavalle> not from me 16:52:01 <slaweq> ok 16:52:16 <slaweq> lets then move on quickly to next topic 16:52:17 <slaweq> #topic Tempest/Scenario 16:52:40 <slaweq> first of all, I want to mention that we have quite often failures with errors like 16:52:44 <slaweq> "Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'" 16:52:57 <slaweq> it causes errors in devstack deployment and job is failing 16:53:03 <slaweq> it's not related to neutron directly 16:53:18 <slaweq> and AFAIK infra people are aware of this 16:53:45 <slaweq> and second thing which I want to mention 16:53:46 <slaweq> I did patch with summary of tempest jobs 16:53:54 <slaweq> https://review.opendev.org/#/c/660349/ - please review it, 16:54:26 <slaweq> it's follow up after discussion from Denver 16:54:41 <mlavalle> nice! 16:54:56 <slaweq> when this will be merged, my plan is to maybe switch some of those tempest jobs with neutron-templest-plugin jobs 16:55:31 <slaweq> as IMHO we don't need to test tempest-full-xxx jobs with every possible config like dvr/l3ha/lb/ovs/.... 16:55:57 <slaweq> we can IMHO run tempest-full with some default config and then test neutron related things with other configurations 16:56:08 <njohnston> cool idea, I like it 16:56:27 <slaweq> but I want to have this list merged first and than use it as list of "todo" :) 16:56:39 <mlavalle> ok 16:56:44 <slaweq> there is also list of grenade jobs in this patch 16:56:55 <slaweq> and speaking about grenade jobs, I sent email some time ago 16:57:02 <slaweq> http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006146.html 16:57:17 <slaweq> please read it, and tell me what You think about it 16:57:39 <slaweq> I will also ask gmann and other qa folks about their opinion on this 16:58:05 <slaweq> and that's all from my side :) 16:58:15 <slaweq> any questions/comments? 16:58:21 <slaweq> we have about 1 minute left 16:58:31 <mlavalle> not from me 16:58:36 <njohnston> yeah I was waiting for gmann's response to that email 16:58:56 <slaweq> ok, I will ping him also :) 16:59:05 <njohnston> thanks slaweq 16:59:14 <slaweq> so if there is nothing else 16:59:17 <slaweq> thx for attending 16:59:22 <slaweq> and have a nice week :) 16:59:28 <slaweq> #endmeeting