16:00:15 #startmeeting neutron_ci 16:00:16 Meeting started Tue Nov 21 16:00:15 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:20 The meeting name has been set to 'neutron_ci' 16:00:23 o/ 16:00:25 o/ 16:01:00 hi folks. giving 2 mins for others to join. 16:01:09 o/ 16:01:46 hi 16:02:47 #topic Actions from prev meeting 16:03:00 "ihrachys to pull oslo folks into reviewing rootwrap patch" 16:03:09 the patch for oslo.rootwrap was merged 16:03:29 https://review.openstack.org/514547 16:03:35 and new oslo.rootwrap released 16:03:41 also upper-constraints updated 16:03:51 so at this point, gate (master) should have the fix 16:04:23 if you still see this particular failure (either eventlet error in an agent, or commands receiving output of previous commands), please speak up 16:04:39 oh, I saw this morning fullstack failure rate about 70%, so maybe that was it :) 16:04:40 I have backports for the fix for stable: https://review.openstack.org/#/q/Id9d38832c67f2d81d382cda797a48fee943a27f1 16:04:48 but I wanted to give it some time to prove itself 16:04:59 jlibosva, yeah it went down somewhat 16:05:14 at 65% right now 16:05:25 hello, sorry for late 16:05:32 slaweq, hey! 16:05:47 slaweq: you in Paris? 16:05:56 yes 16:06:00 slaweq, fyi the rootwrap issue should be fixed in master. if you see it, then the patch didn't help. 16:06:16 ok, thx for info 16:06:21 next is "mlavalle to track down "TypeError: None is not str() or unicode()!" error in dhcp agent fullstack tests" 16:06:38 I believe l3 team was going to look into it 16:06:39 haleyb proposed this fix https://review.openstack.org/#/c/520710/ 16:07:28 haleyb: is this the right patch? 16:07:41 yes it is 16:08:16 didn't know haleyb is mlavalle's sockpuppet account 16:08:23 sorry, couldn't resist 16:08:53 so it's WIP, is it because you can't reproduce with new oslo? 16:10:05 sorry, someone rang bell and dog went crazy 16:10:38 ihrachys: right, i don't see it in the logs, but haven't looked in logstash 16:11:15 haleyb, note stable branches are still affected so you will need to filter them out 16:11:41 oh actually no, maybe posting the patch to stable will reveal it easier 16:11:50 because it still doesn't have the packages 16:13:04 ok, i can do that while i'm searching logstash 16:13:14 ok those were all actions we had 16:13:19 #topic Grafana 16:13:24 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:13:40 before we dive into data... why is it that gate coverage dashboard is empty? 16:13:45 http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=6&fullscreen 16:13:54 probably a name change? 16:13:58 for the job 16:14:42 ihrachys: that can happen if it's never failed 16:15:17 hm ok I see 16:16:20 yeah the name in project-config seems to be correct 16:16:20 the unit tests in the gate had that "No datapoints" until something in infra blew up 16:16:52 ok. speaking of data... 16:17:17 as we already mentioned, fullstack went somewhat down, now at 65% 16:17:32 still a long way to go but it's the right direction 16:17:37 we sat at 100% for a while 16:18:03 scenarios are back at 100%, both flavors 16:18:59 dvr-ha is at steady 30% and I suspect we don't make progress to make it voting 16:19:28 regarding scenarios - I checked this morning and only consistently failing tests are now east-west fip tests 16:19:59 I was tempted to use unstable_test decorator to see the stability of others :) 16:20:12 we have not made any progress on that bug ^^ yet, will have to look at re-assigning 16:20:23 is it https://bugs.launchpad.net/neutron/+bug/1717302 ? 16:20:23 Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [High,Confirmed] 16:20:37 yes 16:20:51 does it affect both flavors though? this one seems dvr/ha specific 16:21:09 I checked multinode-dvr only 16:21:40 or you mean we have non-ha routers there? 16:21:40 yeah. I picked a random linuxbridge run but it failed with timeout: http://logs.openstack.org/04/492404/19/check/legacy-tempest-dsvm-neutron-scenario-linuxbridge/0decc9a/job-output.txt.gz#_2017-11-21_04_36_12_038942 16:21:44 so no per test logs 16:22:13 ok here is a better run: http://logs.openstack.org/83/521683/3/check/legacy-tempest-dsvm-neutron-scenario-linuxbridge/46e952f/logs/testr_results.html.gz 16:23:11 ah right, linuxbridge is affected by https://bugs.launchpad.net/neutron/+bug/1719711 16:23:11 Launchpad bug 1719711 in neutron "iptables failed to apply when binding a port with AGENT.debug_iptables_rules enabled" [High,Confirmed] - Assigned to Brian Haley (brian-haley) 16:23:23 haleyb, no luck with that one? 16:24:12 ihrachys: no, i have had other priorities, if anyone wants to pick up i can help once we can reproduce it better 16:25:03 haleyb, it reproduces in gate just fine. do you mean you couldn't reproduce locally? 16:25:57 ihrachys: right, i only saw it locally once 16:26:45 I see. maybe if you don't have time for it, unassign yourself so that others are aware it's free 16:27:14 as for floating ip bug, is it still in scope for l3 subteam to figure out the fix? 16:27:19 or we need someone else too 16:27:36 it is still in scope 16:28:04 ok thanks 16:29:09 let's have a look at fullstack now 16:29:13 #topic Fullstack 16:29:49 example failure: http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/testr_results.html.gz 16:31:33 so connectivity failures seem to be because port hasn't transitioned to ACTIVE 16:31:50 I checked agent logs here: http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/dsvm-fullstack-logs/TestOvsConnectivitySameNetwork.test_connectivity_VXLAN,openflow-native_/ and I don't see any clear errors/traces though 16:31:56 also not in neutron-server 16:32:14 I also wonder why the test_connectivity test wasn't skipped 16:32:23 why should it? 16:32:33 have we merged the decorator alredy? 16:32:38 we haven't? 16:32:40 wait :) 16:32:41 I think we did 16:33:08 yeah we did https://review.openstack.org/514660 16:33:38 I need to investigate whether the decorator works with scenarios then 16:34:19 #action jlibosva to figure out why unstable_test didn't work for fullstack scenario case 16:34:33 another possibility is it just doesn't work :) 16:34:43 we don't have a test for it 16:34:57 we should have some fake test that raises an Exception 16:35:02 with the decorator applied 16:35:06 that would prove it works 16:35:39 and we can then do same for scenarios 16:35:43 but I'm pretty sure I saw it was working 16:36:30 http://logs.openstack.org/60/514660/4/check/legacy-neutron-dsvm-fullstack/587b7ff/job-output.txt.gz#_2017-11-14_17_25_56_281752 16:36:35 e.g. here 16:37:09 hm 16:37:20 could it be that the output is included nevertheless 16:37:55 no it lists all 3 as failed: http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/job-output.txt.gz#_2017-11-20_22_26_56_096106 16:38:15 no, this one http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/testr_results.html.gz lists 2 as skipped 16:38:19 just one is failed ... 16:38:33 does the decorator not work if the failure is in the class setup instead of the test itself? 16:38:34 in example which I gave it's marked as skipped: http://logs.openstack.org/60/514660/4/check/legacy-neutron-dsvm-fullstack/587b7ff/logs/testr_results.html.gz 16:38:48 frickler: yes, in class setup it won't work 16:38:57 yeah :) 16:38:58 frickler++ 16:39:05 it didn't even build the env 16:39:17 huh ok good :) 16:40:13 jlibosva, btw we banned the test case because of https://bugs.launchpad.net/neutron/+bug/1728948 but seems like what we see is not related to this bug 16:40:13 Launchpad bug 1728948 in neutron "fullstack: test_connectivity fails due to dhclient crash" [High,New] - Assigned to Jakub Libosvar (libosvar) 16:40:25 jlibosva, what's the plan? make it work for classes somehow? 16:41:10 ihrachys: no, I don't think the decorator should work for classes. If env is not build, is very severe and I don't think we should skip 16:42:19 ok. we need to report a bug for the failure. anyone dare? 16:42:25 I can invsetigate 16:42:36 at least I don't see it in gate-failure list, maybe it's somewhere 16:42:40 jlibosva, thanks! 16:43:04 #action jlibosva to investigate / report a bug for env deployment failure in fullstack because of port down 16:43:50 the other failure in the logs, test_controller_timeout_does_not_break_connectivity_sigkill, is not in setUp though 16:44:28 failed in block_until_boot 16:44:45 could be the dhclient crash? 16:45:17 ah, no. it's a port status to become active 16:46:16 yeah. I think we had a bug reported for the test case failure before 16:46:43 this: https://bugs.launchpad.net/neutron/+bug/1673531 16:46:43 Launchpad bug 1673531 in neutron "fullstack test_controller_timeout_does_not_break_connectivity_sigkill(GRE and l2pop,openflow-native_ovsdb-cli) failure" [Undecided,Fix released] 16:46:50 I should reopen it 16:47:46 #action ihrachys to investigate latest https://bugs.launchpad.net/neutron/+bug/1673531 failures 16:47:46 Launchpad bug 1673531 in neutron "fullstack test_controller_timeout_does_not_break_connectivity_sigkill(GRE and l2pop,openflow-native_ovsdb-cli) failure" [High,Confirmed] - Assigned to Ihar Hrachyshka (ihar-hrachyshka) 16:48:13 and finally, we have test_dscp_marking_packets(openflow-native) failing there 16:48:18 with: neutron.tests.common.agents.l2_extensions.TcpdumpException: No packets marked with DSCP = 16 received from 10.0.0.8 to 10.0.0.11 16:49:46 I would need to look into what the test case does but 16:50:01 only ovs agents are running there, and neutron-server has this: 16:50:01 http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/dsvm-fullstack-logs/TestDscpMarkingQoSOvs.test_dscp_marking_packets_openflow-native_/neutron-server--2017-11-20--22-12-38-673206.txt.gz?level=TRACE 16:50:58 I can try to look at this one it You want 16:51:07 by the looks of it, it's as expected that no l3 / dhcp is there 16:51:27 slaweq, yes please. I think we should start with reporting a bug so that we can capture details there. 16:51:46 ok, I will report bug for it 16:51:46 could it be that it got notification from ovsdb monitor? :) 16:51:51 #action slaweq to investigate / report a bug for test_dscp_marking_packets fullstack failure 16:52:02 jlibosva, what do you mean 16:52:57 ihrachys: nothing, brain fart. ignore me 16:52:59 it's server log 16:53:58 slaweq, irrespective of what actually fails, I think it may also make sense to look if we can suppress those warnings. not having l3 / dhcp is, in theory, a reasonable setup (well, maybe not for dhcp because enable_dhcp is not implemented with a service plugin, but surely you can have a setup without router service plugin) 16:54:41 #topic Tempest plugin 16:54:49 ok 16:54:57 as you know, we transition to a new repo for tempest tests 16:55:12 etherpad tracking patches: https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move 16:55:38 tl;dr we remove jobs from neutron, move them to new repo, inherit them from neutron repo 16:55:59 I still haven't heard about what we do for stable branches 16:56:07 there are some patches attempting removal of legacy jobs 16:56:12 but they are still to be used for stable 16:56:23 if you have answer please speak up 16:56:25 I did some research on that 16:56:44 Please look at https://docs.openstack.org/infra/manual/zuulv3.html#stable-branches 16:57:25 The summary is that each stable branch has to have its own .zuul.yaml and playbooks 16:57:33 ok. so we need to move legacy jobs into stable 16:57:38 before removing them in infra rpeos 16:57:39 *repos 16:57:40 correct 16:57:56 I also had a conversation with the infra team and they confirmed 16:58:07 can we in the meantime filter legacy out for master as we did with zuulv2? 16:58:22 http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-11-21.log.html#t2017-11-21T15:29:19 16:58:35 I think we can do that 16:58:41 ok 16:59:03 final thing I'd like to mention is that apparently there was a breakage in new tempest repo by one of new scenarios 16:59:12 because create_server changed its signature somewhat 16:59:19 so the fix is https://review.openstack.org/#/c/521919/ please review 16:59:36 assuming it helps. if not, we have a revert here: https://review.openstack.org/#/c/521898/ 16:59:48 though I think we can live without a revert for some time because it's not voting anywhere. 17:00:11 (we don't even have those jobs anywhere right now because we cleaned them up in neutron repo already) 17:00:16 anyway, we are out of time 17:00:18 thanks everyone 17:00:20 #endmeeting