16:00:15 <ihrachys> #startmeeting neutron_ci 16:00:16 <openstack> Meeting started Tue Nov 21 16:00:15 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:20 <openstack> The meeting name has been set to 'neutron_ci' 16:00:23 <mlavalle> o/ 16:00:25 <jlibosva> o/ 16:01:00 <ihrachys> hi folks. giving 2 mins for others to join. 16:01:09 <frickler> o/ 16:01:46 <haleyb> hi 16:02:47 <ihrachys> #topic Actions from prev meeting 16:03:00 <ihrachys> "ihrachys to pull oslo folks into reviewing rootwrap patch" 16:03:09 <ihrachys> the patch for oslo.rootwrap was merged 16:03:29 <ihrachys> https://review.openstack.org/514547 16:03:35 <ihrachys> and new oslo.rootwrap released 16:03:41 <ihrachys> also upper-constraints updated 16:03:51 <ihrachys> so at this point, gate (master) should have the fix 16:04:23 <ihrachys> if you still see this particular failure (either eventlet error in an agent, or commands receiving output of previous commands), please speak up 16:04:39 <jlibosva> oh, I saw this morning fullstack failure rate about 70%, so maybe that was it :) 16:04:40 <ihrachys> I have backports for the fix for stable: https://review.openstack.org/#/q/Id9d38832c67f2d81d382cda797a48fee943a27f1 16:04:48 <ihrachys> but I wanted to give it some time to prove itself 16:04:59 <ihrachys> jlibosva, yeah it went down somewhat 16:05:14 <ihrachys> at 65% right now 16:05:25 <slaweq> hello, sorry for late 16:05:32 <ihrachys> slaweq, hey! 16:05:47 <mlavalle> slaweq: you in Paris? 16:05:56 <slaweq> yes 16:06:00 <ihrachys> slaweq, fyi the rootwrap issue should be fixed in master. if you see it, then the patch didn't help. 16:06:16 <slaweq> ok, thx for info 16:06:21 <ihrachys> next is "mlavalle to track down "TypeError: None is not str() or unicode()!" error in dhcp agent fullstack tests" 16:06:38 <ihrachys> I believe l3 team was going to look into it 16:06:39 <mlavalle> haleyb proposed this fix https://review.openstack.org/#/c/520710/ 16:07:28 <mlavalle> haleyb: is this the right patch? 16:07:41 <mlavalle> yes it is 16:08:16 <ihrachys> didn't know haleyb is mlavalle's sockpuppet account 16:08:23 <ihrachys> sorry, couldn't resist 16:08:53 <ihrachys> so it's WIP, is it because you can't reproduce with new oslo? 16:10:05 <haleyb> sorry, someone rang bell and dog went crazy 16:10:38 <haleyb> ihrachys: right, i don't see it in the logs, but haven't looked in logstash 16:11:15 <ihrachys> haleyb, note stable branches are still affected so you will need to filter them out 16:11:41 <ihrachys> oh actually no, maybe posting the patch to stable will reveal it easier 16:11:50 <ihrachys> because it still doesn't have the packages 16:13:04 <haleyb> ok, i can do that while i'm searching logstash 16:13:14 <ihrachys> ok those were all actions we had 16:13:19 <ihrachys> #topic Grafana 16:13:24 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:13:40 <ihrachys> before we dive into data... why is it that gate coverage dashboard is empty? 16:13:45 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=6&fullscreen 16:13:54 <ihrachys> probably a name change? 16:13:58 <ihrachys> for the job 16:14:42 <haleyb> ihrachys: that can happen if it's never failed 16:15:17 <ihrachys> hm ok I see 16:16:20 <ihrachys> yeah the name in project-config seems to be correct 16:16:20 <haleyb> the unit tests in the gate had that "No datapoints" until something in infra blew up 16:16:52 <ihrachys> ok. speaking of data... 16:17:17 <ihrachys> as we already mentioned, fullstack went somewhat down, now at 65% 16:17:32 <ihrachys> still a long way to go but it's the right direction 16:17:37 <ihrachys> we sat at 100% for a while 16:18:03 <ihrachys> scenarios are back at 100%, both flavors 16:18:59 <ihrachys> dvr-ha is at steady 30% and I suspect we don't make progress to make it voting 16:19:28 <jlibosva> regarding scenarios - I checked this morning and only consistently failing tests are now east-west fip tests 16:19:59 <jlibosva> I was tempted to use unstable_test decorator to see the stability of others :) 16:20:12 <haleyb> we have not made any progress on that bug ^^ yet, will have to look at re-assigning 16:20:23 <ihrachys> is it https://bugs.launchpad.net/neutron/+bug/1717302 ? 16:20:23 <openstack> Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [High,Confirmed] 16:20:37 <jlibosva> yes 16:20:51 <ihrachys> does it affect both flavors though? this one seems dvr/ha specific 16:21:09 <jlibosva> I checked multinode-dvr only 16:21:40 <jlibosva> or you mean we have non-ha routers there? 16:21:40 <ihrachys> yeah. I picked a random linuxbridge run but it failed with timeout: http://logs.openstack.org/04/492404/19/check/legacy-tempest-dsvm-neutron-scenario-linuxbridge/0decc9a/job-output.txt.gz#_2017-11-21_04_36_12_038942 16:21:44 <ihrachys> so no per test logs 16:22:13 <ihrachys> ok here is a better run: http://logs.openstack.org/83/521683/3/check/legacy-tempest-dsvm-neutron-scenario-linuxbridge/46e952f/logs/testr_results.html.gz 16:23:11 <ihrachys> ah right, linuxbridge is affected by https://bugs.launchpad.net/neutron/+bug/1719711 16:23:11 <openstack> Launchpad bug 1719711 in neutron "iptables failed to apply when binding a port with AGENT.debug_iptables_rules enabled" [High,Confirmed] - Assigned to Brian Haley (brian-haley) 16:23:23 <ihrachys> haleyb, no luck with that one? 16:24:12 <haleyb> ihrachys: no, i have had other priorities, if anyone wants to pick up i can help once we can reproduce it better 16:25:03 <ihrachys> haleyb, it reproduces in gate just fine. do you mean you couldn't reproduce locally? 16:25:57 <haleyb> ihrachys: right, i only saw it locally once 16:26:45 <ihrachys> I see. maybe if you don't have time for it, unassign yourself so that others are aware it's free 16:27:14 <ihrachys> as for floating ip bug, is it still in scope for l3 subteam to figure out the fix? 16:27:19 <ihrachys> or we need someone else too 16:27:36 <mlavalle> it is still in scope 16:28:04 <ihrachys> ok thanks 16:29:09 <ihrachys> let's have a look at fullstack now 16:29:13 <ihrachys> #topic Fullstack 16:29:49 <ihrachys> example failure: http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/testr_results.html.gz 16:31:33 <ihrachys> so connectivity failures seem to be because port hasn't transitioned to ACTIVE 16:31:50 <ihrachys> I checked agent logs here: http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/dsvm-fullstack-logs/TestOvsConnectivitySameNetwork.test_connectivity_VXLAN,openflow-native_/ and I don't see any clear errors/traces though 16:31:56 <ihrachys> also not in neutron-server 16:32:14 <jlibosva> I also wonder why the test_connectivity test wasn't skipped 16:32:23 <ihrachys> why should it? 16:32:33 <ihrachys> have we merged the decorator alredy? 16:32:38 <jlibosva> we haven't? 16:32:40 <jlibosva> wait :) 16:32:41 <mlavalle> I think we did 16:33:08 <ihrachys> yeah we did https://review.openstack.org/514660 16:33:38 <jlibosva> I need to investigate whether the decorator works with scenarios then 16:34:19 <ihrachys> #action jlibosva to figure out why unstable_test didn't work for fullstack scenario case 16:34:33 <ihrachys> another possibility is it just doesn't work :) 16:34:43 <ihrachys> we don't have a test for it 16:34:57 <ihrachys> we should have some fake test that raises an Exception 16:35:02 <ihrachys> with the decorator applied 16:35:06 <ihrachys> that would prove it works 16:35:39 <ihrachys> and we can then do same for scenarios 16:35:43 <slaweq> but I'm pretty sure I saw it was working 16:36:30 <slaweq> http://logs.openstack.org/60/514660/4/check/legacy-neutron-dsvm-fullstack/587b7ff/job-output.txt.gz#_2017-11-14_17_25_56_281752 16:36:35 <slaweq> e.g. here 16:37:09 <ihrachys> hm 16:37:20 <ihrachys> could it be that the output is included nevertheless 16:37:55 <ihrachys> no it lists all 3 as failed: http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/job-output.txt.gz#_2017-11-20_22_26_56_096106 16:38:15 <jlibosva> no, this one http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/testr_results.html.gz lists 2 as skipped 16:38:19 <jlibosva> just one is failed ... 16:38:33 <frickler> does the decorator not work if the failure is in the class setup instead of the test itself? 16:38:34 <slaweq> in example which I gave it's marked as skipped: http://logs.openstack.org/60/514660/4/check/legacy-neutron-dsvm-fullstack/587b7ff/logs/testr_results.html.gz 16:38:48 <jlibosva> frickler: yes, in class setup it won't work 16:38:57 <jlibosva> yeah :) 16:38:58 <jlibosva> frickler++ 16:39:05 <jlibosva> it didn't even build the env 16:39:17 <ihrachys> huh ok good :) 16:40:13 <ihrachys> jlibosva, btw we banned the test case because of https://bugs.launchpad.net/neutron/+bug/1728948 but seems like what we see is not related to this bug 16:40:13 <openstack> Launchpad bug 1728948 in neutron "fullstack: test_connectivity fails due to dhclient crash" [High,New] - Assigned to Jakub Libosvar (libosvar) 16:40:25 <ihrachys> jlibosva, what's the plan? make it work for classes somehow? 16:41:10 <jlibosva> ihrachys: no, I don't think the decorator should work for classes. If env is not build, is very severe and I don't think we should skip 16:42:19 <ihrachys> ok. we need to report a bug for the failure. anyone dare? 16:42:25 <jlibosva> I can invsetigate 16:42:36 <ihrachys> at least I don't see it in gate-failure list, maybe it's somewhere 16:42:40 <ihrachys> jlibosva, thanks! 16:43:04 <ihrachys> #action jlibosva to investigate / report a bug for env deployment failure in fullstack because of port down 16:43:50 <ihrachys> the other failure in the logs, test_controller_timeout_does_not_break_connectivity_sigkill, is not in setUp though 16:44:28 <ihrachys> failed in block_until_boot 16:44:45 <jlibosva> could be the dhclient crash? 16:45:17 <jlibosva> ah, no. it's a port status to become active 16:46:16 <ihrachys> yeah. I think we had a bug reported for the test case failure before 16:46:43 <ihrachys> this: https://bugs.launchpad.net/neutron/+bug/1673531 16:46:43 <openstack> Launchpad bug 1673531 in neutron "fullstack test_controller_timeout_does_not_break_connectivity_sigkill(GRE and l2pop,openflow-native_ovsdb-cli) failure" [Undecided,Fix released] 16:46:50 <ihrachys> I should reopen it 16:47:46 <ihrachys> #action ihrachys to investigate latest https://bugs.launchpad.net/neutron/+bug/1673531 failures 16:47:46 <openstack> Launchpad bug 1673531 in neutron "fullstack test_controller_timeout_does_not_break_connectivity_sigkill(GRE and l2pop,openflow-native_ovsdb-cli) failure" [High,Confirmed] - Assigned to Ihar Hrachyshka (ihar-hrachyshka) 16:48:13 <ihrachys> and finally, we have test_dscp_marking_packets(openflow-native) failing there 16:48:18 <ihrachys> with: neutron.tests.common.agents.l2_extensions.TcpdumpException: No packets marked with DSCP = 16 received from 10.0.0.8 to 10.0.0.11 16:49:46 <ihrachys> I would need to look into what the test case does but 16:50:01 <ihrachys> only ovs agents are running there, and neutron-server has this: 16:50:01 <ihrachys> http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/dsvm-fullstack-logs/TestDscpMarkingQoSOvs.test_dscp_marking_packets_openflow-native_/neutron-server--2017-11-20--22-12-38-673206.txt.gz?level=TRACE 16:50:58 <slaweq> I can try to look at this one it You want 16:51:07 <ihrachys> by the looks of it, it's as expected that no l3 / dhcp is there 16:51:27 <ihrachys> slaweq, yes please. I think we should start with reporting a bug so that we can capture details there. 16:51:46 <slaweq> ok, I will report bug for it 16:51:46 <jlibosva> could it be that it got notification from ovsdb monitor? :) 16:51:51 <ihrachys> #action slaweq to investigate / report a bug for test_dscp_marking_packets fullstack failure 16:52:02 <ihrachys> jlibosva, what do you mean 16:52:57 <jlibosva> ihrachys: nothing, brain fart. ignore me 16:52:59 <jlibosva> it's server log 16:53:58 <ihrachys> slaweq, irrespective of what actually fails, I think it may also make sense to look if we can suppress those warnings. not having l3 / dhcp is, in theory, a reasonable setup (well, maybe not for dhcp because enable_dhcp is not implemented with a service plugin, but surely you can have a setup without router service plugin) 16:54:41 <ihrachys> #topic Tempest plugin 16:54:49 <slaweq> ok 16:54:57 <ihrachys> as you know, we transition to a new repo for tempest tests 16:55:12 <ihrachys> etherpad tracking patches: https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move 16:55:38 <ihrachys> tl;dr we remove jobs from neutron, move them to new repo, inherit them from neutron repo 16:55:59 <ihrachys> I still haven't heard about what we do for stable branches 16:56:07 <ihrachys> there are some patches attempting removal of legacy jobs 16:56:12 <ihrachys> but they are still to be used for stable 16:56:23 <ihrachys> if you have answer please speak up 16:56:25 <mlavalle> I did some research on that 16:56:44 <mlavalle> Please look at https://docs.openstack.org/infra/manual/zuulv3.html#stable-branches 16:57:25 <mlavalle> The summary is that each stable branch has to have its own .zuul.yaml and playbooks 16:57:33 <ihrachys> ok. so we need to move legacy jobs into stable 16:57:38 <ihrachys> before removing them in infra rpeos 16:57:39 <ihrachys> *repos 16:57:40 <mlavalle> correct 16:57:56 <mlavalle> I also had a conversation with the infra team and they confirmed 16:58:07 <ihrachys> can we in the meantime filter legacy out for master as we did with zuulv2? 16:58:22 <mlavalle> http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-11-21.log.html#t2017-11-21T15:29:19 16:58:35 <mlavalle> I think we can do that 16:58:41 <ihrachys> ok 16:59:03 <ihrachys> final thing I'd like to mention is that apparently there was a breakage in new tempest repo by one of new scenarios 16:59:12 <ihrachys> because create_server changed its signature somewhat 16:59:19 <ihrachys> so the fix is https://review.openstack.org/#/c/521919/ please review 16:59:36 <ihrachys> assuming it helps. if not, we have a revert here: https://review.openstack.org/#/c/521898/ 16:59:48 <ihrachys> though I think we can live without a revert for some time because it's not voting anywhere. 17:00:11 <ihrachys> (we don't even have those jobs anywhere right now because we cleaned them up in neutron repo already) 17:00:16 <ihrachys> anyway, we are out of time 17:00:18 <ihrachys> thanks everyone 17:00:20 <ihrachys> #endmeeting