16:00:15 <ihrachys> #startmeeting neutron_ci
16:00:16 <openstack> Meeting started Tue Nov 21 16:00:15 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:20 <openstack> The meeting name has been set to 'neutron_ci'
16:00:23 <mlavalle> o/
16:00:25 <jlibosva> o/
16:01:00 <ihrachys> hi folks. giving 2 mins for others to join.
16:01:09 <frickler> o/
16:01:46 <haleyb> hi
16:02:47 <ihrachys> #topic Actions from prev meeting
16:03:00 <ihrachys> "ihrachys to pull oslo folks into reviewing rootwrap patch"
16:03:09 <ihrachys> the patch for oslo.rootwrap was merged
16:03:29 <ihrachys> https://review.openstack.org/514547
16:03:35 <ihrachys> and new oslo.rootwrap released
16:03:41 <ihrachys> also upper-constraints updated
16:03:51 <ihrachys> so at this point, gate (master) should have the fix
16:04:23 <ihrachys> if you still see this particular failure (either eventlet error in an agent, or commands receiving output of previous commands), please speak up
16:04:39 <jlibosva> oh, I saw this morning fullstack failure rate about 70%, so maybe that was it :)
16:04:40 <ihrachys> I have backports for the fix for stable: https://review.openstack.org/#/q/Id9d38832c67f2d81d382cda797a48fee943a27f1
16:04:48 <ihrachys> but I wanted to give it some time to prove itself
16:04:59 <ihrachys> jlibosva, yeah it went down somewhat
16:05:14 <ihrachys> at 65% right now
16:05:25 <slaweq> hello, sorry for late
16:05:32 <ihrachys> slaweq, hey!
16:05:47 <mlavalle> slaweq: you in Paris?
16:05:56 <slaweq> yes
16:06:00 <ihrachys> slaweq, fyi the rootwrap issue should be fixed in master. if you see it, then the patch didn't help.
16:06:16 <slaweq> ok, thx for info
16:06:21 <ihrachys> next is "mlavalle to track down "TypeError: None is not str() or unicode()!" error in dhcp agent fullstack tests"
16:06:38 <ihrachys> I believe l3 team was going to look into it
16:06:39 <mlavalle> haleyb proposed this fix https://review.openstack.org/#/c/520710/
16:07:28 <mlavalle> haleyb: is this the right patch?
16:07:41 <mlavalle> yes it is
16:08:16 <ihrachys> didn't know haleyb is mlavalle's sockpuppet account
16:08:23 <ihrachys> sorry, couldn't resist
16:08:53 <ihrachys> so it's WIP, is it because you can't reproduce with new oslo?
16:10:05 <haleyb> sorry, someone rang bell and dog went crazy
16:10:38 <haleyb> ihrachys: right, i don't see it in the logs, but haven't looked in logstash
16:11:15 <ihrachys> haleyb, note stable branches are still affected so you will need to filter them out
16:11:41 <ihrachys> oh actually no, maybe posting the patch to stable will reveal it easier
16:11:50 <ihrachys> because it still doesn't have the packages
16:13:04 <haleyb> ok, i can do that while i'm searching logstash
16:13:14 <ihrachys> ok those were all actions we had
16:13:19 <ihrachys> #topic Grafana
16:13:24 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:13:40 <ihrachys> before we dive into data... why is it that gate coverage dashboard is empty?
16:13:45 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=6&fullscreen
16:13:54 <ihrachys> probably a name change?
16:13:58 <ihrachys> for the job
16:14:42 <haleyb> ihrachys: that can happen if it's never failed
16:15:17 <ihrachys> hm ok I see
16:16:20 <ihrachys> yeah the name in project-config seems to be correct
16:16:20 <haleyb> the unit tests in the gate had that "No datapoints" until something in infra blew up
16:16:52 <ihrachys> ok. speaking of data...
16:17:17 <ihrachys> as we already mentioned, fullstack went somewhat down, now at 65%
16:17:32 <ihrachys> still a long way to go but it's the right direction
16:17:37 <ihrachys> we sat at 100% for a while
16:18:03 <ihrachys> scenarios are back at 100%, both flavors
16:18:59 <ihrachys> dvr-ha is at steady 30% and I suspect we don't make progress to make it voting
16:19:28 <jlibosva> regarding scenarios - I checked this morning and only consistently failing tests are now east-west fip tests
16:19:59 <jlibosva> I was tempted to use unstable_test decorator to see the stability of others :)
16:20:12 <haleyb> we have not made any progress on that bug ^^ yet, will have to look at re-assigning
16:20:23 <ihrachys> is it https://bugs.launchpad.net/neutron/+bug/1717302 ?
16:20:23 <openstack> Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [High,Confirmed]
16:20:37 <jlibosva> yes
16:20:51 <ihrachys> does it affect both flavors though? this one seems dvr/ha specific
16:21:09 <jlibosva> I checked multinode-dvr only
16:21:40 <jlibosva> or you mean we have non-ha routers there?
16:21:40 <ihrachys> yeah. I picked a random linuxbridge run but it failed with timeout: http://logs.openstack.org/04/492404/19/check/legacy-tempest-dsvm-neutron-scenario-linuxbridge/0decc9a/job-output.txt.gz#_2017-11-21_04_36_12_038942
16:21:44 <ihrachys> so no per test logs
16:22:13 <ihrachys> ok here is a better run: http://logs.openstack.org/83/521683/3/check/legacy-tempest-dsvm-neutron-scenario-linuxbridge/46e952f/logs/testr_results.html.gz
16:23:11 <ihrachys> ah right, linuxbridge is affected by https://bugs.launchpad.net/neutron/+bug/1719711
16:23:11 <openstack> Launchpad bug 1719711 in neutron "iptables failed to apply when binding a port with AGENT.debug_iptables_rules enabled" [High,Confirmed] - Assigned to Brian Haley (brian-haley)
16:23:23 <ihrachys> haleyb, no luck with that one?
16:24:12 <haleyb> ihrachys: no, i have had other priorities, if anyone wants to pick up i can help once we can reproduce it better
16:25:03 <ihrachys> haleyb, it reproduces in gate just fine. do you mean you couldn't reproduce locally?
16:25:57 <haleyb> ihrachys: right, i only saw it locally once
16:26:45 <ihrachys> I see. maybe if you don't have time for it, unassign yourself so that others are aware it's free
16:27:14 <ihrachys> as for floating ip bug, is it still in scope for l3 subteam to figure out the fix?
16:27:19 <ihrachys> or we need someone else too
16:27:36 <mlavalle> it is still in scope
16:28:04 <ihrachys> ok thanks
16:29:09 <ihrachys> let's have a look at fullstack now
16:29:13 <ihrachys> #topic Fullstack
16:29:49 <ihrachys> example failure: http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/testr_results.html.gz
16:31:33 <ihrachys> so connectivity failures seem to be because port hasn't transitioned to ACTIVE
16:31:50 <ihrachys> I checked agent logs here: http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/dsvm-fullstack-logs/TestOvsConnectivitySameNetwork.test_connectivity_VXLAN,openflow-native_/ and I don't see any clear errors/traces though
16:31:56 <ihrachys> also not in neutron-server
16:32:14 <jlibosva> I also wonder why the test_connectivity test wasn't skipped
16:32:23 <ihrachys> why should it?
16:32:33 <ihrachys> have we merged the decorator alredy?
16:32:38 <jlibosva> we haven't?
16:32:40 <jlibosva> wait :)
16:32:41 <mlavalle> I think we did
16:33:08 <ihrachys> yeah we did https://review.openstack.org/514660
16:33:38 <jlibosva> I need to investigate whether the decorator works with scenarios then
16:34:19 <ihrachys> #action jlibosva to figure out why unstable_test didn't work for fullstack scenario case
16:34:33 <ihrachys> another possibility is it just doesn't work :)
16:34:43 <ihrachys> we don't have a test for it
16:34:57 <ihrachys> we should have some fake test that raises an Exception
16:35:02 <ihrachys> with the decorator applied
16:35:06 <ihrachys> that would prove it works
16:35:39 <ihrachys> and we can then do same for scenarios
16:35:43 <slaweq> but I'm pretty sure I saw it was working
16:36:30 <slaweq> http://logs.openstack.org/60/514660/4/check/legacy-neutron-dsvm-fullstack/587b7ff/job-output.txt.gz#_2017-11-14_17_25_56_281752
16:36:35 <slaweq> e.g. here
16:37:09 <ihrachys> hm
16:37:20 <ihrachys> could it be that the output is included nevertheless
16:37:55 <ihrachys> no it lists all 3 as failed: http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/job-output.txt.gz#_2017-11-20_22_26_56_096106
16:38:15 <jlibosva> no, this one http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/testr_results.html.gz lists 2 as skipped
16:38:19 <jlibosva> just one is failed ...
16:38:33 <frickler> does the decorator not work if the failure is in the class setup instead of the test itself?
16:38:34 <slaweq> in example which I gave it's marked as skipped: http://logs.openstack.org/60/514660/4/check/legacy-neutron-dsvm-fullstack/587b7ff/logs/testr_results.html.gz
16:38:48 <jlibosva> frickler: yes, in class setup it won't work
16:38:57 <jlibosva> yeah :)
16:38:58 <jlibosva> frickler++
16:39:05 <jlibosva> it didn't even build the env
16:39:17 <ihrachys> huh ok good :)
16:40:13 <ihrachys> jlibosva, btw we banned the test case because of https://bugs.launchpad.net/neutron/+bug/1728948 but seems like what we see is not related to this bug
16:40:13 <openstack> Launchpad bug 1728948 in neutron "fullstack: test_connectivity fails due to dhclient crash" [High,New] - Assigned to Jakub Libosvar (libosvar)
16:40:25 <ihrachys> jlibosva, what's the plan? make it work for classes somehow?
16:41:10 <jlibosva> ihrachys: no, I don't think the decorator should work for classes. If env is not build, is very severe and I don't think we should skip
16:42:19 <ihrachys> ok. we need to report a bug for the failure. anyone dare?
16:42:25 <jlibosva> I can invsetigate
16:42:36 <ihrachys> at least I don't see it in gate-failure list, maybe it's somewhere
16:42:40 <ihrachys> jlibosva, thanks!
16:43:04 <ihrachys> #action jlibosva to investigate / report a bug for env deployment failure in fullstack because of port down
16:43:50 <ihrachys> the other failure in the logs, test_controller_timeout_does_not_break_connectivity_sigkill, is not in setUp though
16:44:28 <ihrachys> failed in block_until_boot
16:44:45 <jlibosva> could be the dhclient crash?
16:45:17 <jlibosva> ah, no. it's a port status to become active
16:46:16 <ihrachys> yeah. I think we had a bug reported for the test case failure before
16:46:43 <ihrachys> this: https://bugs.launchpad.net/neutron/+bug/1673531
16:46:43 <openstack> Launchpad bug 1673531 in neutron "fullstack test_controller_timeout_does_not_break_connectivity_sigkill(GRE and l2pop,openflow-native_ovsdb-cli) failure" [Undecided,Fix released]
16:46:50 <ihrachys> I should reopen it
16:47:46 <ihrachys> #action ihrachys to investigate latest https://bugs.launchpad.net/neutron/+bug/1673531 failures
16:47:46 <openstack> Launchpad bug 1673531 in neutron "fullstack test_controller_timeout_does_not_break_connectivity_sigkill(GRE and l2pop,openflow-native_ovsdb-cli) failure" [High,Confirmed] - Assigned to Ihar Hrachyshka (ihar-hrachyshka)
16:48:13 <ihrachys> and finally, we have test_dscp_marking_packets(openflow-native) failing there
16:48:18 <ihrachys> with: neutron.tests.common.agents.l2_extensions.TcpdumpException: No packets marked with DSCP = 16 received from 10.0.0.8 to 10.0.0.11
16:49:46 <ihrachys> I would need to look into what the test case does but
16:50:01 <ihrachys> only ovs agents are running there, and neutron-server has this:
16:50:01 <ihrachys> http://logs.openstack.org/71/520371/7/check/legacy-neutron-dsvm-fullstack/ad585a2/logs/dsvm-fullstack-logs/TestDscpMarkingQoSOvs.test_dscp_marking_packets_openflow-native_/neutron-server--2017-11-20--22-12-38-673206.txt.gz?level=TRACE
16:50:58 <slaweq> I can try to look at this one it You want
16:51:07 <ihrachys> by the looks of it, it's as expected that no l3 / dhcp is there
16:51:27 <ihrachys> slaweq, yes please. I think we should start with reporting a bug so that we can capture details there.
16:51:46 <slaweq> ok, I will report bug for it
16:51:46 <jlibosva> could it be that it got notification from ovsdb monitor? :)
16:51:51 <ihrachys> #action slaweq to investigate / report a bug for test_dscp_marking_packets fullstack failure
16:52:02 <ihrachys> jlibosva, what do you mean
16:52:57 <jlibosva> ihrachys: nothing, brain fart. ignore me
16:52:59 <jlibosva> it's server log
16:53:58 <ihrachys> slaweq, irrespective of what actually fails, I think it may also make sense to look if we can suppress those warnings. not having l3 / dhcp is, in theory, a reasonable setup (well, maybe not for dhcp because enable_dhcp is not implemented with a service plugin, but surely you can have a setup without router service plugin)
16:54:41 <ihrachys> #topic Tempest plugin
16:54:49 <slaweq> ok
16:54:57 <ihrachys> as you know, we transition to a new repo for tempest tests
16:55:12 <ihrachys> etherpad tracking patches: https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move
16:55:38 <ihrachys> tl;dr we remove jobs from neutron, move them to new repo, inherit them from neutron repo
16:55:59 <ihrachys> I still haven't heard about what we do for stable branches
16:56:07 <ihrachys> there are some patches attempting removal of legacy jobs
16:56:12 <ihrachys> but they are still to be used for stable
16:56:23 <ihrachys> if you have answer please speak up
16:56:25 <mlavalle> I did some research on that
16:56:44 <mlavalle> Please look at https://docs.openstack.org/infra/manual/zuulv3.html#stable-branches
16:57:25 <mlavalle> The summary is that each stable branch has to have its own .zuul.yaml and playbooks
16:57:33 <ihrachys> ok. so we need to move legacy  jobs into stable
16:57:38 <ihrachys> before removing them in infra rpeos
16:57:39 <ihrachys> *repos
16:57:40 <mlavalle> correct
16:57:56 <mlavalle> I also had a conversation with the infra team and they confirmed
16:58:07 <ihrachys> can we in the meantime filter legacy out for master as we did with zuulv2?
16:58:22 <mlavalle> http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-11-21.log.html#t2017-11-21T15:29:19
16:58:35 <mlavalle> I think we can do that
16:58:41 <ihrachys> ok
16:59:03 <ihrachys> final thing I'd like to mention is that apparently there was a breakage in new tempest repo by one of new scenarios
16:59:12 <ihrachys> because create_server changed its signature somewhat
16:59:19 <ihrachys> so the fix is https://review.openstack.org/#/c/521919/ please review
16:59:36 <ihrachys> assuming it helps. if not, we have a revert here: https://review.openstack.org/#/c/521898/
16:59:48 <ihrachys> though I think we can live without a revert for some time because it's not voting anywhere.
17:00:11 <ihrachys> (we don't even have those jobs anywhere right now because we cleaned them up in neutron repo already)
17:00:16 <ihrachys> anyway, we are out of time
17:00:18 <ihrachys> thanks everyone
17:00:20 <ihrachys> #endmeeting