#openstack-meeting log

16:00:44 <slaweq> #startmeeting neutron_ci
16:00:44 <openstack> Meeting started Tue Apr 24 16:00:44 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:46 <slaweq> hi
16:00:47 <openstack> The meeting name has been set to 'neutron_ci'
16:00:48 <mlavalle> o/
16:01:03 <mlavalle> slaweq: long time no see ;-)
16:01:12 <slaweq> yeah :)
16:01:46 <jlibosva> o/
16:01:50 <haleyb> hi
16:02:20 <slaweq> ok, so let's start
16:02:36 <openstack> slaweq: Error: Can't start another meeting, one is in progress.  Use #endmeeting first.
16:02:43 <slaweq> sorry
16:02:47 <slaweq> wrong copy paste :)
16:02:48 <slaweq> #topic Actions from previous meetings
16:02:57 <njohnston> o/
16:03:15 <mlavalle> Nice to see you in this meeting also Nate
16:03:19 <slaweq> hi njohnston - long time :)
16:03:29 <jlibosva> I greet you njohnston :)
16:03:34 <slaweq> so first action was:
16:03:34 <slaweq> * slaweq will check old gate-failure bugs
16:04:14 <slaweq> I went through list of bugs and send an summary email during the weekend: I wrote some summary: http://lists.openstack.org/pipermail/openstack-dev/2018-April/129625.html
16:04:47 <slaweq> basically I set some bugs as incomplete
16:04:49 <mlavalle> that was a very nice thing to do
16:04:55 <mlavalle> good initiative
16:04:58 <slaweq> thx
16:05:21 <slaweq> please check this list - maybe some bugs which I wasn't sure can be closed also
16:05:43 <slaweq> some of them are really old :)
16:06:00 <slaweq> ok, next one was: * jlibosva take a look on failing ovsfw blink functional test
16:06:06 <jlibosva> I did
16:06:15 <jlibosva> the fix is about to get merged: https://review.openstack.org/#/c/562220/
16:07:04 <slaweq> it's already +W so almost done :)
16:07:27 <slaweq> thx jlibosva
16:07:35 <slaweq> ok, next one: * slaweq will check failed SG fullstack test
16:07:49 <slaweq> I didn't have time to look at it yet - sorry
16:08:01 <slaweq> I will do it this week
16:08:07 <slaweq> #action slaweq will check failed SG fullstack test
16:08:29 <slaweq> and the last one was: ihar will check if job output are indexed in logstash
16:08:41 <slaweq> but I think ihar is not here now
16:08:48 <slaweq> do You know if he did something with that?
16:08:50 <jlibosva> he did and he showed me a patch that should fix it
16:08:54 <jlibosva> sec
16:09:25 <mlavalle> yeah, I remember him mentioning something in the Neutron channel
16:09:44 <jlibosva> this https://review.openstack.org/#/c/562042/
16:10:52 <jlibosva> the py35 job is failing with http://logs.openstack.org/42/562042/2/check/tox-py35-on-zuul/1288651/job-output.txt.gz#_2018-04-23_17_05_56_544588
16:11:13 <jlibosva> clarkb: hi, we're looking at your patch ^^ is this py35 failure legitimate or some intermittent issue?
16:12:09 <jlibosva> seems like clarkb is not here :) slaweq I think we can move on, we have some eyes on it :)
16:12:23 <mlavalle> he'll show up
16:12:27 <clarkb> ya I'm lurking
16:12:31 <slaweq> jlibosva: ok, thx for update about it
16:12:32 <mlavalle> see
16:12:34 <jlibosva> \o/
16:12:36 <slaweq> hi clarkb :)
16:12:52 <mlavalle> uncanny ability to know where he is needed
16:13:19 <clarkb> oh that one. The problem was actually elsewhere (I think I should still fix the consistency problem though)
16:13:29 <clarkb> I believe we are indexing neutron jobs properly for a few days now /me digs up the actual fix
16:13:39 <clarkb> would be good if you can double check though
16:14:00 <clarkb> https://review.openstack.org/#/c/562070/ was the fix
16:14:55 <jlibosva> I'll check, thanks clarkb :)
16:16:33 <slaweq> clarkb: from quick check it looks that it's working fine
16:16:37 <slaweq> thx
16:17:17 <slaweq> ok, that was all actions from previous week
16:17:19 <slaweq> #topic Grafana
16:17:24 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:19:24 <slaweq> looking for last 7 days, there was one spike at 22.04 where many jobs was failing 100%
16:19:56 <slaweq> but as all jobs were dropped down in same time also, I suppose it was something not related to neutron directly :)
16:20:16 <slaweq> there is problem with one periodic job but I will talk about it later
16:20:35 <slaweq> and also we had quite high failure rate for functional tests last week
16:20:35 <njohnston> and the 100% peak happens right before a one-hour discontinuity so I bet there was a problem elsewhere
16:20:51 <slaweq> njohnston++
16:21:51 <slaweq> functional tests, as I checked few failures was caused by problem with blink firewall which jlibosva already fixed :)
16:22:08 <slaweq> so that should be good now
16:22:35 <slaweq> do You see there anything worth mentioning also? :)
16:23:58 <jlibosva> the tempest-plugin-dvr-multinode is at 36%, is it all because of the trunk tests failing occasionally?
16:24:44 <slaweq> jlibosva: yes, I have that for one of next topics :)
16:24:48 <jlibosva> ok
16:24:50 <jlibosva> sorry :)
16:24:55 <slaweq> no problem :)
16:25:07 <mlavalle> jumping the gun, jlibosva ;-)
16:25:07 <slaweq> so let's talk about scenario jobs now
16:25:15 <slaweq> #topic Scenarios
16:25:42 <slaweq> as jlibosva pointed, neutron-tempest-plugin-dvr-multinode-scenario is still around 40% of failures.
16:25:57 <slaweq> and I checked failures from about last 2 days
16:26:04 <slaweq> I found 2 culprits:
16:26:13 <slaweq> * neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle - happens more often, like:
16:26:25 <slaweq> * http://logs.openstack.org/03/560703/7/check/neutron-tempest-plugin-dvr-multinode-scenario/1f67afd/logs/testr_results.html.gz
16:26:25 <slaweq> * http://logs.openstack.org/17/553617/19/check/neutron-tempest-plugin-dvr-multinode-scenario/a13a6fd/logs/testr_results.html.gz
16:26:25 <slaweq> * http://logs.openstack.org/84/533284/5/check/neutron-tempest-plugin-dvr-multinode-scenario/1c09aa6/logs/testr_results.html.gz
16:26:51 <slaweq> and second neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_subport_connectivity - which happens less frequent, like:
16:26:56 <slaweq> * http://logs.openstack.org/90/545490/9/check/neutron-tempest-plugin-dvr-multinode-scenario/c1ed535/logs/testr_results.html.gz
16:27:16 <slaweq> I didn't found any other failures of this job in last few days
16:27:31 <slaweq> jlibosva: is this related to what You were debugging already?
16:27:38 <mlavalle> so yeah, mostly TrunkTest
16:27:41 <jlibosva> slaweq: yes but I didn't get far yet
16:27:51 <jlibosva> slaweq: I mostly tried to get a reproducer
16:29:14 <slaweq> jlibosva: I can help and take a look on logs if You want
16:29:28 <slaweq> maybe we will find something
16:29:40 <jlibosva> slaweq: I welcome any help I can get a):
16:29:42 <slaweq> is there any bug related to it reported already?
16:29:44 <jlibosva> :)
16:30:17 <jlibosva> I don't think there is
16:30:33 <slaweq> so I will report it today
16:30:35 <slaweq> fine?
16:30:38 <jlibosva> ok
16:31:00 <slaweq> #action slaweq will report bug about failing trunk tests in dvr multinode scenario
16:31:16 <slaweq> maybe we should also mark those tests are unstable until we will fix it?
16:31:24 <slaweq> what do You think about that?
16:31:35 <jlibosva> yep, makes sense
16:31:46 <jlibosva> we know the failure rate of it now so let's disable it
16:32:03 <slaweq> mlavalle: ok with that?
16:32:21 <mlavalle> yeah, that's ok
16:32:25 <slaweq> ok
16:32:42 <slaweq> jlibosva: will You do it or should I after I will create a bug report?
16:32:59 <jlibosva> slaweq: I can do it if you want
16:33:04 <slaweq> sure, thx
16:33:26 <slaweq> #action jlibosva will mark trunk scenario tests as unstable for now
16:33:27 <slaweq> thx
16:33:52 <slaweq> speaking about other scenario jobs, IMO all is quite fine
16:34:02 <slaweq> no any urgent problems which I'm aware of :)
16:34:29 <mlavalle> cool
16:35:06 <slaweq> so I think we can move on to next topic
16:35:11 <slaweq> #topic Fullstack
16:35:43 <slaweq> I know that there is still this issue with SG tests sometimes and I will check it
16:36:10 <slaweq> but it's not very common failure and fullstack tests are mostly fine
16:36:25 <slaweq> in gate queue fullstack is at 0% for most of the time even :)
16:36:38 <jlibosva> that's so great :)
16:36:40 <slaweq> I wanted to mention 2 things related to fullstack also:
16:36:46 <jlibosva> really great job slaweq for making it work!
16:36:54 <slaweq> jlibosva: it wasn't only me :)
16:37:06 <njohnston> +100
16:37:06 <slaweq> great job of the team
16:37:43 <slaweq> so I wanted to mention that jlibosva did a patch with new firewall test: https://review.openstack.org/#/c/563159/ which waits for reviews :)
16:38:04 <njohnston> checking it out
16:38:32 <slaweq> and second thing, I reported "bug" https://bugs.launchpad.net/neutron/+bug/1765519 to add fullstack test which will cover API operations on shared networks
16:38:33 <openstack> Launchpad bug 1765519 in neutron "Add fullstack tests for shared networks API" [Wishlist,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:38:49 <slaweq> we had such tests in tempest but we had to remove them from tempest
16:38:59 <slaweq> so now there are only UT for it
16:39:22 <slaweq> do You think it is reasonable to add such fullstack tests? or it's wasting of time maybe?
16:40:04 <jlibosva> I think it's worth it to prevent regressions
16:40:13 <mlavalle> I think so
16:40:16 <jlibosva> since we have fullstack voting, it's a legitimate testing framework now
16:40:40 <slaweq> ok, if I have 2 votes for it, I will do it :) Thx
16:41:07 <slaweq> do You have anything to add regarding to fullstack?
16:41:50 <jlibosva> nope
16:42:00 <slaweq> so, next topic
16:42:10 <slaweq> #topic Rally
16:42:32 <slaweq> rally is still between 10 and 30% of failures
16:42:57 <slaweq> as I checked few recent failures, it's always because of global timeout reached, like:
16:43:05 <slaweq> * http://logs.openstack.org/24/558724/11/check/neutron-rally-neutron/d891678/job-output.txt.gz
16:43:05 <slaweq> * http://logs.openstack.org/90/545490/9/check/neutron-rally-neutron/9de921a/job-output.txt.gz
16:43:39 <slaweq> so maybe it could be worth to check report a bug and check why it takes so long sometimes?
16:43:49 <mlavalle> yes
16:43:58 <mlavalle> with the infra team, I guess
16:44:05 <slaweq> or maybe it's always very close to the limit and we just should increase it?
16:44:22 <mlavalle> why not give it a try?
16:44:51 <slaweq> do we have any volunteer to check that? :)
16:45:14 <slaweq> if not, I can check it
16:45:50 <mlavalle> slaweq: if you are out of bandwidth, let me know and I can help
16:45:54 <slaweq> ok, I will check how long it takes for "ok" runs and will report a bug for that
16:46:04 <slaweq> thx mlavalle :)
16:46:34 <slaweq> I think I will have time to check it :)
16:46:40 <mlavalle> ok
16:46:52 <slaweq> #action slaweq to check rally timeouts and report a bug about that
16:47:02 <slaweq> ok, next topic
16:47:03 <slaweq> #topic Periodic
16:47:13 <slaweq> we have one job failing since few days again
16:47:22 <slaweq> it's neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4
16:47:37 <slaweq> example of failure: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron-dynamic-routing/master/neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4/708499c/job-output.txt.gz
16:47:58 <slaweq> IMO possible culprit: https://review.openstack.org/#/c/560465/ but I didn’t check it deeply so it might be only trigger for some other failure in fact :)
16:50:40 <slaweq> any ideas?
16:50:42 <haleyb> slaweq: that probably is causing the failure since it changed that code
16:51:18 <haleyb> i'm not sure if federico is on irc
16:52:04 <slaweq> he is probably offline
16:52:33 <slaweq> ok, I will report a bug for that and will try to catch him on irc
16:52:35 <haleyb> i can take a look and/or ping him instead of reverting
16:52:40 <haleyb> you win :)
16:52:48 <slaweq> haleyb: ok :)
16:53:21 <haleyb> it's periodic so non-voting right?
16:53:40 <slaweq> yes, it's periodic job for neutron-dynamic-routing
16:54:01 <slaweq> so I don't think it is very urgent but still should be fixed somehow :)
16:54:13 <haleyb> yes, agreed
16:54:37 <slaweq> #action slaweq will report a bug and talk with Federico about issue with neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4
16:54:56 <slaweq> for other periodic jobs it's working fine
16:55:12 <slaweq> failures which we had during last week wasn't related to neutron
16:55:26 <slaweq> I think we can move to last topic finally :)
16:55:48 <slaweq> #topic others
16:56:37 <slaweq> Just if someone missed it, I wanted to mention that we had issue with stable/queens jobs during last week: https://bugs.launchpad.net/neutron/+bug/1765008
16:56:38 <openstack> Launchpad bug 1765008 in tripleo "Tempest API tests failing for stable/queens branch" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami)
16:56:54 <slaweq> it should be already fixed
16:57:11 <slaweq> there is also patch https://review.openstack.org/#/c/562364/ proposed to avoid such regressions
16:57:51 <slaweq> IIUC with this patch we will run jobs for both neutron master and stable/queens branches for each patch in neutron-tempest-plugin repo
16:58:13 <slaweq> and we will have to remember to add same set of jobs for next stable branches in future
16:58:19 <mlavalle> pike also?
16:58:23 <slaweq> no
16:58:35 <slaweq> pike and ocata are using old tempest tests from neutron repo
16:58:42 <slaweq> so are not affected
16:58:59 <slaweq> only master and queens are running tempest tests from neutron-tempest-plugin repo
16:59:58 <slaweq> ok, as we are almost out of time, I think we can end meeting now
17:00:04 <slaweq> thanks everyone for attending
17:00:08 <slaweq> and see You next week :)
17:00:09 <slaweq> bye
17:00:11 <jlibosva> thanks, bye o/
17:00:14 <slaweq> #endmeeting