16:00:44 <slaweq> #startmeeting neutron_ci 16:00:44 <openstack> Meeting started Tue Apr 24 16:00:44 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 <slaweq> hi 16:00:47 <openstack> The meeting name has been set to 'neutron_ci' 16:00:48 <mlavalle> o/ 16:01:03 <mlavalle> slaweq: long time no see ;-) 16:01:12 <slaweq> yeah :) 16:01:46 <jlibosva> o/ 16:01:50 <haleyb> hi 16:02:20 <slaweq> ok, so let's start 16:02:36 <openstack> slaweq: Error: Can't start another meeting, one is in progress. Use #endmeeting first. 16:02:43 <slaweq> sorry 16:02:47 <slaweq> wrong copy paste :) 16:02:48 <slaweq> #topic Actions from previous meetings 16:02:57 <njohnston> o/ 16:03:15 <mlavalle> Nice to see you in this meeting also Nate 16:03:19 <slaweq> hi njohnston - long time :) 16:03:29 <jlibosva> I greet you njohnston :) 16:03:34 <slaweq> so first action was: 16:03:34 <slaweq> * slaweq will check old gate-failure bugs 16:04:14 <slaweq> I went through list of bugs and send an summary email during the weekend: I wrote some summary: http://lists.openstack.org/pipermail/openstack-dev/2018-April/129625.html 16:04:47 <slaweq> basically I set some bugs as incomplete 16:04:49 <mlavalle> that was a very nice thing to do 16:04:55 <mlavalle> good initiative 16:04:58 <slaweq> thx 16:05:21 <slaweq> please check this list - maybe some bugs which I wasn't sure can be closed also 16:05:43 <slaweq> some of them are really old :) 16:06:00 <slaweq> ok, next one was: * jlibosva take a look on failing ovsfw blink functional test 16:06:06 <jlibosva> I did 16:06:15 <jlibosva> the fix is about to get merged: https://review.openstack.org/#/c/562220/ 16:07:04 <slaweq> it's already +W so almost done :) 16:07:27 <slaweq> thx jlibosva 16:07:35 <slaweq> ok, next one: * slaweq will check failed SG fullstack test 16:07:49 <slaweq> I didn't have time to look at it yet - sorry 16:08:01 <slaweq> I will do it this week 16:08:07 <slaweq> #action slaweq will check failed SG fullstack test 16:08:29 <slaweq> and the last one was: ihar will check if job output are indexed in logstash 16:08:41 <slaweq> but I think ihar is not here now 16:08:48 <slaweq> do You know if he did something with that? 16:08:50 <jlibosva> he did and he showed me a patch that should fix it 16:08:54 <jlibosva> sec 16:09:25 <mlavalle> yeah, I remember him mentioning something in the Neutron channel 16:09:44 <jlibosva> this https://review.openstack.org/#/c/562042/ 16:10:52 <jlibosva> the py35 job is failing with http://logs.openstack.org/42/562042/2/check/tox-py35-on-zuul/1288651/job-output.txt.gz#_2018-04-23_17_05_56_544588 16:11:13 <jlibosva> clarkb: hi, we're looking at your patch ^^ is this py35 failure legitimate or some intermittent issue? 16:12:09 <jlibosva> seems like clarkb is not here :) slaweq I think we can move on, we have some eyes on it :) 16:12:23 <mlavalle> he'll show up 16:12:27 <clarkb> ya I'm lurking 16:12:31 <slaweq> jlibosva: ok, thx for update about it 16:12:32 <mlavalle> see 16:12:34 <jlibosva> \o/ 16:12:36 <slaweq> hi clarkb :) 16:12:52 <mlavalle> uncanny ability to know where he is needed 16:13:19 <clarkb> oh that one. The problem was actually elsewhere (I think I should still fix the consistency problem though) 16:13:29 <clarkb> I believe we are indexing neutron jobs properly for a few days now /me digs up the actual fix 16:13:39 <clarkb> would be good if you can double check though 16:14:00 <clarkb> https://review.openstack.org/#/c/562070/ was the fix 16:14:55 <jlibosva> I'll check, thanks clarkb :) 16:16:33 <slaweq> clarkb: from quick check it looks that it's working fine 16:16:37 <slaweq> thx 16:17:17 <slaweq> ok, that was all actions from previous week 16:17:19 <slaweq> #topic Grafana 16:17:24 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:19:24 <slaweq> looking for last 7 days, there was one spike at 22.04 where many jobs was failing 100% 16:19:56 <slaweq> but as all jobs were dropped down in same time also, I suppose it was something not related to neutron directly :) 16:20:16 <slaweq> there is problem with one periodic job but I will talk about it later 16:20:35 <slaweq> and also we had quite high failure rate for functional tests last week 16:20:35 <njohnston> and the 100% peak happens right before a one-hour discontinuity so I bet there was a problem elsewhere 16:20:51 <slaweq> njohnston++ 16:21:51 <slaweq> functional tests, as I checked few failures was caused by problem with blink firewall which jlibosva already fixed :) 16:22:08 <slaweq> so that should be good now 16:22:35 <slaweq> do You see there anything worth mentioning also? :) 16:23:58 <jlibosva> the tempest-plugin-dvr-multinode is at 36%, is it all because of the trunk tests failing occasionally? 16:24:44 <slaweq> jlibosva: yes, I have that for one of next topics :) 16:24:48 <jlibosva> ok 16:24:50 <jlibosva> sorry :) 16:24:55 <slaweq> no problem :) 16:25:07 <mlavalle> jumping the gun, jlibosva ;-) 16:25:07 <slaweq> so let's talk about scenario jobs now 16:25:15 <slaweq> #topic Scenarios 16:25:42 <slaweq> as jlibosva pointed, neutron-tempest-plugin-dvr-multinode-scenario is still around 40% of failures. 16:25:57 <slaweq> and I checked failures from about last 2 days 16:26:04 <slaweq> I found 2 culprits: 16:26:13 <slaweq> * neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle - happens more often, like: 16:26:25 <slaweq> * http://logs.openstack.org/03/560703/7/check/neutron-tempest-plugin-dvr-multinode-scenario/1f67afd/logs/testr_results.html.gz 16:26:25 <slaweq> * http://logs.openstack.org/17/553617/19/check/neutron-tempest-plugin-dvr-multinode-scenario/a13a6fd/logs/testr_results.html.gz 16:26:25 <slaweq> * http://logs.openstack.org/84/533284/5/check/neutron-tempest-plugin-dvr-multinode-scenario/1c09aa6/logs/testr_results.html.gz 16:26:51 <slaweq> and second neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_subport_connectivity - which happens less frequent, like: 16:26:56 <slaweq> * http://logs.openstack.org/90/545490/9/check/neutron-tempest-plugin-dvr-multinode-scenario/c1ed535/logs/testr_results.html.gz 16:27:16 <slaweq> I didn't found any other failures of this job in last few days 16:27:31 <slaweq> jlibosva: is this related to what You were debugging already? 16:27:38 <mlavalle> so yeah, mostly TrunkTest 16:27:41 <jlibosva> slaweq: yes but I didn't get far yet 16:27:51 <jlibosva> slaweq: I mostly tried to get a reproducer 16:29:14 <slaweq> jlibosva: I can help and take a look on logs if You want 16:29:28 <slaweq> maybe we will find something 16:29:40 <jlibosva> slaweq: I welcome any help I can get a): 16:29:42 <slaweq> is there any bug related to it reported already? 16:29:44 <jlibosva> :) 16:30:17 <jlibosva> I don't think there is 16:30:33 <slaweq> so I will report it today 16:30:35 <slaweq> fine? 16:30:38 <jlibosva> ok 16:31:00 <slaweq> #action slaweq will report bug about failing trunk tests in dvr multinode scenario 16:31:16 <slaweq> maybe we should also mark those tests are unstable until we will fix it? 16:31:24 <slaweq> what do You think about that? 16:31:35 <jlibosva> yep, makes sense 16:31:46 <jlibosva> we know the failure rate of it now so let's disable it 16:32:03 <slaweq> mlavalle: ok with that? 16:32:21 <mlavalle> yeah, that's ok 16:32:25 <slaweq> ok 16:32:42 <slaweq> jlibosva: will You do it or should I after I will create a bug report? 16:32:59 <jlibosva> slaweq: I can do it if you want 16:33:04 <slaweq> sure, thx 16:33:26 <slaweq> #action jlibosva will mark trunk scenario tests as unstable for now 16:33:27 <slaweq> thx 16:33:52 <slaweq> speaking about other scenario jobs, IMO all is quite fine 16:34:02 <slaweq> no any urgent problems which I'm aware of :) 16:34:29 <mlavalle> cool 16:35:06 <slaweq> so I think we can move on to next topic 16:35:11 <slaweq> #topic Fullstack 16:35:43 <slaweq> I know that there is still this issue with SG tests sometimes and I will check it 16:36:10 <slaweq> but it's not very common failure and fullstack tests are mostly fine 16:36:25 <slaweq> in gate queue fullstack is at 0% for most of the time even :) 16:36:38 <jlibosva> that's so great :) 16:36:40 <slaweq> I wanted to mention 2 things related to fullstack also: 16:36:46 <jlibosva> really great job slaweq for making it work! 16:36:54 <slaweq> jlibosva: it wasn't only me :) 16:37:06 <njohnston> +100 16:37:06 <slaweq> great job of the team 16:37:43 <slaweq> so I wanted to mention that jlibosva did a patch with new firewall test: https://review.openstack.org/#/c/563159/ which waits for reviews :) 16:38:04 <njohnston> checking it out 16:38:32 <slaweq> and second thing, I reported "bug" https://bugs.launchpad.net/neutron/+bug/1765519 to add fullstack test which will cover API operations on shared networks 16:38:33 <openstack> Launchpad bug 1765519 in neutron "Add fullstack tests for shared networks API" [Wishlist,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:38:49 <slaweq> we had such tests in tempest but we had to remove them from tempest 16:38:59 <slaweq> so now there are only UT for it 16:39:22 <slaweq> do You think it is reasonable to add such fullstack tests? or it's wasting of time maybe? 16:40:04 <jlibosva> I think it's worth it to prevent regressions 16:40:13 <mlavalle> I think so 16:40:16 <jlibosva> since we have fullstack voting, it's a legitimate testing framework now 16:40:40 <slaweq> ok, if I have 2 votes for it, I will do it :) Thx 16:41:07 <slaweq> do You have anything to add regarding to fullstack? 16:41:50 <jlibosva> nope 16:42:00 <slaweq> so, next topic 16:42:10 <slaweq> #topic Rally 16:42:32 <slaweq> rally is still between 10 and 30% of failures 16:42:57 <slaweq> as I checked few recent failures, it's always because of global timeout reached, like: 16:43:05 <slaweq> * http://logs.openstack.org/24/558724/11/check/neutron-rally-neutron/d891678/job-output.txt.gz 16:43:05 <slaweq> * http://logs.openstack.org/90/545490/9/check/neutron-rally-neutron/9de921a/job-output.txt.gz 16:43:39 <slaweq> so maybe it could be worth to check report a bug and check why it takes so long sometimes? 16:43:49 <mlavalle> yes 16:43:58 <mlavalle> with the infra team, I guess 16:44:05 <slaweq> or maybe it's always very close to the limit and we just should increase it? 16:44:22 <mlavalle> why not give it a try? 16:44:51 <slaweq> do we have any volunteer to check that? :) 16:45:14 <slaweq> if not, I can check it 16:45:50 <mlavalle> slaweq: if you are out of bandwidth, let me know and I can help 16:45:54 <slaweq> ok, I will check how long it takes for "ok" runs and will report a bug for that 16:46:04 <slaweq> thx mlavalle :) 16:46:34 <slaweq> I think I will have time to check it :) 16:46:40 <mlavalle> ok 16:46:52 <slaweq> #action slaweq to check rally timeouts and report a bug about that 16:47:02 <slaweq> ok, next topic 16:47:03 <slaweq> #topic Periodic 16:47:13 <slaweq> we have one job failing since few days again 16:47:22 <slaweq> it's neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4 16:47:37 <slaweq> example of failure: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron-dynamic-routing/master/neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4/708499c/job-output.txt.gz 16:47:58 <slaweq> IMO possible culprit: https://review.openstack.org/#/c/560465/ but I didn’t check it deeply so it might be only trigger for some other failure in fact :) 16:50:40 <slaweq> any ideas? 16:50:42 <haleyb> slaweq: that probably is causing the failure since it changed that code 16:51:18 <haleyb> i'm not sure if federico is on irc 16:52:04 <slaweq> he is probably offline 16:52:33 <slaweq> ok, I will report a bug for that and will try to catch him on irc 16:52:35 <haleyb> i can take a look and/or ping him instead of reverting 16:52:40 <haleyb> you win :) 16:52:48 <slaweq> haleyb: ok :) 16:53:21 <haleyb> it's periodic so non-voting right? 16:53:40 <slaweq> yes, it's periodic job for neutron-dynamic-routing 16:54:01 <slaweq> so I don't think it is very urgent but still should be fixed somehow :) 16:54:13 <haleyb> yes, agreed 16:54:37 <slaweq> #action slaweq will report a bug and talk with Federico about issue with neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4 16:54:56 <slaweq> for other periodic jobs it's working fine 16:55:12 <slaweq> failures which we had during last week wasn't related to neutron 16:55:26 <slaweq> I think we can move to last topic finally :) 16:55:48 <slaweq> #topic others 16:56:37 <slaweq> Just if someone missed it, I wanted to mention that we had issue with stable/queens jobs during last week: https://bugs.launchpad.net/neutron/+bug/1765008 16:56:38 <openstack> Launchpad bug 1765008 in tripleo "Tempest API tests failing for stable/queens branch" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) 16:56:54 <slaweq> it should be already fixed 16:57:11 <slaweq> there is also patch https://review.openstack.org/#/c/562364/ proposed to avoid such regressions 16:57:51 <slaweq> IIUC with this patch we will run jobs for both neutron master and stable/queens branches for each patch in neutron-tempest-plugin repo 16:58:13 <slaweq> and we will have to remember to add same set of jobs for next stable branches in future 16:58:19 <mlavalle> pike also? 16:58:23 <slaweq> no 16:58:35 <slaweq> pike and ocata are using old tempest tests from neutron repo 16:58:42 <slaweq> so are not affected 16:58:59 <slaweq> only master and queens are running tempest tests from neutron-tempest-plugin repo 16:59:58 <slaweq> ok, as we are almost out of time, I think we can end meeting now 17:00:04 <slaweq> thanks everyone for attending 17:00:08 <slaweq> and see You next week :) 17:00:09 <slaweq> bye 17:00:11 <jlibosva> thanks, bye o/ 17:00:14 <slaweq> #endmeeting