16:00:44 #startmeeting neutron_ci 16:00:44 Meeting started Tue Apr 24 16:00:44 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:45 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 hi 16:00:47 The meeting name has been set to 'neutron_ci' 16:00:48 o/ 16:01:03 slaweq: long time no see ;-) 16:01:12 yeah :) 16:01:46 o/ 16:01:50 hi 16:02:20 ok, so let's start 16:02:36 slaweq: Error: Can't start another meeting, one is in progress. Use #endmeeting first. 16:02:43 sorry 16:02:47 wrong copy paste :) 16:02:48 #topic Actions from previous meetings 16:02:57 o/ 16:03:15 Nice to see you in this meeting also Nate 16:03:19 hi njohnston - long time :) 16:03:29 I greet you njohnston :) 16:03:34 so first action was: 16:03:34 * slaweq will check old gate-failure bugs 16:04:14 I went through list of bugs and send an summary email during the weekend: I wrote some summary: http://lists.openstack.org/pipermail/openstack-dev/2018-April/129625.html 16:04:47 basically I set some bugs as incomplete 16:04:49 that was a very nice thing to do 16:04:55 good initiative 16:04:58 thx 16:05:21 please check this list - maybe some bugs which I wasn't sure can be closed also 16:05:43 some of them are really old :) 16:06:00 ok, next one was: * jlibosva take a look on failing ovsfw blink functional test 16:06:06 I did 16:06:15 the fix is about to get merged: https://review.openstack.org/#/c/562220/ 16:07:04 it's already +W so almost done :) 16:07:27 thx jlibosva 16:07:35 ok, next one: * slaweq will check failed SG fullstack test 16:07:49 I didn't have time to look at it yet - sorry 16:08:01 I will do it this week 16:08:07 #action slaweq will check failed SG fullstack test 16:08:29 and the last one was: ihar will check if job output are indexed in logstash 16:08:41 but I think ihar is not here now 16:08:48 do You know if he did something with that? 16:08:50 he did and he showed me a patch that should fix it 16:08:54 sec 16:09:25 yeah, I remember him mentioning something in the Neutron channel 16:09:44 this https://review.openstack.org/#/c/562042/ 16:10:52 the py35 job is failing with http://logs.openstack.org/42/562042/2/check/tox-py35-on-zuul/1288651/job-output.txt.gz#_2018-04-23_17_05_56_544588 16:11:13 clarkb: hi, we're looking at your patch ^^ is this py35 failure legitimate or some intermittent issue? 16:12:09 seems like clarkb is not here :) slaweq I think we can move on, we have some eyes on it :) 16:12:23 he'll show up 16:12:27 ya I'm lurking 16:12:31 jlibosva: ok, thx for update about it 16:12:32 see 16:12:34 \o/ 16:12:36 hi clarkb :) 16:12:52 uncanny ability to know where he is needed 16:13:19 oh that one. The problem was actually elsewhere (I think I should still fix the consistency problem though) 16:13:29 I believe we are indexing neutron jobs properly for a few days now /me digs up the actual fix 16:13:39 would be good if you can double check though 16:14:00 https://review.openstack.org/#/c/562070/ was the fix 16:14:55 I'll check, thanks clarkb :) 16:16:33 clarkb: from quick check it looks that it's working fine 16:16:37 thx 16:17:17 ok, that was all actions from previous week 16:17:19 #topic Grafana 16:17:24 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:19:24 looking for last 7 days, there was one spike at 22.04 where many jobs was failing 100% 16:19:56 but as all jobs were dropped down in same time also, I suppose it was something not related to neutron directly :) 16:20:16 there is problem with one periodic job but I will talk about it later 16:20:35 and also we had quite high failure rate for functional tests last week 16:20:35 and the 100% peak happens right before a one-hour discontinuity so I bet there was a problem elsewhere 16:20:51 njohnston++ 16:21:51 functional tests, as I checked few failures was caused by problem with blink firewall which jlibosva already fixed :) 16:22:08 so that should be good now 16:22:35 do You see there anything worth mentioning also? :) 16:23:58 the tempest-plugin-dvr-multinode is at 36%, is it all because of the trunk tests failing occasionally? 16:24:44 jlibosva: yes, I have that for one of next topics :) 16:24:48 ok 16:24:50 sorry :) 16:24:55 no problem :) 16:25:07 jumping the gun, jlibosva ;-) 16:25:07 so let's talk about scenario jobs now 16:25:15 #topic Scenarios 16:25:42 as jlibosva pointed, neutron-tempest-plugin-dvr-multinode-scenario is still around 40% of failures. 16:25:57 and I checked failures from about last 2 days 16:26:04 I found 2 culprits: 16:26:13 * neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle - happens more often, like: 16:26:25 * http://logs.openstack.org/03/560703/7/check/neutron-tempest-plugin-dvr-multinode-scenario/1f67afd/logs/testr_results.html.gz 16:26:25 * http://logs.openstack.org/17/553617/19/check/neutron-tempest-plugin-dvr-multinode-scenario/a13a6fd/logs/testr_results.html.gz 16:26:25 * http://logs.openstack.org/84/533284/5/check/neutron-tempest-plugin-dvr-multinode-scenario/1c09aa6/logs/testr_results.html.gz 16:26:51 and second neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_subport_connectivity - which happens less frequent, like: 16:26:56 * http://logs.openstack.org/90/545490/9/check/neutron-tempest-plugin-dvr-multinode-scenario/c1ed535/logs/testr_results.html.gz 16:27:16 I didn't found any other failures of this job in last few days 16:27:31 jlibosva: is this related to what You were debugging already? 16:27:38 so yeah, mostly TrunkTest 16:27:41 slaweq: yes but I didn't get far yet 16:27:51 slaweq: I mostly tried to get a reproducer 16:29:14 jlibosva: I can help and take a look on logs if You want 16:29:28 maybe we will find something 16:29:40 slaweq: I welcome any help I can get a): 16:29:42 is there any bug related to it reported already? 16:29:44 :) 16:30:17 I don't think there is 16:30:33 so I will report it today 16:30:35 fine? 16:30:38 ok 16:31:00 #action slaweq will report bug about failing trunk tests in dvr multinode scenario 16:31:16 maybe we should also mark those tests are unstable until we will fix it? 16:31:24 what do You think about that? 16:31:35 yep, makes sense 16:31:46 we know the failure rate of it now so let's disable it 16:32:03 mlavalle: ok with that? 16:32:21 yeah, that's ok 16:32:25 ok 16:32:42 jlibosva: will You do it or should I after I will create a bug report? 16:32:59 slaweq: I can do it if you want 16:33:04 sure, thx 16:33:26 #action jlibosva will mark trunk scenario tests as unstable for now 16:33:27 thx 16:33:52 speaking about other scenario jobs, IMO all is quite fine 16:34:02 no any urgent problems which I'm aware of :) 16:34:29 cool 16:35:06 so I think we can move on to next topic 16:35:11 #topic Fullstack 16:35:43 I know that there is still this issue with SG tests sometimes and I will check it 16:36:10 but it's not very common failure and fullstack tests are mostly fine 16:36:25 in gate queue fullstack is at 0% for most of the time even :) 16:36:38 that's so great :) 16:36:40 I wanted to mention 2 things related to fullstack also: 16:36:46 really great job slaweq for making it work! 16:36:54 jlibosva: it wasn't only me :) 16:37:06 +100 16:37:06 great job of the team 16:37:43 so I wanted to mention that jlibosva did a patch with new firewall test: https://review.openstack.org/#/c/563159/ which waits for reviews :) 16:38:04 checking it out 16:38:32 and second thing, I reported "bug" https://bugs.launchpad.net/neutron/+bug/1765519 to add fullstack test which will cover API operations on shared networks 16:38:33 Launchpad bug 1765519 in neutron "Add fullstack tests for shared networks API" [Wishlist,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:38:49 we had such tests in tempest but we had to remove them from tempest 16:38:59 so now there are only UT for it 16:39:22 do You think it is reasonable to add such fullstack tests? or it's wasting of time maybe? 16:40:04 I think it's worth it to prevent regressions 16:40:13 I think so 16:40:16 since we have fullstack voting, it's a legitimate testing framework now 16:40:40 ok, if I have 2 votes for it, I will do it :) Thx 16:41:07 do You have anything to add regarding to fullstack? 16:41:50 nope 16:42:00 so, next topic 16:42:10 #topic Rally 16:42:32 rally is still between 10 and 30% of failures 16:42:57 as I checked few recent failures, it's always because of global timeout reached, like: 16:43:05 * http://logs.openstack.org/24/558724/11/check/neutron-rally-neutron/d891678/job-output.txt.gz 16:43:05 * http://logs.openstack.org/90/545490/9/check/neutron-rally-neutron/9de921a/job-output.txt.gz 16:43:39 so maybe it could be worth to check report a bug and check why it takes so long sometimes? 16:43:49 yes 16:43:58 with the infra team, I guess 16:44:05 or maybe it's always very close to the limit and we just should increase it? 16:44:22 why not give it a try? 16:44:51 do we have any volunteer to check that? :) 16:45:14 if not, I can check it 16:45:50 slaweq: if you are out of bandwidth, let me know and I can help 16:45:54 ok, I will check how long it takes for "ok" runs and will report a bug for that 16:46:04 thx mlavalle :) 16:46:34 I think I will have time to check it :) 16:46:40 ok 16:46:52 #action slaweq to check rally timeouts and report a bug about that 16:47:02 ok, next topic 16:47:03 #topic Periodic 16:47:13 we have one job failing since few days again 16:47:22 it's neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4 16:47:37 example of failure: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron-dynamic-routing/master/neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4/708499c/job-output.txt.gz 16:47:58 IMO possible culprit: https://review.openstack.org/#/c/560465/ but I didn’t check it deeply so it might be only trigger for some other failure in fact :) 16:50:40 any ideas? 16:50:42 slaweq: that probably is causing the failure since it changed that code 16:51:18 i'm not sure if federico is on irc 16:52:04 he is probably offline 16:52:33 ok, I will report a bug for that and will try to catch him on irc 16:52:35 i can take a look and/or ping him instead of reverting 16:52:40 you win :) 16:52:48 haleyb: ok :) 16:53:21 it's periodic so non-voting right? 16:53:40 yes, it's periodic job for neutron-dynamic-routing 16:54:01 so I don't think it is very urgent but still should be fixed somehow :) 16:54:13 yes, agreed 16:54:37 #action slaweq will report a bug and talk with Federico about issue with neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4 16:54:56 for other periodic jobs it's working fine 16:55:12 failures which we had during last week wasn't related to neutron 16:55:26 I think we can move to last topic finally :) 16:55:48 #topic others 16:56:37 Just if someone missed it, I wanted to mention that we had issue with stable/queens jobs during last week: https://bugs.launchpad.net/neutron/+bug/1765008 16:56:38 Launchpad bug 1765008 in tripleo "Tempest API tests failing for stable/queens branch" [Critical,In progress] - Assigned to Gabriele Cerami (gcerami) 16:56:54 it should be already fixed 16:57:11 there is also patch https://review.openstack.org/#/c/562364/ proposed to avoid such regressions 16:57:51 IIUC with this patch we will run jobs for both neutron master and stable/queens branches for each patch in neutron-tempest-plugin repo 16:58:13 and we will have to remember to add same set of jobs for next stable branches in future 16:58:19 pike also? 16:58:23 no 16:58:35 pike and ocata are using old tempest tests from neutron repo 16:58:42 so are not affected 16:58:59 only master and queens are running tempest tests from neutron-tempest-plugin repo 16:59:58 ok, as we are almost out of time, I think we can end meeting now 17:00:04 thanks everyone for attending 17:00:08 and see You next week :) 17:00:09 bye 17:00:11 thanks, bye o/ 17:00:14 #endmeeting