16:00:39 #startmeeting neutron_ci 16:00:39 Meeting started Tue Jul 10 16:00:39 2018 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:40 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:43 The meeting name has been set to 'neutron_ci' 16:00:53 o/ 16:02:17 I don't run this meeeting regularly, so you will have to excuse my clumsiness 16:02:32 #topic Actions from previous meetings 16:02:36 hi 16:02:54 njohnston to look into adding Grafana dasboard for stable branches 16:03:04 any updates on this one? 16:03:54 I have a change queued to it but I wanted to check with slaweq, but he is on PTO 16:04:22 I figure once he gets back I'll make sure I understand his comment correctly and then get this going 16:04:46 did he leave a comment in the change? 16:04:53 yes 16:05:25 what's the urls change? I am pretty sure he will red the log of this meeting... 16:05:40 that way he will know he has homework.... 16:05:42 https://review.openstack.org/#/c/578191/ 16:06:22 Although since ajaeger concurred I might just go ahead, since ajaeger's comment cleared up the ambiguity I had 16:06:40 I agree 16:06:48 just go ahead 16:07:10 ok I will do that right after this meeting then 16:07:21 thanks for moving this forward! 16:07:40 np 16:07:45 #actions slaweq will report and investigate py35 timeouts 16:08:32 Before leaving for vacation, he filed a bug for it: https://bugs.launchpad.net/neutron/+bug/1779077 16:08:32 Launchpad bug 1779077 in neutron "Unit test jobs (py35) fails with timeout often" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:09:09 Preparing for this meeting earlier today I looked for instances of this issue and couldn't find any 16:09:20 Have any of you seen it happening? 16:09:38 not off the top of my head 16:10:09 how about you haleyb? 16:10:24 no, haven't seen that 16:10:42 deprecation warnings like in that log always worry me though, as each takes time 16:11:50 ok, to help slaweq in prioritizing his todos catching up next week, I'll leave note in the bug indicating we discussed it today and we haven't seen it lately 16:11:53 hang on 16:12:28 most recent one I see is in the graphql change: https://review.openstack.org/#/c/578191/ PS8 16:12:43 but that was on 6/28 16:13:32 ok, I left a note on the bug 16:13:52 #action slaweq reports a bug with tempest timeouts 16:14:46 He filed this bug https://bugs.launchpad.net/neutron/+bug/1779075 16:14:46 Launchpad bug 1779075 in neutron "Tempest jobs fails because of timeout" [Medium,Confirmed] 16:14:57 It has no owner at this point in time 16:15:01 any takers? 16:15:49 ok, moving on 16:16:07 #topic Grafana 16:16:21 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:18:00 I don't see much out of the ordinary 16:18:32 what do others think? 16:19:18 sorry, still loading 16:19:28 take your time :-) 16:19:38 yeah it takes a while to load the 7 day view 16:20:32 looks like yesterday we had a bad spike on functional test failures in the gate, up to 50% 16:20:36 tempest-full still at 15% failure in the gate, but that just started and is working down? 16:21:13 yes, functional too, so guess that will subside 16:21:35 yeah I see that functional spike 16:21:51 but it seems to be coming down 16:22:10 and it seems like there is a spike in most of the panels starting about 2 hours ago 16:22:32 probably rworth re-checking tomorrow 16:23:30 looks like almost all of the check queue panels, none of the gate queue panels, show the spike 16:24:40 so it might be a bunch of patches hitting gerrit 16:24:49 yeah 16:24:53 bringing some failures 16:25:02 let's keep an eye on it 16:26:13 ok, let's move on 16:26:27 #topic Tempest tests 16:27:24 Since about two weeks ago we have seen this bug https://bugs.launchpad.net/tempest/+bug/1775947 16:27:24 Launchpad bug 1775947 in tempest "tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest failing" [Medium,Confirmed] - Assigned to Deepak Mourya (mourya007) 16:27:38 I saw it several times over the past few days 16:27:56 and again investigating earlier today preparing for this meeting 16:28:11 Left anote in the bug itself 16:28:47 There is a related patch for Tempest: https://review.openstack.org/# /c/578765/ 16:29:02 oooops 16:29:09 https://review.openstack.org/#/c/578765/ 16:29:29 that's probably better 16:30:13 here probably the action item is to bug the QA guys to just merge the patch 16:30:31 it has a +2 and two +1s 16:30:40 I'll bug them 16:30:56 I think Felipe's comments are valid but they can be a follow-on change 16:31:33 #action mlavalle to follow up with QA team to merge https://review.openstack.org/#/c/578765/ 16:32:26 ok, moving on 16:32:43 #topic Grenade 16:33:12 In the last meeting slweq asked me about http://logs.openstack.org/03/563803/9/check/neutron-grenade-dvr-multinode/13338d9/logs/testr_results.html.gz 16:34:40 another timeout? 16:35:52 I actually dug deeper and found a problem with one of my multiple port binding patches 16:36:06 I corrected it here: https://review.openstack.org/#/c/414251/69/neutron/objects/ports.py@477 16:37:18 and I verified that the issue is gone 16:37:25 interesting 16:37:36 Matt also confirmed he is not seeinng it anymore 16:38:23 btw, https://review.openstack.org/#/c/414251 is ready for review 16:38:42 I think it good to merge, so if you have a time, please take a look 16:39:04 ok moving on 16:39:36 There was one final action item from last meeting: #action njohnston to ask infra team if TIMED_OUT is included in FAILURE for grafana graphs 16:39:48 did you have a chance to talk to them? 16:40:22 My apologies I completely forgot about this. 16:40:31 no problem 16:40:59 whenever you have a chance 16:41:09 #topic Open Agenda 16:41:23 Any other topics we shuld discuss today? 16:41:32 i had a question 16:41:48 i recently tagged all the neutron repos 16:42:13 weren't we going to tag neutron-tempest-plugin too? i'm trying to remember if i forgot something 16:42:13 stabe brnches, right? 16:42:20 right 16:42:35 we never concluded that conversation 16:42:40 downstream ci wants to know :) 16:43:05 I asked the release team 16:43:16 and they pretty much said it was up to us 16:44:20 let me dig in my todos and I'll get back to you 16:44:32 then i guess we should, unless there was some objection i don't remember about branchless, etc 16:44:53 the only objection came from amotoki 16:45:11 exactly around the brnachless nature of that repo 16:45:43 mlavalle: ack. just an fyi that some downstream tools we use only pull when a new tag appears, so haven't pulled since an initial setup was done 16:46:15 but I remember that other projects are releasing / tagging their tempest plugin repos 16:46:28 I'll get back to you soon 16:46:39 Update from the infra team: corvus says TIMED_OUT is not counted in FAILURE. So we might need to brainstorm how to report volume of TIMED_OUT errors. 16:47:23 mlavalle: np, i just got pinged about it this morning 16:47:37 (to clarify, i believe every build result is reported as its own statsd/graphite metric) 16:48:30 njohnston, corvus: thanks for the update 16:48:52 corvus: right, just want to make sure that if we see a failure line at 0, or 50, that it matches how many are actually failing, so it might need some adjustment to the grafana to accomodate some extra graphite metrics 16:49:01 so is there a timedout stat in addition to success/failure? been a while since i dug into graphite 16:50:19 corvus: ^^^^ any guidance from the top of your head? 16:50:38 haleyb: yes, should be (also NODE_FAILURE, RETRY_FAILURE, POST_FAILURE, etc) 16:50:55 but those will only show up in graphite for a particular job if it's ever reported one 16:51:04 (they're counters, so we don't send them unless they exist) 16:51:56 * njohnston will plumb graphite 16:51:58 right, remember seeing empty graphs because a job had never failed before 16:52:16 njohnston: good luck :) 16:52:33 njohnston: thanks! 16:52:50 Thanks for attending 16:53:07 Next week slaweq will be back driving this meeting 16:53:15 #endmeeting