16:00:39 <mlavalle> #startmeeting neutron_ci 16:00:39 <openstack> Meeting started Tue Jul 10 16:00:39 2018 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:43 <openstack> The meeting name has been set to 'neutron_ci' 16:00:53 <njohnston_> o/ 16:02:17 <mlavalle> I don't run this meeeting regularly, so you will have to excuse my clumsiness 16:02:32 <mlavalle> #topic Actions from previous meetings 16:02:36 <haleyb> hi 16:02:54 <mlavalle> njohnston to look into adding Grafana dasboard for stable branches 16:03:04 <mlavalle> any updates on this one? 16:03:54 <njohnston> I have a change queued to it but I wanted to check with slaweq, but he is on PTO 16:04:22 <njohnston> I figure once he gets back I'll make sure I understand his comment correctly and then get this going 16:04:46 <mlavalle> did he leave a comment in the change? 16:04:53 <njohnston> yes 16:05:25 <mlavalle> what's the urls change? I am pretty sure he will red the log of this meeting... 16:05:40 <mlavalle> that way he will know he has homework.... 16:05:42 <njohnston> https://review.openstack.org/#/c/578191/ 16:06:22 <njohnston> Although since ajaeger concurred I might just go ahead, since ajaeger's comment cleared up the ambiguity I had 16:06:40 <mlavalle> I agree 16:06:48 <mlavalle> just go ahead 16:07:10 <njohnston> ok I will do that right after this meeting then 16:07:21 <mlavalle> thanks for moving this forward! 16:07:40 <njohnston> np 16:07:45 <mlavalle> #actions slaweq will report and investigate py35 timeouts 16:08:32 <mlavalle> Before leaving for vacation, he filed a bug for it: https://bugs.launchpad.net/neutron/+bug/1779077 16:08:32 <openstack> Launchpad bug 1779077 in neutron "Unit test jobs (py35) fails with timeout often" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:09:09 <mlavalle> Preparing for this meeting earlier today I looked for instances of this issue and couldn't find any 16:09:20 <mlavalle> Have any of you seen it happening? 16:09:38 <njohnston> not off the top of my head 16:10:09 <mlavalle> how about you haleyb? 16:10:24 <haleyb> no, haven't seen that 16:10:42 <haleyb> deprecation warnings like in that log always worry me though, as each takes time 16:11:50 <mlavalle> ok, to help slaweq in prioritizing his todos catching up next week, I'll leave note in the bug indicating we discussed it today and we haven't seen it lately 16:11:53 <mlavalle> hang on 16:12:28 <njohnston> most recent one I see is in the graphql change: https://review.openstack.org/#/c/578191/ PS8 16:12:43 <njohnston> but that was on 6/28 16:13:32 <mlavalle> ok, I left a note on the bug 16:13:52 <mlavalle> #action slaweq reports a bug with tempest timeouts 16:14:46 <mlavalle> He filed this bug https://bugs.launchpad.net/neutron/+bug/1779075 16:14:46 <openstack> Launchpad bug 1779075 in neutron "Tempest jobs fails because of timeout" [Medium,Confirmed] 16:14:57 <mlavalle> It has no owner at this point in time 16:15:01 <mlavalle> any takers? 16:15:49 <mlavalle> ok, moving on 16:16:07 <mlavalle> #topic Grafana 16:16:21 <mlavalle> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:18:00 <mlavalle> I don't see much out of the ordinary 16:18:32 <mlavalle> what do others think? 16:19:18 <haleyb> sorry, still loading 16:19:28 <mlavalle> take your time :-) 16:19:38 <njohnston> yeah it takes a while to load the 7 day view 16:20:32 <njohnston> looks like yesterday we had a bad spike on functional test failures in the gate, up to 50% 16:20:36 <haleyb> tempest-full still at 15% failure in the gate, but that just started and is working down? 16:21:13 <haleyb> yes, functional too, so guess that will subside 16:21:35 <mlavalle> yeah I see that functional spike 16:21:51 <mlavalle> but it seems to be coming down 16:22:10 <njohnston> and it seems like there is a spike in most of the panels starting about 2 hours ago 16:22:32 <mlavalle> probably rworth re-checking tomorrow 16:23:30 <njohnston> looks like almost all of the check queue panels, none of the gate queue panels, show the spike 16:24:40 <mlavalle> so it might be a bunch of patches hitting gerrit 16:24:49 <njohnston> yeah 16:24:53 <mlavalle> bringing some failures 16:25:02 <mlavalle> let's keep an eye on it 16:26:13 <mlavalle> ok, let's move on 16:26:27 <mlavalle> #topic Tempest tests 16:27:24 <mlavalle> Since about two weeks ago we have seen this bug https://bugs.launchpad.net/tempest/+bug/1775947 16:27:24 <openstack> Launchpad bug 1775947 in tempest "tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest failing" [Medium,Confirmed] - Assigned to Deepak Mourya (mourya007) 16:27:38 <mlavalle> I saw it several times over the past few days 16:27:56 <mlavalle> and again investigating earlier today preparing for this meeting 16:28:11 <mlavalle> Left anote in the bug itself 16:28:47 <mlavalle> There is a related patch for Tempest: https://review.openstack.org/# /c/578765/ 16:29:02 <mlavalle> oooops 16:29:09 <mlavalle> https://review.openstack.org/#/c/578765/ 16:29:29 <mlavalle> that's probably better 16:30:13 <mlavalle> here probably the action item is to bug the QA guys to just merge the patch 16:30:31 <mlavalle> it has a +2 and two +1s 16:30:40 <mlavalle> I'll bug them 16:30:56 <njohnston> I think Felipe's comments are valid but they can be a follow-on change 16:31:33 <mlavalle> #action mlavalle to follow up with QA team to merge https://review.openstack.org/#/c/578765/ 16:32:26 <mlavalle> ok, moving on 16:32:43 <mlavalle> #topic Grenade 16:33:12 <mlavalle> In the last meeting slweq asked me about http://logs.openstack.org/03/563803/9/check/neutron-grenade-dvr-multinode/13338d9/logs/testr_results.html.gz 16:34:40 <haleyb> another timeout? 16:35:52 <mlavalle> I actually dug deeper and found a problem with one of my multiple port binding patches 16:36:06 <mlavalle> I corrected it here: https://review.openstack.org/#/c/414251/69/neutron/objects/ports.py@477 16:37:18 <mlavalle> and I verified that the issue is gone 16:37:25 <njohnston> interesting 16:37:36 <mlavalle> Matt also confirmed he is not seeinng it anymore 16:38:23 <mlavalle> btw, https://review.openstack.org/#/c/414251 is ready for review 16:38:42 <mlavalle> I think it good to merge, so if you have a time, please take a look 16:39:04 <mlavalle> ok moving on 16:39:36 <mlavalle> There was one final action item from last meeting: #action njohnston to ask infra team if TIMED_OUT is included in FAILURE for grafana graphs 16:39:48 <mlavalle> did you have a chance to talk to them? 16:40:22 <njohnston> My apologies I completely forgot about this. 16:40:31 <mlavalle> no problem 16:40:59 <mlavalle> whenever you have a chance 16:41:09 <mlavalle> #topic Open Agenda 16:41:23 <mlavalle> Any other topics we shuld discuss today? 16:41:32 <haleyb> i had a question 16:41:48 <haleyb> i recently tagged all the neutron repos 16:42:13 <haleyb> weren't we going to tag neutron-tempest-plugin too? i'm trying to remember if i forgot something 16:42:13 <mlavalle> stabe brnches, right? 16:42:20 <haleyb> right 16:42:35 <mlavalle> we never concluded that conversation 16:42:40 <haleyb> downstream ci wants to know :) 16:43:05 <mlavalle> I asked the release team 16:43:16 <mlavalle> and they pretty much said it was up to us 16:44:20 <mlavalle> let me dig in my todos and I'll get back to you 16:44:32 <haleyb> then i guess we should, unless there was some objection i don't remember about branchless, etc 16:44:53 <mlavalle> the only objection came from amotoki 16:45:11 <mlavalle> exactly around the brnachless nature of that repo 16:45:43 <haleyb> mlavalle: ack. just an fyi that some downstream tools we use only pull when a new tag appears, so haven't pulled since an initial setup was done 16:46:15 <mlavalle> but I remember that other projects are releasing / tagging their tempest plugin repos 16:46:28 <mlavalle> I'll get back to you soon 16:46:39 <njohnston> Update from the infra team: corvus says TIMED_OUT is not counted in FAILURE. So we might need to brainstorm how to report volume of TIMED_OUT errors. 16:47:23 <haleyb> mlavalle: np, i just got pinged about it this morning 16:47:37 <corvus> (to clarify, i believe every build result is reported as its own statsd/graphite metric) 16:48:30 <mlavalle> njohnston, corvus: thanks for the update 16:48:52 <njohnston> corvus: right, just want to make sure that if we see a failure line at 0, or 50, that it matches how many are actually failing, so it might need some adjustment to the grafana to accomodate some extra graphite metrics 16:49:01 <haleyb> so is there a timedout stat in addition to success/failure? been a while since i dug into graphite 16:50:19 <mlavalle> corvus: ^^^^ any guidance from the top of your head? 16:50:38 <corvus> haleyb: yes, should be (also NODE_FAILURE, RETRY_FAILURE, POST_FAILURE, etc) 16:50:55 <corvus> but those will only show up in graphite for a particular job if it's ever reported one 16:51:04 <corvus> (they're counters, so we don't send them unless they exist) 16:51:56 * njohnston will plumb graphite 16:51:58 <haleyb> right, remember seeing empty graphs because a job had never failed before 16:52:16 <haleyb> njohnston: good luck :) 16:52:33 <mlavalle> njohnston: thanks! 16:52:50 <mlavalle> Thanks for attending 16:53:07 <mlavalle> Next week slaweq will be back driving this meeting 16:53:15 <mlavalle> #endmeeting