16:00:39 <mlavalle> #startmeeting neutron_ci
16:00:39 <openstack> Meeting started Tue Jul 10 16:00:39 2018 UTC and is due to finish in 60 minutes.  The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:43 <openstack> The meeting name has been set to 'neutron_ci'
16:00:53 <njohnston_> o/
16:02:17 <mlavalle> I don't run this meeeting regularly, so you will have to excuse my clumsiness
16:02:32 <mlavalle> #topic Actions from previous meetings
16:02:36 <haleyb> hi
16:02:54 <mlavalle> njohnston to look into adding Grafana dasboard for stable branches
16:03:04 <mlavalle> any updates on this one?
16:03:54 <njohnston> I have a change queued to it but I wanted to check with slaweq, but he is on PTO
16:04:22 <njohnston> I figure once he gets back I'll make sure I understand his comment correctly and then get this going
16:04:46 <mlavalle> did he leave a comment in the change?
16:04:53 <njohnston> yes
16:05:25 <mlavalle> what's the urls change? I am pretty sure he will red the log of this meeting...
16:05:40 <mlavalle> that way he will know he has homework....
16:05:42 <njohnston> https://review.openstack.org/#/c/578191/
16:06:22 <njohnston> Although since ajaeger concurred I might just go ahead, since ajaeger's comment cleared up the ambiguity I had
16:06:40 <mlavalle> I agree
16:06:48 <mlavalle> just go ahead
16:07:10 <njohnston> ok I will do that right after this meeting then
16:07:21 <mlavalle> thanks for moving this forward!
16:07:40 <njohnston> np
16:07:45 <mlavalle> #actions slaweq will report and investigate py35 timeouts
16:08:32 <mlavalle> Before leaving for vacation, he filed a bug for it: https://bugs.launchpad.net/neutron/+bug/1779077
16:08:32 <openstack> Launchpad bug 1779077 in neutron "Unit test jobs (py35) fails with timeout often" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:09:09 <mlavalle> Preparing for this meeting earlier today I looked for instances of this issue and couldn't find any
16:09:20 <mlavalle> Have any of you seen it happening?
16:09:38 <njohnston> not off the top of my head
16:10:09 <mlavalle> how about you haleyb?
16:10:24 <haleyb> no, haven't seen that
16:10:42 <haleyb> deprecation warnings like in that log always worry me though, as each takes time
16:11:50 <mlavalle> ok, to help slaweq in prioritizing his todos catching up next week, I'll leave  note in the bug indicating we discussed it today and we haven't seen it lately
16:11:53 <mlavalle> hang on
16:12:28 <njohnston> most recent one I see is in the graphql change: https://review.openstack.org/#/c/578191/ PS8
16:12:43 <njohnston> but that was on 6/28
16:13:32 <mlavalle> ok, I left a note on the bug
16:13:52 <mlavalle> #action slaweq reports a bug with tempest timeouts
16:14:46 <mlavalle> He filed this bug https://bugs.launchpad.net/neutron/+bug/1779075
16:14:46 <openstack> Launchpad bug 1779075 in neutron "Tempest jobs fails because of timeout" [Medium,Confirmed]
16:14:57 <mlavalle> It has no owner at this point in time
16:15:01 <mlavalle> any takers?
16:15:49 <mlavalle> ok, moving on
16:16:07 <mlavalle> #topic Grafana
16:16:21 <mlavalle> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:18:00 <mlavalle> I don't see much out of the ordinary
16:18:32 <mlavalle> what do others think?
16:19:18 <haleyb> sorry, still loading
16:19:28 <mlavalle> take your time :-)
16:19:38 <njohnston> yeah it takes a while to load the 7 day view
16:20:32 <njohnston> looks like yesterday we had a bad spike on functional test failures in the gate, up to 50%
16:20:36 <haleyb> tempest-full still at 15% failure in the gate, but that just started and is working down?
16:21:13 <haleyb> yes, functional too, so guess that will subside
16:21:35 <mlavalle> yeah I see that functional spike
16:21:51 <mlavalle> but it seems to be coming down
16:22:10 <njohnston> and it seems like there is a spike in most of the panels starting about 2 hours ago
16:22:32 <mlavalle> probably rworth re-checking tomorrow
16:23:30 <njohnston> looks like almost all of the check queue panels, none of the gate queue panels, show the spike
16:24:40 <mlavalle> so it might be a bunch of patches hitting gerrit
16:24:49 <njohnston> yeah
16:24:53 <mlavalle> bringing some failures
16:25:02 <mlavalle> let's keep an eye on it
16:26:13 <mlavalle> ok, let's move on
16:26:27 <mlavalle> #topic Tempest tests
16:27:24 <mlavalle> Since about two weeks ago we have seen this bug https://bugs.launchpad.net/tempest/+bug/1775947
16:27:24 <openstack> Launchpad bug 1775947 in tempest "tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest failing" [Medium,Confirmed] - Assigned to Deepak Mourya (mourya007)
16:27:38 <mlavalle> I saw it several times over the past few days
16:27:56 <mlavalle> and again investigating earlier today preparing for this meeting
16:28:11 <mlavalle> Left anote in the bug itself
16:28:47 <mlavalle> There is a related patch for Tempest:  https://review.openstack.org/# /c/578765/
16:29:02 <mlavalle> oooops
16:29:09 <mlavalle> https://review.openstack.org/#/c/578765/
16:29:29 <mlavalle> that's probably better
16:30:13 <mlavalle> here probably the action item is to bug the QA guys to just merge the patch
16:30:31 <mlavalle> it has a +2 and two +1s
16:30:40 <mlavalle> I'll bug them
16:30:56 <njohnston> I think Felipe's comments are valid but they can be a follow-on change
16:31:33 <mlavalle> #action mlavalle to follow up with QA team to merge https://review.openstack.org/#/c/578765/
16:32:26 <mlavalle> ok, moving on
16:32:43 <mlavalle> #topic Grenade
16:33:12 <mlavalle> In the last meeting slweq asked me about http://logs.openstack.org/03/563803/9/check/neutron-grenade-dvr-multinode/13338d9/logs/testr_results.html.gz
16:34:40 <haleyb> another timeout?
16:35:52 <mlavalle> I actually dug deeper and found a problem with one of my multiple port binding patches
16:36:06 <mlavalle> I corrected it here: https://review.openstack.org/#/c/414251/69/neutron/objects/ports.py@477
16:37:18 <mlavalle> and I verified that the issue is gone
16:37:25 <njohnston> interesting
16:37:36 <mlavalle> Matt also confirmed he is not seeinng it anymore
16:38:23 <mlavalle> btw, https://review.openstack.org/#/c/414251 is ready for review
16:38:42 <mlavalle> I think it good to merge, so if you have a time, please take a look
16:39:04 <mlavalle> ok moving on
16:39:36 <mlavalle> There was one final action item from last meeting: #action njohnston to ask infra team if TIMED_OUT is included in FAILURE for grafana graphs
16:39:48 <mlavalle> did you have a chance to talk to them?
16:40:22 <njohnston> My apologies I completely forgot about this.
16:40:31 <mlavalle> no problem
16:40:59 <mlavalle> whenever you have a chance
16:41:09 <mlavalle> #topic Open Agenda
16:41:23 <mlavalle> Any other topics we shuld discuss today?
16:41:32 <haleyb> i had a question
16:41:48 <haleyb> i recently tagged all the neutron repos
16:42:13 <haleyb> weren't we going to tag neutron-tempest-plugin too?  i'm trying to remember if i forgot something
16:42:13 <mlavalle> stabe brnches, right?
16:42:20 <haleyb> right
16:42:35 <mlavalle> we never concluded that conversation
16:42:40 <haleyb> downstream ci wants to know :)
16:43:05 <mlavalle> I asked the release team
16:43:16 <mlavalle> and they pretty much said it was up to us
16:44:20 <mlavalle> let me dig in my todos and I'll get back to you
16:44:32 <haleyb> then i guess we should, unless there was some objection i don't remember about branchless, etc
16:44:53 <mlavalle> the only objection came from amotoki
16:45:11 <mlavalle> exactly around the brnachless nature of that repo
16:45:43 <haleyb> mlavalle: ack.  just an fyi that some downstream tools we use only pull when a new tag appears, so haven't pulled since an initial setup was done
16:46:15 <mlavalle> but I remember that other projects are releasing / tagging their tempest plugin repos
16:46:28 <mlavalle> I'll get back to you soon
16:46:39 <njohnston> Update from the infra team: corvus says TIMED_OUT is not counted in FAILURE.  So we might need to brainstorm how to report volume of TIMED_OUT errors.
16:47:23 <haleyb> mlavalle: np, i just got pinged about it this morning
16:47:37 <corvus> (to clarify, i believe every build result is reported as its own statsd/graphite metric)
16:48:30 <mlavalle> njohnston, corvus: thanks for the update
16:48:52 <njohnston> corvus: right, just want to make sure that if we see a failure line at 0, or 50, that it matches how many are actually failing, so it might need some adjustment to the grafana to accomodate some extra graphite metrics
16:49:01 <haleyb> so is there a timedout stat in addition to success/failure?  been a while since i dug into graphite
16:50:19 <mlavalle> corvus: ^^^^ any guidance from the top of your head?
16:50:38 <corvus> haleyb: yes, should be (also NODE_FAILURE, RETRY_FAILURE, POST_FAILURE, etc)
16:50:55 <corvus> but those will only show up in graphite for a particular job if it's ever reported one
16:51:04 <corvus> (they're counters, so we don't send them unless they exist)
16:51:56 * njohnston will plumb graphite
16:51:58 <haleyb> right, remember seeing empty graphs because a job had never failed before
16:52:16 <haleyb> njohnston: good luck :)
16:52:33 <mlavalle> njohnston: thanks!
16:52:50 <mlavalle> Thanks for attending
16:53:07 <mlavalle> Next week slaweq will be back driving this meeting
16:53:15 <mlavalle> #endmeeting