#openstack-meeting log

16:00:03 <mlavalle> #startmeeting neutron_ci
16:00:04 <openstack> Meeting started Tue Aug 27 16:00:03 2019 UTC and is due to finish in 60 minutes.  The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:08 <openstack> The meeting name has been set to 'neutron_ci'
16:00:12 <njohnston> o/
16:01:07 <ralonsoh> hi
16:02:01 <bcafarel> hi again
16:02:35 <mlavalle> I am not slaweq so it will be hard to sub for him but I'll do my best
16:03:11 <mlavalle> Please remember to open the grafana dashboard now: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:03:36 <mlavalle> #topic Actions from previous meetings
16:03:48 <mlavalle> alonsoh to review the CI (functional tests), search for error patterns and open bugs if needed
16:03:55 <mlavalle> ralonsoh: ^^^^
16:03:56 <ralonsoh> yes
16:04:02 <ralonsoh> I found this one
16:04:23 <ralonsoh> #link https://review.opendev.org/#/c/678262
16:04:27 <ralonsoh> well, this is the patch
16:04:38 <ralonsoh> and, btw, liuyulong found another error
16:04:48 <ralonsoh> I'm wroning on the patch right now
16:05:00 <ralonsoh> I'll push the second one after the meeting
16:05:10 <mlavalle> thanks for the update!
16:05:12 <ralonsoh> that's all
16:05:33 <mlavalle> mlavalle will continue debugging router migration bug: https://bugs.launchpad.net/neutron/+bug/1838449
16:05:35 <openstack> Launchpad bug 1838449 in neutron "Router migrations failing in the gate" [Medium,Confirmed] - Assigned to Miguel Lavalle (minsel)
16:05:52 <mlavalle> I have continued working on this one. Here's the situation
16:06:22 <mlavalle> 1) Before staring the migration, the test sets the router's admin_state_up to false
16:07:06 <mlavalle> 2) Server is expected to set the routers ports to DOWN after that
16:07:30 <mlavalle> 3) Problem is that, in some cases, those ports are not set to DOWN
16:08:29 <mlavalle> 4) This is because one of the L3 agents involved is supposed not to get any routers here: https://github.com/openstack/neutron/blob/4f6b8bb3e55dfa564e87d90e4c1257d0153ef141/neutron/agent/l3/agent.py#L730
16:09:34 <mlavalle> so the agent never removes the routers interfaces. I am investigating why the server is returning the router. I have his DNM patch: https://review.opendev.org/#/c/677098/
16:09:57 <mlavalle> problem is that so far, all the migrations have succeeded after my re-checks
16:10:09 <mlavalle> so I'll keep pulling the thread
16:10:15 <mlavalle> any questions?
16:10:35 <ralonsoh> mlavalle, can you guest why is this happening?
16:10:47 <ralonsoh> or what is generating this?
16:11:08 <mlavalle> I am suspecting the "related routers" mechanism
16:11:21 <mlavalle> and some sort of race condition
16:11:51 <ralonsoh> thanks!
16:12:10 <mlavalle> njohnston to look at errors in the neutron-tempest-postgres-full periodic job
16:12:30 <njohnston> So I need to get in touch with #openstack-infra about that
16:12:40 <njohnston> With the move to Swift I don't know where the new log location is
16:12:51 <njohnston> But if you look at the old location: http://logs.openstack.org/periodic/opendev.org/openstack/neutron/master/neutron-tempest-postgres-full/
16:13:09 <njohnston> you can see that there was one successful run on 8/24, which is the only run registered since 8/15
16:13:26 <njohnston> So I am guessing the issues can be found in the new log location
16:13:34 <mlavalle> ok
16:13:40 <mlavalle> so to summarize:
16:13:43 <njohnston> So I will send an email to the ML to see where the new location is, and continue
16:14:45 <mlavalle> #action ralonsoh will continue working on error patterns and open bugs for functional tests
16:14:54 <ralonsoh> ok
16:15:31 <mlavalle> #action mlavalle will continue debugging https://bugs.launchpad.net/neutron/+bug/1838449
16:15:33 <openstack> Launchpad bug 1838449 in neutron "Router migrations failing in the gate" [Medium,Confirmed] - Assigned to Miguel Lavalle (minsel)
16:15:57 <mlavalle> #action njohnston will get the new location for periodic jobs logs
16:16:12 <mlavalle> is that correct?
16:16:28 <ralonsoh> I think so
16:17:11 <mlavalle> ok
16:17:14 <mlavalle> #topic Stadium projects
16:17:45 <mlavalle> For Python 3 migration I think we had a good update during the Neutron meeting
16:17:45 <njohnston> We covered the python 3 topic for stadium pretty well in the neutron team meeting earlier
16:18:04 <mlavalle> lajoskatona did progress on networking-odl last week. He needs help merging: https://review.opendev.org/#/q/status:open+project:openstack/networking-odl+branch:master+topic:use-latest-odl
16:18:19 <mlavalle> networking-bagpipe: merged releasenotes: https://review.opendev.org/#/c/677648/.
16:18:26 <mlavalle> did I miss anything?
16:19:26 <mlavalle> I'll take that as a no
16:19:45 <mlavalle> in regards to tempest-plugins migration
16:19:54 <mlavalle> let's look at the etherpad
16:20:06 <mlavalle> https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo
16:20:27 <njohnston> we were able to mark fwaas done last week thanks to slawek
16:20:54 <mlavalle> yeap
16:21:12 <mlavalle> and neutron-dynamic-routing and neutron-vpnaas are still in progress
16:21:21 <mlavalle> I will work on vpnaas this week
16:21:56 <mlavalle> tidwellr is working on dynamic routing
16:22:34 <mlavalle> ok, let's look at:
16:22:36 <mlavalle> #topic Grafana
16:24:21 <mlavalle> well, we had a peak yesterday in tempest and grenade
16:24:36 <mlavalle> but things seem to be getting back to normal
16:25:14 <mlavalle> any observations?
16:25:14 <njohnston> either something was broken and got fixed, or the 4 runs were all just for the same broken job
16:25:24 <mlavalle> yeap
16:25:38 <njohnston> The openstack-tox-docs jobs look high, peaking at 20%, but based on the volume of jobs that's only 2 or 3 failures - which is not really much at all.  I think that same story applies for most of the the other panels in the check queue.
16:25:56 <bcafarel> yes recent runs look good for tempest/grenade
16:26:20 <bcafarel> and docs job may be amotoki's series of pdf docs reviews
16:26:29 <mlavalle> that's true
16:27:10 <njohnston> I am worried about data loss for the gate queue, I know I have seen more things merge than the quantitative graphs would have us believe
16:27:43 <njohnston> data loss as in Grafana/elasticsearch data loss
16:27:58 <mlavalle> should we take this as a warning and watch this week?
16:28:04 <clarkb> elasticsearch is a best effort due to sheer volume of data
16:28:10 <clarkb> grafana should be fairly accurate though
16:28:18 <mlavalle> or should we do something more urgent?
16:28:35 <clarkb> also we had a gap in elasticsearch after the switch to swift hosted logs, we've since plugged it but there is a hole there
16:29:30 <njohnston> the only thing that suffers is the utility of grafana, so it's only as big a deal as we deem it to be.  I think let's just check for a week, and if we can show that we're losing significant amounts of job results then we look to infra.  But better to have facts instead of suppositions and suspicions.
16:29:53 <mlavalle> fair enough
16:30:29 <mlavalle> ok, let's move on then
16:30:43 <mlavalle> #topic fullstack/functional
16:30:57 <amotoki> regarding openstack-tox-docs failure, I am not sure it is really triggered by pdf docs reviews, but we use neutron review to check the infra job review. let's see what happens next week.
16:31:28 <mlavalle> I found this https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_21/678021/4/check/neutron-functional/7e999b2/testr_results.html.gz
16:31:51 <mlavalle> njohnston: this is what you were talking about during the Neutron meeting, right njohnston?
16:32:59 <njohnston> hmm, did I mention that?  My mind is blanking.
16:33:31 <mlavalle> you didn't mention this failure
16:33:45 <mlavalle> but you mentioned the sighup stuff for oslo
16:34:06 <bcafarel> is that oslo issue a recent one?
16:34:26 <njohnston> Ah, yes!  I'm not sure hopw oslo.service works for functional tests, but it might be at fault.
16:35:17 <njohnston> I don't have a precise answer as far as when the oslo issue started, but my impression is that it has been happening for some time.
16:35:41 <mlavalle> ok, you wanted to make sure it's the same stuff
16:35:42 <bcafarel> ack that was my general impression too
16:36:24 <mlavalle> ok, let's move on
16:36:39 <mlavalle> #topic Open discussion
16:36:48 <mlavalle> anything else we should discuss today?
16:38:15 <njohnston> I got an email back about the periodic jobs location - thanks clarkb
16:38:34 <njohnston> and looking at it, it looks like it is a sqlalchemy error on glance db migrations
16:39:04 <njohnston> http://paste.openstack.org/show/765665/
16:39:18 <bcafarel> so one action point (periodic jobs logs location) already down :)
16:39:18 <njohnston> so I will raise that with the glance folks
16:39:52 <njohnston> and possibly zzzeek
16:40:15 <mlavalle> ok, thanks
16:40:22 <mlavalle> thanks for attending
16:40:27 <mlavalle> #endmeeting