16:00:03 <mlavalle> #startmeeting neutron_ci 16:00:04 <openstack> Meeting started Tue Aug 27 16:00:03 2019 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:08 <openstack> The meeting name has been set to 'neutron_ci' 16:00:12 <njohnston> o/ 16:01:07 <ralonsoh> hi 16:02:01 <bcafarel> hi again 16:02:35 <mlavalle> I am not slaweq so it will be hard to sub for him but I'll do my best 16:03:11 <mlavalle> Please remember to open the grafana dashboard now: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:03:36 <mlavalle> #topic Actions from previous meetings 16:03:48 <mlavalle> alonsoh to review the CI (functional tests), search for error patterns and open bugs if needed 16:03:55 <mlavalle> ralonsoh: ^^^^ 16:03:56 <ralonsoh> yes 16:04:02 <ralonsoh> I found this one 16:04:23 <ralonsoh> #link https://review.opendev.org/#/c/678262 16:04:27 <ralonsoh> well, this is the patch 16:04:38 <ralonsoh> and, btw, liuyulong found another error 16:04:48 <ralonsoh> I'm wroning on the patch right now 16:05:00 <ralonsoh> I'll push the second one after the meeting 16:05:10 <mlavalle> thanks for the update! 16:05:12 <ralonsoh> that's all 16:05:33 <mlavalle> mlavalle will continue debugging router migration bug: https://bugs.launchpad.net/neutron/+bug/1838449 16:05:35 <openstack> Launchpad bug 1838449 in neutron "Router migrations failing in the gate" [Medium,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:05:52 <mlavalle> I have continued working on this one. Here's the situation 16:06:22 <mlavalle> 1) Before staring the migration, the test sets the router's admin_state_up to false 16:07:06 <mlavalle> 2) Server is expected to set the routers ports to DOWN after that 16:07:30 <mlavalle> 3) Problem is that, in some cases, those ports are not set to DOWN 16:08:29 <mlavalle> 4) This is because one of the L3 agents involved is supposed not to get any routers here: https://github.com/openstack/neutron/blob/4f6b8bb3e55dfa564e87d90e4c1257d0153ef141/neutron/agent/l3/agent.py#L730 16:09:34 <mlavalle> so the agent never removes the routers interfaces. I am investigating why the server is returning the router. I have his DNM patch: https://review.opendev.org/#/c/677098/ 16:09:57 <mlavalle> problem is that so far, all the migrations have succeeded after my re-checks 16:10:09 <mlavalle> so I'll keep pulling the thread 16:10:15 <mlavalle> any questions? 16:10:35 <ralonsoh> mlavalle, can you guest why is this happening? 16:10:47 <ralonsoh> or what is generating this? 16:11:08 <mlavalle> I am suspecting the "related routers" mechanism 16:11:21 <mlavalle> and some sort of race condition 16:11:51 <ralonsoh> thanks! 16:12:10 <mlavalle> njohnston to look at errors in the neutron-tempest-postgres-full periodic job 16:12:30 <njohnston> So I need to get in touch with #openstack-infra about that 16:12:40 <njohnston> With the move to Swift I don't know where the new log location is 16:12:51 <njohnston> But if you look at the old location: http://logs.openstack.org/periodic/opendev.org/openstack/neutron/master/neutron-tempest-postgres-full/ 16:13:09 <njohnston> you can see that there was one successful run on 8/24, which is the only run registered since 8/15 16:13:26 <njohnston> So I am guessing the issues can be found in the new log location 16:13:34 <mlavalle> ok 16:13:40 <mlavalle> so to summarize: 16:13:43 <njohnston> So I will send an email to the ML to see where the new location is, and continue 16:14:45 <mlavalle> #action ralonsoh will continue working on error patterns and open bugs for functional tests 16:14:54 <ralonsoh> ok 16:15:31 <mlavalle> #action mlavalle will continue debugging https://bugs.launchpad.net/neutron/+bug/1838449 16:15:33 <openstack> Launchpad bug 1838449 in neutron "Router migrations failing in the gate" [Medium,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:15:57 <mlavalle> #action njohnston will get the new location for periodic jobs logs 16:16:12 <mlavalle> is that correct? 16:16:28 <ralonsoh> I think so 16:17:11 <mlavalle> ok 16:17:14 <mlavalle> #topic Stadium projects 16:17:45 <mlavalle> For Python 3 migration I think we had a good update during the Neutron meeting 16:17:45 <njohnston> We covered the python 3 topic for stadium pretty well in the neutron team meeting earlier 16:18:04 <mlavalle> lajoskatona did progress on networking-odl last week. He needs help merging: https://review.opendev.org/#/q/status:open+project:openstack/networking-odl+branch:master+topic:use-latest-odl 16:18:19 <mlavalle> networking-bagpipe: merged releasenotes: https://review.opendev.org/#/c/677648/. 16:18:26 <mlavalle> did I miss anything? 16:19:26 <mlavalle> I'll take that as a no 16:19:45 <mlavalle> in regards to tempest-plugins migration 16:19:54 <mlavalle> let's look at the etherpad 16:20:06 <mlavalle> https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:20:27 <njohnston> we were able to mark fwaas done last week thanks to slawek 16:20:54 <mlavalle> yeap 16:21:12 <mlavalle> and neutron-dynamic-routing and neutron-vpnaas are still in progress 16:21:21 <mlavalle> I will work on vpnaas this week 16:21:56 <mlavalle> tidwellr is working on dynamic routing 16:22:34 <mlavalle> ok, let's look at: 16:22:36 <mlavalle> #topic Grafana 16:24:21 <mlavalle> well, we had a peak yesterday in tempest and grenade 16:24:36 <mlavalle> but things seem to be getting back to normal 16:25:14 <mlavalle> any observations? 16:25:14 <njohnston> either something was broken and got fixed, or the 4 runs were all just for the same broken job 16:25:24 <mlavalle> yeap 16:25:38 <njohnston> The openstack-tox-docs jobs look high, peaking at 20%, but based on the volume of jobs that's only 2 or 3 failures - which is not really much at all. I think that same story applies for most of the the other panels in the check queue. 16:25:56 <bcafarel> yes recent runs look good for tempest/grenade 16:26:20 <bcafarel> and docs job may be amotoki's series of pdf docs reviews 16:26:29 <mlavalle> that's true 16:27:10 <njohnston> I am worried about data loss for the gate queue, I know I have seen more things merge than the quantitative graphs would have us believe 16:27:43 <njohnston> data loss as in Grafana/elasticsearch data loss 16:27:58 <mlavalle> should we take this as a warning and watch this week? 16:28:04 <clarkb> elasticsearch is a best effort due to sheer volume of data 16:28:10 <clarkb> grafana should be fairly accurate though 16:28:18 <mlavalle> or should we do something more urgent? 16:28:35 <clarkb> also we had a gap in elasticsearch after the switch to swift hosted logs, we've since plugged it but there is a hole there 16:29:30 <njohnston> the only thing that suffers is the utility of grafana, so it's only as big a deal as we deem it to be. I think let's just check for a week, and if we can show that we're losing significant amounts of job results then we look to infra. But better to have facts instead of suppositions and suspicions. 16:29:53 <mlavalle> fair enough 16:30:29 <mlavalle> ok, let's move on then 16:30:43 <mlavalle> #topic fullstack/functional 16:30:57 <amotoki> regarding openstack-tox-docs failure, I am not sure it is really triggered by pdf docs reviews, but we use neutron review to check the infra job review. let's see what happens next week. 16:31:28 <mlavalle> I found this https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_21/678021/4/check/neutron-functional/7e999b2/testr_results.html.gz 16:31:51 <mlavalle> njohnston: this is what you were talking about during the Neutron meeting, right njohnston? 16:32:59 <njohnston> hmm, did I mention that? My mind is blanking. 16:33:31 <mlavalle> you didn't mention this failure 16:33:45 <mlavalle> but you mentioned the sighup stuff for oslo 16:34:06 <bcafarel> is that oslo issue a recent one? 16:34:26 <njohnston> Ah, yes! I'm not sure hopw oslo.service works for functional tests, but it might be at fault. 16:35:17 <njohnston> I don't have a precise answer as far as when the oslo issue started, but my impression is that it has been happening for some time. 16:35:41 <mlavalle> ok, you wanted to make sure it's the same stuff 16:35:42 <bcafarel> ack that was my general impression too 16:36:24 <mlavalle> ok, let's move on 16:36:39 <mlavalle> #topic Open discussion 16:36:48 <mlavalle> anything else we should discuss today? 16:38:15 <njohnston> I got an email back about the periodic jobs location - thanks clarkb 16:38:34 <njohnston> and looking at it, it looks like it is a sqlalchemy error on glance db migrations 16:39:04 <njohnston> http://paste.openstack.org/show/765665/ 16:39:18 <bcafarel> so one action point (periodic jobs logs location) already down :) 16:39:18 <njohnston> so I will raise that with the glance folks 16:39:52 <njohnston> and possibly zzzeek 16:40:15 <mlavalle> ok, thanks 16:40:22 <mlavalle> thanks for attending 16:40:27 <mlavalle> #endmeeting