16:00:03 #startmeeting neutron_ci 16:00:04 Meeting started Tue Aug 27 16:00:03 2019 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:08 The meeting name has been set to 'neutron_ci' 16:00:12 o/ 16:01:07 hi 16:02:01 hi again 16:02:35 I am not slaweq so it will be hard to sub for him but I'll do my best 16:03:11 Please remember to open the grafana dashboard now: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:03:36 #topic Actions from previous meetings 16:03:48 alonsoh to review the CI (functional tests), search for error patterns and open bugs if needed 16:03:55 ralonsoh: ^^^^ 16:03:56 yes 16:04:02 I found this one 16:04:23 #link https://review.opendev.org/#/c/678262 16:04:27 well, this is the patch 16:04:38 and, btw, liuyulong found another error 16:04:48 I'm wroning on the patch right now 16:05:00 I'll push the second one after the meeting 16:05:10 thanks for the update! 16:05:12 that's all 16:05:33 mlavalle will continue debugging router migration bug: https://bugs.launchpad.net/neutron/+bug/1838449 16:05:35 Launchpad bug 1838449 in neutron "Router migrations failing in the gate" [Medium,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:05:52 I have continued working on this one. Here's the situation 16:06:22 1) Before staring the migration, the test sets the router's admin_state_up to false 16:07:06 2) Server is expected to set the routers ports to DOWN after that 16:07:30 3) Problem is that, in some cases, those ports are not set to DOWN 16:08:29 4) This is because one of the L3 agents involved is supposed not to get any routers here: https://github.com/openstack/neutron/blob/4f6b8bb3e55dfa564e87d90e4c1257d0153ef141/neutron/agent/l3/agent.py#L730 16:09:34 so the agent never removes the routers interfaces. I am investigating why the server is returning the router. I have his DNM patch: https://review.opendev.org/#/c/677098/ 16:09:57 problem is that so far, all the migrations have succeeded after my re-checks 16:10:09 so I'll keep pulling the thread 16:10:15 any questions? 16:10:35 mlavalle, can you guest why is this happening? 16:10:47 or what is generating this? 16:11:08 I am suspecting the "related routers" mechanism 16:11:21 and some sort of race condition 16:11:51 thanks! 16:12:10 njohnston to look at errors in the neutron-tempest-postgres-full periodic job 16:12:30 So I need to get in touch with #openstack-infra about that 16:12:40 With the move to Swift I don't know where the new log location is 16:12:51 But if you look at the old location: http://logs.openstack.org/periodic/opendev.org/openstack/neutron/master/neutron-tempest-postgres-full/ 16:13:09 you can see that there was one successful run on 8/24, which is the only run registered since 8/15 16:13:26 So I am guessing the issues can be found in the new log location 16:13:34 ok 16:13:40 so to summarize: 16:13:43 So I will send an email to the ML to see where the new location is, and continue 16:14:45 #action ralonsoh will continue working on error patterns and open bugs for functional tests 16:14:54 ok 16:15:31 #action mlavalle will continue debugging https://bugs.launchpad.net/neutron/+bug/1838449 16:15:33 Launchpad bug 1838449 in neutron "Router migrations failing in the gate" [Medium,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:15:57 #action njohnston will get the new location for periodic jobs logs 16:16:12 is that correct? 16:16:28 I think so 16:17:11 ok 16:17:14 #topic Stadium projects 16:17:45 For Python 3 migration I think we had a good update during the Neutron meeting 16:17:45 We covered the python 3 topic for stadium pretty well in the neutron team meeting earlier 16:18:04 lajoskatona did progress on networking-odl last week. He needs help merging: https://review.opendev.org/#/q/status:open+project:openstack/networking-odl+branch:master+topic:use-latest-odl 16:18:19 networking-bagpipe: merged releasenotes: https://review.opendev.org/#/c/677648/. 16:18:26 did I miss anything? 16:19:26 I'll take that as a no 16:19:45 in regards to tempest-plugins migration 16:19:54 let's look at the etherpad 16:20:06 https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:20:27 we were able to mark fwaas done last week thanks to slawek 16:20:54 yeap 16:21:12 and neutron-dynamic-routing and neutron-vpnaas are still in progress 16:21:21 I will work on vpnaas this week 16:21:56 tidwellr is working on dynamic routing 16:22:34 ok, let's look at: 16:22:36 #topic Grafana 16:24:21 well, we had a peak yesterday in tempest and grenade 16:24:36 but things seem to be getting back to normal 16:25:14 any observations? 16:25:14 either something was broken and got fixed, or the 4 runs were all just for the same broken job 16:25:24 yeap 16:25:38 The openstack-tox-docs jobs look high, peaking at 20%, but based on the volume of jobs that's only 2 or 3 failures - which is not really much at all. I think that same story applies for most of the the other panels in the check queue. 16:25:56 yes recent runs look good for tempest/grenade 16:26:20 and docs job may be amotoki's series of pdf docs reviews 16:26:29 that's true 16:27:10 I am worried about data loss for the gate queue, I know I have seen more things merge than the quantitative graphs would have us believe 16:27:43 data loss as in Grafana/elasticsearch data loss 16:27:58 should we take this as a warning and watch this week? 16:28:04 elasticsearch is a best effort due to sheer volume of data 16:28:10 grafana should be fairly accurate though 16:28:18 or should we do something more urgent? 16:28:35 also we had a gap in elasticsearch after the switch to swift hosted logs, we've since plugged it but there is a hole there 16:29:30 the only thing that suffers is the utility of grafana, so it's only as big a deal as we deem it to be. I think let's just check for a week, and if we can show that we're losing significant amounts of job results then we look to infra. But better to have facts instead of suppositions and suspicions. 16:29:53 fair enough 16:30:29 ok, let's move on then 16:30:43 #topic fullstack/functional 16:30:57 regarding openstack-tox-docs failure, I am not sure it is really triggered by pdf docs reviews, but we use neutron review to check the infra job review. let's see what happens next week. 16:31:28 I found this https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_21/678021/4/check/neutron-functional/7e999b2/testr_results.html.gz 16:31:51 njohnston: this is what you were talking about during the Neutron meeting, right njohnston? 16:32:59 hmm, did I mention that? My mind is blanking. 16:33:31 you didn't mention this failure 16:33:45 but you mentioned the sighup stuff for oslo 16:34:06 is that oslo issue a recent one? 16:34:26 Ah, yes! I'm not sure hopw oslo.service works for functional tests, but it might be at fault. 16:35:17 I don't have a precise answer as far as when the oslo issue started, but my impression is that it has been happening for some time. 16:35:41 ok, you wanted to make sure it's the same stuff 16:35:42 ack that was my general impression too 16:36:24 ok, let's move on 16:36:39 #topic Open discussion 16:36:48 anything else we should discuss today? 16:38:15 I got an email back about the periodic jobs location - thanks clarkb 16:38:34 and looking at it, it looks like it is a sqlalchemy error on glance db migrations 16:39:04 http://paste.openstack.org/show/765665/ 16:39:18 so one action point (periodic jobs logs location) already down :) 16:39:18 so I will raise that with the glance folks 16:39:52 and possibly zzzeek 16:40:15 ok, thanks 16:40:22 thanks for attending 16:40:27 #endmeeting