16:00:22 #startmeeting neutron_ci 16:00:23 Meeting started Tue Jun 26 16:00:22 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:25 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:26 hi 16:00:27 o/ 16:00:28 The meeting name has been set to 'neutron_ci' 16:00:35 o/ 16:01:21 haleyb: are You around? 16:02:09 * slaweq is on airport so sorry if there will be some issues with connection 16:02:38 ok, I maybe haleyb will join later 16:02:41 let's start 16:02:41 going on vaction? 16:02:56 no, I'm going back home from internal training 16:03:02 ahhh 16:03:07 have a safe flight 16:03:11 thx :) 16:03:13 #topic Actions from previous meetings 16:03:27 we have only one action from last week 16:03:29 njohnston to look into adding Grafana dasboard for stable branches 16:04:19 I have started working on that but got pulled into some other things; I plan to push something for people to review later this week 16:04:30 Apologies for the delay 16:04:33 ok, thx for update 16:04:40 no problem, it's not very urgent :) 16:04:46 #action njohnston to look into adding Grafana dasboard for stable branches 16:04:55 You have it for next week :) 16:05:00 no apologies needed. we all have sponsors who help us to pay the bills 16:05:01 thanks 16:05:13 and they have their priorities 16:05:22 mlavalle: well said :) 16:05:43 #topic Grafana 16:05:52 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:07:11 generally it doesn't looks very bad this week IMO 16:07:39 but I don't know why there is some "gap" with graphs yesterday and on Sunday 16:07:40 The tempest linux bridge job has had some failures 16:08:14 mlavalle: are You talking about this "spike" in gate queue? 16:08:24 yeah 16:08:57 I was checking that and I found few issues not related to neutron, like e.g. 503 errors from cinder: http://logs.openstack.org/63/323963/61/check/neutron-tempest-linuxbridge/1d56164/logs/testr_results.html.gz 16:09:25 yes, I agree, What I've seens is someting about Nova tagging testing 16:09:32 and I think that this high failure rate is also because there was no many runs counted this time 16:10:02 as I went yesterday through some of failed jobs and there wasn't many failed examples of this job 16:10:17 ok, let's keep an eye on it and see what happens over the next few days 16:10:25 about tagging test failure I have it for one of next topics :) 16:10:32 mlavalle: I agree 16:11:00 maybe I am just pissed off with this job because it has failed in some of my patches ;-) 16:11:08 so let's now talk about some specific jobs 16:11:11 mlavalle: LOL 16:11:13 maybe 16:11:16 #topic Unit tests 16:11:38 I added UT as a topic today because I found few failures with some timeout 16:11:54 like e.g.: http://logs.openstack.org/58/565358/14/check/openstack-tox-py35/aa30b12/job-output.txt.gz or http://logs.openstack.org/03/563803/9/check/openstack-tox-py35/a50de4a/job-output.txt.gz 16:12:22 and it was always py35 16:12:42 or maybe You saw something similar with py27 but I didn't spotted it yet 16:12:53 mhhh, let me see 16:12:56 I think that this should be investigated 16:13:42 no, I saw I timeout a few minutes ago, but it wasn't unit tests 16:14:32 I have seen tempest get killed due to timeouts as well, for example: http://logs.openstack.org/61/566961/4/check/neutron-tempest-iptables_hybrid/c70896b/job-output.txt.gz#_2018-06-26_08_55_07_638631 16:15:52 I think that we should report this one as a bug, check how long "good" run takes and then check what issue it is maybe 16:16:14 sounds like a plan 16:16:20 and I will try to investigate this UT tests 16:16:37 #actions slaweq will report and investigate py35 timeouts 16:16:55 njohnston: tempest is probably different issue 16:17:13 I know that it happens sometimes and we probably also should check 16:18:33 njohnston: would You like to investigate it? 16:19:47 ok, I will report it as a bug also and I will see if I will be able to check it this week 16:20:12 #action slaweq reports a bug with tempest timeouts, like on http://logs.openstack.org/61/566961/4/check/neutron-tempest-iptables_hybrid/c70896b/job-output.txt.gz 16:20:27 ok, let's go to next one 16:20:29 #topic Functional 16:20:50 I saw at least few times something like: * http://logs.openstack.org/58/574058/10/check/neutron-functional/f38d685/logs/testr_results.html.gz 16:21:04 it might be related to https://bugs.launchpad.net/neutron/+bug/1687027 16:21:04 Launchpad bug 1687027 in neutron "test_walk_versions tests fail with "IndexError: tuple index out of range" after timeout" [High,Confirmed] 16:21:31 it happens at least few times this week so I think we should also dig into it and maybe find a way how to fix it :) 16:24:00 and those are different tests failing but with similar error, second example from last few days: * http://logs.openstack.org/61/567461/4/check/neutron-functional/81d69a4/logs/testr_results.html.gz 16:24:35 any volunteer to debug this issue? 16:24:40 o/ 16:24:45 thx mlavalle :) 16:25:02 I assigned it to me 16:25:09 #action mlavalle will check functional tests issue, https://bugs.launchpad.net/neutron/+bug/1687027 16:25:09 Launchpad bug 1687027 in neutron "test_walk_versions tests fail with "IndexError: tuple index out of range" after timeout" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:25:13 ok, thx a lot 16:25:51 I will not talk about fullstack tests today as it is on quite good failure rate recently and I didn't have time to dig into this one often failing test yet 16:25:59 so next topic will be now 16:26:04 #topic Scenarios 16:26:30 As mlavalle mentioned before, there is some issue with tagging test from tempest 16:26:42 bug report: https://bugs.launchpad.net/tempest/+bug/1775947 16:26:42 Launchpad bug 1775947 in tempest "tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest failing" [Medium,Confirmed] - Assigned to Deepak Mourya (mourya007) 16:26:59 yeah I've seen it more in api tests 16:27:00 example of failure is e.g. on http://logs.openstack.org/44/575444/4/gate/neutron-tempest-linuxbridge/118cc97/logs/testr_results.html.gz 16:27:05 not as much in scenrio tests 16:27:10 ahh, yes 16:27:14 sorry, my bad 16:27:25 I mixed them as it's "tempest" in both cases :) 16:27:32 it's in tempest api tests 16:27:43 so let's change topic to 16:27:43 and ye, I see it most;y in the linuxbridge job 16:27:52 #topic tempest tests 16:27:54 :) 16:28:22 I didn't check if it's only related to linuxbridge job 16:28:47 what I saw often was something like "process exited with errcode 137" on VM 16:28:59 during doing "curl" to metadata service 16:29:24 right, I've also seen the message about the metadata service 16:31:20 for now it is reported against tempest as I suspected at first glance that it might be issue with "too fast" removing of port from instance and doing curl from VM to metadata API 16:31:32 but now I'm not sure if that is (only) issue 16:31:52 I will try to investigate it more as I was already checking it a bit 16:32:05 but I'm not sure if I will be able to do it this week 16:32:27 #action slaweq will investigate tagging tempest test issue 16:33:18 from other issues which I found and wanted to raise here is failure in neutron-tempest-dvr test: 16:33:20 http://logs.openstack.org/14/529814/16/check/neutron-tempest-dvr/d9696d5/logs/testr_results.html.gz 16:33:41 I think I spotted it only once but maybe You saw it also somewhere? 16:34:14 NO I haven't seen it 16:34:25 But I'll keep an eye open for it 16:35:13 thx mlavalle 16:36:12 I also found timeout issue with tempest-py3 test: http://logs.openstack.org/61/566961/4/gate/tempest-full-py3/4ca37f8/job-output.txt.gz 16:36:51 but I think that's the same as pointed by njohnston earlier 16:36:59 so it's already assigned to me :) 16:37:04 lol 16:37:20 anything else to add here? 16:37:27 not from me 16:37:35 ok, moving to next topic then 16:37:37 #topic Rally 16:37:47 There was an issue related to rally last week (spike on grafana) but was fixed quickly on rally side after talk with andreykurilin 16:37:52 it's just FYI :) 16:38:02 ack 16:38:07 today boden spotted some new issue with rally for stable/queens branch 16:38:14 https://bugs.launchpad.net/neutron/+bug/1778714 16:38:14 Launchpad bug 1778714 in neutron "neutron-rally-neutron fails with `NeutronTrunks.create_and_list_trunk_subports` in in any platform" [Critical,New] 16:38:25 but it looks that patch which he proposed fixes this problem 16:38:48 is that a neutron patch? 16:38:49 I also talked with andreykurilin on rally channel and he confirmed me just before meeting that this should help :) 16:39:14 yes: https://review.openstack.org/#/c/578104/ 16:39:15 patch 578104 - neutron (stable/queens) - Use rally 0.12.1 release for stable/pike branch. 16:39:57 ok, added to my pile 16:40:13 thx mlavalle 16:40:35 so I guess we can go to next topic now 16:40:45 #topic Grenade 16:40:56 I also found issue in neutron-grenade-dvr-multinode job: 16:41:01 http://logs.openstack.org/03/563803/9/check/neutron-grenade-dvr-multinode/13338d9/logs/testr_results.html.gz 16:41:09 did You saw it already? 16:41:34 I think I did, yeah 16:41:50 so it't also something which we should check 16:42:19 This specific failure is the instance failing to become active 16:42:25 it looks for me like something what can be potentially related to neutron 16:42:44 L2 agent could not configure port or server didn't send notification to nova 16:43:21 or it's maybe just misconfiguration problem as it stays in error message that it didn't become active in 196 seconds 16:43:38 and IIRC nova-compute waits for port to be active for 300 seconds by default 16:47:07 strange thing for me is that this instance uuid is not in nova-compute logs at all 16:47:28 but I don't know exactly if it should be there 16:48:41 mlavalle: but in neutron-server logs there are some errors relted to dvr somehow: http://logs.openstack.org/03/563803/9/check/neutron-grenade-dvr-multinode/13338d9/logs/screen-q-svc.txt.gz?level=ERROR 16:48:53 check first 3 lines 16:49:02 do You think that it might be related? 16:51:10 mlavalle: are You here still? :) 16:51:14 mhhh 16:51:18 I was looking 16:51:23 ahh, ok :) 16:51:32 at first glance I don't think so 16:51:53 the messages refer to DVR ports 16:52:09 and multiple port bindings only apply to compute ports 16:52:20 but I will keep an eye open anyways 16:52:26 thanks for bringingi up 16:52:30 thx 16:53:01 ok, that's all from me according to failures in CI for this week 16:53:15 do You have anything else? 16:53:33 nope, thanks flor the thorough update, as usual 16:53:43 Quick question, do the grafana graphs include just jobs that end in ERROR, or do they also include the increasing number of jobs that end in TIMED_OUT status as reported back in Zuul? 16:53:59 njohnston: good question 16:54:06 I don't know to be honest 16:54:36 don't know either 16:54:43 better ask in the infra channel 16:55:14 according to config file of dashboard: https://git.openstack.org/cgit/openstack-infra/project-config/tree/grafana/neutron.yaml 16:55:30 it's FAILURE but I have no idea how this value is calculated :) 16:55:39 right 16:55:47 OK, i will ask in infra channel and report back 16:55:54 njohnston: thx a lot 16:56:14 if You don't have anything else I have one more thing to tell You 16:56:17 #action njohnston to ask infra team if TIMED_OUT is included in FAILURE for grafana graphs 16:56:30 go ahead 16:56:40 next 2 weeks I will be on PTO (\o/) so I will not be able to do this meeting 16:57:00 You will need to find someone else who will replace me or just cancel them 16:57:28 I will be able to chair it again on 17.07 16:57:33 Next week will be slow anyway with the July 4th holiday anywas 16:57:48 so probably it will be best to cancel it 16:57:58 but I can run it on the 10th 16:58:12 ok, I will send email about canceling next week's meeting 16:58:25 and thx for handling 10th of July :) 16:58:29 if you trust I can handle it, that is 16:58:33 thanks mlavalle 16:58:46 #action slaweq will cancel next week's meeting 16:58:59 mlavalle: You will do it better than me for sure :) 16:59:12 doubt it. I'll do it anyway 16:59:13 ok, thx for attending today 16:59:14 * haleyb waves in the final minute, just got back and will go through the logs 16:59:24 hi haleyb :) 16:59:33 see You all 16:59:35 bye slaweq :) 16:59:39 what airport are you at slaweq ? 16:59:41 #endmeeting