16:00:35 #startmeeting neutron_ci 16:00:36 Meeting started Tue Jul 24 16:00:35 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:37 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:40 The meeting name has been set to 'neutron_ci' 16:00:41 hi 16:01:17 o/ 16:01:34 hi mlavalle 16:01:38 njohnston: are You around? 16:02:31 I think that there will be only 2 of us today mlavalle :) 16:02:51 slaweq: always nice to talk to you :-) 16:03:04 so let's start, maybe someone will join later 16:03:12 thx :) 16:03:28 and it's also very nice to talk to You :) 16:03:40 so let's do it quickly 16:03:42 #topic Actions from previous meetings 16:03:50 njohnston to see if we can set the default time period on the grafana dashboard to now-7d 16:04:04 I have no idea if he did something with that 16:04:29 but it's not urgent for sure so we can wait for next week then 16:04:40 next one 16:04:43 slaweq to talk about issue with test_create_server_with_scheduler_hint_group_anti_affinity with nova-neutron liaison 16:05:01 did you get a hold of him? 16:05:04 I talked with Sean last week, they was aware of this issue 16:05:22 Sean told me that patch https://review.openstack.org/#/c/583347/ should solve this 16:05:23 patch 583347 - nova - Update RequestSpec.instance_uuid during scheduling (MERGED) 16:05:27 and this patch was merged today 16:05:37 ah, that's cool 16:05:37 so I hope it will be better now :) 16:06:08 Great! 16:06:08 because it happend quite often in multinode jobs recently 16:06:15 and last action was: 16:06:17 * mlavalle/haleyb to check dvr (dvr-ha) env and shelve/unshelve server 16:06:32 I didn't have time to do that 16:06:47 I guess haleyb also didn't have time before his PTO :) 16:07:02 can I assign it to You for next week also? 16:07:07 I'll try to follow up at the end of the week 16:07:15 yes, assign it to me 16:07:29 #action mlavalle to check dvr (dvr-ha) env and shelve/unshelve server 16:07:31 thx 16:07:55 I found that it happend once or twice in non dvr job also 16:08:16 http://logs.openstack.org/31/584431/2/check/tempest-full/e789052/testr_results.html.gz 16:08:22 ahhh 16:08:35 sorry 16:08:40 ignore this link please 16:08:55 it is same test failed but now I checked that there is different reason there 16:08:59 so it's not same issue 16:09:41 ack 16:09:49 but there is another one: http://logs.openstack.org/26/575326/13/check/neutron-tempest-multinode-full/6886894/logs/testr_results.html.gz 16:10:02 this looks at first glance as similar issue 16:10:08 and it's not dvr 16:11:07 but in neutron-tempest-dvr-ha-multinode-full job it happens very often - I found 16 examples when I was checking patches from last few days 16:11:11 yeah, looks similar 16:11:38 ok, if You will need any help on that, ping me :) 16:11:48 I'm not an dvr expert but I can try to help maybe 16:12:09 moving on to next topic? 16:12:32 ok 16:13:04 #topic Grafana 16:13:21 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:14:09 SLOOOOOOOOOOOOW 16:14:17 yes 16:14:32 but generally speaking it looks quite good in this week 16:14:50 almost all jobs are below 20% of failures 16:15:24 * mlavalle notices that in the 7 days look, there is a gap 16:16:01 yes, but it's only for gate jobs 16:16:12 for check queue there is no this gap 16:16:19 sorry, it is 16:16:41 but I know that there were some issus with infra recently 16:16:50 that logs server was down or something like that 16:16:57 so maybe that's the reason 16:18:06 https://wiki.openstack.org/wiki/Infrastructure_Status 16:18:18 it was 2018-07-19 16:18:24 yeah, that mey be the reason 16:18:29 may 16:19:25 other than the gap, it seems our CI is behaving well 16:19:34 yep, quite good imo 16:19:57 so I went today through results of CI jobs for latest neutron patches 16:20:08 I checked whole first page from gerrit :) 16:20:22 and I checked what were the resons of failures 16:20:32 I think we can now talk about some of them 16:20:40 #topic Scenarios 16:20:43 * mlavalle thanks slaweq for doing that 16:21:14 most of those issus are related to scenario jobs in fact 16:21:53 I found some issues with unit tests or others but when checking them, it looked that failures are related to patch on which it was run 16:22:09 so let's go through tempest jobs then 16:22:29 ok 16:22:31 first job is neutron-tempest-multinode-full which is failing quite often 16:23:06 in most of the cases it was failed because of this affinity test issue mentioned at the begining of the meeting 16:23:25 so we should see a marked improvement soon 16:23:37 here yes, I hope so 16:24:09 other than that there were issues with volumes, live-migration, tagging devices and shelve|unshelve instance 16:24:23 so except this issue with shelve - not related to neutron 16:24:45 about issue with tagging devices, like: * http://logs.openstack.org/73/565773/3/check/neutron-tempest-multinode-full/6748276/logs/testr_results.html.gz 16:24:49 and this shelve issue is the one you assigned to me above, right? 16:24:54 right :) 16:25:00 ok 16:25:25 about tagging issue, my patch to tempest was merged and I found in logs that it's volume which isn't removed from metadata 16:25:43 I even sent email about that few days ago to ask for some help from Nova and Cinder side 16:25:53 I saw it 16:26:05 I hope they will fix this issue as it looks as real bug, not only test issue 16:27:28 and that's all about this job basically - issues are not related to neutron 16:27:38 \o/ 16:27:39 next job is neutron-tempest-dvr-ha-multinode-full 16:27:56 here I also found some issues with volumes and tagging devices 16:28:12 also really a lot of failures of this shelve/unshelve tests 16:28:27 and one test which may be related to neutron: * http://logs.openstack.org/48/580548/2/check/neutron-tempest-dvr-ha-multinode-full/dc195b7/logs/testr_results.html.gz 16:28:49 it looks like here there was an issue with ssh to FIP 16:32:01 from instance's console log it looks like there was even no fixed ip configured on it 16:32:36 mhhh 16:35:48 it will require some more debugging IMO 16:36:02 I don't see anything wrong what could be related to this issue 16:36:33 nova properly started and then unpaused this instance, so neutron should send notification that port is active 16:36:44 so L2 agent (probably) did his job right 16:37:25 do You see anything related to this issue there maybe? 16:37:57 no 16:38:45 do you want me to spend sometime with it? 16:38:51 I will try to investigate it more if I will have some time for it 16:39:11 You are already very busy so I will take care of it 16:39:37 thanks 16:40:09 #action slaweq to check why there was no network connection in neutron-tempest-dvr-ha-multinode-full test 16:40:43 from other jobs, there was also one issue with ssh to instance in neutron-tempest-iptables_hybrid: * http://logs.openstack.org/07/574907/20/gate/neutron-tempest-iptables_hybrid/1594a88/logs/testr_results.html.gz 16:42:37 but here it might be related to volume as it's "boot from volume" test or something like that 16:42:52 I will check that also if that is related to neutron or not 16:42:58 ack 16:43:17 #action slaweq to check neutron-tempest-iptables_hybrid ssh connection issue 16:43:30 in other tests I found only some failures related to volumes 16:43:42 and one timeout in * neutron-tempest-plugin-dvr-multinode-scenario 16:43:53 http://logs.openstack.org/83/584383/2/check/neutron-tempest-plugin-dvr-multinode-scenario/2e3af27/job-output.txt.gz 16:44:17 but as this timeout was only once, I don't think that we should now do something with it 16:44:26 ok 16:44:29 and that's all from my side for today 16:44:42 so, all in all, we are in good shape, aren't we? 16:44:49 basically I think that we are quite good now with CI :) 16:44:56 Great 16:45:10 good job slaweq 16:45:20 I also think that there is a bit less rechecks recently on patches 16:45:41 and patches are merged a bit faster now IMO 16:45:52 thx mlavalle but it's not because of me :) 16:46:13 it's all team's work :) 16:46:16 you help a lot 16:46:28 thx :) 16:46:40 I don't have anything else for today 16:46:53 do You want to talk about something else maybe? 16:46:59 me neither, let's go and try to nail this release ;-) 16:47:14 sure 16:47:16 thx a lot 16:47:24 #endmeeting