16:00:05 #startmeeting neutron_ci 16:00:09 Meeting started Tue May 15 16:00:05 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:09 hello 16:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:12 The meeting name has been set to 'neutron_ci' 16:00:32 o/ 16:00:39 hi njohnston 16:00:47 o/ 16:00:50 o/ 16:02:01 I just pinged ihar and haleyb also in neutron channel 16:02:11 maybe they will join us also 16:02:22 but I think that we can start 16:02:28 #topic Actions from previous meetings 16:02:37 mlavalle will check why trunk tests are failing in dvr multinode scenario 16:02:50 slaweq: didn't have time last week. sorry 16:03:32 so moving to next week? 16:03:39 yes, please 16:04:02 #action mlavalle will check why trunk tests are failing in dvr multinode scenario 16:04:08 Thanks 16:04:08 thx 16:04:16 slaweq will continue debugging slow rally tests issue 16:04:28 I also didn't have time to debug it 16:04:36 #action slaweq will continue debugging slow rally tests issue 16:04:43 slaweq to fix neutron-dynamic-routing scenario tests bug 16:05:04 fressi sent fixes: https://review.openstack.org/#/c/567736/ and https://review.openstack.org/#/c/567742/ 16:05:17 both are merged now so I hope it should be better 16:05:37 next one is: 16:05:37 slaweq to make 2 scenario jobs gating 16:05:49 ovsfw job done: https://review.openstack.org/#/c/567055/ 16:06:14 neutron-tempest-plugin-scenario-linuxbridge done: https://review.openstack.org/#/c/567057/ 16:06:24 eh sorry for late 16:06:29 hi ihar 16:06:40 second one was merged today 16:07:09 I also did patch to add those new jobs to dashboard: https://review.openstack.org/#/c/567062/ and it's waiting for review now 16:07:26 mlavalle: would be good if You could +1 it :) 16:07:57 Done 16:08:13 thx 16:08:25 but more importatly, ihar +1ed it also 16:08:38 thx ihar too :) 16:08:40 :))) 16:09:04 * slaweq wish that 2 x +1 could be counted as +2 :) 16:09:26 and the last from previous week is: 16:09:28 haleyb will debug failing security groups fullstack test: https://bugs.launchpad.net/neutron/+bug/1767829 16:09:29 Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:09:51 but haleyb is not here probably 16:10:00 and I think that he didn't do anything with that 16:10:07 it's assigned to you btw 16:10:15 ihar: yes, I know 16:10:32 but last week haleyb told that he will try to check so action was assigned to him 16:10:33 :) 16:11:02 yeah I get, just saying should prolly be assigned to him to nag him some more 16:11:07 I was trying to reproduce it on top of master branch with some DNM patch with extra logging but I couldn't 16:11:51 but he has some patch which cause this issue 100% times AFAIR - maybe I will just try to debug on top of his patch and will find something 16:12:01 I will assign it to me now :) 16:12:02 yeah, if you assign it to him, you'll have more nagging power 16:12:10 ;-) 16:12:32 #action slaweq to debug failing security groups fullstack test: https://bugs.launchpad.net/neutron/+bug/1767829 16:12:34 Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:12:49 I know that haleyb is very busy so I will try to check it maybe 16:13:11 ok, that were all actions from previous week 16:13:22 let's move to next topic then 16:13:26 #topic Grafana 16:13:32 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:15:43 how come not a single failure in fullstack / functional for 4 days! 16:16:08 (there are some lately, but there was a long trench) 16:16:27 actually 5 days 16:16:35 ihar: are You talking about gate queue? 16:16:45 I think it was very similar in the past here 16:16:47 yea 16:16:57 oh right it's gate, maybe nothing was in gate 16:17:21 but speaking about Fullstack - it was on high failure rate - around 40-50% at the end of last week - but as I went through failed patches from last week I didn’t found one (or 2) general reasons of such failures, 16:18:04 so, entropy? 16:18:23 also when I today went through different patches and looking for some example failures I saw quite many jobs marked WIP or something like that when many jobs were red 16:18:55 did they have fullstack failures? 16:18:59 so I didn't check reasons of failures in such jobs and maybe it had some impact on graphs also 16:19:22 mlavalle: yes, I'm talking about patches which had failures on (almost) everything - fullstack also :) 16:19:35 example of a wip job failure? 16:19:44 ok, yes, maybe they had an impact 16:20:01 ihar: give me a sec 16:20:42 e.g. https://review.openstack.org/#/c/533850/ 16:21:26 https://review.openstack.org/#/c/567621/ 16:21:33 https://review.openstack.org/#/c/549168/ 16:21:42 what's 'wip' you refer to 16:22:30 ok, those are not marked as WIP but I assumed that if every job is red then there is probably something wrong with patch 16:23:10 I don't know the oher one, but 567621 is from a co-worker of mine and it is wip 16:23:21 I'll ask him to mark it so 16:23:28 ok gotcha. or there is an infra issue. 16:23:52 ihar: right, but then it would be probably seen on graphs as well :) 16:24:30 mlavalle: I just wanted to talk that I didn't check failures of one of jobs in such patch :) 16:24:52 ack 16:25:28 talking about fullstack, there were also failures like: https://review.openstack.org/#/c/499908/ which is clearly related to patch 16:26:05 so, to sum up I didn't found anything new which would repeat many times :) 16:26:29 but I found one issue: http://logs.openstack.org/82/568282/3/check/neutron-fullstack/21baf5f/logs/testr_results.html.gz 16:26:38 which is new for me 16:27:18 so for now just for the record - there was such issue at least once and we should check if it will not repeat more times :) 16:28:22 any questions related to fullstack tests (or grafana)? 16:28:43 I'm good 16:28:57 no 16:29:04 so let's move on 16:29:09 #topic Scenarios 16:29:54 Neutron-tempest-plugin-dvr-multinode-scenario failures rate is better now, after marking few tests as unstable 16:29:57 but it's still not perfect 16:30:00 sorry, i had a meeting conflict and didn't see your ping, will try and watch out of one eye 16:30:11 haleyb: sure :) 16:30:44 I found one example of failure for dvr-multinode scenario: http://logs.openstack.org/86/567086/5/check/neutron-tempest-plugin-dvr-multinode-scenario/59d052d/job-output.txt.gz#_2018-05-15_09_04_47_980425 16:31:25 hm, AuthenticationException 16:31:37 interesting, it's usually timeouts 16:32:01 means that metadata didn't provide the ssh fingerprint 16:32:28 the test suck, it doesn't dump instance console output 16:32:37 that would give us an idea what fails there 16:32:42 yes, there is no console log there 16:32:44 but probably metadata down? 16:33:36 logs of metadata agent looks fine 16:33:56 at first glance at least :) 16:34:45 it will look just fine if instance can't reach it 16:34:47 :) 16:35:02 but some instances reached it for sure 16:36:52 I think the next step is make the test case dump console 16:37:01 I can try to read test code and check if there is maybe anything what could be changed/improved to catch console log in such case 16:37:02 (and probably there are more test cases that have the same issue) 16:37:23 most likely 16:37:37 are You fine with that? 16:37:37 that would be nice of you if you can do it 16:37:42 :) 16:38:01 note the dumping often times is done in connect check methods 16:38:07 not directly in test case 16:38:12 #action slaweq to check why some scenario tests don't log instance console log 16:38:26 yes, I think I saw it before already somewhere :) 16:39:35 ok, I have one more thing about scenario tests 16:39:41 and it is from frickler 16:40:28 he asked me last week to discuss if we can switch neutron-tempest-plugin-designate-scenario job to be voting (gating later) as it looks quite stable since long time 16:40:57 so I'm asking what You think about it? 16:41:10 if failure rate is good, yes, let's do it 16:41:19 I think yes it's a good move. it's very stable. 16:41:20 http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=9&fullscreen 16:41:43 yes, it's below 10 % for most of the time at least in last 30 days 16:41:49 so I also think that it is stable 16:41:54 yes, let';s do it 16:42:06 ok, so I will switch it to be voting for now 16:42:18 and later we can think about making it gating also - right? 16:42:37 yes 16:42:46 #action slaweq to switch neutron-tempest-plugin-designate-scenario to be voting 16:42:48 thx 16:42:59 ok, moving on 16:43:00 #topic Rally 16:43:00 thanks to you and frickler ! 16:43:19 thx frickler :) 16:43:34 in rally tests there is nothing new 16:43:57 still only failures which I found are related to global timeout, e.g.: http://logs.openstack.org/12/470912/43/check/neutron-rally-neutron/101415f/job-output.txt.gz 16:44:13 so I need to investigate it 16:44:21 this week 16:44:42 next topic is 16:44:42 #topic Periodic 16:45:02 2 jobs were failing during last week: 16:45:36 neutron-tempest-postgres-full - failed few times but as I checked logs, failures related to some error with volumes or nova service list (or something like that) 16:45:47 so I didn't debug it more 16:46:12 Neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4 - this one should be fixed since today as fressi's patches are merged 16:46:16 psql is traditionally unstable. I don't think anyone has taste to fix it so as long as it passes from time to time it's good enough. 16:46:35 ihar: so it's good enough :) 16:47:05 and that was all about periodic jobs 16:47:13 anything to add? questions? 16:47:21 not from me 16:47:37 ok, so last topic for today 16:47:38 #topic Open discussion 16:47:56 I just wanted to ask if You are fine to cancel next week's meeting due to summit? 16:48:03 if so I will sent an email about that 16:48:13 I was going to suggest it 16:48:17 :) 16:48:24 I am good for less meetings 16:48:29 I have a patch to raise 16:48:41 for unit tests 16:48:43 https://review.openstack.org/#/c/568390/ 16:48:52 it is *not* ready and I will try to get it ready today 16:49:08 but if not someone better pick it up from me because next time I will get to it the next Monday 16:49:14 and it fixes a unit test gate issue 16:49:27 ok ihar, go ahead :) 16:49:30 (should be easy to resolve the last standing failure there though, just a heads-up) 16:49:41 ignore my last sentence :) 16:49:43 ok, thanks for the heads up 16:50:30 ihar: I can help You with it if You want 16:50:57 I have one item I wanted to share 16:50:58 just take it over if I don't get it done today 16:51:05 ihar: sure 16:51:18 jlibosva: go on 16:51:31 We have a community goal to switch to python3 16:51:38 we used to have functional and fullstack experimental jobs 16:52:08 we spent some time in the past fixing bugs so functional suite can be executed fine without any errors 16:52:19 we had one last thing missing, which was caused by bug in oslo.service 16:52:27 aaaaand, this bug has been fixed after a year or so :) 16:52:38 \o/ 16:52:39 but we lost the experimental jobs 16:52:44 so I have a patch to add them back - https://review.openstack.org/#/c/568282/ 16:53:01 in the meantime, there are new issues in functional job, I'm working on fixes. I think there are two issues only 16:53:28 I think it would be worth to have the functional job passing and then move it to check queue - one more thing that could make the zuul vote -1, right? :) 16:53:34 what are your opinions? 16:53:47 maybe as non-voting for the beginning 16:53:57 +1 16:54:00 I agree with your proposal 16:54:08 I mean, it will create yet another job and it seems we have a few already in Neutron :-/ 16:54:08 of course non voting first but otherwise I just +w'd the jobs 16:54:14 ihar: thanks 16:54:20 as slaweq says, non voting first 16:54:24 ok, I'll be working on patches for it, thanks :) 16:54:27 more jobs is fine if we manage failures 16:54:34 thx jlibosva 16:54:40 gaining some Python 3 knowledge on the fly 16:54:44 arguably the project is doing great so far 16:55:55 ok, someone wants to talk about anything else? 16:56:11 if no, we can enjoy our free 4 minutes :) 16:56:33 4 mins is fine with me 16:56:36 cheers 16:56:41 Thanks 16:56:42 thx for attending 16:56:45 thx bye 16:56:49 we will see in 2 weeks than 16:56:52 see You 16:56:52 https://review.openstack.org/#/c/568282/ 16:56:58 #undo 16:56:59 Removing item from minutes: #link https://review.openstack.org/#/c/568282/ 16:57:04 #endmeeting