16:00:05 <slaweq> #startmeeting neutron_ci 16:00:09 <openstack> Meeting started Tue May 15 16:00:05 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:09 <slaweq> hello 16:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:12 <openstack> The meeting name has been set to 'neutron_ci' 16:00:32 <njohnston> o/ 16:00:39 <slaweq> hi njohnston 16:00:47 <mlavalle> o/ 16:00:50 <jlibosva> o/ 16:02:01 <slaweq> I just pinged ihar and haleyb also in neutron channel 16:02:11 <slaweq> maybe they will join us also 16:02:22 <slaweq> but I think that we can start 16:02:28 <slaweq> #topic Actions from previous meetings 16:02:37 <slaweq> mlavalle will check why trunk tests are failing in dvr multinode scenario 16:02:50 <mlavalle> slaweq: didn't have time last week. sorry 16:03:32 <slaweq> so moving to next week? 16:03:39 <mlavalle> yes, please 16:04:02 <slaweq> #action mlavalle will check why trunk tests are failing in dvr multinode scenario 16:04:08 <mlavalle> Thanks 16:04:08 <slaweq> thx 16:04:16 <slaweq> slaweq will continue debugging slow rally tests issue 16:04:28 <slaweq> I also didn't have time to debug it 16:04:36 <slaweq> #action slaweq will continue debugging slow rally tests issue 16:04:43 <slaweq> slaweq to fix neutron-dynamic-routing scenario tests bug 16:05:04 <slaweq> fressi sent fixes: https://review.openstack.org/#/c/567736/ and https://review.openstack.org/#/c/567742/ 16:05:17 <slaweq> both are merged now so I hope it should be better 16:05:37 <slaweq> next one is: 16:05:37 <slaweq> slaweq to make 2 scenario jobs gating 16:05:49 <slaweq> ovsfw job done: https://review.openstack.org/#/c/567055/ 16:06:14 <slaweq> neutron-tempest-plugin-scenario-linuxbridge done: https://review.openstack.org/#/c/567057/ 16:06:24 <ihar> eh sorry for late 16:06:29 <slaweq> hi ihar 16:06:40 <slaweq> second one was merged today 16:07:09 <slaweq> I also did patch to add those new jobs to dashboard: https://review.openstack.org/#/c/567062/ and it's waiting for review now 16:07:26 <slaweq> mlavalle: would be good if You could +1 it :) 16:07:57 <mlavalle> Done 16:08:13 <slaweq> thx 16:08:25 <mlavalle> but more importatly, ihar +1ed it also 16:08:38 <slaweq> thx ihar too :) 16:08:40 <ihar> :))) 16:09:04 * slaweq wish that 2 x +1 could be counted as +2 :) 16:09:26 <slaweq> and the last from previous week is: 16:09:28 <slaweq> haleyb will debug failing security groups fullstack test: https://bugs.launchpad.net/neutron/+bug/1767829 16:09:29 <openstack> Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:09:51 <slaweq> but haleyb is not here probably 16:10:00 <slaweq> and I think that he didn't do anything with that 16:10:07 <ihar> it's assigned to you btw 16:10:15 <slaweq> ihar: yes, I know 16:10:32 <slaweq> but last week haleyb told that he will try to check so action was assigned to him 16:10:33 <slaweq> :) 16:11:02 <ihar> yeah I get, just saying should prolly be assigned to him to nag him some more 16:11:07 <slaweq> I was trying to reproduce it on top of master branch with some DNM patch with extra logging but I couldn't 16:11:51 <slaweq> but he has some patch which cause this issue 100% times AFAIR - maybe I will just try to debug on top of his patch and will find something 16:12:01 <slaweq> I will assign it to me now :) 16:12:02 <mlavalle> yeah, if you assign it to him, you'll have more nagging power 16:12:10 <mlavalle> ;-) 16:12:32 <slaweq> #action slaweq to debug failing security groups fullstack test: https://bugs.launchpad.net/neutron/+bug/1767829 16:12:34 <openstack> Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:12:49 <slaweq> I know that haleyb is very busy so I will try to check it maybe 16:13:11 <slaweq> ok, that were all actions from previous week 16:13:22 <slaweq> let's move to next topic then 16:13:26 <slaweq> #topic Grafana 16:13:32 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:15:43 <ihar> how come not a single failure in fullstack / functional for 4 days! 16:16:08 <ihar> (there are some lately, but there was a long trench) 16:16:27 <ihar> actually 5 days 16:16:35 <slaweq> ihar: are You talking about gate queue? 16:16:45 <slaweq> I think it was very similar in the past here 16:16:47 <ihar> yea 16:16:57 <ihar> oh right it's gate, maybe nothing was in gate 16:17:21 <slaweq> but speaking about Fullstack - it was on high failure rate - around 40-50% at the end of last week - but as I went through failed patches from last week I didn’t found one (or 2) general reasons of such failures, 16:18:04 <mlavalle> so, entropy? 16:18:23 <slaweq> also when I today went through different patches and looking for some example failures I saw quite many jobs marked WIP or something like that when many jobs were red 16:18:55 <mlavalle> did they have fullstack failures? 16:18:59 <slaweq> so I didn't check reasons of failures in such jobs and maybe it had some impact on graphs also 16:19:22 <slaweq> mlavalle: yes, I'm talking about patches which had failures on (almost) everything - fullstack also :) 16:19:35 <ihar> example of a wip job failure? 16:19:44 <mlavalle> ok, yes, maybe they had an impact 16:20:01 <slaweq> ihar: give me a sec 16:20:42 <slaweq> e.g. https://review.openstack.org/#/c/533850/ 16:21:26 <slaweq> https://review.openstack.org/#/c/567621/ 16:21:33 <slaweq> https://review.openstack.org/#/c/549168/ 16:21:42 <ihar> what's 'wip' you refer to 16:22:30 <slaweq> ok, those are not marked as WIP but I assumed that if every job is red then there is probably something wrong with patch 16:23:10 <mlavalle> I don't know the oher one, but 567621 is from a co-worker of mine and it is wip 16:23:21 <mlavalle> I'll ask him to mark it so 16:23:28 <ihar> ok gotcha. or there is an infra issue. 16:23:52 <slaweq> ihar: right, but then it would be probably seen on graphs as well :) 16:24:30 <slaweq> mlavalle: I just wanted to talk that I didn't check failures of one of jobs in such patch :) 16:24:52 <mlavalle> ack 16:25:28 <slaweq> talking about fullstack, there were also failures like: https://review.openstack.org/#/c/499908/ which is clearly related to patch 16:26:05 <slaweq> so, to sum up I didn't found anything new which would repeat many times :) 16:26:29 <slaweq> but I found one issue: http://logs.openstack.org/82/568282/3/check/neutron-fullstack/21baf5f/logs/testr_results.html.gz 16:26:38 <slaweq> which is new for me 16:27:18 <slaweq> so for now just for the record - there was such issue at least once and we should check if it will not repeat more times :) 16:28:22 <slaweq> any questions related to fullstack tests (or grafana)? 16:28:43 <mlavalle> I'm good 16:28:57 <ihar> no 16:29:04 <slaweq> so let's move on 16:29:09 <slaweq> #topic Scenarios 16:29:54 <slaweq> Neutron-tempest-plugin-dvr-multinode-scenario failures rate is better now, after marking few tests as unstable 16:29:57 <slaweq> but it's still not perfect 16:30:00 <haleyb> sorry, i had a meeting conflict and didn't see your ping, will try and watch out of one eye 16:30:11 <slaweq> haleyb: sure :) 16:30:44 <slaweq> I found one example of failure for dvr-multinode scenario: http://logs.openstack.org/86/567086/5/check/neutron-tempest-plugin-dvr-multinode-scenario/59d052d/job-output.txt.gz#_2018-05-15_09_04_47_980425 16:31:25 <ihar> hm, AuthenticationException 16:31:37 <ihar> interesting, it's usually timeouts 16:32:01 <ihar> means that metadata didn't provide the ssh fingerprint 16:32:28 <ihar> the test suck, it doesn't dump instance console output 16:32:37 <ihar> that would give us an idea what fails there 16:32:42 <slaweq> yes, there is no console log there 16:32:44 <ihar> but probably metadata down? 16:33:36 <slaweq> logs of metadata agent looks fine 16:33:56 <slaweq> at first glance at least :) 16:34:45 <ihar> it will look just fine if instance can't reach it 16:34:47 <ihar> :) 16:35:02 <slaweq> but some instances reached it for sure 16:36:52 <ihar> I think the next step is make the test case dump console 16:37:01 <slaweq> I can try to read test code and check if there is maybe anything what could be changed/improved to catch console log in such case 16:37:02 <ihar> (and probably there are more test cases that have the same issue) 16:37:23 <mlavalle> most likely 16:37:37 <slaweq> are You fine with that? 16:37:37 <ihar> that would be nice of you if you can do it 16:37:42 <slaweq> :) 16:38:01 <ihar> note the dumping often times is done in connect check methods 16:38:07 <ihar> not directly in test case 16:38:12 <slaweq> #action slaweq to check why some scenario tests don't log instance console log 16:38:26 <slaweq> yes, I think I saw it before already somewhere :) 16:39:35 <slaweq> ok, I have one more thing about scenario tests 16:39:41 <slaweq> and it is from frickler 16:40:28 <slaweq> he asked me last week to discuss if we can switch neutron-tempest-plugin-designate-scenario job to be voting (gating later) as it looks quite stable since long time 16:40:57 <slaweq> so I'm asking what You think about it? 16:41:10 <mlavalle> if failure rate is good, yes, let's do it 16:41:19 <ihar> I think yes it's a good move. it's very stable. 16:41:20 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=9&fullscreen 16:41:43 <slaweq> yes, it's below 10 % for most of the time at least in last 30 days 16:41:49 <slaweq> so I also think that it is stable 16:41:54 <mlavalle> yes, let';s do it 16:42:06 <slaweq> ok, so I will switch it to be voting for now 16:42:18 <slaweq> and later we can think about making it gating also - right? 16:42:37 <mlavalle> yes 16:42:46 <slaweq> #action slaweq to switch neutron-tempest-plugin-designate-scenario to be voting 16:42:48 <slaweq> thx 16:42:59 <slaweq> ok, moving on 16:43:00 <slaweq> #topic Rally 16:43:00 <mlavalle> thanks to you and frickler ! 16:43:19 <slaweq> thx frickler :) 16:43:34 <slaweq> in rally tests there is nothing new 16:43:57 <slaweq> still only failures which I found are related to global timeout, e.g.: http://logs.openstack.org/12/470912/43/check/neutron-rally-neutron/101415f/job-output.txt.gz 16:44:13 <slaweq> so I need to investigate it 16:44:21 <slaweq> this week 16:44:42 <slaweq> next topic is 16:44:42 <slaweq> #topic Periodic 16:45:02 <slaweq> 2 jobs were failing during last week: 16:45:36 <slaweq> neutron-tempest-postgres-full - failed few times but as I checked logs, failures related to some error with volumes or nova service list (or something like that) 16:45:47 <slaweq> so I didn't debug it more 16:46:12 <slaweq> Neutron-dynamic-routing-dsvm-tempest-with-ryu-master-scenario-ipv4 - this one should be fixed since today as fressi's patches are merged 16:46:16 <ihar> psql is traditionally unstable. I don't think anyone has taste to fix it so as long as it passes from time to time it's good enough. 16:46:35 <slaweq> ihar: so it's good enough :) 16:47:05 <slaweq> and that was all about periodic jobs 16:47:13 <slaweq> anything to add? questions? 16:47:21 <mlavalle> not from me 16:47:37 <slaweq> ok, so last topic for today 16:47:38 <slaweq> #topic Open discussion 16:47:56 <slaweq> I just wanted to ask if You are fine to cancel next week's meeting due to summit? 16:48:03 <slaweq> if so I will sent an email about that 16:48:13 <mlavalle> I was going to suggest it 16:48:17 <slaweq> :) 16:48:24 <ihar> I am good for less meetings 16:48:29 <ihar> I have a patch to raise 16:48:41 <ihar> for unit tests 16:48:43 <ihar> https://review.openstack.org/#/c/568390/ 16:48:52 <ihar> it is *not* ready and I will try to get it ready today 16:49:08 <ihar> but if not someone better pick it up from me because next time I will get to it the next Monday 16:49:14 <ihar> and it fixes a unit test gate issue 16:49:27 <slaweq> ok ihar, go ahead :) 16:49:30 <ihar> (should be easy to resolve the last standing failure there though, just a heads-up) 16:49:41 <slaweq> ignore my last sentence :) 16:49:43 <mlavalle> ok, thanks for the heads up 16:50:30 <slaweq> ihar: I can help You with it if You want 16:50:57 <jlibosva> I have one item I wanted to share 16:50:58 <ihar> just take it over if I don't get it done today 16:51:05 <slaweq> ihar: sure 16:51:18 <slaweq> jlibosva: go on 16:51:31 <jlibosva> We have a community goal to switch to python3 16:51:38 <jlibosva> we used to have functional and fullstack experimental jobs 16:52:08 <jlibosva> we spent some time in the past fixing bugs so functional suite can be executed fine without any errors 16:52:19 <jlibosva> we had one last thing missing, which was caused by bug in oslo.service 16:52:27 <jlibosva> aaaaand, this bug has been fixed after a year or so :) 16:52:38 <slaweq> \o/ 16:52:39 <jlibosva> but we lost the experimental jobs 16:52:44 <jlibosva> so I have a patch to add them back - https://review.openstack.org/#/c/568282/ 16:53:01 <jlibosva> in the meantime, there are new issues in functional job, I'm working on fixes. I think there are two issues only 16:53:28 <jlibosva> I think it would be worth to have the functional job passing and then move it to check queue - one more thing that could make the zuul vote -1, right? :) 16:53:34 <jlibosva> what are your opinions? 16:53:47 <slaweq> maybe as non-voting for the beginning 16:53:57 <njohnston> +1 16:54:00 <mlavalle> I agree with your proposal 16:54:08 <jlibosva> I mean, it will create yet another job and it seems we have a few already in Neutron :-/ 16:54:08 <ihar> of course non voting first but otherwise I just +w'd the jobs 16:54:14 <jlibosva> ihar: thanks 16:54:20 <mlavalle> as slaweq says, non voting first 16:54:24 <jlibosva> ok, I'll be working on patches for it, thanks :) 16:54:27 <ihar> more jobs is fine if we manage failures 16:54:34 <slaweq> thx jlibosva 16:54:40 <jlibosva> gaining some Python 3 knowledge on the fly 16:54:44 <ihar> arguably the project is doing great so far 16:55:55 <slaweq> ok, someone wants to talk about anything else? 16:56:11 <slaweq> if no, we can enjoy our free 4 minutes :) 16:56:33 <ihar> 4 mins is fine with me 16:56:36 <ihar> cheers 16:56:41 <mlavalle> Thanks 16:56:42 <slaweq> thx for attending 16:56:45 <jlibosva> thx bye 16:56:49 <slaweq> we will see in 2 weeks than 16:56:52 <slaweq> see You 16:56:52 <slaweq> https://review.openstack.org/#/c/568282/ 16:56:58 <slaweq> #undo 16:56:59 <openstack> Removing item from minutes: #link https://review.openstack.org/#/c/568282/ 16:57:04 <slaweq> #endmeeting