16:01:47 <ihrachys> #startmeeting neutron_ci 16:01:49 <openstack> Meeting started Tue Mar 13 16:01:47 2018 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:52 <openstack> The meeting name has been set to 'neutron_ci' 16:01:56 <mlavalle> o/ 16:02:13 <jlibosva> o/ 16:02:16 <ihrachys> I will need to drop off today in the middle of the meeting so I would like someone to take over the chair from there 16:02:28 <ihrachys> volunteers welcome 16:02:37 <ihrachys> #topic Actions from prev meeting 16:02:45 <ihrachys> "slaweq to check why it takes too long to raise interfaces in linuxbirdge scenarios job, and to compare with dvr" 16:03:24 <ihrachys> slaweq, around? 16:03:57 <mlavalle> he was here 10 minutes ago 16:04:03 <mlavalle> we finished the QoS meeting 16:04:05 <jlibosva> I think we agreed last time it was because of multinode? 16:04:23 <jlibosva> that dvr is multinode while linuxbridge is allinone 16:05:08 <ihrachys> yeah though I am not sure if we made any progress in terms of patches to bump timeout 16:05:31 <slaweq> hi 16:05:33 <jlibosva> iirc it was merged 16:05:34 <slaweq> sorry for late 16:05:51 <ihrachys> https://review.openstack.org/550832 ? 16:05:59 <jlibosva> yep 16:06:08 <slaweq> ihrachys: only thing I could do was increase ssh timeout 16:06:21 <ihrachys> ok great. I also noticed on grafana that the linuxbridge job is very stable now 16:06:30 <ihrachys> so that probably actually helped 16:06:31 <slaweq> from logs which I checked it wasn't looking that there is any issue with neutron agent's or something like that 16:06:41 <ihrachys> great work slaweq 16:06:50 <slaweq> thx 16:06:59 <mlavalle> he got an ovation for this in the Neutron meeting 16:07:07 <slaweq> LOL 16:07:11 <mlavalle> standing ovation 16:07:19 <ihrachys> next was "ihrachys to check grafana stats several days later when dust settles" 16:07:21 <jlibosva> indeed :) 16:07:27 <ihrachys> that's about fullstack / dvr instability 16:07:47 <ihrachys> dvr scenarios are still problematic, but fullstack seems to be more stable than functional now for a week or so 16:07:57 <slaweq> \o/ 16:08:03 <ihrachys> looks like if we keep functional voting we should definitely have fullstack too 16:08:10 <ihrachys> we can check charts later 16:08:20 <ihrachys> "jlibosva to look into agent startup failure and missing logs in: http://logs.openstack.org/83/549283/1/check/neutron-fullstack/cbad08a/logs/" 16:08:45 <jlibosva> I tried to reproduce the l3 agent failing to start locally but no success 16:08:59 <jlibosva> I posted a patch to log failed processes though, I think it merged 16:09:13 <jlibosva> https://review.openstack.org/#/c/550566/ 16:09:29 <slaweq> maybe it was some issue in some other package and was fixed in meantime 16:10:04 <ihrachys> nice, we will revisit it the next time it hits 16:10:26 <ihrachys> these are all AIs we had 16:10:30 <ihrachys> #topic Grafana 16:10:37 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:11:15 <ihrachys> as I said before, fullstack looks very nice. there were some failure bumps during the week but they match with other bumps for other jobs 16:11:25 <ihrachys> now fullstack at 6% or smth like that 16:11:36 <ihrachys> in contrast functional is almost 20% 16:12:19 <ihrachys> and as for tempest, linuxbridge scenarios are looking very good 16:12:36 <ihrachys> like 2-3% now 16:12:52 <ihrachys> in contrast to dvr scenarios that are still 35% 16:13:04 <slaweq> I hope it will be like that now 16:13:20 <slaweq> as I was checking during weekend with this longer ssh timeout it was passing every time 16:13:22 <ihrachys> and dvr-ha job is also at the same high failure rate 16:13:50 <slaweq> I think that I saw some trunk related test in dvr failing often 16:14:18 <ihrachys> another candidate with reasonable behavior is -ovsfw- though it will need more monitoring I believe 16:14:29 <ihrachys> we can dive in each type 16:14:32 <ihrachys> #topic Fullstack 16:14:50 <ihrachys> considering that the job is quite stable for a while, should we make it vote now? 16:14:59 <ihrachys> we are at the start of cycle so that's good 16:15:28 <slaweq> sounds good for me 16:15:31 <slaweq> we can try 16:15:42 <jlibosva> would that require to also include fullstack in gate queue? 16:16:04 <ihrachys> well we can experiment with partial enablement like we did with functional of course 16:16:27 <ihrachys> not that I am saying I would suggest doing it 16:17:10 <ihrachys> mlavalle, thoughts 16:17:43 <mlavalle> I'd say go for it 16:17:59 <ihrachys> slaweq, do you want the honor of posting the patch? 16:18:26 <slaweq> ihrachys: I would love to :) 16:18:28 <slaweq> thx 16:18:40 <ihrachys> #action slaweq to enable voting for fullstack 16:18:52 <ihrachys> mlavalle, jlibosva so do we go with both queues or check only? 16:19:05 <mlavalle> let's start with check 16:19:10 <slaweq> ++ 16:19:16 <ihrachys> ok 16:19:17 <slaweq> let's do it with small steps 16:19:31 <ihrachys> we will revisit gate in like 2 weeks then 16:19:37 <jlibosva> that's what my question was above, I think there is a "rule" that voting jobs should be in gate queue too, isn't there? 16:19:48 <ihrachys> jlibosva, there is tradition for sure 16:19:58 <ihrachys> but we had functional not in gate but in check for some time 16:20:17 <ihrachys> we can post and see if infra objects 16:20:22 <ihrachys> I don't mind enabling in both queues 16:20:56 <slaweq> there is also neutron-rally-neutron which is not in gate currently 16:21:01 <slaweq> and is voting in check queue 16:21:42 <ihrachys> I will skip discussion of fullstack failures for today 16:21:52 <ihrachys> #topic Rally 16:22:07 <ihrachys> do we know what's the background behind rally not gating? 16:22:20 <ihrachys> is it because it breaks from time to time due to instabilities in rally itself? 16:22:53 <ihrachys> I remember there were multiple cases when a patch in rally broke our gates 16:22:58 <ihrachys> so maybe that 16:23:21 <ihrachys> I am not too excited to enable gating for it 16:23:27 <ihrachys> I mean, gate queue 16:23:32 <mlavalle> I don't know the details, but it must be that 16:23:57 <ihrachys> ok. I would focus on other jobs for now where we have better understanding and control. 16:24:03 <ihrachys> #topic Scenarios 16:24:08 <clarkb> glance recently removed rally beacuse it wasn't working at all for them 16:24:12 <clarkb> (just a datapoint) 16:24:26 <mlavalle> good to know, clarkb 16:24:57 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=8&fullscreen 16:25:32 <ihrachys> considering linuxbridge scenarios job is in good shape (failure spikes are reflected in other jobs, it's at 5% right now), do we want to consider it for voting too? 16:25:55 <mlavalle> yes 16:26:02 <mlavalle> let's go for it 16:26:07 <mlavalle> early in the cycle 16:26:18 <ihrachys> objections? 16:26:31 <slaweq> also only for check queue for now? 16:26:47 * ihrachys waves at rossella_s 16:27:14 <ihrachys> I would imagine yes, check queue for now 16:27:16 <rossella_s> ihrachys, hi! 16:27:29 <ihrachys> ok seems like no objections 16:27:34 <mlavalle> hola rossella_s 16:27:48 <ihrachys> #action slaweq to enable voting for linuxbridge scenarios 16:27:59 <slaweq> ok, sure 16:28:11 <rossella_s> mlavalle, hola :) 16:28:18 <slaweq> hi rossella_s 16:28:27 <ihrachys> slaweq, can you take over the meeting from here? I gotta bail out. 16:28:45 <slaweq> sure 16:28:47 <ihrachys> #chair slaweq 16:28:48 <openstack> Current chairs: ihrachys slaweq 16:28:55 <slaweq> do You have link to agenda somewhere? 16:29:02 <ihrachys> there is no agenda :) 16:29:19 <mlavalle> we can follow previous meetings 16:29:24 <ihrachys> but basically look at dvr scenarios that would be the main thing :) 16:29:27 <slaweq> ok, sure 16:29:36 <ihrachys> ok bye folks! 16:29:36 <slaweq> I will take care 16:29:41 <slaweq> bye ihrachys 16:29:54 <jlibosva> o/ 16:30:14 <slaweq> so moving on with scenario jobs 16:30:29 <slaweq> dvr and dvr-ha are still on high failure ratio 16:31:10 <slaweq> about dvr when I was testing ssh timeout patch I had few times same errors like on http://logs.openstack.org/32/550832/2/check/neutron-tempest-plugin-dvr-multinode-scenario/6186573/logs/testr_results.html.gz 16:32:00 <slaweq> or http://logs.openstack.org/32/550832/2/check/neutron-tempest-plugin-dvr-multinode-scenario/136c7ef/logs/testr_results.html.gz 16:32:10 <slaweq> but every time it was failure with trunk port 16:32:16 <jlibosva> is it just test_subport_connectivity or also lifecycle? 16:32:30 <jlibosva> ah, second link answers my question :) sorry 16:32:33 <mlavalle> yeah, it's trunk ports 16:32:34 <slaweq> jlibosva: in second link it was lifecycle 16:32:47 <slaweq> any ideas about that? 16:33:07 <slaweq> on third one there was both of them even: http://logs.openstack.org/32/550832/2/check/neutron-tempest-plugin-dvr-multinode-scenario/6d09fd8/logs/testr_results.html.gz 16:33:52 <slaweq> I just found another example of same tests failed 16:34:06 <slaweq> do You have time/resources to check that this week? 16:34:49 <slaweq> I don't have any dvr environment and don't have experience with both dvr and trunk so it might be hard for me 16:36:43 <haleyb> I know dvr but not trunk, do you just need a multi-node environment? 16:36:55 <jlibosva> I know trunk and a bit of dvr :) 16:37:03 <slaweq> so we have a winner :) 16:37:30 <jlibosva> I can have a look and if I have issues with dvr, I'll poke haleyb 16:37:45 <slaweq> thx jlibosva 16:38:14 <slaweq> #action jlibosva to take a look on dvr trunk tests issue 16:39:04 <slaweq> I'm trying to find some failed neutron-tempest-dvr-ha-multinode-full test now as this one is also around 50% fails 16:40:17 <slaweq> I found one http://logs.openstack.org/14/529814/5/check/neutron-tempest-dvr-ha-multinode-full/d8cfbdf/logs/testr_results.html.gz 16:40:30 <slaweq> but this doesn't look like related to neutron 16:40:52 <slaweq> do You have maybe any other example? 16:41:24 <haleyb> that might be an old bug? https://bugs.launchpad.net/neutron/+bug/1717302 16:41:25 <openstack> Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [High,Confirmed] - Assigned to Brian Haley (brian-haley) 16:44:06 <slaweq> haleyb: will You continue work on it? 16:44:19 <jlibosva> do we have a logstash query for the above? 16:44:38 <slaweq> I don't know 16:45:20 <mlavalle> No we don't have one 16:45:56 <haleyb> slaweq: i have not reproduced it locally yet 16:46:50 <jlibosva> maybe this could be used? http://bit.ly/2Fz6FPs 16:47:32 <jlibosva> it looks to me that the issue is gone 16:48:01 <slaweq> jlibosva: are You sure that logs from this job are properly indexed in logstash? 16:48:08 <mlavalle> no hist over the past 7 days 16:48:12 <slaweq> AFAIR there was some issue with fullstack for example 16:48:36 <jlibosva> it's a normal tempest job and afaik all service logs are indexed 16:48:44 <slaweq> ok, it should be 16:49:01 <jlibosva> or is the error coming from neutron-keepalived-state-change ? 16:49:06 <slaweq> ok, I will try to find more example of failures for this job 16:49:20 <jlibosva> maybe that one is not indexed, I don't know :) 16:49:37 <slaweq> and if I will find something what is happen often I will just report as bugs 16:49:41 <slaweq> what You think? 16:49:53 <mlavalle> sounds good 16:50:21 <slaweq> #action slaweq check reasons of failures of neutron-tempest-dvr-ha-multinode-full 16:51:09 <slaweq> so, other tempest jobs are below 10% of failure so it's fine IMO 16:52:02 <slaweq> periodic jobs looks that are fine also 16:52:31 <slaweq> do You have anything to add? 16:53:00 <mlavalle> I don't 16:53:22 <slaweq> #topic Open discussion 16:53:50 <slaweq> I just want to say that I found today that our scenario jobs are using neutron-legacy lib still 16:54:00 <slaweq> so I want to switch it to lib/neutron 16:54:25 <slaweq> are You fine with that? or there is any reason why it's like that and I shouldn't change it 16:55:32 <mlavalle> not that I'm aware of 16:56:08 <slaweq> ok, so I will send a patch and check how it works then 16:56:42 <slaweq> #action slaweq switch scenario jobs to lib/neutron 16:57:25 <slaweq> ok, so if You don't have anything I think we are done for today 16:58:14 * jlibosva nods 16:58:27 <slaweq> I wasn't for sure as good as ihrachys as a chair but I hope it was somehow :) 16:58:36 <jlibosva> you did really great :) 16:58:46 <slaweq> thank You 16:58:54 <slaweq> bye 16:58:57 <slaweq> #endmeeting