16:00:27 <ihrachys> #startmeeting neutron_ci 16:00:28 <openstack> Meeting started Tue Aug 22 16:00:27 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:32 <openstack> The meeting name has been set to 'neutron_ci' 16:00:42 <ihrachys> jlibosva, o/ 16:00:43 <jlibosva> o/ 16:00:55 <ihrachys> #topic Actions from prev week 16:01:06 <ihrachys> "jlibosva to look through uncategorized/latest gate tempest failures (15% failure rate atm)" 16:01:25 * jlibosva slacked, didn't happen 16:01:44 <ihrachys> ok, we will revisit whether it's still needed in Grafana section 16:01:54 <ihrachys> "ihrachys to capture os.kill calls in func tests and see if any of those kill test threads" 16:02:19 <ihrachys> I haven't done THAT but since last kernel update we don't see to see the failure 16:02:29 <ihrachys> jlibosva, right? 16:02:45 <jlibosva> right, and also at the end of last meeting we confirmed it's kernel who kills the process 16:02:47 <ihrachys> I haven't seen it personally for a week + grafana board is very clean 16:03:00 <jlibosva> I haven't checked via logstash 16:03:22 <jlibosva> but I'm able to reproduce that failure on my ubuntu box with the -83 kernel 16:03:36 <ihrachys> -83, is it the old or new? 16:03:42 <jlibosva> I can update and confirm it not reproducible 16:03:46 <jlibosva> ah, sorry. 83 is hte old one 16:03:48 <jlibosva> I think :) 16:03:51 <ihrachys> ok 16:04:12 <ihrachys> let's close https://bugs.launchpad.net/neutron/+bug/1707933 if/when it's confirmed (I personally feel ok doing it without additional step, but you decide) 16:04:14 <openstack> Launchpad bug 1707933 in neutron "functional tests timeout after a test worker killed" [Critical,Confirmed] - Assigned to Jakub Libosvar (libosvar) 16:04:31 <ihrachys> "jlibosva to tweak gate not to create default subnetpool and enable test_convert_default_subnetpool_to_non_default" 16:04:48 <jlibosva> didn't do either :( 16:05:13 <ihrachys> #action jlibosva to tweak gate not to create default subnetpool and enable test_convert_default_subnetpool_to_non_default 16:05:33 <ihrachys> #topic Grafana 16:05:38 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:07:03 <ihrachys> of gating jobs, the only one that stands out is gate-grenade-dsvm-neutron-multinode-ubuntu-xenial 16:07:14 <ihrachys> not a catastrophe, ~10% 16:07:32 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=6&fullscreen 16:08:35 <ihrachys> apart from that, usual check queue violators - fullstack and scenarios - are at 100% 16:08:42 <ihrachys> for fullstack, I still suck 16:09:00 <ihrachys> for scenarios, I believe jlibosva tried to gather troops to triage/fix remaining known failures 16:09:13 <ihrachys> jlibosva, was there success in that? 16:09:47 <jlibosva> Anil is looking at the router migration tests - but no outcome yet 16:10:42 <ihrachys> I also see dvr-ha job failing at 100%. I don't think that was the case a while ago. 16:11:48 <ihrachys> we may need Brian back to have a look at it closer 16:12:05 <ihrachys> he was working on making it voting 16:12:11 <anilvenkata> jlibosva, ihrachys migration tests are failing intermittently 16:12:45 <anilvenkata> jlibosva, ihrachys If consistently failing then it would be easier to fix 16:12:46 <jlibosva> anilvenkata: are you able to reproduce it? 16:12:49 <ihrachys> anilvenkata, ok. any ideas how can we make them not fail? what's missing to understand what happens there? 16:13:12 <anilvenkata> may be I need to add more checks before the test try to ssh 16:13:31 <anilvenkata> checks like HA port is active, etc 16:14:26 <anilvenkata> that will make sure the dataplane path is properly configured, before the test try to ssh 16:14:43 <ihrachys> yeah. another thing to look at would be, whether any helpful info is missing in agent logs. 16:15:17 <anilvenkata> I Will also look at that 16:15:20 <anilvenkata> thanks Ihar 16:15:30 <ihrachys> #topic anilvenkata to add more control plane checks for migrated router ports tests before ssh'ing 16:15:35 <ihrachys> oops 16:15:36 <ihrachys> #undo 16:15:36 <openstack> Removing item from minutes: #topic anilvenkata to add more control plane checks for migrated router ports tests before ssh'ing 16:15:40 <ihrachys> #action anilvenkata to add more control plane checks for migrated router ports tests before ssh'ing 16:15:57 <anilvenkata> :) 16:17:04 <ihrachys> I also noticed today that we still keep linuxbridge grenade job in dashboard even though it's experimental now 16:17:09 <ihrachys> patch to remove it from there: https://review.openstack.org/#/c/496287/ 16:17:36 <ihrachys> jlibosva, back to the earlier action item you had, it doesn't seem it's a pressing need right now to classify tempest failures. 16:18:06 <jlibosva> okies 16:18:10 <ihrachys> #topic Bugs 16:18:20 <ihrachys> I started cleaning up the list: https://bugs.launchpad.net/neutron/+bugs?field.tag=gate-failure 16:18:35 <ihrachys> closed a bunch of those that no longer reproduce / no logs 16:18:41 <ihrachys> we can reopen if they happen again 16:19:17 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1687709 16:19:19 <openstack> Launchpad bug 1687709 in neutron "fullstack: ovs-agents remove trunk bridges that don't belong to them" [High,Confirmed] - Assigned to Jakub Libosvar (libosvar) 16:19:29 <ihrachys> jlibosva, is it correct that ovs isolation work should help with that? 16:19:37 <jlibosva> I'm gonna close the functional killer bug, no hits in last week 16:19:50 <ihrachys> jlibosva, yep, go for it 16:19:51 <jlibosva> ihrachys: yes, that's still something I'd like to implement but can't find spare time 16:20:06 <ihrachys> jlibosva, is everything ready on ovsdbapp side for that? 16:20:30 <jlibosva> my plan is not to use the sandbox ovs, so ovsdbapp is no longer a requirement 16:20:46 <jlibosva> the isolation will use normal ovsdb-server process running in namespace instead 16:21:04 <jlibosva> as we need an actual access to kernel datapath, which sandbox mocks 16:21:17 <ihrachys> oh ok 16:22:06 <ihrachys> I haven't walked through all bugs yet; I am going to continue cleaning up the gate-failure list this week 16:22:15 <ihrachys> #action ihrachys to complete gate-failure cleanup 16:23:13 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1712278 16:23:14 <openstack> Launchpad bug 1712278 in neutron "Default qos policy doesn't work when creating network" [High,In progress] - Assigned to Hirofumi Ichihara (ichihara-hirofumi) 16:23:22 <ihrachys> we seem to have a patch here: https://review.openstack.org/#/c/496139/ 16:23:54 <ihrachys> I am not sure how it happened. don't we have test coverage for that? 16:24:55 <jlibosva> on higher level, maybe tempest or fullstack - but both are non-voting 16:25:16 <ihrachys> oh right. 16:25:40 <jlibosva> hmm, actually api tests should be able to catch that 16:26:37 <ihrachys> for some reason, logstash shows osc functional test failures only 16:26:42 <ihrachys> http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AttributeError%3A%20'QosPolicyDefault'%20object%20has%20no%20attribute%20'translate'%5C%22 16:27:30 <jlibosva> and api tests don't test default qos policy 16:27:37 <jlibosva> only fullstack 16:28:01 <ihrachys> I don't see fullstack hits in logstash 16:28:41 <jlibosva> do we collect server logs for fullstack? 16:29:07 <ihrachys> oh; we probably collect but don't index 16:29:15 <ihrachys> we index test runner logs only 16:29:26 <ihrachys> that explains then 16:29:48 <ihrachys> would probably make sense to ask to add api tests to the fix 16:30:17 <ihrachys> oh you did already 16:30:19 <ihrachys> ok 16:30:26 <jlibosva> yeah :) 16:30:43 <jlibosva> also looking at the post-gate hook, if I understand the find correctly, we should index also the server and agent logs 16:30:45 <jlibosva> for fullstack 16:32:11 <ihrachys> hm, right. so which test do you think covers the functionality? 16:32:14 <ihrachys> fullstack 16:32:32 <jlibosva> yep, so maybe it doesn't hit that specific 16:32:38 <ihrachys> oh I see TestQoSPolicyIsDefault 16:32:49 <jlibosva> I checked the index file and server and agents are included 16:32:53 <jlibosva> ihrachys++ :) 16:33:31 <jlibosva> so we probably don't create a network after we have the default policy 16:33:33 <ihrachys> I don't think we create a network using default policy in any of those test cases 16:33:36 <ihrachys> right 16:34:06 <ihrachys> ok, that makes sense 16:34:56 <ihrachys> #topic Other patches 16:35:06 <ihrachys> jlibosva, anything of interest not mentioned already? 16:35:31 <jlibosva> just that if you look at functional gate failures in the last 7 days, it draws a batman :) 16:35:47 <jlibosva> nothing else from me 16:35:51 <ihrachys> :)) 16:36:39 <ihrachys> ok, I will note that I noticed some time ago that grenade failures don't always trigger corresponding elastic patterns, and I think I came up with the fix here: https://review.openstack.org/#/c/493987/ 16:37:01 <ihrachys> elastic bot doesn't wait for all service logs to upload if it's grenade 16:37:26 <ihrachys> I have nothing else 16:37:27 <jlibosva> oh, one more thing I have 16:37:30 <ihrachys> shoot 16:37:53 <jlibosva> I noticed that bot now reports elastic rechecks at the upstream channel :) it was discussed in this meeting a while ago 16:38:20 <ihrachys> yeah. it doesn't seem very consistent though, I failed to grasp when it does and when it doesn't 16:38:37 <ihrachys> (same feeling I have for gerrit comments from the bot) 16:39:04 <ihrachys> ok, we can close the session I guess 16:39:06 <ihrachys> thanks jlibosva 16:39:07 <jlibosva> yep 16:39:09 <ihrachys> #endmeeting