16:00:27 #startmeeting neutron_ci 16:00:28 Meeting started Tue Aug 22 16:00:27 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:32 The meeting name has been set to 'neutron_ci' 16:00:42 jlibosva, o/ 16:00:43 o/ 16:00:55 #topic Actions from prev week 16:01:06 "jlibosva to look through uncategorized/latest gate tempest failures (15% failure rate atm)" 16:01:25 * jlibosva slacked, didn't happen 16:01:44 ok, we will revisit whether it's still needed in Grafana section 16:01:54 "ihrachys to capture os.kill calls in func tests and see if any of those kill test threads" 16:02:19 I haven't done THAT but since last kernel update we don't see to see the failure 16:02:29 jlibosva, right? 16:02:45 right, and also at the end of last meeting we confirmed it's kernel who kills the process 16:02:47 I haven't seen it personally for a week + grafana board is very clean 16:03:00 I haven't checked via logstash 16:03:22 but I'm able to reproduce that failure on my ubuntu box with the -83 kernel 16:03:36 -83, is it the old or new? 16:03:42 I can update and confirm it not reproducible 16:03:46 ah, sorry. 83 is hte old one 16:03:48 I think :) 16:03:51 ok 16:04:12 let's close https://bugs.launchpad.net/neutron/+bug/1707933 if/when it's confirmed (I personally feel ok doing it without additional step, but you decide) 16:04:14 Launchpad bug 1707933 in neutron "functional tests timeout after a test worker killed" [Critical,Confirmed] - Assigned to Jakub Libosvar (libosvar) 16:04:31 "jlibosva to tweak gate not to create default subnetpool and enable test_convert_default_subnetpool_to_non_default" 16:04:48 didn't do either :( 16:05:13 #action jlibosva to tweak gate not to create default subnetpool and enable test_convert_default_subnetpool_to_non_default 16:05:33 #topic Grafana 16:05:38 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:07:03 of gating jobs, the only one that stands out is gate-grenade-dsvm-neutron-multinode-ubuntu-xenial 16:07:14 not a catastrophe, ~10% 16:07:32 http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=6&fullscreen 16:08:35 apart from that, usual check queue violators - fullstack and scenarios - are at 100% 16:08:42 for fullstack, I still suck 16:09:00 for scenarios, I believe jlibosva tried to gather troops to triage/fix remaining known failures 16:09:13 jlibosva, was there success in that? 16:09:47 Anil is looking at the router migration tests - but no outcome yet 16:10:42 I also see dvr-ha job failing at 100%. I don't think that was the case a while ago. 16:11:48 we may need Brian back to have a look at it closer 16:12:05 he was working on making it voting 16:12:11 jlibosva, ihrachys migration tests are failing intermittently 16:12:45 jlibosva, ihrachys If consistently failing then it would be easier to fix 16:12:46 anilvenkata: are you able to reproduce it? 16:12:49 anilvenkata, ok. any ideas how can we make them not fail? what's missing to understand what happens there? 16:13:12 may be I need to add more checks before the test try to ssh 16:13:31 checks like HA port is active, etc 16:14:26 that will make sure the dataplane path is properly configured, before the test try to ssh 16:14:43 yeah. another thing to look at would be, whether any helpful info is missing in agent logs. 16:15:17 I Will also look at that 16:15:20 thanks Ihar 16:15:30 #topic anilvenkata to add more control plane checks for migrated router ports tests before ssh'ing 16:15:35 oops 16:15:36 #undo 16:15:36 Removing item from minutes: #topic anilvenkata to add more control plane checks for migrated router ports tests before ssh'ing 16:15:40 #action anilvenkata to add more control plane checks for migrated router ports tests before ssh'ing 16:15:57 :) 16:17:04 I also noticed today that we still keep linuxbridge grenade job in dashboard even though it's experimental now 16:17:09 patch to remove it from there: https://review.openstack.org/#/c/496287/ 16:17:36 jlibosva, back to the earlier action item you had, it doesn't seem it's a pressing need right now to classify tempest failures. 16:18:06 okies 16:18:10 #topic Bugs 16:18:20 I started cleaning up the list: https://bugs.launchpad.net/neutron/+bugs?field.tag=gate-failure 16:18:35 closed a bunch of those that no longer reproduce / no logs 16:18:41 we can reopen if they happen again 16:19:17 https://bugs.launchpad.net/neutron/+bug/1687709 16:19:19 Launchpad bug 1687709 in neutron "fullstack: ovs-agents remove trunk bridges that don't belong to them" [High,Confirmed] - Assigned to Jakub Libosvar (libosvar) 16:19:29 jlibosva, is it correct that ovs isolation work should help with that? 16:19:37 I'm gonna close the functional killer bug, no hits in last week 16:19:50 jlibosva, yep, go for it 16:19:51 ihrachys: yes, that's still something I'd like to implement but can't find spare time 16:20:06 jlibosva, is everything ready on ovsdbapp side for that? 16:20:30 my plan is not to use the sandbox ovs, so ovsdbapp is no longer a requirement 16:20:46 the isolation will use normal ovsdb-server process running in namespace instead 16:21:04 as we need an actual access to kernel datapath, which sandbox mocks 16:21:17 oh ok 16:22:06 I haven't walked through all bugs yet; I am going to continue cleaning up the gate-failure list this week 16:22:15 #action ihrachys to complete gate-failure cleanup 16:23:13 https://bugs.launchpad.net/neutron/+bug/1712278 16:23:14 Launchpad bug 1712278 in neutron "Default qos policy doesn't work when creating network" [High,In progress] - Assigned to Hirofumi Ichihara (ichihara-hirofumi) 16:23:22 we seem to have a patch here: https://review.openstack.org/#/c/496139/ 16:23:54 I am not sure how it happened. don't we have test coverage for that? 16:24:55 on higher level, maybe tempest or fullstack - but both are non-voting 16:25:16 oh right. 16:25:40 hmm, actually api tests should be able to catch that 16:26:37 for some reason, logstash shows osc functional test failures only 16:26:42 http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AttributeError%3A%20'QosPolicyDefault'%20object%20has%20no%20attribute%20'translate'%5C%22 16:27:30 and api tests don't test default qos policy 16:27:37 only fullstack 16:28:01 I don't see fullstack hits in logstash 16:28:41 do we collect server logs for fullstack? 16:29:07 oh; we probably collect but don't index 16:29:15 we index test runner logs only 16:29:26 that explains then 16:29:48 would probably make sense to ask to add api tests to the fix 16:30:17 oh you did already 16:30:19 ok 16:30:26 yeah :) 16:30:43 also looking at the post-gate hook, if I understand the find correctly, we should index also the server and agent logs 16:30:45 for fullstack 16:32:11 hm, right. so which test do you think covers the functionality? 16:32:14 fullstack 16:32:32 yep, so maybe it doesn't hit that specific 16:32:38 oh I see TestQoSPolicyIsDefault 16:32:49 I checked the index file and server and agents are included 16:32:53 ihrachys++ :) 16:33:31 so we probably don't create a network after we have the default policy 16:33:33 I don't think we create a network using default policy in any of those test cases 16:33:36 right 16:34:06 ok, that makes sense 16:34:56 #topic Other patches 16:35:06 jlibosva, anything of interest not mentioned already? 16:35:31 just that if you look at functional gate failures in the last 7 days, it draws a batman :) 16:35:47 nothing else from me 16:35:51 :)) 16:36:39 ok, I will note that I noticed some time ago that grenade failures don't always trigger corresponding elastic patterns, and I think I came up with the fix here: https://review.openstack.org/#/c/493987/ 16:37:01 elastic bot doesn't wait for all service logs to upload if it's grenade 16:37:26 I have nothing else 16:37:27 oh, one more thing I have 16:37:30 shoot 16:37:53 I noticed that bot now reports elastic rechecks at the upstream channel :) it was discussed in this meeting a while ago 16:38:20 yeah. it doesn't seem very consistent though, I failed to grasp when it does and when it doesn't 16:38:37 (same feeling I have for gerrit comments from the bot) 16:39:04 ok, we can close the session I guess 16:39:06 thanks jlibosva 16:39:07 yep 16:39:09 #endmeeting