16:00:27 #startmeeting neutron_ci 16:00:30 o/ 16:00:32 Meeting started Tue Mar 28 16:00:27 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:33 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:35 * ihrachys waves at everyone 16:00:35 The meeting name has been set to 'neutron_ci' 16:00:57 o/ 16:01:44 o/ 16:01:51 let's review action items from the prev meeting 16:01:55 aka shame on ihrachys 16:01:59 and jlibosva 16:02:01 "ihrachys fix e-r bot not reporting in irc channel" 16:02:29 hasn't happened; I gotta track that in my trello I guess 16:02:33 #action ihrachys fix e-r bot not reporting in irc channel 16:02:38 "ihrachys to fix the grafana board to include gate-tempest-dsvm-neutron-dvr-multinode-full-ubuntu-xenial-nv" 16:02:44 nope, hasn't happened 16:02:54 will follow up on it after the meeting, it should not take much time :-x 16:03:19 actually, mlavalle maybe you could take that since you were to track dvr failure rate? 16:03:44 that's a matter of editing grafana/neutron.yaml in project-config, not a huge task 16:03:49 ihrachys: sure. I don't know how, but will find out 16:04:00 that's a good learning opportunity then 16:04:05 cool 16:04:08 you can ask me for details in neutron channel 16:04:11 and thanks 16:04:13 will do 16:04:19 #action mlavalle to fix the grafana board to include gate-tempest-dsvm-neutron-dvr-multinode-full-ubuntu-xenial-nv 16:04:43 I spent time loking at this 16:04:47 looking^^^ 16:05:11 all the failures I can see are due to hosts not available for the tests 16:05:31 or loosing connection with the hypervisor 16:05:55 the other failures I see are due to the patchsets in the check queue 16:06:23 as a next step I'll be glad to talk to the infra team about this 16:06:23 ok I see 16:06:41 we may revisit that once we have data (grafana) back 16:07:00 next was "jlibosva to figure out the plan for py3 gate transition and report back" 16:07:17 didn't sync yet. Although it's quite important I won't be able to make a plan till next meeting as I'll be off most of the time. So I target for now+2 weeks :) 16:08:09 mlavalle: yes please do ping us in -infra after the meeting if you can (I've been trying to get things under control failure wise want to make sure we aren't missing something) 16:08:26 clarkb: will do 16:08:29 ok let's punt py3 for now till jlibosva is bcak 16:08:32 *back 16:09:00 unless someone want to take a pitch on writing a proposal for py3 coverage in gate 16:10:58 ok 16:11:05 #topic State of the Gate 16:11:10 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:12:01 gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial is the only gate job that seem to show high failure rate 16:12:07 it's 8% right now 16:12:17 anyone aware of what happens there? 16:12:39 * electrocucaracha is checking 16:12:59 any chance it's still the echo from the spike before? 16:13:19 I see this example: http://logs.openstack.org/17/412017/5/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/c2fab50/console.html#_2017-03-24_18_31_35_087822 16:13:28 jlibosva: looks rather flat from grafna 16:13:36 timeout? 16:14:37 I don't see too many patches merged lately, could be a one off 16:15:59 #topic Fullstack voting progress 16:16:16 jlibosva: surely fullstack is still at 100% failure rate but do we make progress? 16:16:26 do we have grasp of all failures there? 16:17:09 re - linuxbridge - I found the latest failure: http://logs.openstack.org/71/450771/1/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/df0eaa2/console.html#_2017-03-28_14_05_34_665809 16:17:22 ihrachys: there is still the patch for iptables firewall 16:17:41 https://review.openstack.org/441353 ? 16:17:56 yes, that one 16:18:07 still probably a WIP 16:18:38 I see test_securitygroup failing with it as in http://logs.openstack.org/53/441353/8/check/gate-neutron-dsvm-fullstack-ubuntu-xenial/fe9f205/testr_results.html.gz 16:18:47 does it suggest it's not solving it? 16:19:05 it's probably introducing another regression 16:19:11 :-) 16:19:17 as it solves the iptables driver but breaks iptables_hybrid 16:19:24 whack a mole 16:19:27 they are closely related and both use the conntrack manager 16:20:08 kevinbenton: fyi seems like we need the conntrack patch in to move forward with fullstack 16:20:21 jlibosva: apart from this failure, anything pressing? or is it the last one? 16:20:28 no, other two :) 16:20:31 rather :( 16:20:46 https://bugs.launchpad.net/neutron/+bug/1673531 - introduced recently 16:20:46 Launchpad bug 1673531 in neutron "fullstack test_controller_timeout_does_not_break_connectivity_sigkill(GRE and l2pop,openflow-native_ovsdb-cli) failure" [Undecided,New] 16:21:02 by merging tests for keeping data plane connectivity while agent is restart 16:21:04 ed 16:21:47 I also saw another failure in trunk test where patch ports between tbr- and br-int are not cleaned properly after trunk is deleted. 16:22:01 I haven't investigated that one and I don't think I reported a LP bug yet 16:22:39 I should probably raise the test_controller_timeout_does_not_break_connectivity_sigkill one on upgrades meeting since it's directly related to upgrade scenarios 16:22:59 It's unclear to me if it's fullstack or agent 16:23:53 http://logs.openstack.org/98/446598/1/check/gate-neutron-dsvm-fullstack-ubuntu-xenial/2e0f93e/logs/dsvm-fullstack-logs/TestOvsConnectivitySameNetworkOnOvsBridgeControllerStop.test_controller_timeout_does_not_break_connectivity_sigkill_GRE-and-l2pop,openflow-native_ovsdb-cli_/neutron-openvswitch-agent--2017-03-16--16-06-05-730632.txt.gz?level=TRACE 16:24:08 looks like multiple agents trying to add same manager? 16:24:28 since we don't isolate ovs, and we execute two hosts, maybe that's why 16:25:05 gotta get otherwiseguy looking at it. the bug may be in the code that is now in ovsdbapp. 16:25:36 that's weird 16:25:41 it has vsctl ovsdb_interface 16:25:46 I thought the manager is for native 16:25:55 it's chicken and egg 16:26:06 you can't do native before you register the poret 16:26:07 port 16:26:17 so if connection fails, we call CLI to add the manager port 16:26:21 and then repeat native attempt 16:26:24 but there is no native whatsoever 16:26:49 it's a vsctl test 16:27:17 oh 16:27:36 a reasonable question is then, why do we open the port 16:27:38 right? 16:27:50 but anyways, if it tries to create new manager and it's already there, it shouldn't influence the functionality, right? 16:28:11 depending on what the agent will do with the failure. 16:28:23 not sure if failure happens on this iteration, or somewhere later 16:30:21 yeah, seems like the failure happens 30sec+ after that error 16:30:27 probably not directly related 16:31:04 I'm looking at the code right now and ovsdb monitor calls to native.helpers.enable_connection_uri 16:32:06 https://review.openstack.org/#/c/441447/ 16:32:26 yea, was actually looking for this exact patch 16:32:40 but by that time the fullstack test wasn't in tree yet 16:34:44 oh so basically polling.py always passes cfg.CONF.OVS.ovsdb_connection 16:34:56 and since it has default value, it always triggers the command 16:35:15 I think there are several issues here. one is - we don't need that at all for vsctl 16:35:25 another being - multiple calls may probably race 16:35:40 neither are directly related to fullstack failure 16:36:20 #action ihrachys to report bugs for fullstack race in ovs agent when calling to enable_connection_uri 16:36:34 we could hack fullstack to filelock the call 16:36:42 I don't think that correct 16:36:45 to avoid races, it can't happen in real world 16:37:11 because we don't run multiple monitors? 16:37:20 we don't run multiple ovs agents 16:38:32 yeah seems like the only place we call the code path is in ovs agent 16:39:53 I would still prefer code level fix for that, but it would work if we lock too 16:40:06 the only thing where it's used is vsphere at some dvs_neutron_agent ... http://codesearch.openstack.org/?q=get_polling_manager&i=nope&files=&repos= 16:40:12 but dunno what that is 16:40:32 this code looks like ovs agent copy-pasted :) 16:41:05 but it doesn't seem this code reimplements the agent 16:41:21 the question would be whether the DVS agent can be used with OVS agent on the same node 16:42:34 * mlavalle has to step out 16:42:38 ok let's move to the next topic 16:42:48 #topic Functional job state 16:43:03 http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen 16:43:25 there are still spikes in close past till 80% 16:43:36 not sure what that was, I suspect some general gate breakage 16:43:43 now it's at reasonable 10% 16:43:53 (note it's check queue so there may be valid breakages) 16:44:24 of all patches, I am aware of this fix for a func test stability: https://review.openstack.org/#/c/446991/ 16:44:37 jlibosva: maybe you can have a look 16:45:00 I will 16:45:19 also note that almost the whole previous week the rate was around 20% which is still not ideal 16:46:28 yeah. sadly I am consumed this week by misc stuff so won't be able to have a look. 16:48:25 #topic Other gate failures 16:48:29 https://bugs.launchpad.net/neutron/+bugs?field.tag=gate-failure 16:48:41 we could monitor the trend for next week and we'll see 16:48:45 * ihrachys looks through the list to see if anything could benefit from review attention 16:49:10 this patch may be interesting since late pecan switch: https://review.openstack.org/#/c/447781/ 16:49:18 but it needs dasanind to respin it with a test included 16:49:41 I think the bug hit tempest sometimes. 16:50:46 any bugs worth raising? 16:51:02 oh there is one from tmorin with a fix here: https://review.openstack.org/#/c/450865/ 16:51:12 I haven't checked the fix yet, so I am not sure if it's the right thing 16:51:57 he just enabled quotas explicitly and it worked 16:52:11 need to check how quota ovo disrupted normal behavior 16:52:12 I don't think the change we landed was intended to break subprojects ;) 16:52:19 gotta find a fix on neutron side 16:52:47 yea that would be right fix 16:53:10 ok let's move on 16:53:14 #topic Open discussion 16:53:36 https://review.openstack.org/#/c/439114/ from manjeets_ still waits for +W from infra 16:53:43 I see Clark already +2d it, nice 16:54:00 i asked clark yesterday for review 16:54:16 may be need to post once more on infra 16:54:35 yeah, thanks for following up on it 16:54:51 apart from that, anything CI related worth mentioning here? 16:55:27 I noticed that qos is skipped in api job 16:55:42 e.g. http://logs.openstack.org/91/446991/2/check/gate-neutron-dsvm-api-ubuntu-xenial/044a331/testr_results.html.gz 16:55:49 test_qos 16:55:50 one question I was looking at functional tests 16:55:57 don't see much for qos 16:56:43 i see trunk is covered in functional but not qos 16:57:09 jlibosva: I think we had a skip somewhere there 16:57:22 http://logs.openstack.org/91/446991/2/check/gate-neutron-dsvm-api-ubuntu-xenial/044a331/console.html#_2017-03-24_11_12_55_029613 16:57:36 apparently the driver (ovs?) doesn't support it 16:57:41 I digged into it a bit and ended up that settings from local.conf are not propagated to the tempest.conf - but at this patch I see it works ... maybe it 16:57:53 jlibosva: oh there was another thing related 16:57:55 yeah, that's probably something else than what I saw - seems fixed by now 16:58:08 https://review.openstack.org/#/c/449182/ 16:58:20 that should fix the issue with changes not propagated from hooks into tempest.conf 16:58:33 so now we have 2 skips only, and they seem to be legit 16:58:37 ihrachys: yeah, that's probably it :) 16:59:10 it was weird for me though as I actually saw crudini being called - anyways, it's solved. Thanks ihrachys :) 16:59:17 np 16:59:31 manjeets_: there are func tests for qos too 16:59:41 manjeets_: I will give links in neutron channel since we are at top of the hour 16:59:45 thanks everyone and keep up! 16:59:47 #endmeeting