16:00:23 #startmeeting neutron_ci 16:00:24 Meeting started Tue Sep 17 16:00:23 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:25 hi 16:00:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:29 The meeting name has been set to 'neutron_ci' 16:00:29 hi 16:01:31 o/ 16:01:42 lets wait 1 or 2 minutes for others 16:01:48 I just pinged people on neutron channel 16:03:34 o/ 16:03:44 ok, lets start 16:03:51 first thing 16:03:51 hi 16:04:00 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:04:21 please open it now to give it time to load 16:04:22 #topic Actions from previous meetings 16:04:28 mlavalle to continue investigating router migrations issue 16:04:53 I submitted a new revision to patch with log statements 16:05:13 I haven't hit the failure so far 16:05:23 so rechecking until I hit it 16:05:49 lets hope it will fail fast :) 16:06:00 yeap, lol 16:06:00 #action mlavalle to continue investigating router migrations issue 16:06:13 lets check next week than 16:06:16 ok, next one 16:06:21 ralonsoh to report bug and check issue with ovsdb errors in functional tests 16:06:50 slaweq, the problem was not related to ovsdb 16:06:58 but I dont; find now the real bug 16:07:11 ralonsoh, I now I filled it 16:07:23 slaweq, give me some time, please continue 16:07:35 ok :) 16:07:50 that's all from last week (not too much this time) 16:07:55 next topic than 16:08:00 #topic Stadium projects 16:08:16 about python3 we already talked on yesterday's neutron meeting 16:08:41 to sumup, we still have few projects to finish 16:08:52 as a reminder: Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:09:17 any updates from the team about python3 ? 16:10:50 Nothing from me 16:11:04 ok, so next topic related to stadium 16:11:05 can I interrupt? 16:11:06 tempest-plugins migration 16:11:13 sure ralonsoh 16:11:20 about the previous CI issue 16:11:41 it wasn't the ovsdb, but the namespaces creation/deletion and the ip_lib 16:11:49 again? 16:11:56 o/ late hi (again, sorry) 16:11:58 with https://review.opendev.org/#/c/679428/ we'll see how the CI gets better 16:12:03 that's all 16:12:05 thanks 16:12:33 ok, this patch was merged fed days ago, so it should be better now probably 16:12:55 ralonsoh: will You propose backports of this patch maybe? or stable branches don't need it? 16:13:06 slaweq, I'll check this now 16:13:40 ralonsoh: thx a lot 16:13:52 I'll keep an eye on these backports :) 16:14:05 #action ralonsoh to check if https://review.opendev.org/#/c/679428/ should be backported to stable branches 16:14:28 slaweq, no, not needed 16:14:39 ahh, that was fast :) 16:14:43 thx ralonsoh 16:15:03 ok, going back to tempest-plugins migration 16:15:09 Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:15:21 neutron-dynamic-routing - patch ready for review https://review.opendev.org/#/c/652099 16:15:30 tidwellr: thx a lot for work on this 16:15:40 I'll do my best to finish vpn over the next few days 16:15:48 mlavalle: thx 16:15:58 than we should be done I hope 16:16:52 ok, so I think we can move on 16:16:54 #topic Grafana 16:17:00 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:18:28 after few weeks of very bad CI I think we are getting back to business finally 16:18:43 at least most urgent issues are fixed or workarounded 16:19:14 It seems that rally is going back - we should probably switch it to be voting again this week 16:19:32 I will take a look at it for few days and will propose patch to make it voting again 16:19:49 yes, to me the biggest ongoing negative that stands out is the consistent 20% failure rate on neutron-tempest-plugin-scenario-openvswitch 16:20:07 that that's not bad 16:21:30 njohnston: yes, this one is failing quite often 16:21:48 and unfortunatelly I didn't saw one common reason of those failures 16:22:00 many random tests AFAICT 16:22:07 I agree, that is what I saw as well 16:22:46 I will try to spent some more time on checking those failures more carefuly now 16:23:16 but speaking about this job I have one failing test which failed couple of times 16:23:19 :) 16:23:52 do we want to talk about something else related to grafana or can we move to next topic and talk about scenario jobs than? 16:24:11 go ahead 16:24:11 let's move on 16:24:15 ok 16:24:21 #topic Tempest/Scenario 16:24:40 so, as I said, I found at least 3 times failed neutron_tempest_plugin.scenario.test_qos.QoSTest.test_qos_basic_and_update 16:24:54 in different jobs but failures looks similar 16:25:00 https://9a0240ca9f61a595b570-86672578d4e6ceb498f2d932b0da6815.ssl.cf1.rackcdn.com/633871/20/check/neutron-tempest-plugin-scenario-openvswitch/772f7a4/testr_results.html.gz 16:25:02 https://1fd93ff32a555bc48a73-5fe9d093373d887f2b09d5c4b981e1db.ssl.cf2.rackcdn.com/652099/34/check/neutron-tempest-plugin-scenario-openvswitch-rocky/4897a52/testr_results.html.gz 16:25:04 https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_cf0/679510/1/check/neutron-tempest-plugin-scenario-openvswitch/cf055c6/testr_results.html.gz 16:25:12 ohh, even not 16:25:16 all in openvswitch job :) 16:25:20 but once in rocky 16:27:08 slaweq, maybe (maybe) we are hitting the max BW in the CI system 16:27:14 it's testing: 16:27:14 port=self.NC_PORT, expected_bw=QoSTest.LIMIT_BYTES_SEC * 3) 16:27:26 that means, this is testing BW*3 16:27:36 ralonsoh: but e.g. in https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_cf0/679510/1/check/neutron-tempest-plugin-scenario-openvswitch/cf055c6/testr_results.html.gz it failed on _create_file_for_bw_tests() 16:27:51 and job is singlenode 16:27:53 yes, not this one 16:28:03 so all this traffic is only on one vm 16:28:07 one host 16:28:27 I don't think it's like we are hitting any limit 16:28:54 slaweq, let me review this again 16:29:05 I submitted a patch I5ce1a34f7d5d635002baa1e5b14c288e6d2bc43e some weeks ago for this 16:29:06 ralonsoh: look at those 2 lines: 16:29:07 2019-09-12 23:01:40,063 3293 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.18 successfully created 16:29:09 2019-09-12 23:17:26,625 3293 ERROR [paramiko.transport] Socket exception: Connection timed out (110) 16:29:21 it waited 16 minutes there 16:29:42 and there is nothing between, those are 2 lines one after another in log from this test 16:30:29 this command is just creating an empty file in the VM 16:30:29 in another of those failed tests: 16:30:31 2019-09-17 06:18:59,385 627 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': ''} 16:30:33 Body: {"bandwidth_limit_rule": {"max_kbps": 3000, "max_burst_kbps": 3000}} 16:30:35 Response - Headers: {'content-type': 'application/json', 'content-length': '137', 'x-openstack-request-id': 'req-c94e94c1-fb8b-4ef4-b107-e7e9d0a2cf44', 'date': 'Tue, 17 Sep 2019 06:18:59 GMT', 'connection': 'close', 'status': '200', 'content-location': 'http://10.209.96.230:9696/v2.0/qos/policies/43a40f58-3685-4b92-969b-f8ba7e3c7fad/bandwidth_limit_rules/f320a803-bb1f-4f85-89c8-38888dc5805d'} 16:30:37 Body: b'{"bandwidth_limit_rule": {"max_kbps": 3000, "max_burst_kbps": 3000, "direction": "egress", "id": "f320a803-bb1f-4f85-89c8-38888dc5805d"}}' 16:30:39 2019-09-17 06:34:47,297 627 ERROR [paramiko.transport] Socket exception: Connection timed out (110) 16:30:52 similar nothing for 16 minutes and time out 16:31:31 anyone got cycles to investigate on this? 16:31:34 ok, we can reduce the ssh connection timeout 16:31:38 I can try it 16:31:48 thx ralonsoh 16:31:56 will You report a bug or should I? 16:32:06 slaweq, I'll do it 16:32:09 thx 16:32:28 #action ralonsoh to report bug and investigate issue with neutron_tempest_plugin.scenario.test_qos.QoSTest.test_qos_basic_and_update 16:33:04 another problem which I see from time to time in various jobs is issue with 'Multiple possible networks found, use a Network ID to be more specific.' 16:33:11 like https://58a87e825b9766115d07-cec36eea8e90c9127fc5a72b798cfeab.ssl.cf2.rackcdn.com/670177/9/check/networking-ovn-tempest-dsvm-ovs-release/b58638a/testr_results.html.gz 16:33:42 I can take this one too 16:33:44 I think we should check tempest code and make sure that if it spawns instances, it always explicity choose network 16:34:10 ralonsoh: do You have cycles for that too? 16:34:15 yes 16:34:21 We don't want to overload You :) 16:34:27 np 16:34:32 great, thx 16:34:50 ralonsoh: I think we should also report bug for that, but maybe agains tempest 16:34:57 for sure 16:34:59 will You do it? 16:35:04 yes 16:35:09 thx 16:35:22 #action ralonsoh to report bug with "Multiple possible networks found" 16:35:46 that's all what I have regarding scenario jobs for today 16:35:57 anything else You want to add here? 16:37:07 I got nothing 16:37:43 ok, lets move on 16:37:58 #topic fullstack/functional 16:38:12 for functional tests I found one new issue: 16:38:32 in neutron.tests.functional.agent.linux.test_l3_tc_lib.TcLibTestCase.test_clear_all_filters 16:38:35 https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_eff/682391/1/check/neutron-functional-python27/eff0cab/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_l3_tc_lib.TcLibTestCase.test_clear_all_filters.txt.gz 16:38:48 You can see there AttributeError: 'str' object has no attribute 'content_type' 16:39:27 and it was on patch totally not related to this part of code 16:39:36 but I saw it only once so far 16:40:52 any ideas what could be the root cause? 16:41:21 is this a testresult lib error? 16:41:36 hmm 16:41:37 testtools 16:41:38 it seems so 16:42:01 we can keep this log but I wouldn't spend too much time on this one 16:42:21 maybe let's just be aware of it and check if it will not happend again (too often :P) 16:42:26 sure 16:42:35 ok for everyone? 16:43:04 yes 16:43:24 thx 16:43:29 +1 16:43:33 I have also one "new" fullstack issue 16:43:34 https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_847/681893/1/gate/neutron-fullstack/847f4d9/testr_results.html.gz 16:43:41 but also found it only once 16:44:37 same here, to the locker until we hit another one (but this one is more important) 16:45:04 ralonsoh: but here in ovs agent logs https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_847/681893/1/gate/neutron-fullstack/847f4d9/controller/logs/dsvm-fullstack-logs/TestMinBwQoSOvs.test_bw_limit_qos_port_removed_egress_/neutron-openvswitch-agent--2019-09-17--05-35-29-575569_log.txt.gz 16:45:13 there is error: RuntimeError: OVSDB Error: {"details":"cannot delete Queue row a1470780-1834-48d9-afd0-6fe41fcbb027 because of 1 remaining reference(s)","error":"referential integrity violation"} 16:45:30 does it rings a bell for You maybe? 16:45:43 yes and no 16:45:46 that was solved 16:45:47 lol 16:46:09 you can't delete a queue if it's assigned to a qos register 16:46:18 ralonsoh: but it happend on this patch https://review.opendev.org/#/c/681893 16:46:19 same topology as in neutron DB 16:46:27 which is from this week 16:46:42 I know, I know 16:46:50 so maybe some regression, or corner case which wasn't addressed? 16:46:52 and this change is not related to qos 16:46:58 corner case 16:47:06 nope, it is only zuul config change 16:47:46 the point is this test is actually testing that a qos rule (and correspoding queues) are removed 16:47:46 ralonsoh: do You want me to report this bug in LP? 16:47:54 yes, please 16:48:04 I'll take a look at this one but maybe not this week 16:48:09 ok, I will report it and we can track it there 16:48:12 ping me with the LP ID 16:48:46 #action slaweq to report bug with fullstack test_bw_limit_qos_port_removed test 16:48:50 ralonsoh: sure 16:49:16 ok, that's all about functional and fullstack jobs what I have for You today 16:49:24 anything else You want to add? 16:50:23 ok, if not lets move on 16:50:45 I checked periodic jobs today and all of them works really good recently 16:50:56 great! 16:51:05 so we are good with periodic 16:51:14 and that's all from what I have for today 16:51:32 do You want to talk about anything else related to CI today? 16:51:39 if not we can finish earlier :) 16:51:48 let's do it 16:52:05 by "it" You mean "finish earlier" right? 16:52:07 :D 16:52:18 yeap 16:52:22 ok, thx for attending 16:52:24 o/ 16:52:25 o/ 16:52:25 bye! 16:52:27 #endmeeting