16:00:23 <slaweq> #startmeeting neutron_ci 16:00:24 <openstack> Meeting started Tue Sep 17 16:00:23 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:25 <slaweq> hi 16:00:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:29 <openstack> The meeting name has been set to 'neutron_ci' 16:00:29 <ralonsoh> hi 16:01:31 <njohnston> o/ 16:01:42 <slaweq> lets wait 1 or 2 minutes for others 16:01:48 <slaweq> I just pinged people on neutron channel 16:03:34 <mlavalle> o/ 16:03:44 <slaweq> ok, lets start 16:03:51 <slaweq> first thing 16:03:51 <haleyb> hi 16:04:00 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:04:21 <slaweq> please open it now to give it time to load 16:04:22 <slaweq> #topic Actions from previous meetings 16:04:28 <slaweq> mlavalle to continue investigating router migrations issue 16:04:53 <mlavalle> I submitted a new revision to patch with log statements 16:05:13 <mlavalle> I haven't hit the failure so far 16:05:23 <mlavalle> so rechecking until I hit it 16:05:49 <slaweq> lets hope it will fail fast :) 16:06:00 <mlavalle> yeap, lol 16:06:00 <slaweq> #action mlavalle to continue investigating router migrations issue 16:06:13 <slaweq> lets check next week than 16:06:16 <slaweq> ok, next one 16:06:21 <slaweq> ralonsoh to report bug and check issue with ovsdb errors in functional tests 16:06:50 <ralonsoh> slaweq, the problem was not related to ovsdb 16:06:58 <ralonsoh> but I dont; find now the real bug 16:07:11 <ralonsoh> ralonsoh, I now I filled it 16:07:23 <ralonsoh> slaweq, give me some time, please continue 16:07:35 <slaweq> ok :) 16:07:50 <slaweq> that's all from last week (not too much this time) 16:07:55 <slaweq> next topic than 16:08:00 <slaweq> #topic Stadium projects 16:08:16 <slaweq> about python3 we already talked on yesterday's neutron meeting 16:08:41 <slaweq> to sumup, we still have few projects to finish 16:08:52 <slaweq> as a reminder: Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:09:17 <slaweq> any updates from the team about python3 ? 16:10:50 <njohnston> Nothing from me 16:11:04 <slaweq> ok, so next topic related to stadium 16:11:05 <ralonsoh> can I interrupt? 16:11:06 <slaweq> tempest-plugins migration 16:11:13 <slaweq> sure ralonsoh 16:11:20 <ralonsoh> about the previous CI issue 16:11:41 <ralonsoh> it wasn't the ovsdb, but the namespaces creation/deletion and the ip_lib 16:11:49 <slaweq> again? 16:11:56 <bcafarel> o/ late hi (again, sorry) 16:11:58 <ralonsoh> with https://review.opendev.org/#/c/679428/ we'll see how the CI gets better 16:12:03 <ralonsoh> that's all 16:12:05 <ralonsoh> thanks 16:12:33 <slaweq> ok, this patch was merged fed days ago, so it should be better now probably 16:12:55 <slaweq> ralonsoh: will You propose backports of this patch maybe? or stable branches don't need it? 16:13:06 <ralonsoh> slaweq, I'll check this now 16:13:40 <slaweq> ralonsoh: thx a lot 16:13:52 <bcafarel> I'll keep an eye on these backports :) 16:14:05 <slaweq> #action ralonsoh to check if https://review.opendev.org/#/c/679428/ should be backported to stable branches 16:14:28 <ralonsoh> slaweq, no, not needed 16:14:39 <slaweq> ahh, that was fast :) 16:14:43 <slaweq> thx ralonsoh 16:15:03 <slaweq> ok, going back to tempest-plugins migration 16:15:09 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:15:21 <slaweq> neutron-dynamic-routing - patch ready for review https://review.opendev.org/#/c/652099 16:15:30 <slaweq> tidwellr: thx a lot for work on this 16:15:40 <mlavalle> I'll do my best to finish vpn over the next few days 16:15:48 <slaweq> mlavalle: thx 16:15:58 <slaweq> than we should be done I hope 16:16:52 <slaweq> ok, so I think we can move on 16:16:54 <slaweq> #topic Grafana 16:17:00 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:18:28 <slaweq> after few weeks of very bad CI I think we are getting back to business finally 16:18:43 <slaweq> at least most urgent issues are fixed or workarounded 16:19:14 <slaweq> It seems that rally is going back - we should probably switch it to be voting again this week 16:19:32 <slaweq> I will take a look at it for few days and will propose patch to make it voting again 16:19:49 <njohnston> yes, to me the biggest ongoing negative that stands out is the consistent 20% failure rate on neutron-tempest-plugin-scenario-openvswitch 16:20:07 <njohnston> that that's not bad 16:21:30 <slaweq> njohnston: yes, this one is failing quite often 16:21:48 <slaweq> and unfortunatelly I didn't saw one common reason of those failures 16:22:00 <slaweq> many random tests AFAICT 16:22:07 <njohnston> I agree, that is what I saw as well 16:22:46 <slaweq> I will try to spent some more time on checking those failures more carefuly now 16:23:16 <slaweq> but speaking about this job I have one failing test which failed couple of times 16:23:19 <slaweq> :) 16:23:52 <slaweq> do we want to talk about something else related to grafana or can we move to next topic and talk about scenario jobs than? 16:24:11 <njohnston> go ahead 16:24:11 <mlavalle> let's move on 16:24:15 <slaweq> ok 16:24:21 <slaweq> #topic Tempest/Scenario 16:24:40 <slaweq> so, as I said, I found at least 3 times failed neutron_tempest_plugin.scenario.test_qos.QoSTest.test_qos_basic_and_update 16:24:54 <slaweq> in different jobs but failures looks similar 16:25:00 <slaweq> https://9a0240ca9f61a595b570-86672578d4e6ceb498f2d932b0da6815.ssl.cf1.rackcdn.com/633871/20/check/neutron-tempest-plugin-scenario-openvswitch/772f7a4/testr_results.html.gz 16:25:02 <slaweq> https://1fd93ff32a555bc48a73-5fe9d093373d887f2b09d5c4b981e1db.ssl.cf2.rackcdn.com/652099/34/check/neutron-tempest-plugin-scenario-openvswitch-rocky/4897a52/testr_results.html.gz 16:25:04 <slaweq> https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_cf0/679510/1/check/neutron-tempest-plugin-scenario-openvswitch/cf055c6/testr_results.html.gz 16:25:12 <slaweq> ohh, even not 16:25:16 <slaweq> all in openvswitch job :) 16:25:20 <slaweq> but once in rocky 16:27:08 <ralonsoh> slaweq, maybe (maybe) we are hitting the max BW in the CI system 16:27:14 <ralonsoh> it's testing: 16:27:14 <ralonsoh> port=self.NC_PORT, expected_bw=QoSTest.LIMIT_BYTES_SEC * 3) 16:27:26 <ralonsoh> that means, this is testing BW*3 16:27:36 <slaweq> ralonsoh: but e.g. in https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_cf0/679510/1/check/neutron-tempest-plugin-scenario-openvswitch/cf055c6/testr_results.html.gz it failed on _create_file_for_bw_tests() 16:27:51 <slaweq> and job is singlenode 16:27:53 <ralonsoh> yes, not this one 16:28:03 <slaweq> so all this traffic is only on one vm 16:28:07 <slaweq> one host 16:28:27 <slaweq> I don't think it's like we are hitting any limit 16:28:54 <ralonsoh> slaweq, let me review this again 16:29:05 <ralonsoh> I submitted a patch I5ce1a34f7d5d635002baa1e5b14c288e6d2bc43e some weeks ago for this 16:29:06 <slaweq> ralonsoh: look at those 2 lines: 16:29:07 <slaweq> 2019-09-12 23:01:40,063 3293 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.18 successfully created 16:29:09 <slaweq> 2019-09-12 23:17:26,625 3293 ERROR [paramiko.transport] Socket exception: Connection timed out (110) 16:29:21 <slaweq> it waited 16 minutes there 16:29:42 <slaweq> and there is nothing between, those are 2 lines one after another in log from this test 16:30:29 <ralonsoh> this command is just creating an empty file in the VM 16:30:29 <slaweq> in another of those failed tests: 16:30:31 <slaweq> 2019-09-17 06:18:59,385 627 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'} 16:30:33 <slaweq> Body: {"bandwidth_limit_rule": {"max_kbps": 3000, "max_burst_kbps": 3000}} 16:30:35 <slaweq> Response - Headers: {'content-type': 'application/json', 'content-length': '137', 'x-openstack-request-id': 'req-c94e94c1-fb8b-4ef4-b107-e7e9d0a2cf44', 'date': 'Tue, 17 Sep 2019 06:18:59 GMT', 'connection': 'close', 'status': '200', 'content-location': 'http://10.209.96.230:9696/v2.0/qos/policies/43a40f58-3685-4b92-969b-f8ba7e3c7fad/bandwidth_limit_rules/f320a803-bb1f-4f85-89c8-38888dc5805d'} 16:30:37 <slaweq> Body: b'{"bandwidth_limit_rule": {"max_kbps": 3000, "max_burst_kbps": 3000, "direction": "egress", "id": "f320a803-bb1f-4f85-89c8-38888dc5805d"}}' 16:30:39 <slaweq> 2019-09-17 06:34:47,297 627 ERROR [paramiko.transport] Socket exception: Connection timed out (110) 16:30:52 <slaweq> similar nothing for 16 minutes and time out 16:31:31 <slaweq> anyone got cycles to investigate on this? 16:31:34 <ralonsoh> ok, we can reduce the ssh connection timeout 16:31:38 <ralonsoh> I can try it 16:31:48 <slaweq> thx ralonsoh 16:31:56 <slaweq> will You report a bug or should I? 16:32:06 <ralonsoh> slaweq, I'll do it 16:32:09 <slaweq> thx 16:32:28 <slaweq> #action ralonsoh to report bug and investigate issue with neutron_tempest_plugin.scenario.test_qos.QoSTest.test_qos_basic_and_update 16:33:04 <slaweq> another problem which I see from time to time in various jobs is issue with 'Multiple possible networks found, use a Network ID to be more specific.' 16:33:11 <slaweq> like https://58a87e825b9766115d07-cec36eea8e90c9127fc5a72b798cfeab.ssl.cf2.rackcdn.com/670177/9/check/networking-ovn-tempest-dsvm-ovs-release/b58638a/testr_results.html.gz 16:33:42 <ralonsoh> I can take this one too 16:33:44 <slaweq> I think we should check tempest code and make sure that if it spawns instances, it always explicity choose network 16:34:10 <slaweq> ralonsoh: do You have cycles for that too? 16:34:15 <ralonsoh> yes 16:34:21 <slaweq> We don't want to overload You :) 16:34:27 <ralonsoh> np 16:34:32 <slaweq> great, thx 16:34:50 <slaweq> ralonsoh: I think we should also report bug for that, but maybe agains tempest 16:34:57 <ralonsoh> for sure 16:34:59 <slaweq> will You do it? 16:35:04 <ralonsoh> yes 16:35:09 <slaweq> thx 16:35:22 <slaweq> #action ralonsoh to report bug with "Multiple possible networks found" 16:35:46 <slaweq> that's all what I have regarding scenario jobs for today 16:35:57 <slaweq> anything else You want to add here? 16:37:07 <mlavalle> I got nothing 16:37:43 <slaweq> ok, lets move on 16:37:58 <slaweq> #topic fullstack/functional 16:38:12 <slaweq> for functional tests I found one new issue: 16:38:32 <slaweq> in neutron.tests.functional.agent.linux.test_l3_tc_lib.TcLibTestCase.test_clear_all_filters 16:38:35 <slaweq> https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_eff/682391/1/check/neutron-functional-python27/eff0cab/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_l3_tc_lib.TcLibTestCase.test_clear_all_filters.txt.gz 16:38:48 <slaweq> You can see there AttributeError: 'str' object has no attribute 'content_type' 16:39:27 <slaweq> and it was on patch totally not related to this part of code 16:39:36 <slaweq> but I saw it only once so far 16:40:52 <slaweq> any ideas what could be the root cause? 16:41:21 <ralonsoh> is this a testresult lib error? 16:41:36 <slaweq> hmm 16:41:37 <ralonsoh> testtools 16:41:38 <slaweq> it seems so 16:42:01 <ralonsoh> we can keep this log but I wouldn't spend too much time on this one 16:42:21 <slaweq> maybe let's just be aware of it and check if it will not happend again (too often :P) 16:42:26 <ralonsoh> sure 16:42:35 <slaweq> ok for everyone? 16:43:04 <ralonsoh> yes 16:43:24 <slaweq> thx 16:43:29 <njohnston> +1 16:43:33 <slaweq> I have also one "new" fullstack issue 16:43:34 <slaweq> https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_847/681893/1/gate/neutron-fullstack/847f4d9/testr_results.html.gz 16:43:41 <slaweq> but also found it only once 16:44:37 <ralonsoh> same here, to the locker until we hit another one (but this one is more important) 16:45:04 <slaweq> ralonsoh: but here in ovs agent logs https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_847/681893/1/gate/neutron-fullstack/847f4d9/controller/logs/dsvm-fullstack-logs/TestMinBwQoSOvs.test_bw_limit_qos_port_removed_egress_/neutron-openvswitch-agent--2019-09-17--05-35-29-575569_log.txt.gz 16:45:13 <slaweq> there is error: RuntimeError: OVSDB Error: {"details":"cannot delete Queue row a1470780-1834-48d9-afd0-6fe41fcbb027 because of 1 remaining reference(s)","error":"referential integrity violation"} 16:45:30 <slaweq> does it rings a bell for You maybe? 16:45:43 <ralonsoh> yes and no 16:45:46 <ralonsoh> that was solved 16:45:47 <mlavalle> lol 16:46:09 <ralonsoh> you can't delete a queue if it's assigned to a qos register 16:46:18 <slaweq> ralonsoh: but it happend on this patch https://review.opendev.org/#/c/681893 16:46:19 <ralonsoh> same topology as in neutron DB 16:46:27 <slaweq> which is from this week 16:46:42 <ralonsoh> I know, I know 16:46:50 <slaweq> so maybe some regression, or corner case which wasn't addressed? 16:46:52 <ralonsoh> and this change is not related to qos 16:46:58 <ralonsoh> corner case 16:47:06 <slaweq> nope, it is only zuul config change 16:47:46 <ralonsoh> the point is this test is actually testing that a qos rule (and correspoding queues) are removed 16:47:46 <slaweq> ralonsoh: do You want me to report this bug in LP? 16:47:54 <ralonsoh> yes, please 16:48:04 <ralonsoh> I'll take a look at this one but maybe not this week 16:48:09 <slaweq> ok, I will report it and we can track it there 16:48:12 <ralonsoh> ping me with the LP ID 16:48:46 <slaweq> #action slaweq to report bug with fullstack test_bw_limit_qos_port_removed test 16:48:50 <slaweq> ralonsoh: sure 16:49:16 <slaweq> ok, that's all about functional and fullstack jobs what I have for You today 16:49:24 <slaweq> anything else You want to add? 16:50:23 <slaweq> ok, if not lets move on 16:50:45 <slaweq> I checked periodic jobs today and all of them works really good recently 16:50:56 <mlavalle> great! 16:51:05 <slaweq> so we are good with periodic 16:51:14 <slaweq> and that's all from what I have for today 16:51:32 <slaweq> do You want to talk about anything else related to CI today? 16:51:39 <slaweq> if not we can finish earlier :) 16:51:48 <mlavalle> let's do it 16:52:05 <slaweq> by "it" You mean "finish earlier" right? 16:52:07 <slaweq> :D 16:52:18 <mlavalle> yeap 16:52:22 <slaweq> ok, thx for attending 16:52:24 <slaweq> o/ 16:52:25 <njohnston> o/ 16:52:25 <ralonsoh> bye! 16:52:27 <slaweq> #endmeeting