16:00:23 <slaweq> #startmeeting neutron_ci
16:00:24 <openstack> Meeting started Tue Sep 17 16:00:23 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:25 <slaweq> hi
16:00:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:29 <openstack> The meeting name has been set to 'neutron_ci'
16:00:29 <ralonsoh> hi
16:01:31 <njohnston> o/
16:01:42 <slaweq> lets wait 1 or 2 minutes for others
16:01:48 <slaweq> I just pinged people on neutron channel
16:03:34 <mlavalle> o/
16:03:44 <slaweq> ok, lets start
16:03:51 <slaweq> first thing
16:03:51 <haleyb> hi
16:04:00 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:04:21 <slaweq> please open it now to give it time to load
16:04:22 <slaweq> #topic Actions from previous meetings
16:04:28 <slaweq> mlavalle to continue investigating router migrations issue
16:04:53 <mlavalle> I submitted a new revision to patch with log statements
16:05:13 <mlavalle> I haven't hit the failure so far
16:05:23 <mlavalle> so rechecking until I hit it
16:05:49 <slaweq> lets hope it will fail fast :)
16:06:00 <mlavalle> yeap, lol
16:06:00 <slaweq> #action mlavalle to continue investigating router migrations issue
16:06:13 <slaweq> lets check next week than
16:06:16 <slaweq> ok, next one
16:06:21 <slaweq> ralonsoh to report bug and check issue with ovsdb errors in functional tests
16:06:50 <ralonsoh> slaweq, the problem was not related to ovsdb
16:06:58 <ralonsoh> but I dont; find now the real bug
16:07:11 <ralonsoh> ralonsoh, I now I filled it
16:07:23 <ralonsoh> slaweq, give me some time, please continue
16:07:35 <slaweq> ok :)
16:07:50 <slaweq> that's all from last week (not too much this time)
16:07:55 <slaweq> next topic than
16:08:00 <slaweq> #topic Stadium projects
16:08:16 <slaweq> about python3 we already talked on yesterday's neutron meeting
16:08:41 <slaweq> to sumup, we still have few projects to finish
16:08:52 <slaweq> as a reminder: Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status
16:09:17 <slaweq> any updates from the team about python3 ?
16:10:50 <njohnston> Nothing from me
16:11:04 <slaweq> ok, so next topic related to stadium
16:11:05 <ralonsoh> can I interrupt?
16:11:06 <slaweq> tempest-plugins migration
16:11:13 <slaweq> sure ralonsoh
16:11:20 <ralonsoh> about the previous CI issue
16:11:41 <ralonsoh> it wasn't the ovsdb, but the namespaces creation/deletion and the ip_lib
16:11:49 <slaweq> again?
16:11:56 <bcafarel> o/ late hi (again, sorry)
16:11:58 <ralonsoh> with https://review.opendev.org/#/c/679428/ we'll see how the CI gets better
16:12:03 <ralonsoh> that's all
16:12:05 <ralonsoh> thanks
16:12:33 <slaweq> ok, this patch was merged fed days ago, so it should be better now probably
16:12:55 <slaweq> ralonsoh: will You propose backports of this patch maybe? or stable branches don't need it?
16:13:06 <ralonsoh> slaweq, I'll check this now
16:13:40 <slaweq> ralonsoh: thx a lot
16:13:52 <bcafarel> I'll keep an eye on these backports :)
16:14:05 <slaweq> #action ralonsoh to check if https://review.opendev.org/#/c/679428/ should be backported to stable branches
16:14:28 <ralonsoh> slaweq, no, not needed
16:14:39 <slaweq> ahh, that was fast :)
16:14:43 <slaweq> thx ralonsoh
16:15:03 <slaweq> ok, going back to tempest-plugins migration
16:15:09 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo
16:15:21 <slaweq> neutron-dynamic-routing - patch ready for review https://review.opendev.org/#/c/652099
16:15:30 <slaweq> tidwellr: thx a lot for work on this
16:15:40 <mlavalle> I'll do my best to finish vpn over the next few days
16:15:48 <slaweq> mlavalle: thx
16:15:58 <slaweq> than we should be done I hope
16:16:52 <slaweq> ok, so I think we can move on
16:16:54 <slaweq> #topic Grafana
16:17:00 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:18:28 <slaweq> after few weeks of very bad CI I think we are getting back to business finally
16:18:43 <slaweq> at least most urgent issues are fixed or workarounded
16:19:14 <slaweq> It seems that rally is going back - we should probably switch it to be voting again this week
16:19:32 <slaweq> I will take a look at it for few days and will propose patch to make it voting again
16:19:49 <njohnston> yes, to me the biggest ongoing negative that stands out is the consistent 20% failure rate on neutron-tempest-plugin-scenario-openvswitch
16:20:07 <njohnston> that that's not bad
16:21:30 <slaweq> njohnston: yes, this one is failing quite often
16:21:48 <slaweq> and unfortunatelly I didn't saw one common reason of those failures
16:22:00 <slaweq> many random tests AFAICT
16:22:07 <njohnston> I agree, that is what I saw as well
16:22:46 <slaweq> I will try to spent some more time on checking those failures more carefuly now
16:23:16 <slaweq> but speaking about this job I have one failing test which failed couple of times
16:23:19 <slaweq> :)
16:23:52 <slaweq> do we want to talk about something else related to grafana or can we move to next topic and talk about scenario jobs than?
16:24:11 <njohnston> go ahead
16:24:11 <mlavalle> let's move on
16:24:15 <slaweq> ok
16:24:21 <slaweq> #topic Tempest/Scenario
16:24:40 <slaweq> so, as I said, I found at least 3 times failed neutron_tempest_plugin.scenario.test_qos.QoSTest.test_qos_basic_and_update
16:24:54 <slaweq> in different jobs but failures looks similar
16:25:00 <slaweq> https://9a0240ca9f61a595b570-86672578d4e6ceb498f2d932b0da6815.ssl.cf1.rackcdn.com/633871/20/check/neutron-tempest-plugin-scenario-openvswitch/772f7a4/testr_results.html.gz
16:25:02 <slaweq> https://1fd93ff32a555bc48a73-5fe9d093373d887f2b09d5c4b981e1db.ssl.cf2.rackcdn.com/652099/34/check/neutron-tempest-plugin-scenario-openvswitch-rocky/4897a52/testr_results.html.gz
16:25:04 <slaweq> https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_cf0/679510/1/check/neutron-tempest-plugin-scenario-openvswitch/cf055c6/testr_results.html.gz
16:25:12 <slaweq> ohh, even not
16:25:16 <slaweq> all in openvswitch job :)
16:25:20 <slaweq> but once in rocky
16:27:08 <ralonsoh> slaweq, maybe (maybe) we are hitting the max BW in the CI system
16:27:14 <ralonsoh> it's testing:
16:27:14 <ralonsoh> port=self.NC_PORT, expected_bw=QoSTest.LIMIT_BYTES_SEC * 3)
16:27:26 <ralonsoh> that means, this is testing BW*3
16:27:36 <slaweq> ralonsoh: but e.g. in https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_cf0/679510/1/check/neutron-tempest-plugin-scenario-openvswitch/cf055c6/testr_results.html.gz it failed on _create_file_for_bw_tests()
16:27:51 <slaweq> and job is singlenode
16:27:53 <ralonsoh> yes, not this one
16:28:03 <slaweq> so all this traffic is only on one vm
16:28:07 <slaweq> one host
16:28:27 <slaweq> I don't think it's like we are hitting any limit
16:28:54 <ralonsoh> slaweq, let me review this again
16:29:05 <ralonsoh> I submitted a patch I5ce1a34f7d5d635002baa1e5b14c288e6d2bc43e some weeks ago for this
16:29:06 <slaweq> ralonsoh: look at those 2 lines:
16:29:07 <slaweq> 2019-09-12 23:01:40,063 3293 INFO     [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.18 successfully created
16:29:09 <slaweq> 2019-09-12 23:17:26,625 3293 ERROR    [paramiko.transport] Socket exception: Connection timed out (110)
16:29:21 <slaweq> it waited 16 minutes there
16:29:42 <slaweq> and there is nothing between, those are 2 lines one after another in log from this test
16:30:29 <ralonsoh> this command is just creating an empty file in the VM
16:30:29 <slaweq> in another of those failed tests:
16:30:31 <slaweq> 2019-09-17 06:18:59,385 627 DEBUG    [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
16:30:33 <slaweq> Body: {"bandwidth_limit_rule": {"max_kbps": 3000, "max_burst_kbps": 3000}}
16:30:35 <slaweq> Response - Headers: {'content-type': 'application/json', 'content-length': '137', 'x-openstack-request-id': 'req-c94e94c1-fb8b-4ef4-b107-e7e9d0a2cf44', 'date': 'Tue, 17 Sep 2019 06:18:59 GMT', 'connection': 'close', 'status': '200', 'content-location': 'http://10.209.96.230:9696/v2.0/qos/policies/43a40f58-3685-4b92-969b-f8ba7e3c7fad/bandwidth_limit_rules/f320a803-bb1f-4f85-89c8-38888dc5805d'}
16:30:37 <slaweq> Body: b'{"bandwidth_limit_rule": {"max_kbps": 3000, "max_burst_kbps": 3000, "direction": "egress", "id": "f320a803-bb1f-4f85-89c8-38888dc5805d"}}'
16:30:39 <slaweq> 2019-09-17 06:34:47,297 627 ERROR    [paramiko.transport] Socket exception: Connection timed out (110)
16:30:52 <slaweq> similar nothing for 16 minutes and time out
16:31:31 <slaweq> anyone got cycles to investigate on this?
16:31:34 <ralonsoh> ok, we can reduce the ssh connection timeout
16:31:38 <ralonsoh> I can try it
16:31:48 <slaweq> thx ralonsoh
16:31:56 <slaweq> will You report a bug or should I?
16:32:06 <ralonsoh> slaweq, I'll do it
16:32:09 <slaweq> thx
16:32:28 <slaweq> #action ralonsoh to report bug and investigate issue with neutron_tempest_plugin.scenario.test_qos.QoSTest.test_qos_basic_and_update
16:33:04 <slaweq> another problem which I see from time to time in various jobs is issue with 'Multiple possible networks found, use a Network ID to be more specific.'
16:33:11 <slaweq> like https://58a87e825b9766115d07-cec36eea8e90c9127fc5a72b798cfeab.ssl.cf2.rackcdn.com/670177/9/check/networking-ovn-tempest-dsvm-ovs-release/b58638a/testr_results.html.gz
16:33:42 <ralonsoh> I can take this one too
16:33:44 <slaweq> I think we should check tempest code and make sure that if it spawns instances, it always explicity choose network
16:34:10 <slaweq> ralonsoh: do You have cycles for that too?
16:34:15 <ralonsoh> yes
16:34:21 <slaweq> We don't want to overload You :)
16:34:27 <ralonsoh> np
16:34:32 <slaweq> great, thx
16:34:50 <slaweq> ralonsoh: I think we should also report bug for that, but maybe agains tempest
16:34:57 <ralonsoh> for sure
16:34:59 <slaweq> will You do it?
16:35:04 <ralonsoh> yes
16:35:09 <slaweq> thx
16:35:22 <slaweq> #action ralonsoh to report bug with "Multiple possible networks found"
16:35:46 <slaweq> that's all what I have regarding scenario jobs for today
16:35:57 <slaweq> anything else You want to add here?
16:37:07 <mlavalle> I got nothing
16:37:43 <slaweq> ok, lets move on
16:37:58 <slaweq> #topic fullstack/functional
16:38:12 <slaweq> for functional tests I found one new issue:
16:38:32 <slaweq> in neutron.tests.functional.agent.linux.test_l3_tc_lib.TcLibTestCase.test_clear_all_filters
16:38:35 <slaweq> https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_eff/682391/1/check/neutron-functional-python27/eff0cab/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_l3_tc_lib.TcLibTestCase.test_clear_all_filters.txt.gz
16:38:48 <slaweq> You can see there AttributeError: 'str' object has no attribute 'content_type'
16:39:27 <slaweq> and it was on patch totally not related to this part of code
16:39:36 <slaweq> but I saw it only once so far
16:40:52 <slaweq> any ideas what could be the root cause?
16:41:21 <ralonsoh> is this a testresult lib error?
16:41:36 <slaweq> hmm
16:41:37 <ralonsoh> testtools
16:41:38 <slaweq> it seems so
16:42:01 <ralonsoh> we can keep this log but I wouldn't spend too much time on this one
16:42:21 <slaweq> maybe let's just be aware of it and check if it will not happend again (too often :P)
16:42:26 <ralonsoh> sure
16:42:35 <slaweq> ok for everyone?
16:43:04 <ralonsoh> yes
16:43:24 <slaweq> thx
16:43:29 <njohnston> +1
16:43:33 <slaweq> I have also one "new" fullstack issue
16:43:34 <slaweq> https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_847/681893/1/gate/neutron-fullstack/847f4d9/testr_results.html.gz
16:43:41 <slaweq> but also found it only once
16:44:37 <ralonsoh> same here, to the locker until we hit another one (but this one is more important)
16:45:04 <slaweq> ralonsoh: but here in ovs agent logs https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_847/681893/1/gate/neutron-fullstack/847f4d9/controller/logs/dsvm-fullstack-logs/TestMinBwQoSOvs.test_bw_limit_qos_port_removed_egress_/neutron-openvswitch-agent--2019-09-17--05-35-29-575569_log.txt.gz
16:45:13 <slaweq> there is error: RuntimeError: OVSDB Error: {"details":"cannot delete Queue row a1470780-1834-48d9-afd0-6fe41fcbb027 because of 1 remaining reference(s)","error":"referential integrity violation"}
16:45:30 <slaweq> does it rings a bell for You maybe?
16:45:43 <ralonsoh> yes and no
16:45:46 <ralonsoh> that was solved
16:45:47 <mlavalle> lol
16:46:09 <ralonsoh> you can't delete a queue if it's assigned to a qos register
16:46:18 <slaweq> ralonsoh: but it happend on this patch https://review.opendev.org/#/c/681893
16:46:19 <ralonsoh> same topology as in neutron DB
16:46:27 <slaweq> which is from this week
16:46:42 <ralonsoh> I know, I know
16:46:50 <slaweq> so maybe some regression, or corner case which wasn't addressed?
16:46:52 <ralonsoh> and this change is not related to qos
16:46:58 <ralonsoh> corner case
16:47:06 <slaweq> nope, it is only zuul config change
16:47:46 <ralonsoh> the point is this test is actually testing that a qos rule (and correspoding queues) are removed
16:47:46 <slaweq> ralonsoh: do You want me to report this bug in LP?
16:47:54 <ralonsoh> yes, please
16:48:04 <ralonsoh> I'll take a look at this one but maybe not this week
16:48:09 <slaweq> ok, I will report it and we can track it there
16:48:12 <ralonsoh> ping me with the LP ID
16:48:46 <slaweq> #action slaweq to report bug with fullstack test_bw_limit_qos_port_removed test
16:48:50 <slaweq> ralonsoh: sure
16:49:16 <slaweq> ok, that's all about functional and fullstack jobs what I have for You today
16:49:24 <slaweq> anything else You want to add?
16:50:23 <slaweq> ok, if not lets move on
16:50:45 <slaweq> I checked periodic jobs today and all of them works really good recently
16:50:56 <mlavalle> great!
16:51:05 <slaweq> so we are good with periodic
16:51:14 <slaweq> and that's all from what I have for today
16:51:32 <slaweq> do You want to talk about anything else related to CI today?
16:51:39 <slaweq> if not we can finish earlier :)
16:51:48 <mlavalle> let's do it
16:52:05 <slaweq> by "it" You mean "finish earlier" right?
16:52:07 <slaweq> :D
16:52:18 <mlavalle> yeap
16:52:22 <slaweq> ok, thx for attending
16:52:24 <slaweq> o/
16:52:25 <njohnston> o/
16:52:25 <ralonsoh> bye!
16:52:27 <slaweq> #endmeeting