15:00:56 #startmeeting neutron_ci 15:00:57 Meeting started Tue Apr 6 15:00:56 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:58 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:58 hi 15:01:00 The meeting name has been set to 'neutron_ci' 15:01:15 hi 15:01:19 o/ 15:01:22 Hi 15:02:11 ok, let's start 15:02:19 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:27 Please open now :) 15:04:24 #topic Actions from previous meetings 15:04:38 ralonsoh to check failed qos scenario test 15:05:01 no, sorry, I just started. I was busy with the py38/FTs timeouts 15:05:07 ralonsoh: sure 15:05:09 np 15:05:17 can I assign it to You for next week too? 15:05:20 sure 15:05:23 #action ralonsoh to check failed qos scenario test 15:05:24 thx 15:05:28 next one 15:05:30 ralonsoh to check https://bugs.launchpad.net/neutron/+bug/1921866 15:05:32 Launchpad bug 1917793 in neutron "duplicate for #1921866 [HA] keepalived_state_change does not finish "handle_initial_state"execution" [Critical,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:05:55 I pushed a patch to mitigate it 15:05:57 one sec 15:06:12 #link https://review.opendev.org/c/openstack/neutron/+/779024 15:06:32 (already merged) 15:06:43 thx, so we should be good with that one :) 15:07:01 next one then 15:07:03 slaweq to check failed start metadata proxy issue 15:07:08 Bug https://bugs.launchpad.net/neutron/+bug/1922684 15:07:09 Launchpad bug 1922684 in neutron "Functional dhcp agent tests fails to spawn metadata proxy" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:07:20 and proposed fix https://review.opendev.org/c/openstack/neutron/+/784903 15:07:40 ralonsoh: I saw You had some questions about it 15:07:48 thanks 15:08:04 we can discuss it in the patch 15:08:19 let me try to quickly explain it here 15:08:22 sure 15:09:39 first of all, You can easy reproduce it if You will raise exceptions.ProcessExecutionError somewhere in fill_dhcp_udp_checksums() method in https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1762 15:09:53 this is what happens really in those failed tests 15:10:04 so during iptables-restore command there is exception raised 15:10:17 and this is handled properly by dhcp driver 15:11:22 but when it tries call setup() method again https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1664 it fails on ensure_device_is_ready: https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1692 15:11:44 it happens like that because in the test we prepare network object with port prepared 15:12:05 and that "fake" port is used in the first call of setup() method 15:12:17 exactly here https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1667 15:12:36 so "port" is exactly what test expects that it will be 15:13:11 but we also mock get_dhcp_port() from the plugin rpc api class in that test 15:13:22 so in first call of setup() method it will: 15:13:41 1. get correct port in https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1667 15:14:05 2. update network.ports[0] to be mock instead of port which was returned in 1) 15:14:14 3. fail on iptables call 15:14:24 and now second call of setup() 15:14:33 1. get wrong (mock) port in https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1667 15:14:47 2. fails at https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1692 15:15:11 I'm not sure if that's clear for You now 15:15:15 ok, I'll check it locally, I still don't get it 15:15:44 ok 15:15:51 we can continue in the review later 15:16:12 that's all regarding actions from last week 15:16:18 let's move on 15:16:22 #topic Stadium projects 15:16:29 lajoskatona: any updates? 15:16:39 except midonet as it's not stadium project anymore ;) 15:16:40 nothing to tell the truth 15:17:14 as I saw this morning things a re going in, so no issue at leat as i checked 15:17:25 ok, thx for taking care of it 15:17:31 #topic Stable branches 15:17:40 bcafarel: any updates? 15:17:51 except the issue with py2 (again) in older branches 15:18:03 main issue I spoiled in previous meeting is py2 bug indeed 15:18:19 as it breaks up to ussuri included the list of ok branches got short :) 15:18:19 is there any LP for that bug already? 15:18:39 I had opened one for neutron, but closed it as dup (gmann opened one for devstack) 15:18:46 https://bugs.launchpad.net/devstack/+bug/1922736 15:18:47 Launchpad bug 1922736 in devstack "Stable stein|train py2 devstack based jobs are broken on py2 interpreter" [Critical,Confirmed] 15:18:53 as it is rather generic issue not just for us 15:19:42 thx bcafarel 15:20:14 * slaweq wonders when we will need to stop testing all py2 branches in u/s 15:20:35 well, train had still both IIRC 15:20:57 so expect a few other "oh yes whe should cap this one too" 15:21:08 :) 15:21:28 something else, easier regarding the stable branches 15:21:43 we need to update our grafana dashboads to include stable/wallaby 15:21:49 bcafarel: will You take care of it? 15:22:33 sigh sorry I pushed doc update to note this as release step and then forgot about actually doing it 15:22:40 LOL 15:22:48 slaweq: let's add it as topic for next week so I do not keep forgeting :) 15:22:54 thx 15:23:14 #action bcafarel to update grafana dashboards with stable/wallaby 15:23:24 ok, next topic 15:23:26 #topic Grafana 15:23:56 here things looks pretty ok this week IMO 15:24:02 I don't seen any major issues 15:24:42 well, py38 and FTs were a bit unstable, too many timeouts 15:24:52 ralonsoh: true 15:25:11 but You proposed some patches to address, at least py38 issues, right? 15:25:19 https://review.opendev.org/c/openstack/neutron/+/784771 15:25:21 and for FTs 15:25:25 https://review.opendev.org/c/openstack/neutron/+/784771 15:25:38 sorry: https://review.opendev.org/c/openstack/neutron/+/784889 15:26:12 ok, both are approved already 15:26:22 lets see if it will be better with those patches merged 15:27:00 seeing the times for the offline_migration tests it should help 15:27:30 mysql tests take around 10 mins, all of them 15:27:52 hopefully 15:27:57 I'm trying to merge in one single test, to avoid executing the migration again and again 15:29:02 ++ 15:29:45 ok, lets talk about some specific issues 15:29:51 #topic functional 15:29:59 I found one new issue for today 15:30:05 https://78bb45d7d79a62b0c924-1d8800dfbc4b22202783e69a87ac00ba.ssl.cf1.rackcdn.com/783647/6/check/neutron-functional-with-uwsgi/83ffba0/testr_results.html 15:30:10 it's failed test_get_egress_min_bw_for_port 15:30:27 fail 15:30:27 [x] 15:30:27 15:30:27 ft1.22: neutron.tests.functional.agent.common.test_ovs_lib.BaseOVSTestCase.test_get_egress_min_bw_for_porttesttools.testresult.real._StringException: Traceback (most recent call last): 15:30:27 File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 708, in wait_until_true 15:30:29 eventlet.sleep(sleep) 15:30:31 File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/eventlet/greenthread.py", line 36, in sleep 15:30:34 hub.switch() 15:30:38 File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch 15:30:41 return self.greenlet.switch() 15:30:43 eventlet.timeout.Timeout: 5 seconds 15:30:45 During handling of the above exception, another exception occurred: 15:30:47 Traceback (most recent call last): 15:30:49 File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/common/test_ovs_lib.py", line 158, in _check_value 15:30:52 common_utils.wait_until_true(part_check_value, timeout=5, sleep=1) 15:30:54 File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 713, in wait_until_true 15:30:57 raise WaitTimeout(_("Timed out after %d seconds") % timeout) 15:30:59 neutron.common.utils.WaitTimeout: Timed out after 5 seconds 15:31:01 During handling of the above exception, another exception occurred: 15:31:03 Traceback (most recent call last): 15:31:05 File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func 15:31:09 return f(self, *args, **kwargs) 15:31:11 File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/common/test_ovs_lib.py", line 452, in test_get_egress_min_bw_for_port 15:31:14 self._check_value(2800, self.ovs.get_egress_min_bw_for_port, 15:31:16 File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/common/test_ovs_lib.py", line 160, in _check_value 15:31:19 self.fail('Expected value: %s, retrieved value: %s' % 15:31:21 File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/unittest2/case.py", line 690, in fail 15:31:24 raise self.failureException(msg) 15:31:26 AssertionError: Expected value: 2800, retrieved value: 1700 15:31:28 15:31:30 sorry!!! 15:31:32 what I wanted to point out is the retrieved value, 1700 15:31:34 this could be due to an overloaded host 15:31:48 :) 15:31:57 wrong copy paste ;P 15:33:01 ralonsoh: but how overloaded host can impact that? 15:33:29 because it cannot transmit at the requested speed 15:33:45 but it's not checking actual bandwidth 15:33:56 it's just checking what is set in ovs IMO 15:34:20 sorry! you are right 15:34:33 ok, indeed this is an error 15:34:43 https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/tests/functional/agent/common/test_ovs_lib.py#L452 15:34:48 it failed in that line 15:35:00 so just "update_minimum_bandwidth_queue()" 15:35:12 and then wait 5 seconds until it will be really set 15:35:42 this is the most trivial check 15:35:56 but maybe we should use different ports in each test 15:36:11 as now it seems that 1700 was set in different test: https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/tests/functional/agent/common/test_ovs_lib.py#L374 15:36:28 we do, we are generating a new port uuid per test 15:36:41 so from where 1700 came? 15:38:12 ups, the queue number 15:38:25 maybe we need to make the queue number random 15:38:28 I'll check it 15:38:32 thx 15:38:50 queue number is always 1 15:39:00 it may be that there is race between those tests 15:40:00 #action ralonsoh to check failed test_get_egress_min_bw_for_port functional test 15:40:29 ok, that's basically all what I had for today 15:40:40 I really didn't found many new issues in our jobs this week 15:40:53 not complaining that you did not :) 15:41:01 +1 15:41:09 one last thing from me for today 15:41:14 https://review.opendev.org/q/topic:secure-rbac+project:openstack/neutron+status:open 15:41:20 please review those patches 15:41:33 I'm pushing new UT for API policies 15:41:39 slaweq++ nice 15:41:44 (and finding new bugs all the time :/) 15:41:54 so those tests are useful IMO 15:42:10 I know that those patches are huge but please review them :) 15:42:54 and that's all what I have for today 15:43:04 do You have anything else You want to talk about today? 15:43:21 https://bugs.launchpad.net/neutron/+bug/1915341 15:43:22 Launchpad bug 1915341 in neutron "neutron-linuxbridge-agent not starting due to nf_tables rules" [Critical,New] 15:43:30 but this could be discussed in the PTG 15:43:40 in a nutshell: this problem is related to nft API 15:44:00 if they use legacy ebtables (same as in our CI), the problem is gone 15:44:19 I'm trying to fix it for legacy and ebtables-nft (new API) 15:44:25 so this iwhy I cant reproduce it ? 15:44:32 probably 15:44:41 you can force the new api 15:44:42 one sec 15:44:58 https://review.opendev.org/c/openstack/neutron/+/775413/11/roles/nftables/tasks/main.yaml 15:45:08 this is the patch I'm using to test it 15:45:41 but this is just a heads-up, we'll talk about the future of linux bridge and nft in the PTG 15:45:43 thanks, I check it 15:45:44 I'll add a topic 15:45:48 (that's all) 15:45:57 thx for topic proposal 15:46:33 I already added something about linuxbridge agent to the etherpad 15:46:47 but please add Your notes to it too :) 15:47:27 ralonsoh: regarding bug https://bugs.launchpad.net/neutron/+bug/1915341 do You think we should have note about it somewhere in our docs? 15:47:28 Launchpad bug 1915341 in neutron "neutron-linuxbridge-agent not starting due to nf_tables rules" [Critical,New] 15:47:51 slaweq, yes, we should add this in the documentation 15:47:59 I'll do it 15:48:07 ralonsoh++ thx a lot 15:48:26 #action ralonsoh to update LB installation guide with info about legacy ebtables 15:49:28 with that I think we can finish today's meeting 15:49:47 thx for attending 15:49:51 o/ 15:49:53 bye 15:49:54 o/ 15:49:56 #endmeeting