15:00:56 <slaweq> #startmeeting neutron_ci 15:00:57 <openstack> Meeting started Tue Apr 6 15:00:56 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:58 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:58 <slaweq> hi 15:01:00 <openstack> The meeting name has been set to 'neutron_ci' 15:01:15 <ralonsoh> hi 15:01:19 <bcafarel> o/ 15:01:22 <lajoskatona> Hi 15:02:11 <slaweq> ok, let's start 15:02:19 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:27 <slaweq> Please open now :) 15:04:24 <slaweq> #topic Actions from previous meetings 15:04:38 <slaweq> ralonsoh to check failed qos scenario test 15:05:01 <ralonsoh> no, sorry, I just started. I was busy with the py38/FTs timeouts 15:05:07 <slaweq> ralonsoh: sure 15:05:09 <slaweq> np 15:05:17 <slaweq> can I assign it to You for next week too? 15:05:20 <ralonsoh> sure 15:05:23 <slaweq> #action ralonsoh to check failed qos scenario test 15:05:24 <slaweq> thx 15:05:28 <slaweq> next one 15:05:30 <slaweq> ralonsoh to check https://bugs.launchpad.net/neutron/+bug/1921866 15:05:32 <openstack> Launchpad bug 1917793 in neutron "duplicate for #1921866 [HA] keepalived_state_change does not finish "handle_initial_state"execution" [Critical,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:05:55 <ralonsoh> I pushed a patch to mitigate it 15:05:57 <ralonsoh> one sec 15:06:12 <ralonsoh> #link https://review.opendev.org/c/openstack/neutron/+/779024 15:06:32 <ralonsoh> (already merged) 15:06:43 <slaweq> thx, so we should be good with that one :) 15:07:01 <slaweq> next one then 15:07:03 <slaweq> slaweq to check failed start metadata proxy issue 15:07:08 <slaweq> Bug https://bugs.launchpad.net/neutron/+bug/1922684 15:07:09 <openstack> Launchpad bug 1922684 in neutron "Functional dhcp agent tests fails to spawn metadata proxy" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:07:20 <slaweq> and proposed fix https://review.opendev.org/c/openstack/neutron/+/784903 15:07:40 <slaweq> ralonsoh: I saw You had some questions about it 15:07:48 <ralonsoh> thanks 15:08:04 <ralonsoh> we can discuss it in the patch 15:08:19 <slaweq> let me try to quickly explain it here 15:08:22 <ralonsoh> sure 15:09:39 <slaweq> first of all, You can easy reproduce it if You will raise exceptions.ProcessExecutionError somewhere in fill_dhcp_udp_checksums() method in https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1762 15:09:53 <slaweq> this is what happens really in those failed tests 15:10:04 <slaweq> so during iptables-restore command there is exception raised 15:10:17 <slaweq> and this is handled properly by dhcp driver 15:11:22 <slaweq> but when it tries call setup() method again https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1664 it fails on ensure_device_is_ready: https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1692 15:11:44 <slaweq> it happens like that because in the test we prepare network object with port prepared 15:12:05 <slaweq> and that "fake" port is used in the first call of setup() method 15:12:17 <slaweq> exactly here https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1667 15:12:36 <slaweq> so "port" is exactly what test expects that it will be 15:13:11 <slaweq> but we also mock get_dhcp_port() from the plugin rpc api class in that test 15:13:22 <slaweq> so in first call of setup() method it will: 15:13:41 <slaweq> 1. get correct port in https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1667 15:14:05 <slaweq> 2. update network.ports[0] to be mock instead of port which was returned in 1) 15:14:14 <slaweq> 3. fail on iptables call 15:14:24 <slaweq> and now second call of setup() 15:14:33 <slaweq> 1. get wrong (mock) port in https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1667 15:14:47 <slaweq> 2. fails at https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1692 15:15:11 <slaweq> I'm not sure if that's clear for You now 15:15:15 <ralonsoh> ok, I'll check it locally, I still don't get it 15:15:44 <slaweq> ok 15:15:51 <slaweq> we can continue in the review later 15:16:12 <slaweq> that's all regarding actions from last week 15:16:18 <slaweq> let's move on 15:16:22 <slaweq> #topic Stadium projects 15:16:29 <slaweq> lajoskatona: any updates? 15:16:39 <slaweq> except midonet as it's not stadium project anymore ;) 15:16:40 <lajoskatona> nothing to tell the truth 15:17:14 <lajoskatona> as I saw this morning things a re going in, so no issue at leat as i checked 15:17:25 <slaweq> ok, thx for taking care of it 15:17:31 <slaweq> #topic Stable branches 15:17:40 <slaweq> bcafarel: any updates? 15:17:51 <slaweq> except the issue with py2 (again) in older branches 15:18:03 <bcafarel> main issue I spoiled in previous meeting is py2 bug indeed 15:18:19 <bcafarel> as it breaks up to ussuri included the list of ok branches got short :) 15:18:19 <slaweq> is there any LP for that bug already? 15:18:39 <bcafarel> I had opened one for neutron, but closed it as dup (gmann opened one for devstack) 15:18:46 <bcafarel> https://bugs.launchpad.net/devstack/+bug/1922736 15:18:47 <openstack> Launchpad bug 1922736 in devstack "Stable stein|train py2 devstack based jobs are broken on py2 interpreter" [Critical,Confirmed] 15:18:53 <bcafarel> as it is rather generic issue not just for us 15:19:42 <slaweq> thx bcafarel 15:20:14 * slaweq wonders when we will need to stop testing all py2 branches in u/s 15:20:35 <bcafarel> well, train had still both IIRC 15:20:57 <bcafarel> so expect a few other "oh yes whe should cap this one too" 15:21:08 <slaweq> :) 15:21:28 <slaweq> something else, easier regarding the stable branches 15:21:43 <slaweq> we need to update our grafana dashboads to include stable/wallaby 15:21:49 <slaweq> bcafarel: will You take care of it? 15:22:33 <bcafarel> sigh sorry I pushed doc update to note this as release step and then forgot about actually doing it 15:22:40 <slaweq> LOL 15:22:48 <bcafarel> slaweq: let's add it as topic for next week so I do not keep forgeting :) 15:22:54 <slaweq> thx 15:23:14 <slaweq> #action bcafarel to update grafana dashboards with stable/wallaby 15:23:24 <slaweq> ok, next topic 15:23:26 <slaweq> #topic Grafana 15:23:56 <slaweq> here things looks pretty ok this week IMO 15:24:02 <slaweq> I don't seen any major issues 15:24:42 <ralonsoh> well, py38 and FTs were a bit unstable, too many timeouts 15:24:52 <slaweq> ralonsoh: true 15:25:11 <slaweq> but You proposed some patches to address, at least py38 issues, right? 15:25:19 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/784771 15:25:21 <ralonsoh> and for FTs 15:25:25 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/784771 15:25:38 <ralonsoh> sorry: https://review.opendev.org/c/openstack/neutron/+/784889 15:26:12 <slaweq> ok, both are approved already 15:26:22 <slaweq> lets see if it will be better with those patches merged 15:27:00 <bcafarel> seeing the times for the offline_migration tests it should help 15:27:30 <ralonsoh> mysql tests take around 10 mins, all of them 15:27:52 <slaweq> hopefully 15:27:57 <ralonsoh> I'm trying to merge in one single test, to avoid executing the migration again and again 15:29:02 <slaweq> ++ 15:29:45 <slaweq> ok, lets talk about some specific issues 15:29:51 <slaweq> #topic functional 15:29:59 <slaweq> I found one new issue for today 15:30:05 <slaweq> https://78bb45d7d79a62b0c924-1d8800dfbc4b22202783e69a87ac00ba.ssl.cf1.rackcdn.com/783647/6/check/neutron-functional-with-uwsgi/83ffba0/testr_results.html 15:30:10 <slaweq> it's failed test_get_egress_min_bw_for_port 15:30:27 <ralonsoh> fail 15:30:27 <ralonsoh> [x] 15:30:27 <ralonsoh> 15:30:27 <ralonsoh> ft1.22: neutron.tests.functional.agent.common.test_ovs_lib.BaseOVSTestCase.test_get_egress_min_bw_for_porttesttools.testresult.real._StringException: Traceback (most recent call last): 15:30:27 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 708, in wait_until_true 15:30:29 <ralonsoh> eventlet.sleep(sleep) 15:30:31 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/eventlet/greenthread.py", line 36, in sleep 15:30:34 <ralonsoh> hub.switch() 15:30:38 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch 15:30:41 <ralonsoh> return self.greenlet.switch() 15:30:43 <ralonsoh> eventlet.timeout.Timeout: 5 seconds 15:30:45 <ralonsoh> During handling of the above exception, another exception occurred: 15:30:47 <ralonsoh> Traceback (most recent call last): 15:30:49 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/common/test_ovs_lib.py", line 158, in _check_value 15:30:52 <ralonsoh> common_utils.wait_until_true(part_check_value, timeout=5, sleep=1) 15:30:54 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 713, in wait_until_true 15:30:57 <ralonsoh> raise WaitTimeout(_("Timed out after %d seconds") % timeout) 15:30:59 <ralonsoh> neutron.common.utils.WaitTimeout: Timed out after 5 seconds 15:31:01 <ralonsoh> During handling of the above exception, another exception occurred: 15:31:03 <ralonsoh> Traceback (most recent call last): 15:31:05 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func 15:31:09 <ralonsoh> return f(self, *args, **kwargs) 15:31:11 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/common/test_ovs_lib.py", line 452, in test_get_egress_min_bw_for_port 15:31:14 <ralonsoh> self._check_value(2800, self.ovs.get_egress_min_bw_for_port, 15:31:16 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/common/test_ovs_lib.py", line 160, in _check_value 15:31:19 <ralonsoh> self.fail('Expected value: %s, retrieved value: %s' % 15:31:21 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/unittest2/case.py", line 690, in fail 15:31:24 <ralonsoh> raise self.failureException(msg) 15:31:26 <ralonsoh> AssertionError: Expected value: 2800, retrieved value: 1700 15:31:28 <ralonsoh> 15:31:30 <ralonsoh> sorry!!! 15:31:32 <ralonsoh> what I wanted to point out is the retrieved value, 1700 15:31:34 <ralonsoh> this could be due to an overloaded host 15:31:48 <slaweq> :) 15:31:57 <slaweq> wrong copy paste ;P 15:33:01 <slaweq> ralonsoh: but how overloaded host can impact that? 15:33:29 <ralonsoh> because it cannot transmit at the requested speed 15:33:45 <slaweq> but it's not checking actual bandwidth 15:33:56 <slaweq> it's just checking what is set in ovs IMO 15:34:20 <ralonsoh> sorry! you are right 15:34:33 <ralonsoh> ok, indeed this is an error 15:34:43 <slaweq> https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/tests/functional/agent/common/test_ovs_lib.py#L452 15:34:48 <slaweq> it failed in that line 15:35:00 <slaweq> so just "update_minimum_bandwidth_queue()" 15:35:12 <slaweq> and then wait 5 seconds until it will be really set 15:35:42 <ralonsoh> this is the most trivial check 15:35:56 <slaweq> but maybe we should use different ports in each test 15:36:11 <slaweq> as now it seems that 1700 was set in different test: https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/tests/functional/agent/common/test_ovs_lib.py#L374 15:36:28 <ralonsoh> we do, we are generating a new port uuid per test 15:36:41 <slaweq> so from where 1700 came? 15:38:12 <ralonsoh> ups, the queue number 15:38:25 <ralonsoh> maybe we need to make the queue number random 15:38:28 <ralonsoh> I'll check it 15:38:32 <slaweq> thx 15:38:50 <slaweq> queue number is always 1 15:39:00 <slaweq> it may be that there is race between those tests 15:40:00 <slaweq> #action ralonsoh to check failed test_get_egress_min_bw_for_port functional test 15:40:29 <slaweq> ok, that's basically all what I had for today 15:40:40 <slaweq> I really didn't found many new issues in our jobs this week 15:40:53 <bcafarel> not complaining that you did not :) 15:41:01 <lajoskatona> +1 15:41:09 <slaweq> one last thing from me for today 15:41:14 <slaweq> https://review.opendev.org/q/topic:secure-rbac+project:openstack/neutron+status:open 15:41:20 <slaweq> please review those patches 15:41:33 <slaweq> I'm pushing new UT for API policies 15:41:39 <bcafarel> slaweq++ nice 15:41:44 <slaweq> (and finding new bugs all the time :/) 15:41:54 <slaweq> so those tests are useful IMO 15:42:10 <slaweq> I know that those patches are huge but please review them :) 15:42:54 <slaweq> and that's all what I have for today 15:43:04 <slaweq> do You have anything else You want to talk about today? 15:43:21 <ralonsoh> https://bugs.launchpad.net/neutron/+bug/1915341 15:43:22 <openstack> Launchpad bug 1915341 in neutron "neutron-linuxbridge-agent not starting due to nf_tables rules" [Critical,New] 15:43:30 <ralonsoh> but this could be discussed in the PTG 15:43:40 <ralonsoh> in a nutshell: this problem is related to nft API 15:44:00 <ralonsoh> if they use legacy ebtables (same as in our CI), the problem is gone 15:44:19 <ralonsoh> I'm trying to fix it for legacy and ebtables-nft (new API) 15:44:25 <lajoskatona> so this iwhy I cant reproduce it ? 15:44:32 <ralonsoh> probably 15:44:41 <ralonsoh> you can force the new api 15:44:42 <ralonsoh> one sec 15:44:58 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/775413/11/roles/nftables/tasks/main.yaml 15:45:08 <ralonsoh> this is the patch I'm using to test it 15:45:41 <ralonsoh> but this is just a heads-up, we'll talk about the future of linux bridge and nft in the PTG 15:45:43 <lajoskatona> thanks, I check it 15:45:44 <ralonsoh> I'll add a topic 15:45:48 <ralonsoh> (that's all) 15:45:57 <slaweq> thx for topic proposal 15:46:33 <slaweq> I already added something about linuxbridge agent to the etherpad 15:46:47 <slaweq> but please add Your notes to it too :) 15:47:27 <slaweq> ralonsoh: regarding bug https://bugs.launchpad.net/neutron/+bug/1915341 do You think we should have note about it somewhere in our docs? 15:47:28 <openstack> Launchpad bug 1915341 in neutron "neutron-linuxbridge-agent not starting due to nf_tables rules" [Critical,New] 15:47:51 <ralonsoh> slaweq, yes, we should add this in the documentation 15:47:59 <ralonsoh> I'll do it 15:48:07 <slaweq> ralonsoh++ thx a lot 15:48:26 <slaweq> #action ralonsoh to update LB installation guide with info about legacy ebtables 15:49:28 <slaweq> with that I think we can finish today's meeting 15:49:47 <slaweq> thx for attending 15:49:51 <bcafarel> o/ 15:49:53 <ralonsoh> bye 15:49:54 <slaweq> o/ 15:49:56 <slaweq> #endmeeting