15:01:04 <ykarel> #startmeeting neutron_ci
15:01:04 <opendevmeet> Meeting started Tue Sep  5 15:01:04 2023 UTC and is due to finish in 60 minutes.  The chair is ykarel. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:04 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:04 <opendevmeet> The meeting name has been set to 'neutron_ci'
15:01:09 <ralonsoh> hello
15:01:12 <lajoskatona> o/
15:01:12 <ykarel> ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva, elvira
15:01:19 <ykarel> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1
15:01:20 <mtomaska> o/
15:02:02 <mlavalle> o/
15:02:46 <ykarel> k let's start with the topic as some of the folks are on PTO
15:02:48 <ykarel> #topic Actions from previous meetings
15:02:58 <ykarel> lajoskatona to check failures for bagpipe and check use of master neutron in stadium projects
15:03:32 <lajoskatona> I still strugle with bagpipe
15:03:40 <lajoskatona> it is only with sqlalchemy2
15:04:16 <ykarel> yes that's what i recall
15:04:27 <lajoskatona> In the meantime I also found that the bagpipe tempest job is running with focal, so I changed the nodeset to jammy: https://review.opendev.org/c/openstack/networking-bagpipe/+/893709
15:04:34 <ykarel> looking at periodic i saw some issue in other job, but that can checked in later section
15:04:57 <ykarel> yeap i noticed that failure too, thx for fixing it
15:05:11 <ykarel> other was https://zuul.openstack.org/builds?job_name=networking-bagpipe-openstack-tox-py310-with-sqlalchemy-main&project=openstack/networking-bagpipe
15:05:42 <lajoskatona> the issue is only with the sfc driver of bagpipe, so quite complex for me at least to understand the root cause
15:06:13 <lajoskatona> yes that is the sqlalchemy2 issue
15:06:29 <Continuity_> ralonsoh: yes i have allowed all ICMP and SSH for testing.
15:06:31 <lajoskatona> ralonsoh: what is the timeline for the sqlalchemy2 introduction?
15:06:39 <slaweq> o/
15:06:45 <slaweq> sorry I was on different meeting
15:06:53 <ykarel> lajoskatona, do we have bug already for this issue?
15:06:55 <ralonsoh> it will be introduced, most probably, at the begining of C
15:07:00 <lajoskatona> ok
15:07:07 <ralonsoh> there is a requirements patch proposed already
15:07:40 <ralonsoh> Continuity_, we can talk after the meeting
15:07:42 <lajoskatona> ralonsoh: not for bagpipe, I used as reference the big sqlalchemy2 bug
15:08:09 <lajoskatona> I can open a bagpipe bug to track it
15:08:16 <ralonsoh> yeah, much better
15:08:20 <ykarel> +1
15:08:21 <lajoskatona> ack
15:08:26 <ralonsoh> to be honest, I don't know what is failing in this CI
15:09:09 <ykarel> let's check this offline
15:09:13 <ykarel> #topic Stable branches
15:09:40 <ykarel> Bernard is out this week, but stable branches looks good considering patches merge
15:10:17 <ykarel> i din't noticed any consistent failure on stable branches
15:10:35 <ykarel> anything to add for stable branches?
15:12:00 <ykarel> sounds all good then, moving to next topic
15:12:03 <ykarel> #topic Stadium projects
15:12:25 <ykarel> lajoskatona, anything else apart from that sqlalchemy and focal issues for stadium projects?
15:12:27 <lajoskatona> except bagpipe things seems to be green
15:12:56 <ykarel> k good
15:13:01 <ykarel> #link https://zuul.openstack.org/builds?job_name=networking-bagpipe-openstack-tox-py310-with-sqlalchemy-main&project=openstack/networking-bagpipe
15:13:12 <ykarel> #link https://zuul.openstack.org/builds?job_name=networking-bagpipe-tempest&project=openstack%2Fnetworking-bagpipe&branch=master&skip=0
15:13:25 <ykarel> #topic Grafana
15:13:31 <ykarel> https://grafana.opendev.org/d/f913631585/neutron-failure-rate
15:13:48 <ykarel> let's give a minute to it if we observe anything abnormal there
15:14:31 <ykarel> i see a spike at tempest job, but that was a known issue already fixed
15:14:50 <slaweq> today morning there was some "spike" but on all jobs
15:14:51 <slaweq> so it's probably not an issue on our side really
15:15:02 <ralonsoh> neutron-ovn-tempest-ipv6-only-ovs-release had a 66% of failures
15:15:34 <ykarel> yes i noticed there was quite of failures today morning but most of them were related to series of patches pushed together for l3-ovn iirc
15:15:43 <ralonsoh> right, perfect then
15:16:12 <ralonsoh> yes, I see the same spikes in other jobs
15:18:25 <ykarel> yeap, let's move to next section, will keep monitoring it if something new comes in
15:18:35 <ykarel> #topic Rechecks
15:19:01 <ykarel> it was better last week
15:19:30 <ykarel> there were some known issues last week which might have resulted into those rechecks
15:19:56 <ykarel> bare rechecks were also not much 3/17, so good
15:20:03 <ykarel> let's keep avoiding bare rechecks
15:20:16 <ykarel> #topic Unit tests
15:20:30 <mlavalle> what's 3/17?
15:20:40 <ralonsoh> 3 out of 17
15:20:42 <ykarel> 17 total rechecks, 3 out of them were bare
15:20:49 <ykarel> yeap
15:20:49 <mlavalle> ahhh!
15:20:52 <mlavalle> LOL
15:21:30 <ykarel> #info There was issue with unit test job running with sqlalchemy/alembic main branches
15:21:42 <ykarel> It's already fixed with https://review.opendev.org/c/openstack/neutron/+/893602
15:21:52 <ralonsoh> +1
15:22:00 <ykarel> #topic fullstack/functional
15:22:36 <ykarel> neutron.tests.functional.services.trunk.drivers.ovn.test_trunk_driver.TestOVNTrunkDriver.test_subport_delete
15:22:45 <ykarel> AttributeError: 'NoneType' object has no attribute 'status'
15:23:01 <ykarel> Seen twice in a master/wallaby and failure looks related to the patch itself already merged
15:23:21 <ykarel> so some race in the test as not happening always
15:23:33 <ykarel> #link https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b18/892890/2/gate/neutron-functional-with-uwsgi/b18c02c/testr_results.html
15:24:04 <ykarel> #link https://e872331dabdf974ff450-5a66e2fcfa24aae6b75c2058251d7e58.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-uwsgi-fips/c431c66/testr_results.html
15:25:00 <ykarel> https://review.opendev.org/#/q/I2370ea2f96e2e31dbd43bf232a63394388e6945f
15:25:07 <ralonsoh> I'll ping Arnau to check these errors, look related to ^
15:25:24 <ykarel> i see this being reverted, so likely the failures would also go away
15:25:33 <ralonsoh> ah no
15:25:40 <ralonsoh> this could be another issue
15:25:50 <ralonsoh> in any case, we are reverting the patch you mentioned
15:25:57 <ralonsoh> so please wait until the next week
15:26:11 <ykarel> k +1, will keep an eye
15:26:20 <ykarel> if it still happens will open a bug
15:26:36 <ykarel> next one is neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon.test_read_queue_change_state
15:26:42 <ykarel> AssertionError: Text not found in file /tmp/tmpuh6gesvz/tmp50ei9qfp/log_file: "Initial status of router".
15:26:49 <ykarel> https://157ba513c840b85e5d0e-e65fbda5c4a8fc14eb81d398bd7b0a80.ssl.cf5.rackcdn.com/892896/1/gate/neutron-functional-with-uwsgi/0acdecd/testr_results.html
15:27:57 <ralonsoh> this is something recurrent, the monitor doesn't start in some tests
15:28:16 <ykarel> k so it's already a known issue?
15:28:17 <ralonsoh> I would need to investigate how to make these tests more stable
15:28:29 <ralonsoh> yes, that has been happening for years
15:28:33 <ralonsoh> not very often
15:28:44 <ykarel> ok i see 3 failures in last 15 days across branches
15:29:18 <ykarel> k thanks, will keep an eye on this and if it's start happening more frequently will open a bug for it
15:29:34 <ralonsoh> sure
15:30:02 <ykarel> neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter.test_router_gateway_port_binding_host_id
15:30:11 <ykarel> Timeout exception with self.mech_driver.nb_ovn.ovsdb_connection.stop()
15:30:20 <ykarel> #link https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_2a0/892897/1/gate/neutron-functional-with-uwsgi/2a03caa/testr_results.html
15:31:22 <ykarel> any idea for this?
15:31:54 <ralonsoh> no, just a random timeout during the cleanup phase
15:31:59 <ykarel> seen 4 hits as per opensearch across stable/zed and yoga
15:32:07 <ralonsoh> in the same test?
15:34:11 <ykarel> no different test also hit it, like test_gateway_chassis_rebalance_max_chassis
15:34:41 <ykarel> but that's also in same class neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter
15:36:08 <ralonsoh> let me open a LP bug for this one. We can try, maybe, checking if the ovsdb_connection is still open during the cleanup
15:36:23 <ykarel> k thanks
15:36:28 <ralonsoh> if the connection is not active, then we don't need to stop it
15:36:41 <ykarel> #action ralonsoh to open bug for Timeout exception with self.mech_driver.nb_ovn.ovsdb_connection.stop()
15:37:27 <ykarel> neutron.tests.fullstack.test_connectivity.TestUninterruptedConnectivityOnL2AgentRestart.test_l2_agent_restart(LB,VLANs)
15:37:36 <ykarel> #link https://2df199e43476e3c732e7-3130556d487e5cec46a1ba3d1eaa7fda.ssl.cf5.rackcdn.com/892890/2/gate/neutron-fullstack-with-uwsgi/209726b/testr_results.html
15:37:45 <ykarel> #link https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_da8/892869/3/check/neutron-fullstack-with-uwsgi/da88e54/testr_results.html
15:38:24 <ykarel> anyone recall this failure, from meeting history i see it was seen in past too
15:38:56 <ralonsoh> no, sorry
15:39:20 <slaweq> this is related to LB so we can simply mark this test as unstable or simply skip it if it's not stable
15:39:40 <slaweq> as LB is marked as "experimental" since some time
15:39:44 <lajoskatona> +1
15:39:47 <slaweq> I can propose patch for that
15:40:26 <ykarel> k thanks, i think i saw it in stable branches
15:40:46 <ykarel> #action slaweq to check failures with fullstack test test_l2_agent_restart
15:40:51 <ykarel> thx slaweq
15:41:07 <ykarel> neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement(Open vSwitch agent)
15:41:37 <lajoskatona> do you have link for this one? I can check it
15:41:42 <ykarel> lajoskatona, may be you recall something for ^?
15:41:55 <ykarel> #link https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_4fb/893543/1/check/neutron-fullstack-with-uwsgi/4fba5fb/testr_results.html
15:42:03 <ykarel> #link https://c7cd1f2001ec5f5b729f-4854ed941a7816d1225c43ae9b456d0e.ssl.cf2.rackcdn.com/893143/1/check/neutron-fullstack-with-uwsgi/9137417/testr_results.html
15:42:16 <lajoskatona> I will check this one
15:42:21 <ykarel> thx lajoskatona
15:42:34 <ykarel> #action lajoskatona to check failures with fullstack test test_configurations_are_synced_towards_placement
15:42:51 <ykarel> also generic one
15:43:13 <ykarel> we recently seeing quite frequent timeouts in fullstack job https://zuul.openstack.org/builds?job_name=neutron-fullstack-with-uwsgi&project=openstack%2Fneutron&result=TIMED_OUT&skip=0
15:43:34 <ykarel> even after timeout increase to 3 hours as part of isolated db per test patch
15:44:17 <ralonsoh> we can start removing the LB tests, for example
15:44:34 <lajoskatona> for this ralonsoh opend the bug: https://bugs.launchpad.net/neutron/+bug/2033651
15:44:46 <lajoskatona> so we can track the patches under the lp
15:44:52 <ykarel> yeap +1
15:45:15 <ykarel> ok let's move to next topic
15:45:18 <ykarel> #topic Tempest/Scenario
15:45:38 <ykarel> there was an issue but already fixed with https://review.opendev.org/c/openstack/nova/+/893502
15:45:47 <ykarel> #topic Periodic
15:46:03 <ykarel> #link https://zuul.openstack.org/builds?job_name=devstack-tobiko-neutron&branch=master&skip=0
15:46:12 <ykarel> the job was running with ubuntu focal
15:46:30 <ykarel> there is already a patch to move it to jammy with https://review.opendev.org/c/x/devstack-plugin-tobiko/+/893662
15:46:48 <ykarel> #link https://zuul.openstack.org/builds?job_name=neutron-ovn-tempest-ipv6-only-ovs-master&job_name=neutron-ovn-tempest-ovs-master-centos-9-stream&project=openstack%2Fneutron&skip=0
15:47:33 <ykarel> ovs/ovn source deploy jobs with OVN_BRANCH=main are broken https://bugs.launchpad.net/neutron/+bug/2034096
15:47:46 <ykarel> #link https://review.opendev.org/c/openstack/neutron/+/893700
15:48:04 <ykarel> that's it for periodic, please review the above fixes
15:48:11 <ralonsoh> +2 (high priority one0
15:48:57 <ykarel> #topic On Demand
15:49:11 <ykarel> anything else to discuss?
15:51:49 <lajoskatona> nothing from me
15:51:57 <slaweq> nope
15:52:09 <ykarel> k thanks everyone, let's close then and have everyone few minutes back
15:52:11 <mlavalle> nothing
15:52:15 <ykarel> #endmeeting