15:01:04 #startmeeting neutron_ci 15:01:04 Meeting started Tue Sep 5 15:01:04 2023 UTC and is due to finish in 60 minutes. The chair is ykarel. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:04 The meeting name has been set to 'neutron_ci' 15:01:09 hello 15:01:12 o/ 15:01:12 ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva, elvira 15:01:19 Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:01:20 o/ 15:02:02 o/ 15:02:46 k let's start with the topic as some of the folks are on PTO 15:02:48 #topic Actions from previous meetings 15:02:58 lajoskatona to check failures for bagpipe and check use of master neutron in stadium projects 15:03:32 I still strugle with bagpipe 15:03:40 it is only with sqlalchemy2 15:04:16 yes that's what i recall 15:04:27 In the meantime I also found that the bagpipe tempest job is running with focal, so I changed the nodeset to jammy: https://review.opendev.org/c/openstack/networking-bagpipe/+/893709 15:04:34 looking at periodic i saw some issue in other job, but that can checked in later section 15:04:57 yeap i noticed that failure too, thx for fixing it 15:05:11 other was https://zuul.openstack.org/builds?job_name=networking-bagpipe-openstack-tox-py310-with-sqlalchemy-main&project=openstack/networking-bagpipe 15:05:42 the issue is only with the sfc driver of bagpipe, so quite complex for me at least to understand the root cause 15:06:13 yes that is the sqlalchemy2 issue 15:06:29 ralonsoh: yes i have allowed all ICMP and SSH for testing. 15:06:31 ralonsoh: what is the timeline for the sqlalchemy2 introduction? 15:06:39 o/ 15:06:45 sorry I was on different meeting 15:06:53 lajoskatona, do we have bug already for this issue? 15:06:55 it will be introduced, most probably, at the begining of C 15:07:00 ok 15:07:07 there is a requirements patch proposed already 15:07:40 Continuity_, we can talk after the meeting 15:07:42 ralonsoh: not for bagpipe, I used as reference the big sqlalchemy2 bug 15:08:09 I can open a bagpipe bug to track it 15:08:16 yeah, much better 15:08:20 +1 15:08:21 ack 15:08:26 to be honest, I don't know what is failing in this CI 15:09:09 let's check this offline 15:09:13 #topic Stable branches 15:09:40 Bernard is out this week, but stable branches looks good considering patches merge 15:10:17 i din't noticed any consistent failure on stable branches 15:10:35 anything to add for stable branches? 15:12:00 sounds all good then, moving to next topic 15:12:03 #topic Stadium projects 15:12:25 lajoskatona, anything else apart from that sqlalchemy and focal issues for stadium projects? 15:12:27 except bagpipe things seems to be green 15:12:56 k good 15:13:01 #link https://zuul.openstack.org/builds?job_name=networking-bagpipe-openstack-tox-py310-with-sqlalchemy-main&project=openstack/networking-bagpipe 15:13:12 #link https://zuul.openstack.org/builds?job_name=networking-bagpipe-tempest&project=openstack%2Fnetworking-bagpipe&branch=master&skip=0 15:13:25 #topic Grafana 15:13:31 https://grafana.opendev.org/d/f913631585/neutron-failure-rate 15:13:48 let's give a minute to it if we observe anything abnormal there 15:14:31 i see a spike at tempest job, but that was a known issue already fixed 15:14:50 today morning there was some "spike" but on all jobs 15:14:51 so it's probably not an issue on our side really 15:15:02 neutron-ovn-tempest-ipv6-only-ovs-release had a 66% of failures 15:15:34 yes i noticed there was quite of failures today morning but most of them were related to series of patches pushed together for l3-ovn iirc 15:15:43 right, perfect then 15:16:12 yes, I see the same spikes in other jobs 15:18:25 yeap, let's move to next section, will keep monitoring it if something new comes in 15:18:35 #topic Rechecks 15:19:01 it was better last week 15:19:30 there were some known issues last week which might have resulted into those rechecks 15:19:56 bare rechecks were also not much 3/17, so good 15:20:03 let's keep avoiding bare rechecks 15:20:16 #topic Unit tests 15:20:30 what's 3/17? 15:20:40 3 out of 17 15:20:42 17 total rechecks, 3 out of them were bare 15:20:49 yeap 15:20:49 ahhh! 15:20:52 LOL 15:21:30 #info There was issue with unit test job running with sqlalchemy/alembic main branches 15:21:42 It's already fixed with https://review.opendev.org/c/openstack/neutron/+/893602 15:21:52 +1 15:22:00 #topic fullstack/functional 15:22:36 neutron.tests.functional.services.trunk.drivers.ovn.test_trunk_driver.TestOVNTrunkDriver.test_subport_delete 15:22:45 AttributeError: 'NoneType' object has no attribute 'status' 15:23:01 Seen twice in a master/wallaby and failure looks related to the patch itself already merged 15:23:21 so some race in the test as not happening always 15:23:33 #link https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b18/892890/2/gate/neutron-functional-with-uwsgi/b18c02c/testr_results.html 15:24:04 #link https://e872331dabdf974ff450-5a66e2fcfa24aae6b75c2058251d7e58.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-uwsgi-fips/c431c66/testr_results.html 15:25:00 https://review.opendev.org/#/q/I2370ea2f96e2e31dbd43bf232a63394388e6945f 15:25:07 I'll ping Arnau to check these errors, look related to ^ 15:25:24 i see this being reverted, so likely the failures would also go away 15:25:33 ah no 15:25:40 this could be another issue 15:25:50 in any case, we are reverting the patch you mentioned 15:25:57 so please wait until the next week 15:26:11 k +1, will keep an eye 15:26:20 if it still happens will open a bug 15:26:36 next one is neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon.test_read_queue_change_state 15:26:42 AssertionError: Text not found in file /tmp/tmpuh6gesvz/tmp50ei9qfp/log_file: "Initial status of router". 15:26:49 https://157ba513c840b85e5d0e-e65fbda5c4a8fc14eb81d398bd7b0a80.ssl.cf5.rackcdn.com/892896/1/gate/neutron-functional-with-uwsgi/0acdecd/testr_results.html 15:27:57 this is something recurrent, the monitor doesn't start in some tests 15:28:16 k so it's already a known issue? 15:28:17 I would need to investigate how to make these tests more stable 15:28:29 yes, that has been happening for years 15:28:33 not very often 15:28:44 ok i see 3 failures in last 15 days across branches 15:29:18 k thanks, will keep an eye on this and if it's start happening more frequently will open a bug for it 15:29:34 sure 15:30:02 neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter.test_router_gateway_port_binding_host_id 15:30:11 Timeout exception with self.mech_driver.nb_ovn.ovsdb_connection.stop() 15:30:20 #link https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_2a0/892897/1/gate/neutron-functional-with-uwsgi/2a03caa/testr_results.html 15:31:22 any idea for this? 15:31:54 no, just a random timeout during the cleanup phase 15:31:59 seen 4 hits as per opensearch across stable/zed and yoga 15:32:07 in the same test? 15:34:11 no different test also hit it, like test_gateway_chassis_rebalance_max_chassis 15:34:41 but that's also in same class neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter 15:36:08 let me open a LP bug for this one. We can try, maybe, checking if the ovsdb_connection is still open during the cleanup 15:36:23 k thanks 15:36:28 if the connection is not active, then we don't need to stop it 15:36:41 #action ralonsoh to open bug for Timeout exception with self.mech_driver.nb_ovn.ovsdb_connection.stop() 15:37:27 neutron.tests.fullstack.test_connectivity.TestUninterruptedConnectivityOnL2AgentRestart.test_l2_agent_restart(LB,VLANs) 15:37:36 #link https://2df199e43476e3c732e7-3130556d487e5cec46a1ba3d1eaa7fda.ssl.cf5.rackcdn.com/892890/2/gate/neutron-fullstack-with-uwsgi/209726b/testr_results.html 15:37:45 #link https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_da8/892869/3/check/neutron-fullstack-with-uwsgi/da88e54/testr_results.html 15:38:24 anyone recall this failure, from meeting history i see it was seen in past too 15:38:56 no, sorry 15:39:20 this is related to LB so we can simply mark this test as unstable or simply skip it if it's not stable 15:39:40 as LB is marked as "experimental" since some time 15:39:44 +1 15:39:47 I can propose patch for that 15:40:26 k thanks, i think i saw it in stable branches 15:40:46 #action slaweq to check failures with fullstack test test_l2_agent_restart 15:40:51 thx slaweq 15:41:07 neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement(Open vSwitch agent) 15:41:37 do you have link for this one? I can check it 15:41:42 lajoskatona, may be you recall something for ^? 15:41:55 #link https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_4fb/893543/1/check/neutron-fullstack-with-uwsgi/4fba5fb/testr_results.html 15:42:03 #link https://c7cd1f2001ec5f5b729f-4854ed941a7816d1225c43ae9b456d0e.ssl.cf2.rackcdn.com/893143/1/check/neutron-fullstack-with-uwsgi/9137417/testr_results.html 15:42:16 I will check this one 15:42:21 thx lajoskatona 15:42:34 #action lajoskatona to check failures with fullstack test test_configurations_are_synced_towards_placement 15:42:51 also generic one 15:43:13 we recently seeing quite frequent timeouts in fullstack job https://zuul.openstack.org/builds?job_name=neutron-fullstack-with-uwsgi&project=openstack%2Fneutron&result=TIMED_OUT&skip=0 15:43:34 even after timeout increase to 3 hours as part of isolated db per test patch 15:44:17 we can start removing the LB tests, for example 15:44:34 for this ralonsoh opend the bug: https://bugs.launchpad.net/neutron/+bug/2033651 15:44:46 so we can track the patches under the lp 15:44:52 yeap +1 15:45:15 ok let's move to next topic 15:45:18 #topic Tempest/Scenario 15:45:38 there was an issue but already fixed with https://review.opendev.org/c/openstack/nova/+/893502 15:45:47 #topic Periodic 15:46:03 #link https://zuul.openstack.org/builds?job_name=devstack-tobiko-neutron&branch=master&skip=0 15:46:12 the job was running with ubuntu focal 15:46:30 there is already a patch to move it to jammy with https://review.opendev.org/c/x/devstack-plugin-tobiko/+/893662 15:46:48 #link https://zuul.openstack.org/builds?job_name=neutron-ovn-tempest-ipv6-only-ovs-master&job_name=neutron-ovn-tempest-ovs-master-centos-9-stream&project=openstack%2Fneutron&skip=0 15:47:33 ovs/ovn source deploy jobs with OVN_BRANCH=main are broken https://bugs.launchpad.net/neutron/+bug/2034096 15:47:46 #link https://review.opendev.org/c/openstack/neutron/+/893700 15:48:04 that's it for periodic, please review the above fixes 15:48:11 +2 (high priority one0 15:48:57 #topic On Demand 15:49:11 anything else to discuss? 15:51:49 nothing from me 15:51:57 nope 15:52:09 k thanks everyone, let's close then and have everyone few minutes back 15:52:11 nothing 15:52:15 #endmeeting