15:00:12 <slaweq> #startmeeting neutron_ci
15:00:12 <opendevmeet> Meeting started Tue Jan 31 15:00:12 2023 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:12 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:12 <opendevmeet> The meeting name has been set to 'neutron_ci'
15:00:16 <slaweq> ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva
15:00:35 <mlavalle> o/
15:00:42 <bcafarel> o/
15:00:55 <slaweq> o/
15:01:03 <slaweq> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1
15:01:10 <mlavalle> thanks
15:01:10 <mtomaska> o/
15:01:26 * mlavalle opening grafana
15:01:29 <ralonsoh> hi
15:02:39 <slaweq> ok, lets start
15:02:41 <slaweq> #topic Actions from previous meetings
15:03:05 <slaweq> first one is on lajoskatona but he's not here today
15:03:12 <slaweq> so I will assign it to him for next week to not forget
15:03:18 <slaweq> #action lajoskatona  to check fullstack failure neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement(Open vSwitch agent)
15:03:30 <slaweq> next one
15:03:35 <slaweq> ralonsoh to check     neutron.tests.functional.agent.test_ovs_flows.ARPSpoofTestCase.test_arp_spoof_doesnt_block_ipv6  https://404fa55dc27f44e0606d-9f131354b122204fb24a7b43973ed8e6.ssl.cf5.rackcdn.com/860639/22/check/neutron-functional-with-uwsgi/6a57e16/testr_results.html
15:03:44 <ralonsoh> yes
15:03:45 <ralonsoh> #link https://review.opendev.org/c/openstack/neutron/+/871101
15:03:56 <ralonsoh> this is not solving the issue but improving the logs
15:04:22 <ralonsoh> (well, adding a retry on the ping)
15:04:32 <slaweq> ok
15:04:42 <slaweq> I didin't saw the same issue this week
15:05:31 <slaweq> next one is also on lajoskatona so I will assign it for next week
15:05:36 <slaweq> #action     lajoskatona to check in https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ae9/870024/4/check/neutron-functional-with-uwsgi/ae93cab/testr_results.html if there are available additional logs there
15:05:45 <slaweq> next one
15:05:51 <slaweq> mlavalle to check failed     neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_ha_router - https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_dd2/867769/11/gate/neutron-fullstack-with-uwsgi/dd2aa74/testr_results.html
15:06:04 <mlavalle> I've been checking https://zuul.opendev.org/t/openstack/builds?job_name=neutron-fullstack-with-uwsgi&branch=master&skip=0
15:06:10 <mlavalle> for the past two weeks
15:06:39 <mlavalle> The only instance of that test case failing was the one that slaweq reported here
15:06:48 <mlavalle> so I think it was a one of
15:06:54 <slaweq> ok, so lets forget about it for now
15:06:58 <mlavalle> yeap
15:07:05 <slaweq> also fullstack jobs seems to be pretty stable recently
15:07:13 <slaweq> thx mlavalle for taking care of it
15:07:21 <slaweq> next one
15:07:25 <slaweq> ralonsoh to check new occurence of the https://bugs.launchpad.net/neutron/+bug/1940425
15:07:51 <ralonsoh> I commented on https://bugs.launchpad.net/neutron/+bug/1940425/comments/22
15:08:23 <ralonsoh> in this case, Nova didn't update the port binding
15:09:01 <slaweq> ok, should we change status of this bug in Nova?
15:09:03 <slaweq> now it's set as "Invalid"
15:09:08 <slaweq> maybe it should be reopened on Nova's side now?
15:09:42 <ralonsoh> if that happens again, yes
15:09:54 <slaweq> ok
15:10:04 <slaweq> amd the last one
15:10:05 <slaweq> slaweq to check tempest-slow jobs failures
15:10:18 <slaweq> patches https://review.opendev.org/q/topic:bug%252F2003063 are all merged
15:10:23 <slaweq> and those jobs are fixed now
15:10:33 <slaweq> it was indeed due to update of Cirros image
15:10:42 <slaweq> so some patches to tempest were required
15:10:44 <opendevreview> Merged openstack/neutron stable/yoga: Use common wait_until_ha_router_has_state method everywhere  https://review.opendev.org/c/openstack/neutron/+/872008
15:11:02 <slaweq> with that I think we can move on to the next topic
15:11:04 <slaweq> #topic Stable branches
15:11:11 <slaweq> bcafarel any updates?
15:12:06 <bcafarel> overall it looks good from my checks today, yatin backported some functional fixes (and marking a test unstable) wallaby looked fine
15:12:37 <bcafarel> ussuri is broken from comment in https://review.opendev.org/c/openstack/neutron/+/871989 - grenade gate not passing from train paramiko issue
15:13:03 <slaweq> ussuri is EM already, right?
15:14:15 <bcafarel> yep this could be reduced (major changes to test from train to ussuri should be quite rare now!)
15:14:58 <bcafarel> I will check with Yatin (he is off today) if train is easily fixable for that - this issue may break a few jobs there
15:15:11 <slaweq> ++
15:15:12 <slaweq> thx
15:15:41 <slaweq> #action bcafarel to check if we should maybe drop broken grenade jobs from Ussuri branch
15:16:02 <slaweq> anything else or can we move on?
15:16:09 <bcafarel> all good for the rest :)
15:16:12 <slaweq> thx
15:16:15 <slaweq> so lets move on
15:16:27 <slaweq> I will skip stadium projects are lajoskatona is not here
15:16:37 <slaweq> but all periodic jobs are green this week
15:16:43 <slaweq> #topic Grafana
15:16:51 <slaweq> #link https://grafana.opendev.org/d/f913631585/neutron-failure-rate
15:17:17 <slaweq> all looks good for me in grafana
15:17:51 <slaweq> there is raise of the UT failures today, but I saw couple of patches where UT were failing due to proposed change
15:17:58 <slaweq> so nothing really critical for us here
15:18:24 <mlavalle> ++
15:18:30 <slaweq> any other comments related to grafana?
15:18:40 <mlavalle> none from me
15:19:20 <slaweq> ok, so lets move on
15:19:21 <slaweq> #topic Rechecks
15:19:34 <slaweq> with rechecks we are back below 1 in average :)
15:19:48 <mlavalle> ++
15:20:32 <slaweq> +---------+----------+... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/uvudTgUGYRiOhUyICBCziORU>)
15:20:35 <slaweq> so it's better \o/
15:20:46 <slaweq> actually, when I checked patches merged in last 7 days, all of them were merged without any failed build in last PS:
15:21:09 <slaweq> list is in the agenda document https://etherpad.opendev.org/p/neutron-ci-meetings#L74
15:21:21 <slaweq> regarding bare rechecks it's also good:
15:21:27 <slaweq> +---------+---------------+--------------+-------------------+... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/lEkkpDYqRefnrIyvOpseycic>)
15:21:43 <slaweq> just 1 out of 23 rechecks was "bare" in last 7 days
15:21:51 <bcafarel> good numbers :)
15:21:51 <slaweq> so thx a lot for that
15:21:58 <slaweq> bcafarel indeed :)
15:22:36 <slaweq> any questions/comments?
15:22:49 <slaweq> or are we moving on to the next topic(s)?
15:24:00 <slaweq> ok, so lets move on
15:24:06 <slaweq> #topic Unit tests
15:24:22 <slaweq> there is one issue for today
15:24:31 <slaweq> sqlite3.OperationalError: no such table: (seen once in tox-cover job and many occurrences in arm64 unit test jobs)
15:24:41 <slaweq> https://6203638836ab77538319-23561effe696d4d724b67bcb38a7b69d.ssl.cf5.rackcdn.com/871991/1/check/openstack-tox-cover/f9e8ceb/testr_results.html
15:25:23 <slaweq> I think that "no such table" is just effect of some issue, not a root cause of the failure
15:26:08 <ralonsoh> is that always happening?
15:26:21 <slaweq> it wasn't added by me to the agenda
15:26:27 <bcafarel> and quite a few tables missing
15:26:35 <slaweq> but according to the note there, it happens often in arm64 jobs
15:28:09 <slaweq> there is no many logs in the UT jobs :/
15:28:22 <bcafarel> the tox-cover one was https://review.opendev.org/c/openstack/neutron/+/871991/ I see Yatin commenting on "recheck unit test sqlite failure"
15:29:31 <slaweq> ralonsoh can it be that sqlite3 was crashed/killed by oom/something like that during tests?
15:29:53 <ralonsoh> I don't think so in this case
15:29:58 <slaweq> maybe we should add journal log to the UT job logs?
15:29:59 <ralonsoh> the error is happening during the cleanup
15:30:21 <ralonsoh> so maybe (maybe), we have already deleted the table or the DB
15:30:32 <slaweq> cleanup?
15:30:51 <ralonsoh> check any error log in https://6203638836ab77538319-23561effe696d4d724b67bcb38a7b69d.ssl.cf5.rackcdn.com/871991/1/check/openstack-tox-cover/f9e8ceb/testr_results.html
15:30:55 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/.tox/shared/lib/python3.8/site-packages/fixtures/fixture.py", line 125, in cleanUp
15:31:01 <ralonsoh> this is the first line
15:31:45 <slaweq> but it's not like that in all tests
15:31:49 <ralonsoh> yeah
15:31:52 <slaweq> I see it in setUp e.g. in neutron.tests.unit.services.trunk.test_rules.TrunkPortValidatorTestCase.test_can_be_trunked_or_untrunked_unbound_port test
15:32:25 <ralonsoh> and this is during the initialization...
15:32:51 <slaweq> but there is "traceback-2" in that test result and this is in cleanUp
15:33:12 <slaweq> so it seems that it fails in setUp and then it also causes issue in cleanUp phase
15:33:20 <ralonsoh> yes
15:33:20 <slaweq> but that's just result of previous issue
15:34:51 <slaweq> ok, I think that we need to:
15:34:52 <slaweq> 1. report bug for this
15:35:08 <slaweq> 2. start collecting journal log in UT jobs
15:35:12 <slaweq> *2
15:35:29 <slaweq> and then maybe we will be able to know something more what happened there
15:35:31 <slaweq> wdyt?
15:35:34 <ralonsoh> agree
15:35:51 <bcafarel> +1
15:36:00 <slaweq> anyone wants to do that?
15:36:43 <ralonsoh> I'll do it
15:36:57 <slaweq> thx ralonsoh
15:37:27 <slaweq> #action ralonsoh to report UT issue with missing tables in db and propose patch to collect journal log in UT jobs
15:37:51 <slaweq> next topic
15:37:56 <slaweq> #topic fullstack/functional
15:39:09 <slaweq> here I just have issues from previous week as this week I didn't found any new issues
15:39:30 <slaweq> so I think we can skip them, especially as this week functional and fullstack jobs are pretty stable
15:39:52 <slaweq> there is one issue mentioned by ykarel probably:
15:39:53 <slaweq> test_arp_correct_protection.neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase (seen once in victoria)
15:39:53 <slaweq> https://7b7e23438828cd69d6b5-4d3d00830dc84883c962194d8b2d6bed.ssl.cf2.rackcdn.com/871988/1/check/neutron-functional-with-uwsgi/0272bfd/testr_results.html
15:42:28 <slaweq> I wouldn't spend much time on it as it's related to Linuxbridge agent
15:42:51 <slaweq> and it was seen just once so far
15:43:03 <slaweq> thoughts?
15:43:20 <mlavalle> yes, let's niot waste time on it
15:43:21 <ralonsoh> agree with no spending time on this one
15:44:01 <slaweq> ok, next topic then
15:44:03 <slaweq> #topic Tempest/Scenario
15:44:11 <slaweq> here I wanted to ask about one issue
15:44:20 <slaweq> which I saw few times IIRC
15:44:34 <slaweq> devstack failed with error that there was no tenant network for allocation: https://zuul.opendev.org/t/openstack/build/b75030f968944508a3c3bbe6b4851584
15:44:42 <slaweq> did You saw such issue already?
15:45:29 <mlavalle> The last time I built a devstack in my local system was 2 days ago
15:45:33 <mlavalle> it worked fine
15:46:06 <slaweq> ok, I think I know now
15:46:28 <slaweq> there are errors related to the DB connection in neutron log in that job https://b84531a976ef476331e1-baa8eceea3205baf832239057c78a658.ssl.cf1.rackcdn.com/869196/11/check/neutron-ovs-grenade-dvr-multinode/b75030f/controller/logs/screen-q-svc.txt
15:46:44 <slaweq> and it's during creation of the subnetpool
15:47:05 <bcafarel> you were faster than me :) yep there's oslo_db.exception.DBConnectionError just before that 503
15:47:06 <slaweq> so maybe as subnetpool wasn't created, later network couldn't be created too as there wasn't any pool to use
15:47:29 <slaweq> so, it's not neutron issue then
15:48:25 <slaweq> with that we got to the last topic for today
15:48:27 <slaweq> #topic Periodic
15:48:33 <slaweq> generally all looks good there
15:48:49 <slaweq> but I saw that job neutron-ovn-tripleo-ci-centos-9-containers-multinode is failing pretty often
15:49:09 <slaweq> last two times which it failed and which I checked failed on the undercloud deploy
15:49:14 <slaweq> https://zuul.openstack.org/build/772a941b25bd4dbbb66ddbd6544a3b63
15:49:46 <slaweq> I don't think it's neutron issue really but maybe someone will have some cycles to dig deeper into this and maybe try to find the root cause of it?
15:50:21 <ralonsoh> or wait for tripleo folks
15:50:50 <slaweq> ralonsoh yep, or ping them :)
15:51:51 <slaweq> ok, I will try to ask some tripleo folks for help with this
15:52:10 <slaweq> #action slaweq to check with tripleo experts failing neutron-ovn-tripleo-ci-centos-9-containers-multinode job
15:52:22 <slaweq> and with that we got to the end of agenda for today
15:52:32 <slaweq> anything else You want to discuss today?
15:52:43 <mlavalle> nothing from me
15:52:48 <ralonsoh> nothing, thanks
15:52:51 <bcafarel> nothing either
15:53:02 <slaweq> if nothing, then lets get back few minutes
15:53:10 <slaweq> thx for attending the meeting and have a great week
15:53:12 <slaweq> o/
15:53:15 <slaweq> #endmeeting