#openstack-neutron log

15:03:48 <slaweq> #startmeeting neutron_ci
15:03:48 <opendevmeet> Meeting started Tue Dec  6 15:03:48 2022 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:03:48 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:03:48 <opendevmeet> The meeting name has been set to 'neutron_ci'
15:03:50 <mlavalle> o/
15:03:53 <slaweq> sorry for being late :)
15:03:58 <slaweq> hi everyone
15:04:03 <bcafarel> o/
15:04:21 <lajoskatona> o/
15:05:19 <slaweq> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1
15:05:31 <slaweq> #topic Actions from previous meetings
15:05:38 <mtomaska> o/
15:05:39 <slaweq> first one:
15:05:41 <slaweq> ykarel to fix timeout of the ut jobs in stable/wallaby
15:06:42 <ykarel> i pushed https://review.opendev.org/q/topic:fix-tox-job-override
15:06:59 <ykarel> found some more missings so sent all within ^
15:07:12 <bcafarel> ykarel++
15:08:40 <slaweq> thx ykarel I just approved those patches
15:09:03 <ykarel> thx
15:09:37 <slaweq> next one
15:09:43 <slaweq> ralonsoh to check issue with UT on py3.10 and neutron-lib master https://zuul.openstack.org/build/0820d9ef6a4448cea7f0937cac595ee2
15:10:08 <slaweq> ralonsoh is not here today but I'm pretty sure this is fixed now
15:10:20 <slaweq> and the last one:
15:10:22 <slaweq> mlavalle to check failing mariadb periodic job
15:10:29 <mlavalle> I did check that
15:10:51 <mlavalle> if you look at https://zuul.opendev.org/t/openstack/builds?job_name=neutron-ovn-tempest-mariadb-full&branch=master&skip=0, you will notice the failures started on Nov 19
15:11:10 <mlavalle> which is when this merged: https://review.opendev.org/c/openstack/devstack/+/860795
15:11:24 <mlavalle> so we started using Ubuntu 22.04 for that job
15:11:48 <mlavalle> in 20.04 we were using maria db 10.3
15:11:59 <mlavalle> in 22.04 we use now 10.6
15:12:26 <mlavalle> and there was abig change in authentication in version 10.4: https://mariadb.org/authentication-in-mariadb-10-4/
15:12:34 <slaweq> so does it mean that devstack is not compatible with mariadb 10.6 which is in Ubuntu 22.04?
15:13:02 <mlavalle> yes, the way we handle the creation of the root user password
15:13:15 <mlavalle> but lat night I figured out how to do it
15:13:33 <mlavalle> last night I was able to build a debstack in my develpoment envronment
15:13:43 <mlavalle> so today I will propose a fix to devstack
15:13:55 <lajoskatona> cool
15:14:03 <slaweq> thx mlavalle++
15:14:23 <mlavalle> so you can keep this action item under my name one more week
15:14:38 <slaweq> #action mlavalle to fix failing mariadb periodic job
15:14:45 <ykarel> mlavalle, is https://github.com/openstack/devstack/blob/master/lib/databases/mysql#L117-L120 related ?
15:14:50 <slaweq> I just changed "check/fix" :)
15:15:15 <mlavalle> ykarel: yes, that's excatly where the problem is
15:15:38 <mlavalle> I will tweak those lines
15:15:47 <ykarel> mlavalle, ohkk
15:15:57 * mlavalle alresdy tweaked them in my development system
15:16:17 <ykarel> i meant if those task needs to be skipped like bullseye but seems no as per your comment
15:16:42 <mlavalle> no, Ias I said, I fixed it in my dev system
15:16:52 <ykarel> ack got it
15:17:28 <mlavalle> and btw, if there is asimilar problem with bullseye, we might also fix it
15:17:41 <mlavalle> just hadn't thought of it
15:18:26 <slaweq> ok, thx mlavalle for working on this
15:18:31 <mlavalle> :-)
15:18:34 <slaweq> I think we can move on to the next topic now
15:18:37 <slaweq> #topic Stable branches
15:18:47 <slaweq> bcafarel any updates?
15:19:15 <bcafarel> not a lot coming from me thanks Yatin and Ihar for fixes backported over the last week
15:19:35 <bcafarel> apart from that no major issues :)
15:19:52 <slaweq> that's good
15:20:12 <slaweq> thx bcafarel :)
15:20:15 <slaweq> #topic Stadium projects
15:20:25 <slaweq> all jobs seems to be green, except networking-odl
15:20:28 <lajoskatona> seems ok, netowrking-odl hasa timeout
15:20:37 <slaweq> :)
15:20:59 <lajoskatona> I don't know any issues or things to keep an eye on fro stadiums
15:21:12 <lajoskatona> slightly related
15:21:25 <slaweq> lajoskatona but that networking-odl job timed out after just 32 minutes
15:21:28 <lajoskatona> perhaps you saw the mail from zigo regarding py311 failures in some projects
15:21:32 <slaweq> shouldn't we maybe change it?
15:21:48 <lajoskatona> but we are free except networking-l2gw, but that is not stadium :-)
15:22:02 <slaweq> yeah, I saw that email
15:22:03 <lajoskatona> yeah, we can increase that
15:22:17 <lajoskatona> I will check it, why it is so low
15:22:19 <slaweq> and I noticed that there's nothing related to neutron or neutron stadium
15:22:26 <slaweq> thx lajoskatona
15:23:53 <slaweq> I think we can move on
15:23:58 <slaweq> #topic Grafana
15:24:54 <slaweq> I think all is good in grafana
15:25:13 <slaweq> anything You want do discuss about it today?
15:25:23 <mlavalle> it looms good to me
15:25:26 <mlavalle> looks
15:26:02 <slaweq> next topic then
15:26:09 <slaweq> #topic Rechecks
15:26:23 <slaweq> recheck stats looks ok-ish
15:26:35 <slaweq> we have 1 recheck in average to get patches merge last week
15:26:52 <slaweq> but we had that issue with UT which ralonsoh fixed so I hope it will be better
15:27:03 <slaweq> regarding bare rechecks - it looks very good:
15:27:10 <slaweq> +---------+---------------+--------------+-------------------+... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/lPgDdvPJwQWpngbBVSAzKYST>)
15:27:26 <slaweq> all rechecks in last 7 days were made with a reason given
15:27:31 <slaweq> thx a lot for that :)
15:27:36 <lajoskatona> \o/
15:27:44 <mlavalle> +1
15:28:38 <ykarel> o/
15:28:41 <slaweq> ok, now lets talk about some failures in ci jobs
15:28:44 <slaweq> #topic fullstack/functional
15:28:52 <slaweq> neutron.tests.functional.agent.test_ovs_flows.ARPSpoofTestCase.test_arp_spoof_allowed_address_pairs_0cidr
15:28:58 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_a0f/866328/4/gate/neutron-functional-with-uwsgi/a0f0599/testr_results.html
15:29:46 <slaweq> did You saw such failure before?
15:30:06 <lajoskatona> no
15:30:07 <ykarel> no
15:30:55 <slaweq> of course there's nothing obviosly wrong in the log file :/
15:31:59 <slaweq> we can wait and see if similar issues will happen more often
15:32:09 <slaweq> next one
15:32:11 <slaweq> neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon
15:32:15 <slaweq> https://ef5d43af22af7b1c1050-17fc8f83c20e6521d7d8a3ccd8bca531.ssl.cf2.rackcdn.com/861719/6/check/neutron-functional-with-uwsgi/a488990/testr_results.html
15:32:45 <slaweq> two tests failed in very similar way
15:33:32 <slaweq> and it seems that it couldn't connect to ovn:
15:33:34 <slaweq> 2022-12-02 17:47:15.967 40517 ERROR neutron.agent.linux.utils [None req-c47c1008-664c-47f0-b279-520d6e8e5ac6 - tenid - - - -] Exit code: 1; Cmd: ['ovs-appctl', '-t', '/tmp/tmp3_reqb01/ovnsb_db.ctl', 'exit']; Stdin: ; Stdout: ; Stderr: 2022-12-02T17:47:15Z|00001|unixctl|WARN|failed to connect to /tmp/tmp3_reqb01/ovnsb_db.ctl
15:33:34 <slaweq> ovs-appctl: cannot connect to "/tmp/tmp3_reqb01/ovnsb_db.ctl" (No such file or directory)
15:33:57 <slaweq> so most likely some intermittent issue
15:34:55 <slaweq> next one
15:35:04 <slaweq> it's again dvr router lifecycle issue:
15:35:06 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_5c1/866635/1/check/neutron-functional-with-uwsgi/5c12d80/testr_results.html
15:35:21 <slaweq> lajoskatona I think You were checking something similar already
15:35:22 <slaweq> am I right?
15:35:50 <lajoskatona> yes the bug we discussed last week
15:36:47 <lajoskatona> this one https://bugs.launchpad.net/neutron/+bug/1995031 and newer one: https://bugs.launchpad.net/neutron/+bug/1998337
15:37:06 <slaweq> and did You found anything?
15:37:34 <lajoskatona> no to tell the truth I had no time to check since last week
15:37:54 <slaweq> will You be able to look into it this week maybe?
15:38:20 <lajoskatona> I try to allocate some time for it
15:38:26 <slaweq> thx a lot
15:38:47 <slaweq> #action lajoskatona to check dvr lifecycle functional tests failures
15:38:58 <slaweq> and last one
15:39:00 <slaweq> neutron.tests.functional.agent.l3.test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection - router object mismatch
15:39:05 <slaweq> https://b262e0138780f2e869e5-bb966b1b7243c04cadcba4d7bfebb8b0.ssl.cf5.rackcdn.com/865575/1/gate/neutron-functional-with-uwsgi/5fa0dbc/testr_results.html
15:39:09 <slaweq> found by ykarel
15:39:33 <ykarel> yes noticed once
15:39:46 <slaweq> it seems for me from quick look that failover of routers didn't happend there
15:39:48 <ykarel> from etherpad seems it was seen 6 months back too
15:40:46 <slaweq> but IIRC we did some improvements in those tests then
15:40:55 <slaweq> and it was fine for long time
15:41:07 <slaweq> so maybe this time it was some one time issue or we have something new broken there
15:42:05 <slaweq> lets see if we will have it more often
15:42:17 <slaweq> now fullstack tests
15:42:24 <slaweq> test_multiple_agents_for_network(Open vSwitch agent)
15:42:32 <slaweq> https://7c4060ce9e9515b5ad0d-5c947c8d22eb7769ff9d2de46bec4cc9.ssl.cf2.rackcdn.com/865994/1/check/neutron-fullstack-with-uwsgi/ebcfef6/testr_results.html
15:46:27 <slaweq> it seems for me like connectivity to the dhcp namespace was not working fine:
15:46:28 <slaweq> 2022-11-29 16:08:29.328 28264 DEBUG neutron.tests.fullstack.resources.machine [-] Stopping async dhclient [ip netns exec test-532a73a0-62e4-4220-a890-9a94566cc831 dhclient -4 -lf /tmp/tmph9mjx6t9/tmpjqjphk39/69e46a58-d5c0-495b-825c-78f428c64fd5.lease -sf /home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/bin/fullstack-dhclient-script --no-pid -d portf1ffe5]. stdout: [[]] - stderr: [['Internet Systems Consortium DHCP
15:46:28 <slaweq> Client 4.4.1', 'Copyright 2004-2018 Internet Systems Consortium.', 'All rights reserved.', 'For info, please visit https://www.isc.org/software/dhcp/', '', 'Listening on LPF/portf1ffe5/fa:16:3e:bb:71:6a', 'Sending on   LPF/portf1ffe5/fa:16:3e:bb:71:6a', 'Sending on   Socket/fallback', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 3 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 8
15:46:28 <slaweq> (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 15 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 13 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 8 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 9 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 10
15:46:28 <slaweq> (xid=0x38250956)']] _stop_async_dhclient /home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/machine.py:175
15:47:03 <slaweq> but it's hard to say why
15:47:50 <slaweq> if we will see more often that ports aren't configured through DHCP properly in fullstack tests, we will need to have closer look into this issue
15:48:01 <slaweq> next one
15:48:04 <slaweq> neutron.tests.fullstack.test_qos.TestPacketRateLimitQoSOvs
15:48:15 <slaweq> and this one happened at least twice last week:
15:48:20 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_681/865061/3/check/neutron-fullstack-with-uwsgi/6812ab5/testr_results.html
15:48:20 <slaweq> https://898e1c4e07fc90ea0741-aa41af48b127881681a990efb23ea8ce.ssl.cf1.rackcdn.com/865470/1/check/neutron-fullstack-with-uwsgi/f46a1f6/testr_results.html
15:49:07 <lajoskatona> I think this is the fix for it: https://review.opendev.org/c/openstack/neutron/+/866210
15:49:47 <slaweq> yeah, thx lajoskatona
15:50:14 <slaweq> so lets move on
15:50:15 <slaweq> #topic Tempest/Scenario
15:50:21 <slaweq> here I found one issue
15:50:36 <slaweq> where ping from one vm to another failed
15:50:37 <slaweq> https://cc4296039b95f28dde1a-22b43305544279849f41f0100b51a877.ssl.cf5.rackcdn.com/866328/4/check/neutron-tempest-plugin-linuxbridge/61e1c9a/testr_results.html
15:50:55 <slaweq> it's linuxbridge job so if it will start happening more often we can simply disable it
15:51:07 <slaweq> unless there is anybody who have some cycles and wants to check it
15:51:36 <lajoskatona> +1 for skipping/disable :-(
15:51:57 <slaweq> and that's pretty much all what I had for today
15:52:09 <slaweq> in periodic jobs we should be good once mariadb job will be fixed
15:52:33 <slaweq> anything else You want do discuss today?
15:53:09 <mlavalle> nothing from me
15:53:27 <ykarel> nothing from me too
15:53:38 <slaweq> if not, lets finish earlier today
15:53:43 <slaweq> thx for attending the meeting
15:53:49 <slaweq> and have a great week
15:53:53 <slaweq> #endmeeting