15:03:48 <slaweq> #startmeeting neutron_ci 15:03:48 <opendevmeet> Meeting started Tue Dec 6 15:03:48 2022 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:48 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:03:48 <opendevmeet> The meeting name has been set to 'neutron_ci' 15:03:50 <mlavalle> o/ 15:03:53 <slaweq> sorry for being late :) 15:03:58 <slaweq> hi everyone 15:04:03 <bcafarel> o/ 15:04:21 <lajoskatona> o/ 15:05:19 <slaweq> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:05:31 <slaweq> #topic Actions from previous meetings 15:05:38 <mtomaska> o/ 15:05:39 <slaweq> first one: 15:05:41 <slaweq> ykarel to fix timeout of the ut jobs in stable/wallaby 15:06:42 <ykarel> i pushed https://review.opendev.org/q/topic:fix-tox-job-override 15:06:59 <ykarel> found some more missings so sent all within ^ 15:07:12 <bcafarel> ykarel++ 15:08:40 <slaweq> thx ykarel I just approved those patches 15:09:03 <ykarel> thx 15:09:37 <slaweq> next one 15:09:43 <slaweq> ralonsoh to check issue with UT on py3.10 and neutron-lib master https://zuul.openstack.org/build/0820d9ef6a4448cea7f0937cac595ee2 15:10:08 <slaweq> ralonsoh is not here today but I'm pretty sure this is fixed now 15:10:20 <slaweq> and the last one: 15:10:22 <slaweq> mlavalle to check failing mariadb periodic job 15:10:29 <mlavalle> I did check that 15:10:51 <mlavalle> if you look at https://zuul.opendev.org/t/openstack/builds?job_name=neutron-ovn-tempest-mariadb-full&branch=master&skip=0, you will notice the failures started on Nov 19 15:11:10 <mlavalle> which is when this merged: https://review.opendev.org/c/openstack/devstack/+/860795 15:11:24 <mlavalle> so we started using Ubuntu 22.04 for that job 15:11:48 <mlavalle> in 20.04 we were using maria db 10.3 15:11:59 <mlavalle> in 22.04 we use now 10.6 15:12:26 <mlavalle> and there was abig change in authentication in version 10.4: https://mariadb.org/authentication-in-mariadb-10-4/ 15:12:34 <slaweq> so does it mean that devstack is not compatible with mariadb 10.6 which is in Ubuntu 22.04? 15:13:02 <mlavalle> yes, the way we handle the creation of the root user password 15:13:15 <mlavalle> but lat night I figured out how to do it 15:13:33 <mlavalle> last night I was able to build a debstack in my develpoment envronment 15:13:43 <mlavalle> so today I will propose a fix to devstack 15:13:55 <lajoskatona> cool 15:14:03 <slaweq> thx mlavalle++ 15:14:23 <mlavalle> so you can keep this action item under my name one more week 15:14:38 <slaweq> #action mlavalle to fix failing mariadb periodic job 15:14:45 <ykarel> mlavalle, is https://github.com/openstack/devstack/blob/master/lib/databases/mysql#L117-L120 related ? 15:14:50 <slaweq> I just changed "check/fix" :) 15:15:15 <mlavalle> ykarel: yes, that's excatly where the problem is 15:15:38 <mlavalle> I will tweak those lines 15:15:47 <ykarel> mlavalle, ohkk 15:15:57 * mlavalle alresdy tweaked them in my development system 15:16:17 <ykarel> i meant if those task needs to be skipped like bullseye but seems no as per your comment 15:16:42 <mlavalle> no, Ias I said, I fixed it in my dev system 15:16:52 <ykarel> ack got it 15:17:28 <mlavalle> and btw, if there is asimilar problem with bullseye, we might also fix it 15:17:41 <mlavalle> just hadn't thought of it 15:18:26 <slaweq> ok, thx mlavalle for working on this 15:18:31 <mlavalle> :-) 15:18:34 <slaweq> I think we can move on to the next topic now 15:18:37 <slaweq> #topic Stable branches 15:18:47 <slaweq> bcafarel any updates? 15:19:15 <bcafarel> not a lot coming from me thanks Yatin and Ihar for fixes backported over the last week 15:19:35 <bcafarel> apart from that no major issues :) 15:19:52 <slaweq> that's good 15:20:12 <slaweq> thx bcafarel :) 15:20:15 <slaweq> #topic Stadium projects 15:20:25 <slaweq> all jobs seems to be green, except networking-odl 15:20:28 <lajoskatona> seems ok, netowrking-odl hasa timeout 15:20:37 <slaweq> :) 15:20:59 <lajoskatona> I don't know any issues or things to keep an eye on fro stadiums 15:21:12 <lajoskatona> slightly related 15:21:25 <slaweq> lajoskatona but that networking-odl job timed out after just 32 minutes 15:21:28 <lajoskatona> perhaps you saw the mail from zigo regarding py311 failures in some projects 15:21:32 <slaweq> shouldn't we maybe change it? 15:21:48 <lajoskatona> but we are free except networking-l2gw, but that is not stadium :-) 15:22:02 <slaweq> yeah, I saw that email 15:22:03 <lajoskatona> yeah, we can increase that 15:22:17 <lajoskatona> I will check it, why it is so low 15:22:19 <slaweq> and I noticed that there's nothing related to neutron or neutron stadium 15:22:26 <slaweq> thx lajoskatona 15:23:53 <slaweq> I think we can move on 15:23:58 <slaweq> #topic Grafana 15:24:54 <slaweq> I think all is good in grafana 15:25:13 <slaweq> anything You want do discuss about it today? 15:25:23 <mlavalle> it looms good to me 15:25:26 <mlavalle> looks 15:26:02 <slaweq> next topic then 15:26:09 <slaweq> #topic Rechecks 15:26:23 <slaweq> recheck stats looks ok-ish 15:26:35 <slaweq> we have 1 recheck in average to get patches merge last week 15:26:52 <slaweq> but we had that issue with UT which ralonsoh fixed so I hope it will be better 15:27:03 <slaweq> regarding bare rechecks - it looks very good: 15:27:10 <slaweq> +---------+---------------+--------------+-------------------+... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/lPgDdvPJwQWpngbBVSAzKYST>) 15:27:26 <slaweq> all rechecks in last 7 days were made with a reason given 15:27:31 <slaweq> thx a lot for that :) 15:27:36 <lajoskatona> \o/ 15:27:44 <mlavalle> +1 15:28:38 <ykarel> o/ 15:28:41 <slaweq> ok, now lets talk about some failures in ci jobs 15:28:44 <slaweq> #topic fullstack/functional 15:28:52 <slaweq> neutron.tests.functional.agent.test_ovs_flows.ARPSpoofTestCase.test_arp_spoof_allowed_address_pairs_0cidr 15:28:58 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_a0f/866328/4/gate/neutron-functional-with-uwsgi/a0f0599/testr_results.html 15:29:46 <slaweq> did You saw such failure before? 15:30:06 <lajoskatona> no 15:30:07 <ykarel> no 15:30:55 <slaweq> of course there's nothing obviosly wrong in the log file :/ 15:31:59 <slaweq> we can wait and see if similar issues will happen more often 15:32:09 <slaweq> next one 15:32:11 <slaweq> neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon 15:32:15 <slaweq> https://ef5d43af22af7b1c1050-17fc8f83c20e6521d7d8a3ccd8bca531.ssl.cf2.rackcdn.com/861719/6/check/neutron-functional-with-uwsgi/a488990/testr_results.html 15:32:45 <slaweq> two tests failed in very similar way 15:33:32 <slaweq> and it seems that it couldn't connect to ovn: 15:33:34 <slaweq> 2022-12-02 17:47:15.967 40517 ERROR neutron.agent.linux.utils [None req-c47c1008-664c-47f0-b279-520d6e8e5ac6 - tenid - - - -] Exit code: 1; Cmd: ['ovs-appctl', '-t', '/tmp/tmp3_reqb01/ovnsb_db.ctl', 'exit']; Stdin: ; Stdout: ; Stderr: 2022-12-02T17:47:15Z|00001|unixctl|WARN|failed to connect to /tmp/tmp3_reqb01/ovnsb_db.ctl 15:33:34 <slaweq> ovs-appctl: cannot connect to "/tmp/tmp3_reqb01/ovnsb_db.ctl" (No such file or directory) 15:33:57 <slaweq> so most likely some intermittent issue 15:34:55 <slaweq> next one 15:35:04 <slaweq> it's again dvr router lifecycle issue: 15:35:06 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_5c1/866635/1/check/neutron-functional-with-uwsgi/5c12d80/testr_results.html 15:35:21 <slaweq> lajoskatona I think You were checking something similar already 15:35:22 <slaweq> am I right? 15:35:50 <lajoskatona> yes the bug we discussed last week 15:36:47 <lajoskatona> this one https://bugs.launchpad.net/neutron/+bug/1995031 and newer one: https://bugs.launchpad.net/neutron/+bug/1998337 15:37:06 <slaweq> and did You found anything? 15:37:34 <lajoskatona> no to tell the truth I had no time to check since last week 15:37:54 <slaweq> will You be able to look into it this week maybe? 15:38:20 <lajoskatona> I try to allocate some time for it 15:38:26 <slaweq> thx a lot 15:38:47 <slaweq> #action lajoskatona to check dvr lifecycle functional tests failures 15:38:58 <slaweq> and last one 15:39:00 <slaweq> neutron.tests.functional.agent.l3.test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection - router object mismatch 15:39:05 <slaweq> https://b262e0138780f2e869e5-bb966b1b7243c04cadcba4d7bfebb8b0.ssl.cf5.rackcdn.com/865575/1/gate/neutron-functional-with-uwsgi/5fa0dbc/testr_results.html 15:39:09 <slaweq> found by ykarel 15:39:33 <ykarel> yes noticed once 15:39:46 <slaweq> it seems for me from quick look that failover of routers didn't happend there 15:39:48 <ykarel> from etherpad seems it was seen 6 months back too 15:40:46 <slaweq> but IIRC we did some improvements in those tests then 15:40:55 <slaweq> and it was fine for long time 15:41:07 <slaweq> so maybe this time it was some one time issue or we have something new broken there 15:42:05 <slaweq> lets see if we will have it more often 15:42:17 <slaweq> now fullstack tests 15:42:24 <slaweq> test_multiple_agents_for_network(Open vSwitch agent) 15:42:32 <slaweq> https://7c4060ce9e9515b5ad0d-5c947c8d22eb7769ff9d2de46bec4cc9.ssl.cf2.rackcdn.com/865994/1/check/neutron-fullstack-with-uwsgi/ebcfef6/testr_results.html 15:46:27 <slaweq> it seems for me like connectivity to the dhcp namespace was not working fine: 15:46:28 <slaweq> 2022-11-29 16:08:29.328 28264 DEBUG neutron.tests.fullstack.resources.machine [-] Stopping async dhclient [ip netns exec test-532a73a0-62e4-4220-a890-9a94566cc831 dhclient -4 -lf /tmp/tmph9mjx6t9/tmpjqjphk39/69e46a58-d5c0-495b-825c-78f428c64fd5.lease -sf /home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/bin/fullstack-dhclient-script --no-pid -d portf1ffe5]. stdout: [[]] - stderr: [['Internet Systems Consortium DHCP 15:46:28 <slaweq> Client 4.4.1', 'Copyright 2004-2018 Internet Systems Consortium.', 'All rights reserved.', 'For info, please visit https://www.isc.org/software/dhcp/', '', 'Listening on LPF/portf1ffe5/fa:16:3e:bb:71:6a', 'Sending on LPF/portf1ffe5/fa:16:3e:bb:71:6a', 'Sending on Socket/fallback', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 3 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 8 15:46:28 <slaweq> (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 15 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 13 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 8 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 9 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 10 15:46:28 <slaweq> (xid=0x38250956)']] _stop_async_dhclient /home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/machine.py:175 15:47:03 <slaweq> but it's hard to say why 15:47:50 <slaweq> if we will see more often that ports aren't configured through DHCP properly in fullstack tests, we will need to have closer look into this issue 15:48:01 <slaweq> next one 15:48:04 <slaweq> neutron.tests.fullstack.test_qos.TestPacketRateLimitQoSOvs 15:48:15 <slaweq> and this one happened at least twice last week: 15:48:20 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_681/865061/3/check/neutron-fullstack-with-uwsgi/6812ab5/testr_results.html 15:48:20 <slaweq> https://898e1c4e07fc90ea0741-aa41af48b127881681a990efb23ea8ce.ssl.cf1.rackcdn.com/865470/1/check/neutron-fullstack-with-uwsgi/f46a1f6/testr_results.html 15:49:07 <lajoskatona> I think this is the fix for it: https://review.opendev.org/c/openstack/neutron/+/866210 15:49:47 <slaweq> yeah, thx lajoskatona 15:50:14 <slaweq> so lets move on 15:50:15 <slaweq> #topic Tempest/Scenario 15:50:21 <slaweq> here I found one issue 15:50:36 <slaweq> where ping from one vm to another failed 15:50:37 <slaweq> https://cc4296039b95f28dde1a-22b43305544279849f41f0100b51a877.ssl.cf5.rackcdn.com/866328/4/check/neutron-tempest-plugin-linuxbridge/61e1c9a/testr_results.html 15:50:55 <slaweq> it's linuxbridge job so if it will start happening more often we can simply disable it 15:51:07 <slaweq> unless there is anybody who have some cycles and wants to check it 15:51:36 <lajoskatona> +1 for skipping/disable :-( 15:51:57 <slaweq> and that's pretty much all what I had for today 15:52:09 <slaweq> in periodic jobs we should be good once mariadb job will be fixed 15:52:33 <slaweq> anything else You want do discuss today? 15:53:09 <mlavalle> nothing from me 15:53:27 <ykarel> nothing from me too 15:53:38 <slaweq> if not, lets finish earlier today 15:53:43 <slaweq> thx for attending the meeting 15:53:49 <slaweq> and have a great week 15:53:53 <slaweq> #endmeeting