15:03:48 #startmeeting neutron_ci 15:03:48 Meeting started Tue Dec 6 15:03:48 2022 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:48 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:03:48 The meeting name has been set to 'neutron_ci' 15:03:50 o/ 15:03:53 sorry for being late :) 15:03:58 hi everyone 15:04:03 o/ 15:04:21 o/ 15:05:19 Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:05:31 #topic Actions from previous meetings 15:05:38 o/ 15:05:39 first one: 15:05:41 ykarel to fix timeout of the ut jobs in stable/wallaby 15:06:42 i pushed https://review.opendev.org/q/topic:fix-tox-job-override 15:06:59 found some more missings so sent all within ^ 15:07:12 ykarel++ 15:08:40 thx ykarel I just approved those patches 15:09:03 thx 15:09:37 next one 15:09:43 ralonsoh to check issue with UT on py3.10 and neutron-lib master https://zuul.openstack.org/build/0820d9ef6a4448cea7f0937cac595ee2 15:10:08 ralonsoh is not here today but I'm pretty sure this is fixed now 15:10:20 and the last one: 15:10:22 mlavalle to check failing mariadb periodic job 15:10:29 I did check that 15:10:51 if you look at https://zuul.opendev.org/t/openstack/builds?job_name=neutron-ovn-tempest-mariadb-full&branch=master&skip=0, you will notice the failures started on Nov 19 15:11:10 which is when this merged: https://review.opendev.org/c/openstack/devstack/+/860795 15:11:24 so we started using Ubuntu 22.04 for that job 15:11:48 in 20.04 we were using maria db 10.3 15:11:59 in 22.04 we use now 10.6 15:12:26 and there was abig change in authentication in version 10.4: https://mariadb.org/authentication-in-mariadb-10-4/ 15:12:34 so does it mean that devstack is not compatible with mariadb 10.6 which is in Ubuntu 22.04? 15:13:02 yes, the way we handle the creation of the root user password 15:13:15 but lat night I figured out how to do it 15:13:33 last night I was able to build a debstack in my develpoment envronment 15:13:43 so today I will propose a fix to devstack 15:13:55 cool 15:14:03 thx mlavalle++ 15:14:23 so you can keep this action item under my name one more week 15:14:38 #action mlavalle to fix failing mariadb periodic job 15:14:45 mlavalle, is https://github.com/openstack/devstack/blob/master/lib/databases/mysql#L117-L120 related ? 15:14:50 I just changed "check/fix" :) 15:15:15 ykarel: yes, that's excatly where the problem is 15:15:38 I will tweak those lines 15:15:47 mlavalle, ohkk 15:15:57 * mlavalle alresdy tweaked them in my development system 15:16:17 i meant if those task needs to be skipped like bullseye but seems no as per your comment 15:16:42 no, Ias I said, I fixed it in my dev system 15:16:52 ack got it 15:17:28 and btw, if there is asimilar problem with bullseye, we might also fix it 15:17:41 just hadn't thought of it 15:18:26 ok, thx mlavalle for working on this 15:18:31 :-) 15:18:34 I think we can move on to the next topic now 15:18:37 #topic Stable branches 15:18:47 bcafarel any updates? 15:19:15 not a lot coming from me thanks Yatin and Ihar for fixes backported over the last week 15:19:35 apart from that no major issues :) 15:19:52 that's good 15:20:12 thx bcafarel :) 15:20:15 #topic Stadium projects 15:20:25 all jobs seems to be green, except networking-odl 15:20:28 seems ok, netowrking-odl hasa timeout 15:20:37 :) 15:20:59 I don't know any issues or things to keep an eye on fro stadiums 15:21:12 slightly related 15:21:25 lajoskatona but that networking-odl job timed out after just 32 minutes 15:21:28 perhaps you saw the mail from zigo regarding py311 failures in some projects 15:21:32 shouldn't we maybe change it? 15:21:48 but we are free except networking-l2gw, but that is not stadium :-) 15:22:02 yeah, I saw that email 15:22:03 yeah, we can increase that 15:22:17 I will check it, why it is so low 15:22:19 and I noticed that there's nothing related to neutron or neutron stadium 15:22:26 thx lajoskatona 15:23:53 I think we can move on 15:23:58 #topic Grafana 15:24:54 I think all is good in grafana 15:25:13 anything You want do discuss about it today? 15:25:23 it looms good to me 15:25:26 looks 15:26:02 next topic then 15:26:09 #topic Rechecks 15:26:23 recheck stats looks ok-ish 15:26:35 we have 1 recheck in average to get patches merge last week 15:26:52 but we had that issue with UT which ralonsoh fixed so I hope it will be better 15:27:03 regarding bare rechecks - it looks very good: 15:27:10 +---------+---------------+--------------+-------------------+... (full message at ) 15:27:26 all rechecks in last 7 days were made with a reason given 15:27:31 thx a lot for that :) 15:27:36 \o/ 15:27:44 +1 15:28:38 o/ 15:28:41 ok, now lets talk about some failures in ci jobs 15:28:44 #topic fullstack/functional 15:28:52 neutron.tests.functional.agent.test_ovs_flows.ARPSpoofTestCase.test_arp_spoof_allowed_address_pairs_0cidr 15:28:58 https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_a0f/866328/4/gate/neutron-functional-with-uwsgi/a0f0599/testr_results.html 15:29:46 did You saw such failure before? 15:30:06 no 15:30:07 no 15:30:55 of course there's nothing obviosly wrong in the log file :/ 15:31:59 we can wait and see if similar issues will happen more often 15:32:09 next one 15:32:11 neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon 15:32:15 https://ef5d43af22af7b1c1050-17fc8f83c20e6521d7d8a3ccd8bca531.ssl.cf2.rackcdn.com/861719/6/check/neutron-functional-with-uwsgi/a488990/testr_results.html 15:32:45 two tests failed in very similar way 15:33:32 and it seems that it couldn't connect to ovn: 15:33:34 2022-12-02 17:47:15.967 40517 ERROR neutron.agent.linux.utils [None req-c47c1008-664c-47f0-b279-520d6e8e5ac6 - tenid - - - -] Exit code: 1; Cmd: ['ovs-appctl', '-t', '/tmp/tmp3_reqb01/ovnsb_db.ctl', 'exit']; Stdin: ; Stdout: ; Stderr: 2022-12-02T17:47:15Z|00001|unixctl|WARN|failed to connect to /tmp/tmp3_reqb01/ovnsb_db.ctl 15:33:34 ovs-appctl: cannot connect to "/tmp/tmp3_reqb01/ovnsb_db.ctl" (No such file or directory) 15:33:57 so most likely some intermittent issue 15:34:55 next one 15:35:04 it's again dvr router lifecycle issue: 15:35:06 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_5c1/866635/1/check/neutron-functional-with-uwsgi/5c12d80/testr_results.html 15:35:21 lajoskatona I think You were checking something similar already 15:35:22 am I right? 15:35:50 yes the bug we discussed last week 15:36:47 this one https://bugs.launchpad.net/neutron/+bug/1995031 and newer one: https://bugs.launchpad.net/neutron/+bug/1998337 15:37:06 and did You found anything? 15:37:34 no to tell the truth I had no time to check since last week 15:37:54 will You be able to look into it this week maybe? 15:38:20 I try to allocate some time for it 15:38:26 thx a lot 15:38:47 #action lajoskatona to check dvr lifecycle functional tests failures 15:38:58 and last one 15:39:00 neutron.tests.functional.agent.l3.test_ha_router.L3HATestFailover.test_ha_router_lost_gw_connection - router object mismatch 15:39:05 https://b262e0138780f2e869e5-bb966b1b7243c04cadcba4d7bfebb8b0.ssl.cf5.rackcdn.com/865575/1/gate/neutron-functional-with-uwsgi/5fa0dbc/testr_results.html 15:39:09 found by ykarel 15:39:33 yes noticed once 15:39:46 it seems for me from quick look that failover of routers didn't happend there 15:39:48 from etherpad seems it was seen 6 months back too 15:40:46 but IIRC we did some improvements in those tests then 15:40:55 and it was fine for long time 15:41:07 so maybe this time it was some one time issue or we have something new broken there 15:42:05 lets see if we will have it more often 15:42:17 now fullstack tests 15:42:24 test_multiple_agents_for_network(Open vSwitch agent) 15:42:32 https://7c4060ce9e9515b5ad0d-5c947c8d22eb7769ff9d2de46bec4cc9.ssl.cf2.rackcdn.com/865994/1/check/neutron-fullstack-with-uwsgi/ebcfef6/testr_results.html 15:46:27 it seems for me like connectivity to the dhcp namespace was not working fine: 15:46:28 2022-11-29 16:08:29.328 28264 DEBUG neutron.tests.fullstack.resources.machine [-] Stopping async dhclient [ip netns exec test-532a73a0-62e4-4220-a890-9a94566cc831 dhclient -4 -lf /tmp/tmph9mjx6t9/tmpjqjphk39/69e46a58-d5c0-495b-825c-78f428c64fd5.lease -sf /home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/bin/fullstack-dhclient-script --no-pid -d portf1ffe5]. stdout: [[]] - stderr: [['Internet Systems Consortium DHCP 15:46:28 Client 4.4.1', 'Copyright 2004-2018 Internet Systems Consortium.', 'All rights reserved.', 'For info, please visit https://www.isc.org/software/dhcp/', '', 'Listening on LPF/portf1ffe5/fa:16:3e:bb:71:6a', 'Sending on LPF/portf1ffe5/fa:16:3e:bb:71:6a', 'Sending on Socket/fallback', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 3 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 8 15:46:28 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 15 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 13 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 8 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 9 (xid=0x38250956)', 'DHCPDISCOVER on portf1ffe5 to 255.255.255.255 port 67 interval 10 15:46:28 (xid=0x38250956)']] _stop_async_dhclient /home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/machine.py:175 15:47:03 but it's hard to say why 15:47:50 if we will see more often that ports aren't configured through DHCP properly in fullstack tests, we will need to have closer look into this issue 15:48:01 next one 15:48:04 neutron.tests.fullstack.test_qos.TestPacketRateLimitQoSOvs 15:48:15 and this one happened at least twice last week: 15:48:20 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_681/865061/3/check/neutron-fullstack-with-uwsgi/6812ab5/testr_results.html 15:48:20 https://898e1c4e07fc90ea0741-aa41af48b127881681a990efb23ea8ce.ssl.cf1.rackcdn.com/865470/1/check/neutron-fullstack-with-uwsgi/f46a1f6/testr_results.html 15:49:07 I think this is the fix for it: https://review.opendev.org/c/openstack/neutron/+/866210 15:49:47 yeah, thx lajoskatona 15:50:14 so lets move on 15:50:15 #topic Tempest/Scenario 15:50:21 here I found one issue 15:50:36 where ping from one vm to another failed 15:50:37 https://cc4296039b95f28dde1a-22b43305544279849f41f0100b51a877.ssl.cf5.rackcdn.com/866328/4/check/neutron-tempest-plugin-linuxbridge/61e1c9a/testr_results.html 15:50:55 it's linuxbridge job so if it will start happening more often we can simply disable it 15:51:07 unless there is anybody who have some cycles and wants to check it 15:51:36 +1 for skipping/disable :-( 15:51:57 and that's pretty much all what I had for today 15:52:09 in periodic jobs we should be good once mariadb job will be fixed 15:52:33 anything else You want do discuss today? 15:53:09 nothing from me 15:53:27 nothing from me too 15:53:38 if not, lets finish earlier today 15:53:43 thx for attending the meeting 15:53:49 and have a great week 15:53:53 #endmeeting