15:00:34 #startmeeting neutron_ci 15:00:34 Meeting started Tue Nov 28 15:00:34 2023 UTC and is due to finish in 60 minutes. The chair is ykarel. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:34 The meeting name has been set to 'neutron_ci' 15:00:42 ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva, elvira 15:00:48 Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:00:48 Please open now :) 15:01:05 o/ 15:01:15 o/ 15:01:16 I barely had time to go make a coffee :) 15:01:18 o/ 15:01:21 \o 15:01:33 o/ 15:02:06 o/ 15:02:10 k let's start with the topics 15:02:15 #topic Actions from previous meetings 15:02:22 bhaley to refresh grafana dashboard for neutron-lib/neutron-tempest-plugin in project-config 15:02:51 i have the neutron-lib one almost done, just working on the periodic section 15:03:00 short week last week 15:03:13 thx not a urgent one so should be fine 15:03:23 #topic Stable branches 15:03:38 bcafarel, anything to share for stable? 15:04:06 I just finished catching up on my reviews, all looks good on CI status on active branches 15:04:22 k that's good 15:04:34 #topic Stadium projects 15:04:58 periodic lines are green here, and the ui issue related to job link also fixed 15:05:05 projects are green, so no big problem 15:05:41 k Thanks 15:05:50 #topic Rechecks 15:06:01 my usual request: if you have few mins pleasecchek the stadium reviews also :-) 15:06:48 we are seeing some random failures across jobs and that leads to recheck on patches this week 15:07:09 there were 2 bare rechecks too, we should try to avoid that 15:07:39 Let's check the failures 15:07:40 #topic fullstack/functional 15:07:49 test_dvr_router_lifecycle_without_ha_without_snat_with_fips 15:07:57 AssertionError: Device name: fg-863e72ab-79, expected MAC: fa:16:3e:80:8d:89 15:08:10 #link https://ef6fea84da73ed48af61-f850a1c88b63080f1e34c13fe4924008.ssl.cf2.rackcdn.com/901476/2/gate/neutron-functional-with-uwsgi/782e426/testr_results.html 15:09:01 anyone recall this issue? 15:09:29 I don't 15:09:42 i have seen this before but it's not a frequent one 15:09:53 vaguely but don't remember the context 15:10:32 may be https://bugs.launchpad.net/neutron/+bug/2000164 15:11:51 Co-Authored-By: Brian Haley 15:12:36 To confirm that you can check ovs logs to see if port was created, deleted and created again quickly 15:12:44 ykarel: was this failure in all branches? 15:12:56 seen once as of now in master 15:13:20 Merged openstack/ovn-bgp-agent stable/2023.2: Ensure withdrawn events are only processed in relevant nodes https://review.opendev.org/c/openstack/ovn-bgp-agent/+/902065 15:15:36 slaweq, ack will check the logs and confirm if it's same 15:15:57 #action ykarel to check if functional failure is same as bug 2000164 15:16:16 Next one 15:16:17 test_port_binding_chassis_create_event 15:16:24 ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Port_Binding with logical_port=1bbcd693-2b3b-4e72-8143-5a67359d56b2 15:16:30 https://3b2ee5c5f48b31d3cd88-77332c10a4b1568b93e2e5229e79a685.ssl.cf5.rackcdn.com/901507/2/gate/neutron-functional-with-uwsgi/ffae97d/testr_results.html 15:16:40 seen this also one 15:17:27 for this too need to dig logs, any volunteer to check this? 15:17:41 I can do it 15:17:46 thx mlavalle 15:17:55 #action mlavalle to check failure with test_port_binding_chassis_create_event 15:18:24 For fullstack we still hitting TIMEOUTs sometimes even after dropping linuxbridge tests 15:18:43 so need to check those, but for that we already have a known issue 15:19:31 #topic Tempest/Scenario 15:20:01 we have a infra specific problem where Sometimes job failing randomly with Unable to establish SSL connection to cloud-images.ubuntu.com 15:20:25 you may have noticed it already 15:21:01 yup 15:21:11 yesterday there was some zuul issue also without logs 15:21:47 yeap zuul bit is fixed now, for the cloud-images if it's some known bit or not 15:22:14 me not sure, let's see if we still see this frequently, need to reach out to infra for this 15:22:44 also in some ovs jobs we seeing failures like 15:22:44 https://ca6bf2baf6d89f5ed910-37ccdaa5e24fdd183c62eb6fdd1f8a71.ssl.cf5.rackcdn.com/901827/1/gate/neutron-tempest-plugin-openvswitch-enforce-scope-old-defaults/3e27967/testr_results.html 15:22:45 https://4cd02d83e42f664c3d38-4289c2655d9b20a523cb41ee99eef7c9.ssl.cf5.rackcdn.com/901474/3/check/neutron-tempest-plugin-openvswitch-enforce-scope-old-defaults/fb28885/testr_results.html 15:22:55 where test fails to ssh guest vm 15:23:33 in these examples, guest vms don't have ip assigned by dhcp 15:24:40 the 2 examples test_qos is it happens for other tests also? 15:25:00 i recall rodolfo was checking something similar before his PTOs 15:25:24 i think it's not specific to that test/job 15:25:39 i saw some more occurances in opensearch 15:26:01 from quick look I see many ERRORs logged in the q-dhcp log file: https://4cd02d83e42f664c3d38-4289c2655d9b20a523cb41ee99eef7c9.ssl.cf5.rackcdn.com/901474/3/check/neutron-tempest-plugin-openvswitch-enforce-scope-old-defaults/fb28885/controller/logs/screen-q-dhcp.txt 15:26:02 looking for : probing for an IPv4LL address 15:26:19 maybe one of those errors is related to the failed test 15:27:44 in one of those i noticed there were no dhcprequest/dhcpack messages, and continuos dhcpoffer 15:28:17 may be dhcpoffer not reaching vms for some reason? 15:30:40 and in one i saw dhcp agent took time to configure the port(~ 1minute) and in that time client side configured ipv4ll address 15:32:05 iirc in older cirros versions where udhcpc was used it used to try thrice every 60 second before giving up but that doesn't seems to be the case with recent cirros/dhcpcd version 15:32:17 anyway we need to look into these failures 15:32:24 any volunteer to log a bug for this? 15:33:04 ok i will do log a bug for this and then we will see 15:33:30 #action ykarel to log a bug for dhcp/metadata issue in ovs jobs 15:33:32 +1 15:33:46 #topic Periodic 15:33:55 master line is all good 15:34:01 for stable we have some broken jobs 15:34:16 #link https://zuul.openstack.org/builds?job_name=neutron-tempest-mariadb-full&project=openstack%2Fneutron&branch=stable%2Fussuri&skip=0 15:34:25 This Job runs on bionic and setup mariadb 10.3 repos which are no longer available 15:34:31 Merged openstack/ovn-bgp-agent stable/2023.2: Avoid race when deleting VM with FIP https://review.opendev.org/c/openstack/ovn-bgp-agent/+/902067 15:35:09 wdyt should we just drop that job from ussuri as its already extended maintenance? 15:35:30 ++ 15:35:37 +1 15:35:42 +1 15:35:43 +1, and ussuri is also headed for EOL no? 15:35:55 so cleaning the job earlier will not hurt 15:36:43 ok any volunteer to push the patch to drop? 15:37:15 I already ahve the zuul file open in vim :) 15:37:22 ++ thx 15:37:40 #action bcafarel to drop neutron-tempest-mariadb-full in ussuri 15:37:50 #link https://zuul.openstack.org/builds?job_name=neutron-linuxbridge-tempest-plugin-scenario-nftables&job_name=neutron-ovs-tempest-plugin-scenario-iptables_hybrid-nftables&project=openstack%2Fneutron&branch=stable%2Fxena&skip=0 15:38:25 these failing with RETRY_LIMIT and no logs to check the reason so need to be digged 15:38:42 and these are for xena 15:39:18 any volunteer to check these? 15:40:34 k i will check these and report bug 15:40:49 #action ykarel to check issue with nftables xena periodic job 15:40:58 #link https://zuul.openstack.org/builds?job_name=neutron-ovs-tempest-fips&job_name=neutron-ovn-tempest-ovs-release-fips&project=openstack%2Fneutron&branch=stable%2Fyoga&skip=0 15:41:18 this is specific to 9 stream fips jobs 15:41:29 in stable/yoga 15:42:05 in yoga we have pyroute 0.6.6 and due to it's hitting https://bugs.launchpad.net/tripleo/+bug/2023764 15:43:17 so this means pyroute req update on yoga? 15:43:45 yes if we want to get that fixed or drop the jobs to run in yoga 15:45:54 wdyt? should we just drop these two jobs 15:46:23 bumping pyroute2 in stable might be trickier 15:46:29 agree 15:46:57 drop jobs IMO 15:47:07 Bernard Cafarelli proposed openstack/neutron stable/ussuri: [stable-only] Drop neutron-tempest-mariadb-full periodic job https://review.opendev.org/c/openstack/neutron/+/902090 15:47:08 perhaps the thing should be documented somewhere to help operators, bu tlet's drop the jobs 15:47:31 Ok, in RDO side it's already patched 15:47:44 any volunteer to drop these jobs in yoga? 15:49:02 k i will send the patch to drop these 15:49:15 thanks 15:49:18 #action ykarel to send patch to drop ovs/ovn fips job in stable/yoga 15:49:33 Last one 15:49:37 #link https://zuul.openstack.org/builds?job_name=devstack-tobiko-neutron&project=openstack%2Fneutron&branch=stable%2Fxena&branch=stable%2Fyoga&branch=stable%2Fzed&skip=0 15:49:56 This job is Running on focal and running tests which shouldn't on focal as per https://review.opendev.org/c/openstack/neutron/+/871982 15:50:48 and this is included in antelope+ so for older branches need to skip those tests in some other way 15:51:24 slaweq, eolivare may be you know ^ 15:52:34 k i will report a bug for this and then will see 15:52:45 do we have something like tempest_exclude_regex for tobiko? 15:52:50 #action ykarel to report bug for tobiko stable jobs 15:52:54 me no idea 15:53:08 yes let's wait for tobiko experts here :) 15:53:25 bcafarel no, there is nothing like that 15:53:35 k last topic 15:53:39 #topic Grafana 15:53:41 https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:53:51 I will check that tobiko issue 15:53:57 Thanks slaweq 15:54:04 let's have a quick look on grafana too 15:55:19 it looks overall good to me, there are some failures in check but those i think are patch specific 15:56:07 or related to recent zuul issues 15:56:17 yeap 15:56:32 k let's move to on demand 15:56:33 #topic On Demand 15:56:41 anyone wants to bring something here? 15:57:32 nothing from me 15:58:31 nothing from me either 15:58:51 K Thanks everyone, it's stretched longer today :) 15:58:56 #endmeeting