15:00:28 #startmeeting neutron_ci 15:00:28 Meeting started Tue Jul 18 15:00:28 2023 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:28 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:28 The meeting name has been set to 'neutron_ci' 15:00:31 o/ 15:00:37 ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva, elvira 15:00:41 hi 15:00:42 o/ 15:00:46 Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:00:54 o/ 15:01:06 o/ 15:01:48 ok, lets start 15:01:54 #topic Actions from previous meetings 15:02:03 there was only one action item on the list 15:02:07 ralonsoh to try reduce concurency in functional job(s) 15:02:23 done adn merged (I don't have the patch right now) 15:02:46 hold on, not this one, sorry 15:02:47 no no 15:02:52 thx 15:03:02 no need to search for the link to the patch 15:03:10 We trust You :P 15:03:12 https://review.opendev.org/c/openstack/neutron/+/887633 15:03:24 heheheheh (but I don't trust myself) 15:03:52 o/ 15:04:00 LOL 15:04:10 ok, next topic then 15:04:11 #topic Stable branches 15:04:20 bcafarel any updates? 15:04:41 not a lot, I was off yesterday so still need to check last backports but it looked good 15:04:55 wallaby/victoria is back in shape too :) 15:05:03 that's good, thx 15:05:06 +1! 15:05:37 anything else regarding CI of the stable branches anyone? 15:06:33 if not, I think we can move on 15:06:48 I will skip today stadium projects topic as Lajos is off 15:06:52 one sec 15:07:03 https://review.opendev.org/c/openstack/networking-bagpipe/+/862505 15:07:07 https://review.opendev.org/c/openstack/networking-bgpvpn/+/888719 15:07:20 both CI are not working, in particular n-t-p 15:07:23 sure 15:07:35 these patches are in Victoria 15:07:53 if these jobs are not fixed, I'll propose the EOL of these branches 15:08:29 AttributeError: module 'neutron.common.config' has no attribute 'register_common_config_options' 15:08:32 yes 15:08:36 in both jobs 15:08:56 so I guess it's some mismatch in versions used there 15:09:06 could be, right 15:09:11 maybe we can wait 1 more week for Lajos to check it? 15:09:14 wdyt? 15:09:18 because that was implemented in newer versions of Neutron 15:09:22 of course, that can wait 15:09:30 ++ 15:09:50 ok, so next topic 15:09:55 #topic Grafana 15:10:22 I see that yesterday functioinal and fullstack jobs were broken completely 15:10:31 but it's getting to be better today 15:10:38 do You know what happend there maybe? 15:10:51 no idea, I'll check the logs 15:10:59 i didn't see anything special 15:11:25 ok, maybe just some coincidence 15:11:48 #topic Rechecks 15:12:06 in rechecks it looks a bit better last week 15:12:19 and we don't have "bare rechecks" almost at all recently 15:12:33 that's all from me regarding this topic 15:12:41 any questions/comments on that? 15:12:49 one note 15:13:05 don't hesitate to reply on a patch requesting someone not to do bare rechecks 15:13:23 we should teach other people to check the CI first 15:13:39 that's all 15:13:58 +1 15:14:02 slaweq, wrt bare recheck report that also checks stable/branch patches? 15:14:03 ++ 15:14:26 ykarel no, I'm checking only master branch currently 15:14:30 because i had noticed some bare rechecks recently there but you said there were none 15:14:38 ok thanks makes sense than 15:15:28 ok, lets move on then 15:15:30 #topic fullstack/functional 15:16:18 I was checking some timeout in functional job today and it seems for me that it may not always be that it's just noisy neighbors or something like that 15:16:28 please look at https://3e402c0e76741e83fc60-d00ff4f1a74cdbc5ea9d8044145b77c0.ssl.cf2.rackcdn.com/888574/3/check/neutron-functional-with-uwsgi/3bea3d7/job-output.txt 15:16:46 if You look at the logs there, there is a lot of tests which failed with timouts 15:16:53 timeout in privsep 15:16:58 and those tests tooks 3 minutes or more 15:17:22 and most (or all) of them are failing due to the timeouts while interacting with netlink 15:17:35 like e.g. checking if device exists in namespace 15:17:47 or setting IP address on interface 15:17:48 etc. 15:18:08 it also looks similar in fullstack tests job in https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c52/883246/16/check/neutron-fullstack-with-uwsgi/c524e84/job-output.txt 15:18:46 so maybe we should spend some more time and try to understand this issue on our side 15:18:49 did You saw issues like that before? 15:19:41 yes but randomly 15:21:37 ralonsoh and do You maybe know what is causing such problem there? 15:21:45 no, sorry 15:22:02 do You think we should open LP for that? 15:22:15 but when you have an issue in FT related to privsep, the other test will be affected 15:22:22 I think so, at least we should track it 15:22:32 I will open LP for it 15:22:44 and if anyone have any cycles, please maybe try to take a look at this 15:23:02 I'll look at this 15:23:14 thx mlavalle 15:23:18 please assign the LP to me 15:23:37 #action slaweq to create LP about timeouts in functional job and assign it to mlavalle 15:23:41 sure, thx a lot 15:23:46 will you include the ponters to the test failures in the LP? 15:24:00 are you using generic or kvm kernel in the test images? KVM kernel has no gre module: "controller | FAILED TO LOAD nf_conntrack_proto_gre" 15:24:06 mlavalle yes 15:24:14 Thnaks! 15:24:49 racosta is this question related to functional tests issue which we are talking about now? 15:25:01 yes 15:25:21 I don't know what kernel is used there 15:25:30 we just use what infra provides us 15:26:05 but mlavalle maybe it's good first thing to check then :) 15:26:21 Linux np0034680805 5.15.0-76-generic 15:26:23 maybe it's different kernel in different providers and maybe that is causing some issue 15:26:24 idk 15:26:51 I imagine it is kvm, in which case the load of the gre module will not work. 15:26:51 i think images are common across providers 15:27:17 thx ykarel 15:27:23 so it should works fine then 15:27:57 cloud images use kvm kernel - at least the ubuntu ones I've tested 15:29:14 looking at the kernel seems it's not kvm one i.e Linux np0034680805 5.15.0-76-generic, iirc kvm kernels have -kvm as prefix 15:29:23 suffix 15:31:50 ok, lets move forward for now and mlavalle can check it if needed :) 15:32:14 regarding fullstack tests I have found one more issue 15:32:16 neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_keepalived_multiple_sighups_does_not_forfeit_primary 15:32:20 https://284d785cc67babb2d75b-a8caf4f4ad4b0a7f74dab21fc6a45bed.ssl.cf2.rackcdn.com/886992/5/check/neutron-fullstack-with-uwsgi/0bcd7be/testr_results.html 15:32:47 I just found it once so far but I think it's worth to check what happened there at least 15:32:54 any volunteer for that? 15:32:58 I'll check it 15:33:05 thx ralonsoh 15:33:26 #action ralonsoh to check failed neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_keepalived_multiple_sighups_does_not_forfeit_primary test 15:33:31 #topic Tempest/Scenario 15:33:38 slow jobs broken in releases before xena since 11th July 15:33:48 https://bugs.launchpad.net/neutron/+bug/2027817 15:33:52 I think this was added by ykarel 15:33:56 that's fixed with a patch in tempest 15:34:08 merged yesterday 15:34:12 I think we are good now 15:34:25 thx 15:34:56 from other issues, I have found one with some timeouts in the nova-compute https://7ad29d1b700c1da60ae0-1bae5319fe4594ade335a46ad1c3bcc9.ssl.cf2.rackcdn.com/867513/24/check/neutron-tempest-plugin-openvswitch-iptables_hybrid/2bee760/controller/logs/screen-n-cpu.txt 15:35:15 so just FYI - if You will see something similar, I think we may want to report bug for nova for this 15:35:31 but related to port bindings? 15:35:32 and that's all regarding scenario jobs for today from me 15:35:47 ralonsoh no, I think there no even port created yet 15:36:05 nova-compute was failing earlier in the process there IIUC nova-compute log 15:36:06 ahh I see, RPC messages 15:36:20 Merged openstack/neutron stable/wallaby: [OVN] Hash Ring: Set nodes as offline upon exit https://review.opendev.org/c/openstack/neutron/+/887279 15:36:21 yeap 15:36:42 and last topic from me for today is 15:36:44 #topic Periodic 15:36:54 here I found one new issue 15:37:04 https://bugs.launchpad.net/neutron/+bug/2028037 15:37:24 pfffff 15:37:27 but ykarel found out that this is already reported in https://bugs.launchpad.net/neutron/+bug/2028003 15:37:31 thx ykarel 15:38:08 let me check that issue, it could be related to a specific postgree requirement 15:38:19 (as reported in ykarel bug) 15:38:24 must appear in the GROUP BY clause or be used in an aggregate function 15:38:27 yes, it is something specific to postgresql 15:38:29 ralonsoh, yes and seems to be triggered with https://review.opendev.org/q/Ic6001bd5a57493b8befdf81a41eb0bd1c8022df3 15:38:40 as it works fine in other jobs which are using MySQL/Mariadb 15:38:44 yeah, I was expecting that... 15:39:02 and the same job is impacted in stable branches 15:39:19 this is again the ironic job 15:39:31 we had another issue with postgree and ironic 2 weeks ago 15:39:55 one is ironic job and other is our periodic scenario job 15:40:02 both found that issue 15:40:54 ralonsoh will You take a look at this? Or do You want me to check it? 15:41:00 I'll do 15:41:04 thx 15:41:16 #action ralonsoh to check failing postgresql job 15:41:27 and that's all from me for today 15:41:38 do You have any other topics to discuss? 15:41:49 or if not, I will give You back about 18 minutes today 15:42:14 fine for me 15:42:32 o/ 15:42:48 ok, so thx for attending the meeting 15:42:53 #endmeeting