15:00:28 <slaweq> #startmeeting neutron_ci 15:00:28 <opendevmeet> Meeting started Tue Jul 18 15:00:28 2023 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:28 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:28 <opendevmeet> The meeting name has been set to 'neutron_ci' 15:00:31 <mlavalle> o/ 15:00:37 <slaweq> ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva, elvira 15:00:41 <ralonsoh> hi 15:00:42 <bcafarel> o/ 15:00:46 <slaweq> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:00:54 <mtomaska> o/ 15:01:06 <ykarel> o/ 15:01:48 <slaweq> ok, lets start 15:01:54 <slaweq> #topic Actions from previous meetings 15:02:03 <slaweq> there was only one action item on the list 15:02:07 <slaweq> ralonsoh to try reduce concurency in functional job(s) 15:02:23 <ralonsoh> done adn merged (I don't have the patch right now) 15:02:46 <ralonsoh> hold on, not this one, sorry 15:02:47 <ralonsoh> no no 15:02:52 <slaweq> thx 15:03:02 <slaweq> no need to search for the link to the patch 15:03:10 <slaweq> We trust You :P 15:03:12 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/887633 15:03:24 <ralonsoh> heheheheh (but I don't trust myself) 15:03:52 <elvira> o/ 15:04:00 <slaweq> LOL 15:04:10 <slaweq> ok, next topic then 15:04:11 <slaweq> #topic Stable branches 15:04:20 <slaweq> bcafarel any updates? 15:04:41 <bcafarel> not a lot, I was off yesterday so still need to check last backports but it looked good 15:04:55 <bcafarel> wallaby/victoria is back in shape too :) 15:05:03 <slaweq> that's good, thx 15:05:06 <ralonsoh> +1! 15:05:37 <slaweq> anything else regarding CI of the stable branches anyone? 15:06:33 <slaweq> if not, I think we can move on 15:06:48 <slaweq> I will skip today stadium projects topic as Lajos is off 15:06:52 <ralonsoh> one sec 15:07:03 <ralonsoh> https://review.opendev.org/c/openstack/networking-bagpipe/+/862505 15:07:07 <ralonsoh> https://review.opendev.org/c/openstack/networking-bgpvpn/+/888719 15:07:20 <ralonsoh> both CI are not working, in particular n-t-p 15:07:23 <slaweq> sure 15:07:35 <ralonsoh> these patches are in Victoria 15:07:53 <ralonsoh> if these jobs are not fixed, I'll propose the EOL of these branches 15:08:29 <slaweq> AttributeError: module 'neutron.common.config' has no attribute 'register_common_config_options' 15:08:32 <ralonsoh> yes 15:08:36 <ralonsoh> in both jobs 15:08:56 <slaweq> so I guess it's some mismatch in versions used there 15:09:06 <ralonsoh> could be, right 15:09:11 <slaweq> maybe we can wait 1 more week for Lajos to check it? 15:09:14 <slaweq> wdyt? 15:09:18 <ralonsoh> because that was implemented in newer versions of Neutron 15:09:22 <ralonsoh> of course, that can wait 15:09:30 <slaweq> ++ 15:09:50 <slaweq> ok, so next topic 15:09:55 <slaweq> #topic Grafana 15:10:22 <slaweq> I see that yesterday functioinal and fullstack jobs were broken completely 15:10:31 <slaweq> but it's getting to be better today 15:10:38 <slaweq> do You know what happend there maybe? 15:10:51 <ralonsoh> no idea, I'll check the logs 15:10:59 <ralonsoh> i didn't see anything special 15:11:25 <slaweq> ok, maybe just some coincidence 15:11:48 <slaweq> #topic Rechecks 15:12:06 <slaweq> in rechecks it looks a bit better last week 15:12:19 <slaweq> and we don't have "bare rechecks" almost at all recently 15:12:33 <slaweq> that's all from me regarding this topic 15:12:41 <slaweq> any questions/comments on that? 15:12:49 <ralonsoh> one note 15:13:05 <ralonsoh> don't hesitate to reply on a patch requesting someone not to do bare rechecks 15:13:23 <ralonsoh> we should teach other people to check the CI first 15:13:39 <ralonsoh> that's all 15:13:58 <mlavalle> +1 15:14:02 <ykarel> slaweq, wrt bare recheck report that also checks stable/branch patches? 15:14:03 <slaweq> ++ 15:14:26 <slaweq> ykarel no, I'm checking only master branch currently 15:14:30 <ykarel> because i had noticed some bare rechecks recently there but you said there were none 15:14:38 <ykarel> ok thanks makes sense than 15:15:28 <slaweq> ok, lets move on then 15:15:30 <slaweq> #topic fullstack/functional 15:16:18 <slaweq> I was checking some timeout in functional job today and it seems for me that it may not always be that it's just noisy neighbors or something like that 15:16:28 <slaweq> please look at https://3e402c0e76741e83fc60-d00ff4f1a74cdbc5ea9d8044145b77c0.ssl.cf2.rackcdn.com/888574/3/check/neutron-functional-with-uwsgi/3bea3d7/job-output.txt 15:16:46 <slaweq> if You look at the logs there, there is a lot of tests which failed with timouts 15:16:53 <ralonsoh> timeout in privsep 15:16:58 <slaweq> and those tests tooks 3 minutes or more 15:17:22 <slaweq> and most (or all) of them are failing due to the timeouts while interacting with netlink 15:17:35 <slaweq> like e.g. checking if device exists in namespace 15:17:47 <slaweq> or setting IP address on interface 15:17:48 <slaweq> etc. 15:18:08 <slaweq> it also looks similar in fullstack tests job in https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c52/883246/16/check/neutron-fullstack-with-uwsgi/c524e84/job-output.txt 15:18:46 <slaweq> so maybe we should spend some more time and try to understand this issue on our side 15:18:49 <slaweq> did You saw issues like that before? 15:19:41 <ralonsoh> yes but randomly 15:21:37 <slaweq> ralonsoh and do You maybe know what is causing such problem there? 15:21:45 <ralonsoh> no, sorry 15:22:02 <slaweq> do You think we should open LP for that? 15:22:15 <ralonsoh> but when you have an issue in FT related to privsep, the other test will be affected 15:22:22 <ralonsoh> I think so, at least we should track it 15:22:32 <slaweq> I will open LP for it 15:22:44 <slaweq> and if anyone have any cycles, please maybe try to take a look at this 15:23:02 <mlavalle> I'll look at this 15:23:14 <slaweq> thx mlavalle 15:23:18 <mlavalle> please assign the LP to me 15:23:37 <slaweq> #action slaweq to create LP about timeouts in functional job and assign it to mlavalle 15:23:41 <slaweq> sure, thx a lot 15:23:46 <mlavalle> will you include the ponters to the test failures in the LP? 15:24:00 <racosta> are you using generic or kvm kernel in the test images? KVM kernel has no gre module: "controller | FAILED TO LOAD nf_conntrack_proto_gre" 15:24:06 <slaweq> mlavalle yes 15:24:14 <mlavalle> Thnaks! 15:24:49 <slaweq> racosta is this question related to functional tests issue which we are talking about now? 15:25:01 <racosta> yes 15:25:21 <slaweq> I don't know what kernel is used there 15:25:30 <slaweq> we just use what infra provides us 15:26:05 <slaweq> but mlavalle maybe it's good first thing to check then :) 15:26:21 <ykarel> Linux np0034680805 5.15.0-76-generic 15:26:23 <slaweq> maybe it's different kernel in different providers and maybe that is causing some issue 15:26:24 <slaweq> idk 15:26:51 <racosta> I imagine it is kvm, in which case the load of the gre module will not work. 15:26:51 <ykarel> i think images are common across providers 15:27:17 <slaweq> thx ykarel 15:27:23 <slaweq> so it should works fine then 15:27:57 <racosta> cloud images use kvm kernel - at least the ubuntu ones I've tested 15:29:14 <ykarel> looking at the kernel seems it's not kvm one i.e Linux np0034680805 5.15.0-76-generic, iirc kvm kernels have -kvm as prefix 15:29:23 <ykarel> suffix 15:31:50 <slaweq> ok, lets move forward for now and mlavalle can check it if needed :) 15:32:14 <slaweq> regarding fullstack tests I have found one more issue 15:32:16 <slaweq> neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_keepalived_multiple_sighups_does_not_forfeit_primary 15:32:20 <slaweq> https://284d785cc67babb2d75b-a8caf4f4ad4b0a7f74dab21fc6a45bed.ssl.cf2.rackcdn.com/886992/5/check/neutron-fullstack-with-uwsgi/0bcd7be/testr_results.html 15:32:47 <slaweq> I just found it once so far but I think it's worth to check what happened there at least 15:32:54 <slaweq> any volunteer for that? 15:32:58 <ralonsoh> I'll check it 15:33:05 <slaweq> thx ralonsoh 15:33:26 <slaweq> #action ralonsoh to check failed neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_keepalived_multiple_sighups_does_not_forfeit_primary test 15:33:31 <slaweq> #topic Tempest/Scenario 15:33:38 <slaweq> slow jobs broken in releases before xena since 11th July 15:33:48 <slaweq> https://bugs.launchpad.net/neutron/+bug/2027817 15:33:52 <slaweq> I think this was added by ykarel 15:33:56 <ykarel> that's fixed with a patch in tempest 15:34:08 <ykarel> merged yesterday 15:34:12 <ralonsoh> I think we are good now 15:34:25 <slaweq> thx 15:34:56 <slaweq> from other issues, I have found one with some timeouts in the nova-compute https://7ad29d1b700c1da60ae0-1bae5319fe4594ade335a46ad1c3bcc9.ssl.cf2.rackcdn.com/867513/24/check/neutron-tempest-plugin-openvswitch-iptables_hybrid/2bee760/controller/logs/screen-n-cpu.txt 15:35:15 <slaweq> so just FYI - if You will see something similar, I think we may want to report bug for nova for this 15:35:31 <ralonsoh> but related to port bindings? 15:35:32 <slaweq> and that's all regarding scenario jobs for today from me 15:35:47 <slaweq> ralonsoh no, I think there no even port created yet 15:36:05 <slaweq> nova-compute was failing earlier in the process there IIUC nova-compute log 15:36:06 <ralonsoh> ahh I see, RPC messages 15:36:20 <opendevreview> Merged openstack/neutron stable/wallaby: [OVN] Hash Ring: Set nodes as offline upon exit https://review.opendev.org/c/openstack/neutron/+/887279 15:36:21 <slaweq> yeap 15:36:42 <slaweq> and last topic from me for today is 15:36:44 <slaweq> #topic Periodic 15:36:54 <slaweq> here I found one new issue 15:37:04 <slaweq> https://bugs.launchpad.net/neutron/+bug/2028037 15:37:24 <ralonsoh> pfffff 15:37:27 <slaweq> but ykarel found out that this is already reported in https://bugs.launchpad.net/neutron/+bug/2028003 15:37:31 <slaweq> thx ykarel 15:38:08 <ralonsoh> let me check that issue, it could be related to a specific postgree requirement 15:38:19 <ralonsoh> (as reported in ykarel bug) 15:38:24 <ralonsoh> must appear in the GROUP BY clause or be used in an aggregate function 15:38:27 <slaweq> yes, it is something specific to postgresql 15:38:29 <ykarel> ralonsoh, yes and seems to be triggered with https://review.opendev.org/q/Ic6001bd5a57493b8befdf81a41eb0bd1c8022df3 15:38:40 <slaweq> as it works fine in other jobs which are using MySQL/Mariadb 15:38:44 <ralonsoh> yeah, I was expecting that... 15:39:02 <ykarel> and the same job is impacted in stable branches 15:39:19 <ralonsoh> this is again the ironic job 15:39:31 <ralonsoh> we had another issue with postgree and ironic 2 weeks ago 15:39:55 <slaweq> one is ironic job and other is our periodic scenario job 15:40:02 <slaweq> both found that issue 15:40:54 <slaweq> ralonsoh will You take a look at this? Or do You want me to check it? 15:41:00 <ralonsoh> I'll do 15:41:04 <slaweq> thx 15:41:16 <slaweq> #action ralonsoh to check failing postgresql job 15:41:27 <slaweq> and that's all from me for today 15:41:38 <slaweq> do You have any other topics to discuss? 15:41:49 <slaweq> or if not, I will give You back about 18 minutes today 15:42:14 <ralonsoh> fine for me 15:42:32 <ykarel> o/ 15:42:48 <slaweq> ok, so thx for attending the meeting 15:42:53 <slaweq> #endmeeting