15:00:10 <slaweq> #startmeeting neutron_ci 15:00:10 <opendevmeet> Meeting started Tue May 23 15:00:10 2023 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:10 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:10 <opendevmeet> The meeting name has been set to 'neutron_ci' 15:00:20 <ralonsoh> hi 15:00:21 <slaweq> ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva, elvira 15:00:29 <lajoskatona> o/ 15:00:33 <ykarel> o/ 15:00:35 <bcafarel> o/ 15:00:46 <lajoskatona> isn't it video this week? 15:01:03 <slaweq> lajoskatona nope 15:01:06 <lajoskatona> ok 15:01:12 <ralonsoh> (I was wrong) 15:01:19 <slaweq> last week there wasn't meeting 15:01:32 <slaweq> and two weeks ago it was on video 15:01:41 <slaweq> so this week it's on irc :) 15:02:08 <mlavalle> LOL, I've waiting in the video meeting 15:02:44 <slaweq> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:03:06 <slaweq> #topic Actions from previous meetings 15:03:12 <slaweq> lajoskatona to check with dnm patch stadium projects with py39 15:04:21 <lajoskatona> huuu, I forgot about that, I will check 15:04:31 <slaweq> ok, thx 15:04:43 <slaweq> #action lajoskatona to check with dnm patch stadium projects with py39 15:04:43 <lajoskatona> this week I checked the issues with rbac :-) 15:05:20 <slaweq> in stadium? 15:05:29 <lajoskatona> yes, here is the list: https://review.opendev.org/q/topic:bug%252F2019097 15:05:52 <slaweq> thx for that 15:06:03 <slaweq> are those patches ready for review? 15:06:12 <ralonsoh> some of then are still failing 15:06:40 <lajoskatona> yes, I still have to check the bgpvpn tempest tests, for example 15:06:59 <slaweq> ok, so let us know when it will be ready 15:07:04 <lajoskatona> sure 15:07:06 <slaweq> and thx for working on this 15:07:16 <slaweq> next one 15:07:18 <slaweq> ykarel to update nova timeouts bug https://bugs.launchpad.net/neutron/+bug/2015065 15:07:43 <ykarel> Yes added findings in comment 7 and 8 15:07:55 <ykarel> this is the same issue ralonsoh mentioned in previous meeting 15:08:18 <lajoskatona> gibi also mentioned something about this and eventlet 15:08:25 <ykarel> After this gibi too investigated it based on traceback in 8 and filed an eventlet bug 15:08:30 <ykarel> https://github.com/eventlet/eventlet/issues/798 15:08:48 <lajoskatona> +1, I didn't find it 15:09:00 <ykarel> in our jobs for now we disabled dbcounter https://review.opendev.org/c/openstack/neutron/+/883648 15:09:29 <ykarel> don't know if it's directly related, but disabling it made the issue appear less frequent 15:09:47 <slaweq> so with https://review.opendev.org/c/openstack/neutron/+/883648 merged we should be good in our ci for now, right? 15:09:51 <lajoskatona> cool that is better than nothing 15:09:54 <ykarel> basically nova worker is getting stuck while contacting neutron apis 15:10:18 <ralonsoh> and we can't use another lib, right? 15:10:37 <ykarel> also there were some other issues where process don't get stuck but api requests just timeout in 60 seconds, as those taking more time 15:11:12 <ykarel> i didn't digged there why those taking more than 60 sec, but may be just system load 15:11:33 <slaweq> but indeed I think that grenade jobs are much more stable this week 15:11:40 <slaweq> thx ykarel for working on this 15:11:40 <ykarel> ralonsoh, sorry not clear, what another lib? 15:11:43 <slaweq> and gibi too :) 15:11:49 <ralonsoh> not urllib 15:12:10 <ykarel> also if anyone has inputs/workaround for the stuck issue please comment on the bug 15:12:38 <ralonsoh> ok 15:13:01 <slaweq> ok, I think we can move on 15:13:03 <slaweq> #topic Stable branches 15:13:05 <ykarel> ralonsoh, hmm not sure but yes if there are better alternatives could be tried out, but i think urllib is widely used 15:13:10 <slaweq> bcafarel any updates? 15:13:47 <bcafarel> all good in most branches, we had one job that started to fail in train https://bugs.launchpad.net/neutron/+bug/2020363 15:14:01 <bcafarel> already fixed by ykarel++ dropping openstacksdk-functional job 15:14:20 <slaweq> ++ 15:14:24 <ralonsoh> +1 15:14:25 <slaweq> thx ykarel 15:16:33 <slaweq> ok, next topic 15:16:38 <slaweq> #topic Stadium projects 15:16:50 <slaweq> I still see a lot of red jobs in periodic queues 15:17:05 <slaweq> but I guess it's related to s-rbac issue which lajoskatona is working on 15:17:10 <lajoskatona> yes exactly 15:17:22 <lajoskatona> and ralonsoh also 15:18:22 <lajoskatona> that's it from me for these projects, please check if there are ope npatches for these :-) 15:18:38 <slaweq> ++ thx 15:18:57 <slaweq> anything else regarding stadium or can we move on? 15:19:20 <lajoskatona> we can move on 15:19:33 <slaweq> ok 15:19:35 <slaweq> #topic Grafana 15:19:42 <slaweq> #link https://grafana.opendev.org/d/f913631585/neutron-failure-rate 15:20:18 <slaweq> no anything critical there 15:20:26 <slaweq> at least from what I see there 15:21:15 <slaweq> so let's move on 15:21:19 <slaweq> #topic Rechecks 15:21:31 <slaweq> this week we are going back to lower numbers of rechecks 15:21:31 <slaweq> so that's good 15:21:51 <slaweq> regarding bare rechecks we are also good as most of our rechecks are with some reason 15:22:03 <slaweq> but the bad thing is that we are doing A LOT of rechecks in total 15:22:13 <slaweq> in last 7 days it was 64 rechecks in total 15:23:06 <slaweq> I will need to update my script to be able to get data about recheck reasons 15:23:17 <slaweq> and then have some summary of why rechecks happens 15:23:34 <lajoskatona> the percentage is also high? i mean to the toal number of patchsets? 15:24:28 <slaweq> lajoskatona I don't have stats about number of patchsets 15:24:39 <mlavalle> yeah, it took me some rechecks to merge the metadata request rating stuff 15:24:42 <slaweq> I will look at it too 15:24:58 <lajoskatona> ok, thanks 15:26:25 <slaweq> and that's all about rechecks for me 15:26:29 <slaweq> I think we can move on 15:26:30 <slaweq> #topic fullstack/functional 15:26:41 <slaweq> I found couple of failures in functional tests this week 15:26:46 <slaweq> most of them happened once 15:27:13 <slaweq> but qos related tests from neutron.tests.functional.agent.common.test_ovs_lib.BaseOVSTestCase failed twice in similar (or the same) way: 15:27:17 <slaweq> https://7814844573a763db7ab8-0ac84b2ac4d53823f5d0fa90b7a93a42.ssl.cf2.rackcdn.com/882865/2/gate/neutron-functional-with-uwsgi/3e74eb2/testr_results.html 15:27:17 <slaweq> https://f326700999a21a41aed9-1ea95ca857946beec6346fe0f4481db6.ssl.cf1.rackcdn.com/883269/2/check/neutron-functional-with-uwsgi/f3242b7/testr_results.html 15:27:33 <slaweq> did You maybe saw it also already? 15:28:01 <ralonsoh> this is during the port creation 15:28:05 <ralonsoh> not even the qos set 15:28:09 <slaweq> yes 15:28:16 <slaweq> but I saw it in two qos related tests 15:28:43 <slaweq> it can be that it was just busy node 15:28:47 <slaweq> and some timeout happend 15:28:49 <slaweq> idk 15:29:39 <slaweq> anyone wants to check it in logs? 15:29:47 <ralonsoh> I'll try this week 15:29:54 <slaweq> thx ralonsoh 15:30:12 <slaweq> #action ralonsoh to check port creation timeout in functional tests 15:30:55 <slaweq> other failures I saw happened only once this week: 15:31:05 <slaweq> ovsdb command timeout https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_74b/883421/4/check/neutron-functional-with-uwsgi/74bb95e/testr_results.html 15:31:39 <slaweq> issue with connectivity: https://da71356301863c380a6d-648722ac87374da2f576895eac8df5a8.ssl.cf2.rackcdn.com/883687/1/check/neutron-functional-with-uwsgi/7e24a52/testr_results.html 15:31:59 <slaweq> or even there was expected no connectivity there and it was working all the time 15:32:04 <slaweq> IIUC stacktrace 15:32:42 <slaweq> and last but not least, yet another 2 timeouts with dvr_router_lifecycle: 15:32:45 <slaweq> https://9551a11e9f70ee5b8295-ea25b0076b50bda9415898f3289d868a.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/f4897e4/testr_results.html 15:32:49 <slaweq> https://fb13014d32b897a8a583-51ad4c26deb09abcf5b0e79e0d0bdf13.ssl.cf5.rackcdn.com/883681/2/check/neutron-functional-with-uwsgi/4b6f09c/testr_results.html 15:33:04 <slaweq> lajoskatona didn't You investigating same issue in the past? 15:33:32 <lajoskatona> something similar at least, and not just me :-) 15:33:43 <slaweq> ahh true 15:34:11 <slaweq> any volunteer to check those new failures there? 15:34:21 <ralonsoh> I'll try too 15:34:22 <mlavalle> I'll take one 15:34:28 <slaweq> thx 15:34:39 <mlavalle> which want you don't want ralonsoh ? 15:34:52 <ralonsoh> I'll check 15:34:53 <ralonsoh> _dvr_router_lifecycle 15:35:11 <slaweq> #action ralonsoh to check dvr_lifecycle timeouts in functional job 15:35:14 <mlavalle> ok, I'll check the ovsdb one 15:35:15 <slaweq> thx ralonsoh 15:35:28 <slaweq> #action mlavalle to check ovsdb command timeout in functional job 15:35:31 <slaweq> thx mlavalle 15:35:55 <slaweq> ok, and that's all about functional jobs which I have for today 15:36:05 <slaweq> can we move on to the next topic? 15:36:27 <mlavalle> let's move on 15:36:37 <slaweq> so I have just one last topic for today 15:36:42 <slaweq> #topic Periodic 15:36:48 <slaweq> in general things looks good there 15:37:04 <slaweq> except FIPS related jobs which are failing since 18.05.2023 15:37:13 <slaweq> first there was problems with opevswitch start:... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/TLUCwystodtTVqjAheZMlyGi>) 15:37:38 <slaweq> so it don't seems like neutron issue but something distro related more likely for me 15:38:07 <slaweq> any volunteer to check it? if not I will try to find some time for it this week 15:38:34 <ralonsoh> I'll do it 15:38:39 <slaweq> thx 15:38:52 <slaweq> #action ralonsoh to check failing fips periodic jobs 15:39:02 <slaweq> and that was last topic from me for today 15:39:07 <slaweq> #topic On Demand 15:39:14 <slaweq> anything else You want to discuss today here? 15:39:49 <ykarel> seems we missed tempest/grenade sections? 15:40:05 <slaweq> ykarel no, I didn't really missed it 15:40:34 <ykarel> ohkk then seems i missed it :) 15:40:40 <slaweq> I put some links there just for the record but it wasn't anything what is probably worth to discuss 15:40:56 <slaweq> there was just 2 issues which happened once each 15:41:06 <slaweq> and it didn't really looked like related to neutron 15:41:15 <slaweq> so it was just for the record in the etherpad :) 15:41:24 <ykarel> ok the grenade one i saw from etherpad, i seen before those but thought it's related to my test patches but looks like real issue 15:41:49 <slaweq> You mean that nova-api didn't stop issue? 15:41:52 <slaweq> or the other one? 15:41:52 <ykarel> yes 15:42:02 <ykarel> nova-api one 15:42:22 <slaweq> do You think we should report it already? 15:42:29 <ykarel> yes 15:42:31 <lajoskatona> ahh, I just saw that one 15:42:45 <ykarel> i can report it 15:42:58 <lajoskatona> thanks for it 15:43:05 <ykarel> as seen multiple occurances already 15:43:12 <ykarel> and impacting our gates 15:43:29 <slaweq> will You open bug for nova then? 15:43:30 <slaweq> thx ykarel 15:43:34 <slaweq> I saw it just this one time 15:43:36 <slaweq> that's why I though it's not that serious issue and not worth to report yet 15:43:50 <slaweq> but in such case, yeah pleas open LP for it 15:43:59 <ykarel> yes against nova or may be better with devstack/grenade 15:44:20 <ykarel> as looks more related to systemd/uwsgi config 15:44:27 <slaweq> ok 15:44:52 <slaweq> thx for that 15:45:06 <slaweq> ok, now I think we are done with topics for today :) 15:45:14 <slaweq> so I will give You 15 minutes back 15:45:20 <slaweq> thx for attending the meeting 15:45:23 <mlavalle> \o/ 15:45:27 <slaweq> o/ 15:45:28 <slaweq> #endmeeting