15:00:10 #startmeeting neutron_ci 15:00:10 Meeting started Tue May 23 15:00:10 2023 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:10 The meeting name has been set to 'neutron_ci' 15:00:20 hi 15:00:21 ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva, elvira 15:00:29 o/ 15:00:33 o/ 15:00:35 o/ 15:00:46 isn't it video this week? 15:01:03 lajoskatona nope 15:01:06 ok 15:01:12 (I was wrong) 15:01:19 last week there wasn't meeting 15:01:32 and two weeks ago it was on video 15:01:41 so this week it's on irc :) 15:02:08 LOL, I've waiting in the video meeting 15:02:44 Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:03:06 #topic Actions from previous meetings 15:03:12 lajoskatona to check with dnm patch stadium projects with py39 15:04:21 huuu, I forgot about that, I will check 15:04:31 ok, thx 15:04:43 #action lajoskatona to check with dnm patch stadium projects with py39 15:04:43 this week I checked the issues with rbac :-) 15:05:20 in stadium? 15:05:29 yes, here is the list: https://review.opendev.org/q/topic:bug%252F2019097 15:05:52 thx for that 15:06:03 are those patches ready for review? 15:06:12 some of then are still failing 15:06:40 yes, I still have to check the bgpvpn tempest tests, for example 15:06:59 ok, so let us know when it will be ready 15:07:04 sure 15:07:06 and thx for working on this 15:07:16 next one 15:07:18 ykarel to update nova timeouts bug https://bugs.launchpad.net/neutron/+bug/2015065 15:07:43 Yes added findings in comment 7 and 8 15:07:55 this is the same issue ralonsoh mentioned in previous meeting 15:08:18 gibi also mentioned something about this and eventlet 15:08:25 After this gibi too investigated it based on traceback in 8 and filed an eventlet bug 15:08:30 https://github.com/eventlet/eventlet/issues/798 15:08:48 +1, I didn't find it 15:09:00 in our jobs for now we disabled dbcounter https://review.opendev.org/c/openstack/neutron/+/883648 15:09:29 don't know if it's directly related, but disabling it made the issue appear less frequent 15:09:47 so with https://review.opendev.org/c/openstack/neutron/+/883648 merged we should be good in our ci for now, right? 15:09:51 cool that is better than nothing 15:09:54 basically nova worker is getting stuck while contacting neutron apis 15:10:18 and we can't use another lib, right? 15:10:37 also there were some other issues where process don't get stuck but api requests just timeout in 60 seconds, as those taking more time 15:11:12 i didn't digged there why those taking more than 60 sec, but may be just system load 15:11:33 but indeed I think that grenade jobs are much more stable this week 15:11:40 thx ykarel for working on this 15:11:40 ralonsoh, sorry not clear, what another lib? 15:11:43 and gibi too :) 15:11:49 not urllib 15:12:10 also if anyone has inputs/workaround for the stuck issue please comment on the bug 15:12:38 ok 15:13:01 ok, I think we can move on 15:13:03 #topic Stable branches 15:13:05 ralonsoh, hmm not sure but yes if there are better alternatives could be tried out, but i think urllib is widely used 15:13:10 bcafarel any updates? 15:13:47 all good in most branches, we had one job that started to fail in train https://bugs.launchpad.net/neutron/+bug/2020363 15:14:01 already fixed by ykarel++ dropping openstacksdk-functional job 15:14:20 ++ 15:14:24 +1 15:14:25 thx ykarel 15:16:33 ok, next topic 15:16:38 #topic Stadium projects 15:16:50 I still see a lot of red jobs in periodic queues 15:17:05 but I guess it's related to s-rbac issue which lajoskatona is working on 15:17:10 yes exactly 15:17:22 and ralonsoh also 15:18:22 that's it from me for these projects, please check if there are ope npatches for these :-) 15:18:38 ++ thx 15:18:57 anything else regarding stadium or can we move on? 15:19:20 we can move on 15:19:33 ok 15:19:35 #topic Grafana 15:19:42 #link https://grafana.opendev.org/d/f913631585/neutron-failure-rate 15:20:18 no anything critical there 15:20:26 at least from what I see there 15:21:15 so let's move on 15:21:19 #topic Rechecks 15:21:31 this week we are going back to lower numbers of rechecks 15:21:31 so that's good 15:21:51 regarding bare rechecks we are also good as most of our rechecks are with some reason 15:22:03 but the bad thing is that we are doing A LOT of rechecks in total 15:22:13 in last 7 days it was 64 rechecks in total 15:23:06 I will need to update my script to be able to get data about recheck reasons 15:23:17 and then have some summary of why rechecks happens 15:23:34 the percentage is also high? i mean to the toal number of patchsets? 15:24:28 lajoskatona I don't have stats about number of patchsets 15:24:39 yeah, it took me some rechecks to merge the metadata request rating stuff 15:24:42 I will look at it too 15:24:58 ok, thanks 15:26:25 and that's all about rechecks for me 15:26:29 I think we can move on 15:26:30 #topic fullstack/functional 15:26:41 I found couple of failures in functional tests this week 15:26:46 most of them happened once 15:27:13 but qos related tests from neutron.tests.functional.agent.common.test_ovs_lib.BaseOVSTestCase failed twice in similar (or the same) way: 15:27:17 https://7814844573a763db7ab8-0ac84b2ac4d53823f5d0fa90b7a93a42.ssl.cf2.rackcdn.com/882865/2/gate/neutron-functional-with-uwsgi/3e74eb2/testr_results.html 15:27:17 https://f326700999a21a41aed9-1ea95ca857946beec6346fe0f4481db6.ssl.cf1.rackcdn.com/883269/2/check/neutron-functional-with-uwsgi/f3242b7/testr_results.html 15:27:33 did You maybe saw it also already? 15:28:01 this is during the port creation 15:28:05 not even the qos set 15:28:09 yes 15:28:16 but I saw it in two qos related tests 15:28:43 it can be that it was just busy node 15:28:47 and some timeout happend 15:28:49 idk 15:29:39 anyone wants to check it in logs? 15:29:47 I'll try this week 15:29:54 thx ralonsoh 15:30:12 #action ralonsoh to check port creation timeout in functional tests 15:30:55 other failures I saw happened only once this week: 15:31:05 ovsdb command timeout https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_74b/883421/4/check/neutron-functional-with-uwsgi/74bb95e/testr_results.html 15:31:39 issue with connectivity: https://da71356301863c380a6d-648722ac87374da2f576895eac8df5a8.ssl.cf2.rackcdn.com/883687/1/check/neutron-functional-with-uwsgi/7e24a52/testr_results.html 15:31:59 or even there was expected no connectivity there and it was working all the time 15:32:04 IIUC stacktrace 15:32:42 and last but not least, yet another 2 timeouts with dvr_router_lifecycle: 15:32:45 https://9551a11e9f70ee5b8295-ea25b0076b50bda9415898f3289d868a.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/f4897e4/testr_results.html 15:32:49 https://fb13014d32b897a8a583-51ad4c26deb09abcf5b0e79e0d0bdf13.ssl.cf5.rackcdn.com/883681/2/check/neutron-functional-with-uwsgi/4b6f09c/testr_results.html 15:33:04 lajoskatona didn't You investigating same issue in the past? 15:33:32 something similar at least, and not just me :-) 15:33:43 ahh true 15:34:11 any volunteer to check those new failures there? 15:34:21 I'll try too 15:34:22 I'll take one 15:34:28 thx 15:34:39 which want you don't want ralonsoh ? 15:34:52 I'll check 15:34:53 _dvr_router_lifecycle 15:35:11 #action ralonsoh to check dvr_lifecycle timeouts in functional job 15:35:14 ok, I'll check the ovsdb one 15:35:15 thx ralonsoh 15:35:28 #action mlavalle to check ovsdb command timeout in functional job 15:35:31 thx mlavalle 15:35:55 ok, and that's all about functional jobs which I have for today 15:36:05 can we move on to the next topic? 15:36:27 let's move on 15:36:37 so I have just one last topic for today 15:36:42 #topic Periodic 15:36:48 in general things looks good there 15:37:04 except FIPS related jobs which are failing since 18.05.2023 15:37:13 first there was problems with opevswitch start:... (full message at ) 15:37:38 so it don't seems like neutron issue but something distro related more likely for me 15:38:07 any volunteer to check it? if not I will try to find some time for it this week 15:38:34 I'll do it 15:38:39 thx 15:38:52 #action ralonsoh to check failing fips periodic jobs 15:39:02 and that was last topic from me for today 15:39:07 #topic On Demand 15:39:14 anything else You want to discuss today here? 15:39:49 seems we missed tempest/grenade sections? 15:40:05 ykarel no, I didn't really missed it 15:40:34 ohkk then seems i missed it :) 15:40:40 I put some links there just for the record but it wasn't anything what is probably worth to discuss 15:40:56 there was just 2 issues which happened once each 15:41:06 and it didn't really looked like related to neutron 15:41:15 so it was just for the record in the etherpad :) 15:41:24 ok the grenade one i saw from etherpad, i seen before those but thought it's related to my test patches but looks like real issue 15:41:49 You mean that nova-api didn't stop issue? 15:41:52 or the other one? 15:41:52 yes 15:42:02 nova-api one 15:42:22 do You think we should report it already? 15:42:29 yes 15:42:31 ahh, I just saw that one 15:42:45 i can report it 15:42:58 thanks for it 15:43:05 as seen multiple occurances already 15:43:12 and impacting our gates 15:43:29 will You open bug for nova then? 15:43:30 thx ykarel 15:43:34 I saw it just this one time 15:43:36 that's why I though it's not that serious issue and not worth to report yet 15:43:50 but in such case, yeah pleas open LP for it 15:43:59 yes against nova or may be better with devstack/grenade 15:44:20 as looks more related to systemd/uwsgi config 15:44:27 ok 15:44:52 thx for that 15:45:06 ok, now I think we are done with topics for today :) 15:45:14 so I will give You 15 minutes back 15:45:20 thx for attending the meeting 15:45:23 \o/ 15:45:27 o/ 15:45:28 #endmeeting