#openstack-neutron log

15:00:10 <slaweq> #startmeeting neutron_ci
15:00:10 <opendevmeet> Meeting started Tue May 23 15:00:10 2023 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:10 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:10 <opendevmeet> The meeting name has been set to 'neutron_ci'
15:00:20 <ralonsoh> hi
15:00:21 <slaweq> ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva, elvira
15:00:29 <lajoskatona> o/
15:00:33 <ykarel> o/
15:00:35 <bcafarel> o/
15:00:46 <lajoskatona> isn't it video this week?
15:01:03 <slaweq> lajoskatona nope
15:01:06 <lajoskatona> ok
15:01:12 <ralonsoh> (I was wrong)
15:01:19 <slaweq> last week there wasn't meeting
15:01:32 <slaweq> and two weeks ago it was on video
15:01:41 <slaweq> so this week it's on irc :)
15:02:08 <mlavalle> LOL, I've waiting in the video meeting
15:02:44 <slaweq> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1
15:03:06 <slaweq> #topic Actions from previous meetings
15:03:12 <slaweq> lajoskatona to check with dnm patch stadium projects with py39
15:04:21 <lajoskatona> huuu, I forgot about that, I will check
15:04:31 <slaweq> ok, thx
15:04:43 <slaweq> #action lajoskatona to check with dnm patch stadium projects with py39
15:04:43 <lajoskatona> this week I checked the issues with rbac :-)
15:05:20 <slaweq> in stadium?
15:05:29 <lajoskatona> yes, here is the list: https://review.opendev.org/q/topic:bug%252F2019097
15:05:52 <slaweq> thx for that
15:06:03 <slaweq> are those patches ready for review?
15:06:12 <ralonsoh> some of then are still failing
15:06:40 <lajoskatona> yes, I still have to check the bgpvpn tempest tests, for example
15:06:59 <slaweq> ok, so let us know when it will be ready
15:07:04 <lajoskatona> sure
15:07:06 <slaweq> and thx for working on this
15:07:16 <slaweq> next one
15:07:18 <slaweq> ykarel to update nova timeouts bug https://bugs.launchpad.net/neutron/+bug/2015065
15:07:43 <ykarel> Yes added findings in comment 7 and 8
15:07:55 <ykarel> this is the same issue ralonsoh mentioned in previous meeting
15:08:18 <lajoskatona> gibi also mentioned something about this and eventlet
15:08:25 <ykarel> After this gibi too investigated it based on traceback in 8 and filed an eventlet bug
15:08:30 <ykarel> https://github.com/eventlet/eventlet/issues/798
15:08:48 <lajoskatona> +1, I didn't find it
15:09:00 <ykarel> in our jobs for now we disabled dbcounter https://review.opendev.org/c/openstack/neutron/+/883648
15:09:29 <ykarel> don't know if it's directly related, but disabling it made the issue appear less frequent
15:09:47 <slaweq> so with https://review.opendev.org/c/openstack/neutron/+/883648 merged we should be good in our ci for now, right?
15:09:51 <lajoskatona> cool that is better than nothing
15:09:54 <ykarel> basically nova worker is getting stuck while contacting neutron apis
15:10:18 <ralonsoh> and we can't use another lib, right?
15:10:37 <ykarel> also there were some other issues where process don't get stuck but api requests just timeout in 60 seconds, as those taking more time
15:11:12 <ykarel> i didn't digged there why those taking more than 60 sec, but may be just system load
15:11:33 <slaweq> but indeed I think that grenade jobs are much more stable this week
15:11:40 <slaweq> thx ykarel for working on this
15:11:40 <ykarel> ralonsoh, sorry not clear, what another lib?
15:11:43 <slaweq> and gibi too :)
15:11:49 <ralonsoh> not urllib
15:12:10 <ykarel> also if anyone has inputs/workaround for the stuck issue please comment on the bug
15:12:38 <ralonsoh> ok
15:13:01 <slaweq> ok, I think we can move on
15:13:03 <slaweq> #topic Stable branches
15:13:05 <ykarel> ralonsoh, hmm not sure but yes if there are better alternatives could be tried out, but i think urllib is widely used
15:13:10 <slaweq> bcafarel any updates?
15:13:47 <bcafarel> all good in most branches, we had one job that started to fail in train https://bugs.launchpad.net/neutron/+bug/2020363
15:14:01 <bcafarel> already fixed by ykarel++ dropping openstacksdk-functional job
15:14:20 <slaweq> ++
15:14:24 <ralonsoh> +1
15:14:25 <slaweq> thx ykarel
15:16:33 <slaweq> ok, next topic
15:16:38 <slaweq> #topic Stadium projects
15:16:50 <slaweq> I still see a lot of red jobs in periodic queues
15:17:05 <slaweq> but I guess it's related to s-rbac issue which lajoskatona is working on
15:17:10 <lajoskatona> yes exactly
15:17:22 <lajoskatona> and ralonsoh also
15:18:22 <lajoskatona> that's it from me for these projects, please check if there are ope npatches for these :-)
15:18:38 <slaweq> ++ thx
15:18:57 <slaweq> anything else regarding stadium or can we move on?
15:19:20 <lajoskatona> we can move on
15:19:33 <slaweq> ok
15:19:35 <slaweq> #topic Grafana
15:19:42 <slaweq> #link https://grafana.opendev.org/d/f913631585/neutron-failure-rate
15:20:18 <slaweq> no anything critical there
15:20:26 <slaweq> at least from what I see there
15:21:15 <slaweq> so let's move on
15:21:19 <slaweq> #topic Rechecks
15:21:31 <slaweq> this week we are going back to lower numbers of rechecks
15:21:31 <slaweq> so that's good
15:21:51 <slaweq> regarding bare rechecks we are also good as most of our rechecks are with some reason
15:22:03 <slaweq> but the bad thing is that we are doing A LOT of rechecks in total
15:22:13 <slaweq> in last 7 days it was 64 rechecks in total
15:23:06 <slaweq> I will need to update my script to be able to get data about recheck reasons
15:23:17 <slaweq> and then have some summary of why rechecks happens
15:23:34 <lajoskatona> the percentage is also high? i mean to the toal number of patchsets?
15:24:28 <slaweq> lajoskatona I don't have stats about number of patchsets
15:24:39 <mlavalle> yeah, it took me some rechecks to merge the metadata request rating stuff
15:24:42 <slaweq> I will look at it too
15:24:58 <lajoskatona> ok, thanks
15:26:25 <slaweq> and that's all about rechecks for me
15:26:29 <slaweq> I think we can move on
15:26:30 <slaweq> #topic fullstack/functional
15:26:41 <slaweq> I found couple of failures in functional tests this week
15:26:46 <slaweq> most of them happened once
15:27:13 <slaweq> but qos related tests from neutron.tests.functional.agent.common.test_ovs_lib.BaseOVSTestCase failed twice in similar (or the same) way:
15:27:17 <slaweq> https://7814844573a763db7ab8-0ac84b2ac4d53823f5d0fa90b7a93a42.ssl.cf2.rackcdn.com/882865/2/gate/neutron-functional-with-uwsgi/3e74eb2/testr_results.html
15:27:17 <slaweq> https://f326700999a21a41aed9-1ea95ca857946beec6346fe0f4481db6.ssl.cf1.rackcdn.com/883269/2/check/neutron-functional-with-uwsgi/f3242b7/testr_results.html
15:27:33 <slaweq> did You maybe saw it also already?
15:28:01 <ralonsoh> this is during the port creation
15:28:05 <ralonsoh> not even the qos set
15:28:09 <slaweq> yes
15:28:16 <slaweq> but I saw it in two qos related tests
15:28:43 <slaweq> it can be that it was just busy node
15:28:47 <slaweq> and some timeout happend
15:28:49 <slaweq> idk
15:29:39 <slaweq> anyone wants to check it in logs?
15:29:47 <ralonsoh> I'll try this week
15:29:54 <slaweq> thx ralonsoh
15:30:12 <slaweq> #action ralonsoh to check port creation timeout in functional tests
15:30:55 <slaweq> other failures I saw happened only once this week:
15:31:05 <slaweq> ovsdb command timeout https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_74b/883421/4/check/neutron-functional-with-uwsgi/74bb95e/testr_results.html
15:31:39 <slaweq> issue with connectivity: https://da71356301863c380a6d-648722ac87374da2f576895eac8df5a8.ssl.cf2.rackcdn.com/883687/1/check/neutron-functional-with-uwsgi/7e24a52/testr_results.html
15:31:59 <slaweq> or even there was expected no connectivity there and it was working all the time
15:32:04 <slaweq> IIUC stacktrace
15:32:42 <slaweq> and last but not least, yet another 2 timeouts with dvr_router_lifecycle:
15:32:45 <slaweq> https://9551a11e9f70ee5b8295-ea25b0076b50bda9415898f3289d868a.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/f4897e4/testr_results.html
15:32:49 <slaweq> https://fb13014d32b897a8a583-51ad4c26deb09abcf5b0e79e0d0bdf13.ssl.cf5.rackcdn.com/883681/2/check/neutron-functional-with-uwsgi/4b6f09c/testr_results.html
15:33:04 <slaweq> lajoskatona didn't You investigating same issue in the past?
15:33:32 <lajoskatona> something similar at least, and not just me :-)
15:33:43 <slaweq> ahh true
15:34:11 <slaweq> any volunteer to check those new failures there?
15:34:21 <ralonsoh> I'll try too
15:34:22 <mlavalle> I'll take one
15:34:28 <slaweq> thx
15:34:39 <mlavalle> which want you don't want ralonsoh ?
15:34:52 <ralonsoh> I'll check
15:34:53 <ralonsoh> _dvr_router_lifecycle
15:35:11 <slaweq> #action ralonsoh  to check dvr_lifecycle timeouts in functional job
15:35:14 <mlavalle> ok, I'll check the ovsdb one
15:35:15 <slaweq> thx ralonsoh
15:35:28 <slaweq> #action mlavalle to check ovsdb command timeout in functional job
15:35:31 <slaweq> thx mlavalle
15:35:55 <slaweq> ok, and that's all about functional jobs which I have for today
15:36:05 <slaweq> can we move on to the next topic?
15:36:27 <mlavalle> let's move on
15:36:37 <slaweq> so I have just one last topic for today
15:36:42 <slaweq> #topic Periodic
15:36:48 <slaweq> in general things looks good there
15:37:04 <slaweq> except FIPS related jobs which are failing since 18.05.2023
15:37:13 <slaweq> first there was problems with opevswitch start:... (full message at <https://matrix.org/_matrix/media/v3/download/matrix.org/TLUCwystodtTVqjAheZMlyGi>)
15:37:38 <slaweq> so it don't seems like neutron issue but something distro related more likely for me
15:38:07 <slaweq> any volunteer to check it? if not I will try to find some time for it this week
15:38:34 <ralonsoh> I'll do it
15:38:39 <slaweq> thx
15:38:52 <slaweq> #action ralonsoh to check failing fips periodic jobs
15:39:02 <slaweq> and that was last topic from me for today
15:39:07 <slaweq> #topic On Demand
15:39:14 <slaweq> anything else You want to discuss today here?
15:39:49 <ykarel> seems we missed tempest/grenade sections?
15:40:05 <slaweq> ykarel no, I didn't really missed it
15:40:34 <ykarel> ohkk then seems i missed it :)
15:40:40 <slaweq> I put some links there just for the record but it wasn't anything what is probably worth to discuss
15:40:56 <slaweq> there was just 2 issues which happened once each
15:41:06 <slaweq> and it didn't really looked like related to neutron
15:41:15 <slaweq> so it was just for the record in the etherpad :)
15:41:24 <ykarel> ok the grenade one i saw from etherpad, i seen before those but thought it's related to my test patches but looks like real issue
15:41:49 <slaweq> You mean that nova-api didn't stop issue?
15:41:52 <slaweq> or the other one?
15:41:52 <ykarel> yes
15:42:02 <ykarel> nova-api one
15:42:22 <slaweq> do You think we should report it already?
15:42:29 <ykarel> yes
15:42:31 <lajoskatona> ahh, I just saw that one
15:42:45 <ykarel> i can report it
15:42:58 <lajoskatona> thanks for it
15:43:05 <ykarel> as seen multiple occurances already
15:43:12 <ykarel> and impacting our gates
15:43:29 <slaweq> will You open bug for nova then?
15:43:30 <slaweq> thx ykarel
15:43:34 <slaweq> I saw it just this one time
15:43:36 <slaweq> that's why I though it's not that serious issue and not worth to report yet
15:43:50 <slaweq> but in such case, yeah pleas open LP for it
15:43:59 <ykarel> yes against nova or may be better with devstack/grenade
15:44:20 <ykarel> as looks more related to systemd/uwsgi config
15:44:27 <slaweq> ok
15:44:52 <slaweq> thx for that
15:45:06 <slaweq> ok, now I think we are done with topics for today :)
15:45:14 <slaweq> so I will give You 15 minutes back
15:45:20 <slaweq> thx for attending the meeting
15:45:23 <mlavalle> \o/
15:45:27 <slaweq> o/
15:45:28 <slaweq> #endmeeting