15:00:39 <slaweq> #startmeeting neutron_ci
15:00:39 <opendevmeet> Meeting started Tue Aug 22 15:00:39 2023 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:39 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:39 <opendevmeet> The meeting name has been set to 'neutron_ci'
15:00:50 <ralonsoh> hi
15:01:05 <mlavalle> o/
15:01:09 <slaweq> ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva, elvira
15:01:11 <slaweq> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1
15:01:23 <mlavalle> slaweq: irc or video?
15:01:26 <ykarel> o/
15:01:26 <bcafarel> o/
15:01:36 <slaweq> mlavalle irc today
15:01:54 <slaweq> I think we can start
15:01:56 <slaweq> #topic Actions from previous meetings
15:02:09 <slaweq> ralonsoh to check failed neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_keepalived_multiple_sighups_does_not_forfeit_primary test
15:02:25 <ralonsoh> no, I didn't start with this one
15:02:26 <ralonsoh> sorry
15:02:41 <slaweq> no worries
15:02:58 <slaweq> I didn't saw this issue recently so maybe we can just wait for new occurences
15:03:01 <slaweq> wdyt?
15:03:05 <slaweq> and maybe then get back to this
15:03:09 <ralonsoh> ok for me
15:03:49 <slaweq> ok, so next one
15:03:50 <slaweq> mtomaska to check failing neutron-functional-with-sqlalchemy-master periodic job
15:03:55 <mtomaska> https://review.opendev.org/c/openstack/neutron/+/890939
15:04:08 <mtomaska> should be fixed when that patch merges
15:04:12 <lajoskatona> o/
15:04:15 <slaweq> thx for the fix mtomaska, I see it's in the gate now
15:04:54 <slaweq> so, last one from previous meeting:
15:04:55 <slaweq> lajoskatona will send DNM patch for neutron-dynamic-routing to check jobs
15:05:18 <lajoskatona> It was sent and frickler actually found the issue in os-ken
15:05:38 <lajoskatona> so I sent the DNM and frickler done the rest of the work
15:06:02 <slaweq> thx lajoskatona and frickler
15:06:07 <slaweq> so are we good now with it?
15:06:08 <frickler> and quite a journey it was ;)
15:06:12 <slaweq> or is it in progress?
15:06:19 <frickler> we still need to test after os-ken release
15:06:29 <frickler> because it isn't self testing as mentioned earlier
15:06:29 <lajoskatona> if we have the os-ken release it should be fine
15:06:40 <frickler> I only tested on the held node I used for debugging
15:06:41 <slaweq> ok :)
15:06:56 <ykarel> or may be fix the jobs so os-ken in patch get's used
15:06:56 <slaweq> thx a lot to both of You
15:07:40 <slaweq> ykarel do You mean to use os-ken from master in those jobs?
15:08:09 <ralonsoh> that was discussed in the previous meeting
15:08:25 <ykarel> slaweq, iiuc what frickler mean the jobs running against os-ken patches not using those patches but released version
15:08:25 <ralonsoh> n-d-r job in os-ken CI is not isntalling the tested patch
15:08:28 <slaweq> ahh, sorry. I probably missed it then
15:08:42 <ykarel> hmm i was also out last meeting so might be missing context
15:09:08 <ykarel> so i meant we should instead fix the jobs to work with os-ken patches and not wait for actual release to test :)
15:09:26 <frickler> well the release is due this week anyway
15:09:39 <frickler> but fixing the job would be a good task to do, too
15:09:41 <ykarel> me also not sure if this is a regression or it never worked for os-ken
15:09:56 <slaweq> ok, so this needs to be fixed indeed
15:12:37 <slaweq> frickler can You maybe check it this week and open LP if we need to fix jobs in os-ken?
15:12:47 <slaweq> so we can track it at least and not forget about it
15:13:04 <frickler> well I'm pretty sure that it is broken
15:13:24 <frickler> ralonsoh was the one who wanted to take another look and open the bug
15:13:33 <slaweq> ok, thx
15:13:42 <slaweq> so ralonsoh will You open LP for it?
15:13:44 <ralonsoh> yes, I'll check that is broken in the CI execution
15:13:50 <ralonsoh> yes, after checking the logs
15:14:09 <slaweq> thx
15:14:27 <slaweq> #action ralonsoh to check n-d-r os-ken jobs and open LP related to it
15:14:38 <slaweq> ok, I think we can move on
15:14:39 <slaweq> #topic Stable branches
15:14:49 <slaweq> bcafarel anything new/urgent?
15:15:07 <bcafarel> no, all good overall :)
15:15:23 <bcafarel> recent backports passed gates smoothly, up to ussuri
15:16:19 <slaweq> ok
15:16:24 <slaweq> so I think we can move on then
15:16:30 <slaweq> #topic Stadium projects
15:16:48 <slaweq> anything to discuss here? except n-d-r and os-ken which we already talked about
15:17:00 <lajoskatona> We discussed n-d-r, so that is one thing to keep an eye on
15:17:11 <lajoskatona> the other topic is bagpipe
15:17:27 <lajoskatona> it is failing with SQLAlchemy 2, I proposed a patch: https://review.opendev.org/c/openstack/networking-bagpipe/+/891325
15:17:59 <lajoskatona> but some test still fails randomly for the sfc driver, so I have to spend some more time with it
15:18:38 <lajoskatona> that's it for the stadiums
15:18:50 <slaweq> are those random failures also related to SQLAlchemy 2.0? or something different?
15:19:28 <lajoskatona> no I see them only with sqlalchemy2
15:19:57 <slaweq> ok, so maybe ralonsoh and/or stephenfin will be able to help with them somehow
15:20:08 <ralonsoh> I'll try to find the issue there
15:20:14 <slaweq> thx a lot
15:20:27 <lajoskatona> thanks
15:20:56 <slaweq> next topic then
15:20:56 <slaweq> #topic Grafana
15:22:00 <slaweq> https://grafana.opendev.org/d/f913631585/neutron-failure-rate
15:22:08 <slaweq> I see that rally jobs were broken last week but it's fixed on rally side already
15:22:17 <slaweq> other than that it's as usual
15:23:24 <mlavalle> +1
15:24:13 <slaweq> I think we can move on then
15:24:15 <slaweq> #topic Rechecks
15:24:51 <slaweq> it was a bit better last week already but then there was this issue with rally and issue with GLOBAL_VENV in devstack which made it a bit worst
15:25:06 <opendevreview> Merged openstack/neutron master: [sqlalchemy-20] TableClause.insert constructs Insert object  https://review.opendev.org/c/openstack/neutron/+/890939
15:25:09 <slaweq> but those problems are already fixed so I think it's pretty ok in overall
15:25:38 <slaweq> so I think we can move on to talk about some specific failures
15:25:44 <slaweq> #topic fullstack/functional
15:26:01 <slaweq> here I found one new (for me) failure in the     neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_maintenance.TestMaintenance.test_port_forwarding
15:26:08 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8a2/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-pyroute2-master/8a279fa/testr_results.html
15:26:36 <slaweq> it was in periodic job so it's not related to any patch in progress
15:27:07 <slaweq> anyone wants to check it deeper maybe? Or if not we can wait if it will happen more often
15:27:07 <ralonsoh> this is a callback, could be just a race condition
15:27:20 <ralonsoh> I can check it and maybe limit the check to the expected call
15:27:30 <slaweq> ralonsoh++ thx a lot
15:27:52 <slaweq> #action ralonsoh to check failure in the neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_maintenance.TestMaintenance.test_port_forwarding
15:28:23 <slaweq> and that's all regarding functional/fullstack tests
15:28:25 <slaweq> #topic Tempest/Scenario
15:28:38 <slaweq> here I noticed kernel panic in guest vm (again?):
15:28:45 <slaweq> https://cbf8616008e0e2c2dfec-9346de3bff5d83c6d90eefafd8632b44.ssl.cf1.rackcdn.com/884474/13/check/tempest-integrated-networking/0e81b62/testr_results.html
15:29:02 <slaweq> I'm not really even sure what Cirros version was used there
15:29:22 <slaweq> so maybe it's not an issue at all but just wanted to highlight here that I saw it again
15:29:30 <lajoskatona> cirros 6.2
15:29:46 <slaweq> so should be good, right?
15:29:54 <slaweq> maybe it's new issue then, idk
15:30:00 <ykarel> no shouldn't be related to cirros 6.2
15:30:23 <ykarel> i recall it's an old issue and it's workedaround in our job by using uec images
15:30:55 <slaweq> ykarel - possibly as this issue was in the tempest-integrated-networking job which is in tempest repo
15:31:01 <ykarel> yeap
15:31:13 <slaweq> so if that will happen more often we will maybe need to propose same workaround in that job too
15:31:24 <slaweq> lets keep an eye on it for now
15:31:30 <slaweq> is that ok for You?
15:31:42 <ykarel> +1
15:31:50 <ralonsoh> +1
15:31:55 <lajoskatona> +1
15:32:07 <slaweq> thx
15:32:11 <slaweq> so next topic
15:32:13 <slaweq> #topic grenade
15:32:33 <slaweq> I saw (again just once but wanted to mention it) some issue related to keystone: https://53ec660a16b30e470118-779b81139f4f29276caf956abf2a020f.ssl.cf2.rackcdn.com/890939/3/gate/neutron-ovs-grenade-dvr-multinode/f868b9c/controller/logs/grenade.sh_log.txt
15:32:59 <slaweq> did You saw something like that already? Is it something what we should report maybe to the keystone team?
15:33:16 <ralonsoh> yes, could be usefull for them to know this
15:33:31 <ralonsoh> doesn't seem to be related to Neutron
15:33:42 <slaweq> ok, I will let know about it to knikolla
15:34:50 <slaweq> #topic Periodic
15:35:07 <slaweq> here I saw 2 issues which we need to handle somehow:
15:35:18 <slaweq> fullstack fips job broken: https://zuul.openstack.org/build/b87d8c3037a1417193c865bc576ac593
15:35:30 <slaweq> and Centos 9 Stream jobs broken:
15:35:30 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_cbf/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-ovs-master-centos-9-stream/cbf72a9/job-output.txt
15:35:30 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_533/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-ovs-release-fips/5331cd4/job-output.txt
15:35:45 <slaweq> anyone wants to check those?
15:36:07 <ykarel> sounds related to GLOBAL_VENV thing
15:36:20 <ralonsoh> right
15:36:35 * haleyb noticed the centos9 job too trying to recreate a bug, but didn't dig into it
15:36:49 <ralonsoh> I'll check the centos9 error
15:37:07 <frickler> cf. https://review.opendev.org/c/openstack/tempest/+/891517
15:37:24 <slaweq> ok, I will check fullstack fips job then
15:37:39 <ralonsoh> ok so the centos9 issue seems to be solve there
15:37:40 <slaweq> #action ralonsoh to check Centos 9 stream jobs failures
15:37:59 <slaweq> thx frickler
15:38:13 <slaweq> #action slaweq to check fips fullstack job failures
15:38:38 <slaweq> that's all regarding periodic jobs from me
15:38:43 <slaweq> #topic On Demand
15:38:51 <slaweq> do You have anything else to discuss today?
15:39:22 <ykarel> just one thing as more people are here
15:39:31 <ykarel> i raised it over the patch https://review.opendev.org/c/openstack/neutron/+/892134
15:40:05 <ralonsoh> I think I addressed your comment
15:40:08 <ralonsoh> right?
15:40:21 <ralonsoh> I removed the experimental job
15:40:29 <ykarel> ralonsoh, yeap related to duplicating jobs in periodic/experimental and check
15:40:48 <ralonsoh> yeah, let's have it only in check queue
15:41:05 <ykarel> but i had a concern to avoid blocking CI with such jobs if master commits from sqlalchemy and alembic
15:41:11 <slaweq> ralonsoh but I also agree with ykarel that this job maybe should be non-voting one in check queue
15:41:25 <ralonsoh> at this point, that should always work
15:41:35 <ralonsoh> we should not include anything not compatible with sqlalchemy 2.0
15:42:04 <ralonsoh> but if you agree on this, I'll mark it as non-voting
15:42:05 <slaweq> yeah, but the point is - will sqlalchemy not merge anything breaking for us? :)
15:42:31 <ralonsoh> ok, I'll push a new patch marking it as non-voting
15:43:01 <haleyb> but we should pay attention to job during review :)
15:43:30 <ykarel> yeap non-voting jobs might get unnoticed
15:43:51 <slaweq> so maybe keep it voting for now and we can always switch it to non-voting in case of any problems from sqlalchemy side
15:43:57 <ralonsoh> perfect
15:44:12 <ralonsoh> so as is now
15:44:19 <haleyb> +1
15:44:22 <slaweq> ok
15:44:23 <ykarel> ok and hope it get's all good +1
15:44:36 <slaweq> I also have one additional topic/announcement for today
15:45:08 <slaweq> as You probably noticed, I'm chair of this CI meeting for quite some time (6+ years already if I'm not mistaken)
15:45:20 <slaweq> and recently I though it would be good to pass it to someone else
15:45:35 <slaweq> so starting next week ykarel will be our new chair of the CI meeting
15:45:47 <ralonsoh> slaweq, thanks for all these years!
15:45:51 <slaweq> thx ykarel for stepping up in this role :)
15:45:56 <ralonsoh> and thanks ykarel for steeping up!
15:46:06 <lajoskatona> thanks for the efforts to keep these topics in focus
15:46:15 <ykarel> Thanks slaweq for all your efforts for all your efforts in those years
15:46:44 <mlavalle> thanks for leading the meeting for so long slaweq
15:46:49 <mlavalle> and welcome yka
15:46:53 <mlavalle> ykarel:
15:47:06 <ykarel> thx everyone
15:47:12 <slaweq> and that's all from me for today
15:47:19 <lajoskatona> welcome ykarel as chair of this meeting:-)
15:47:26 <slaweq> if there are no other topics, I will give You back few minutes today
15:48:09 <slaweq> ok, thx for attending and have a great week everyone
15:48:13 <slaweq> #endmeeting