15:00:39 <slaweq> #startmeeting neutron_ci 15:00:39 <opendevmeet> Meeting started Tue Aug 22 15:00:39 2023 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:39 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:39 <opendevmeet> The meeting name has been set to 'neutron_ci' 15:00:50 <ralonsoh> hi 15:01:05 <mlavalle> o/ 15:01:09 <slaweq> ping bcafarel, lajoskatona, mlavalle, mtomaska, ralonsoh, ykarel, jlibosva, elvira 15:01:11 <slaweq> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:01:23 <mlavalle> slaweq: irc or video? 15:01:26 <ykarel> o/ 15:01:26 <bcafarel> o/ 15:01:36 <slaweq> mlavalle irc today 15:01:54 <slaweq> I think we can start 15:01:56 <slaweq> #topic Actions from previous meetings 15:02:09 <slaweq> ralonsoh to check failed neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_keepalived_multiple_sighups_does_not_forfeit_primary test 15:02:25 <ralonsoh> no, I didn't start with this one 15:02:26 <ralonsoh> sorry 15:02:41 <slaweq> no worries 15:02:58 <slaweq> I didn't saw this issue recently so maybe we can just wait for new occurences 15:03:01 <slaweq> wdyt? 15:03:05 <slaweq> and maybe then get back to this 15:03:09 <ralonsoh> ok for me 15:03:49 <slaweq> ok, so next one 15:03:50 <slaweq> mtomaska to check failing neutron-functional-with-sqlalchemy-master periodic job 15:03:55 <mtomaska> https://review.opendev.org/c/openstack/neutron/+/890939 15:04:08 <mtomaska> should be fixed when that patch merges 15:04:12 <lajoskatona> o/ 15:04:15 <slaweq> thx for the fix mtomaska, I see it's in the gate now 15:04:54 <slaweq> so, last one from previous meeting: 15:04:55 <slaweq> lajoskatona will send DNM patch for neutron-dynamic-routing to check jobs 15:05:18 <lajoskatona> It was sent and frickler actually found the issue in os-ken 15:05:38 <lajoskatona> so I sent the DNM and frickler done the rest of the work 15:06:02 <slaweq> thx lajoskatona and frickler 15:06:07 <slaweq> so are we good now with it? 15:06:08 <frickler> and quite a journey it was ;) 15:06:12 <slaweq> or is it in progress? 15:06:19 <frickler> we still need to test after os-ken release 15:06:29 <frickler> because it isn't self testing as mentioned earlier 15:06:29 <lajoskatona> if we have the os-ken release it should be fine 15:06:40 <frickler> I only tested on the held node I used for debugging 15:06:41 <slaweq> ok :) 15:06:56 <ykarel> or may be fix the jobs so os-ken in patch get's used 15:06:56 <slaweq> thx a lot to both of You 15:07:40 <slaweq> ykarel do You mean to use os-ken from master in those jobs? 15:08:09 <ralonsoh> that was discussed in the previous meeting 15:08:25 <ykarel> slaweq, iiuc what frickler mean the jobs running against os-ken patches not using those patches but released version 15:08:25 <ralonsoh> n-d-r job in os-ken CI is not isntalling the tested patch 15:08:28 <slaweq> ahh, sorry. I probably missed it then 15:08:42 <ykarel> hmm i was also out last meeting so might be missing context 15:09:08 <ykarel> so i meant we should instead fix the jobs to work with os-ken patches and not wait for actual release to test :) 15:09:26 <frickler> well the release is due this week anyway 15:09:39 <frickler> but fixing the job would be a good task to do, too 15:09:41 <ykarel> me also not sure if this is a regression or it never worked for os-ken 15:09:56 <slaweq> ok, so this needs to be fixed indeed 15:12:37 <slaweq> frickler can You maybe check it this week and open LP if we need to fix jobs in os-ken? 15:12:47 <slaweq> so we can track it at least and not forget about it 15:13:04 <frickler> well I'm pretty sure that it is broken 15:13:24 <frickler> ralonsoh was the one who wanted to take another look and open the bug 15:13:33 <slaweq> ok, thx 15:13:42 <slaweq> so ralonsoh will You open LP for it? 15:13:44 <ralonsoh> yes, I'll check that is broken in the CI execution 15:13:50 <ralonsoh> yes, after checking the logs 15:14:09 <slaweq> thx 15:14:27 <slaweq> #action ralonsoh to check n-d-r os-ken jobs and open LP related to it 15:14:38 <slaweq> ok, I think we can move on 15:14:39 <slaweq> #topic Stable branches 15:14:49 <slaweq> bcafarel anything new/urgent? 15:15:07 <bcafarel> no, all good overall :) 15:15:23 <bcafarel> recent backports passed gates smoothly, up to ussuri 15:16:19 <slaweq> ok 15:16:24 <slaweq> so I think we can move on then 15:16:30 <slaweq> #topic Stadium projects 15:16:48 <slaweq> anything to discuss here? except n-d-r and os-ken which we already talked about 15:17:00 <lajoskatona> We discussed n-d-r, so that is one thing to keep an eye on 15:17:11 <lajoskatona> the other topic is bagpipe 15:17:27 <lajoskatona> it is failing with SQLAlchemy 2, I proposed a patch: https://review.opendev.org/c/openstack/networking-bagpipe/+/891325 15:17:59 <lajoskatona> but some test still fails randomly for the sfc driver, so I have to spend some more time with it 15:18:38 <lajoskatona> that's it for the stadiums 15:18:50 <slaweq> are those random failures also related to SQLAlchemy 2.0? or something different? 15:19:28 <lajoskatona> no I see them only with sqlalchemy2 15:19:57 <slaweq> ok, so maybe ralonsoh and/or stephenfin will be able to help with them somehow 15:20:08 <ralonsoh> I'll try to find the issue there 15:20:14 <slaweq> thx a lot 15:20:27 <lajoskatona> thanks 15:20:56 <slaweq> next topic then 15:20:56 <slaweq> #topic Grafana 15:22:00 <slaweq> https://grafana.opendev.org/d/f913631585/neutron-failure-rate 15:22:08 <slaweq> I see that rally jobs were broken last week but it's fixed on rally side already 15:22:17 <slaweq> other than that it's as usual 15:23:24 <mlavalle> +1 15:24:13 <slaweq> I think we can move on then 15:24:15 <slaweq> #topic Rechecks 15:24:51 <slaweq> it was a bit better last week already but then there was this issue with rally and issue with GLOBAL_VENV in devstack which made it a bit worst 15:25:06 <opendevreview> Merged openstack/neutron master: [sqlalchemy-20] TableClause.insert constructs Insert object https://review.opendev.org/c/openstack/neutron/+/890939 15:25:09 <slaweq> but those problems are already fixed so I think it's pretty ok in overall 15:25:38 <slaweq> so I think we can move on to talk about some specific failures 15:25:44 <slaweq> #topic fullstack/functional 15:26:01 <slaweq> here I found one new (for me) failure in the neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_maintenance.TestMaintenance.test_port_forwarding 15:26:08 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8a2/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-pyroute2-master/8a279fa/testr_results.html 15:26:36 <slaweq> it was in periodic job so it's not related to any patch in progress 15:27:07 <slaweq> anyone wants to check it deeper maybe? Or if not we can wait if it will happen more often 15:27:07 <ralonsoh> this is a callback, could be just a race condition 15:27:20 <ralonsoh> I can check it and maybe limit the check to the expected call 15:27:30 <slaweq> ralonsoh++ thx a lot 15:27:52 <slaweq> #action ralonsoh to check failure in the neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_maintenance.TestMaintenance.test_port_forwarding 15:28:23 <slaweq> and that's all regarding functional/fullstack tests 15:28:25 <slaweq> #topic Tempest/Scenario 15:28:38 <slaweq> here I noticed kernel panic in guest vm (again?): 15:28:45 <slaweq> https://cbf8616008e0e2c2dfec-9346de3bff5d83c6d90eefafd8632b44.ssl.cf1.rackcdn.com/884474/13/check/tempest-integrated-networking/0e81b62/testr_results.html 15:29:02 <slaweq> I'm not really even sure what Cirros version was used there 15:29:22 <slaweq> so maybe it's not an issue at all but just wanted to highlight here that I saw it again 15:29:30 <lajoskatona> cirros 6.2 15:29:46 <slaweq> so should be good, right? 15:29:54 <slaweq> maybe it's new issue then, idk 15:30:00 <ykarel> no shouldn't be related to cirros 6.2 15:30:23 <ykarel> i recall it's an old issue and it's workedaround in our job by using uec images 15:30:55 <slaweq> ykarel - possibly as this issue was in the tempest-integrated-networking job which is in tempest repo 15:31:01 <ykarel> yeap 15:31:13 <slaweq> so if that will happen more often we will maybe need to propose same workaround in that job too 15:31:24 <slaweq> lets keep an eye on it for now 15:31:30 <slaweq> is that ok for You? 15:31:42 <ykarel> +1 15:31:50 <ralonsoh> +1 15:31:55 <lajoskatona> +1 15:32:07 <slaweq> thx 15:32:11 <slaweq> so next topic 15:32:13 <slaweq> #topic grenade 15:32:33 <slaweq> I saw (again just once but wanted to mention it) some issue related to keystone: https://53ec660a16b30e470118-779b81139f4f29276caf956abf2a020f.ssl.cf2.rackcdn.com/890939/3/gate/neutron-ovs-grenade-dvr-multinode/f868b9c/controller/logs/grenade.sh_log.txt 15:32:59 <slaweq> did You saw something like that already? Is it something what we should report maybe to the keystone team? 15:33:16 <ralonsoh> yes, could be usefull for them to know this 15:33:31 <ralonsoh> doesn't seem to be related to Neutron 15:33:42 <slaweq> ok, I will let know about it to knikolla 15:34:50 <slaweq> #topic Periodic 15:35:07 <slaweq> here I saw 2 issues which we need to handle somehow: 15:35:18 <slaweq> fullstack fips job broken: https://zuul.openstack.org/build/b87d8c3037a1417193c865bc576ac593 15:35:30 <slaweq> and Centos 9 Stream jobs broken: 15:35:30 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_cbf/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-ovs-master-centos-9-stream/cbf72a9/job-output.txt 15:35:30 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_533/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-ovs-release-fips/5331cd4/job-output.txt 15:35:45 <slaweq> anyone wants to check those? 15:36:07 <ykarel> sounds related to GLOBAL_VENV thing 15:36:20 <ralonsoh> right 15:36:35 * haleyb noticed the centos9 job too trying to recreate a bug, but didn't dig into it 15:36:49 <ralonsoh> I'll check the centos9 error 15:37:07 <frickler> cf. https://review.opendev.org/c/openstack/tempest/+/891517 15:37:24 <slaweq> ok, I will check fullstack fips job then 15:37:39 <ralonsoh> ok so the centos9 issue seems to be solve there 15:37:40 <slaweq> #action ralonsoh to check Centos 9 stream jobs failures 15:37:59 <slaweq> thx frickler 15:38:13 <slaweq> #action slaweq to check fips fullstack job failures 15:38:38 <slaweq> that's all regarding periodic jobs from me 15:38:43 <slaweq> #topic On Demand 15:38:51 <slaweq> do You have anything else to discuss today? 15:39:22 <ykarel> just one thing as more people are here 15:39:31 <ykarel> i raised it over the patch https://review.opendev.org/c/openstack/neutron/+/892134 15:40:05 <ralonsoh> I think I addressed your comment 15:40:08 <ralonsoh> right? 15:40:21 <ralonsoh> I removed the experimental job 15:40:29 <ykarel> ralonsoh, yeap related to duplicating jobs in periodic/experimental and check 15:40:48 <ralonsoh> yeah, let's have it only in check queue 15:41:05 <ykarel> but i had a concern to avoid blocking CI with such jobs if master commits from sqlalchemy and alembic 15:41:11 <slaweq> ralonsoh but I also agree with ykarel that this job maybe should be non-voting one in check queue 15:41:25 <ralonsoh> at this point, that should always work 15:41:35 <ralonsoh> we should not include anything not compatible with sqlalchemy 2.0 15:42:04 <ralonsoh> but if you agree on this, I'll mark it as non-voting 15:42:05 <slaweq> yeah, but the point is - will sqlalchemy not merge anything breaking for us? :) 15:42:31 <ralonsoh> ok, I'll push a new patch marking it as non-voting 15:43:01 <haleyb> but we should pay attention to job during review :) 15:43:30 <ykarel> yeap non-voting jobs might get unnoticed 15:43:51 <slaweq> so maybe keep it voting for now and we can always switch it to non-voting in case of any problems from sqlalchemy side 15:43:57 <ralonsoh> perfect 15:44:12 <ralonsoh> so as is now 15:44:19 <haleyb> +1 15:44:22 <slaweq> ok 15:44:23 <ykarel> ok and hope it get's all good +1 15:44:36 <slaweq> I also have one additional topic/announcement for today 15:45:08 <slaweq> as You probably noticed, I'm chair of this CI meeting for quite some time (6+ years already if I'm not mistaken) 15:45:20 <slaweq> and recently I though it would be good to pass it to someone else 15:45:35 <slaweq> so starting next week ykarel will be our new chair of the CI meeting 15:45:47 <ralonsoh> slaweq, thanks for all these years! 15:45:51 <slaweq> thx ykarel for stepping up in this role :) 15:45:56 <ralonsoh> and thanks ykarel for steeping up! 15:46:06 <lajoskatona> thanks for the efforts to keep these topics in focus 15:46:15 <ykarel> Thanks slaweq for all your efforts for all your efforts in those years 15:46:44 <mlavalle> thanks for leading the meeting for so long slaweq 15:46:49 <mlavalle> and welcome yka 15:46:53 <mlavalle> ykarel: 15:47:06 <ykarel> thx everyone 15:47:12 <slaweq> and that's all from me for today 15:47:19 <lajoskatona> welcome ykarel as chair of this meeting:-) 15:47:26 <slaweq> if there are no other topics, I will give You back few minutes today 15:48:09 <slaweq> ok, thx for attending and have a great week everyone 15:48:13 <slaweq> #endmeeting