15:01:05 <slaweq> #startmeeting neutron_ci 15:01:05 <opendevmeet> Meeting started Tue Nov 23 15:01:05 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:05 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:05 <opendevmeet> The meeting name has been set to 'neutron_ci' 15:01:10 <ralonsoh> hi 15:01:12 <slaweq> and welcome again :D 15:01:52 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:00 <obondarev> hi 15:02:47 <lajoskatona> Hi 15:02:58 <slaweq> ok, lets start 15:03:04 <slaweq> #topic Actions from previous meetings 15:03:13 <slaweq> lajoskatona to check why make FIP down took more than 120 seconds in the L3 agent 15:03:26 <slaweq> it was from two weeks but lajoskatona wasn't there last week 15:03:27 <slaweq> :) 15:03:42 <lajoskatona> I havent worked on it last week, sorry.... 15:03:51 <bcafarel> o/ again 15:04:05 <lajoskatona> The logs are open, but havent diged in them.... 15:04:37 <ykarel> o/ 15:04:46 <slaweq> hi ykarel :) 15:04:55 <obondarev> does it still happen in the gates? 15:05:02 <slaweq> lajoskatona: ok, so should I assign it to You this week? 15:05:29 <slaweq> obondarev: yes, I saw it last week once: https://08d14f4ddffb82b199e7-61a732188f1643f755755f84f6310584.ssl.cf1.rackcdn.com/817525/7/check/neutron-tempest-plugin-scenario-linuxbridge/e4b6cfb/testr_results.html 15:05:29 <lajoskatona> yes, I have to check it to close those tabs in the browser fially ;) 15:05:44 <slaweq> #action lajoskatona to check why make FIP down took more than 120 seconds in the L3 agent 15:05:45 <obondarev> slaweq: ack 15:05:57 <lajoskatona> slaweq: thanks 15:06:17 <slaweq> ok, next one 15:06:19 <slaweq> slaweq to remove ussuri jobs from neutron-tempest-plugin queues 15:06:25 <slaweq> Patch https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/818688 15:06:32 <slaweq> And it requires https://review.opendev.org/c/openstack/releases/+/818687 15:07:07 <slaweq> please review them when You will have few minutes 15:07:25 <slaweq> next one 15:07:27 <slaweq> slaweq to open LP for issue with neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovsdb_monitor.TestAgentMonitor.test_network_agent_present 15:07:33 <slaweq> Done: https://bugs.launchpad.net/neutron/+bug/1951225 15:07:48 <ralonsoh> BTW, we are currently working (again) on a refactor of the OVN agent code 15:07:54 <slaweq> jlibosva is already taking care of it 15:07:55 <ralonsoh> that could affect to this test 15:08:07 <ralonsoh> so I'll ping him then 15:08:11 <slaweq> I saw it couple of times this week again 15:08:25 <obondarev> +1 15:08:49 <slaweq> next one 15:08:51 <slaweq> slaweq to check failed neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon.test_handle_initial_state_backup 15:08:56 <slaweq> I didn't had time (still) 15:09:02 <slaweq> I will try to check it this week 15:09:09 <slaweq> #action slaweq to check failed neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon.test_handle_initial_state_backup 15:09:17 <slaweq> and last one 15:09:19 <slaweq> slaweq to send email about rechecks numbers to the ML 15:09:29 <slaweq> I did http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025813.html 15:09:35 <slaweq> and there is good feedback there 15:09:55 <slaweq> I will collect all of that for next week's meeting and we can discuss all ideas during the meeting next week 15:10:06 <ralonsoh> slaweq++ 15:10:42 <slaweq> #topic Stable branches 15:10:42 <slaweq> bcafarel: any updates? 15:10:48 <lajoskatona> the release team is now busy with yoga-1 I suppose so perhaps the release patch will be slow to land, I will ping them to be sure 15:11:03 <bcafarel> overall quite good this week 15:11:05 <lajoskatona> sorry this is for the tempest-plugin stuff 15:11:05 <slaweq> lajoskatona: thx 15:11:22 <bcafarel> with https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/817933 merged (thanks slaweq) for older branches 15:11:36 <slaweq> good to hear that 15:11:48 <bcafarel> for rocky we may need to update a bit the filtered out tests https://review.opendev.org/c/openstack/neutron/+/808502 has a lot of rechecks 15:11:59 <slaweq> yeah, I saw it 15:12:09 <bcafarel> but >=stein are quite good, and queens too so only one branch :) 15:12:42 <lajoskatona> by the way I have this stein patch for odl: https://review.opendev.org/c/openstack/networking-odl/+/818384 15:13:07 <slaweq> it's for rocky, not stein :) 15:13:15 <slaweq> I think stein was merged recently, no? 15:13:19 <lajoskatona> oh really the stein one is merged.... 15:14:12 <slaweq> I will review that one tomorrow 15:14:16 <lajoskatona> thanks 15:14:40 <slaweq> so if that's all for stable branches, we can move on to the next topic 15:14:52 <slaweq> #topic Stadium projects 15:15:25 <lajoskatona> no news from them 15:16:50 <slaweq> ok, lets move on 15:16:56 <slaweq> #topic Grafana 15:17:31 <slaweq> from good news - scenario jobs are suprisely good recently :) 15:17:42 <slaweq> but there is bad news also: 15:17:55 <slaweq> it seems that functional tests are failing often again 15:18:14 <slaweq> and UT are also on high failure rate last few days 15:19:00 <ralonsoh> that could be due to failed patches 15:19:20 <obondarev> I also didn't see failed UT unrelated to the patch recently 15:19:22 <slaweq> ralonsoh: yes, when I was checking specific patches from last week I noticed that 15:19:35 <slaweq> there was a lot of patches with broken UT really because of the patch 15:19:53 <slaweq> so I think we just need to keep an eye on it for next days 15:20:16 <slaweq> but I also noticed few times timeout in the lower-constraints job, like: 15:20:21 <slaweq> https://zuul.opendev.org/t/openstack/build/93fa264c3b0e482590369e8b31fa7546/logs 15:20:21 <slaweq> https://zuul.opendev.org/t/openstack/build/b73ff16dc4ec4606a3244ee36f699b43/logs 15:20:21 <slaweq> https://zuul.opendev.org/t/openstack/build/bba203de96d54d95be30b7309a902a38 15:21:02 <lajoskatona> the l-c job timout was not increased last time? 15:21:14 <ralonsoh> I don't think so 15:21:18 <lajoskatona> ok 15:21:45 <slaweq> so maybe we should increase it too? 15:21:50 <slaweq> as for other jobs? 15:22:19 <ralonsoh> yes 15:22:23 <lajoskatona> yes 15:22:35 <lajoskatona> I can push it for review 15:22:38 <obondarev> it should be connected to UT timeout + some gap for coverage stuff 15:22:49 <lajoskatona> obondarev: +1 15:22:58 <obondarev> as far as I understand 15:23:00 <ralonsoh> right 15:23:25 <opendevreview> Terry Wilson proposed openstack/neutron master: WIP Use neutron db for ovn agents https://review.opendev.org/c/openstack/neutron/+/818850 15:24:25 <slaweq> thx 15:24:42 <slaweq> #action lajoskatona to increase timeout of the lower-constraints job in neutron 15:25:03 <slaweq> ok, lets move on 15:25:08 <slaweq> #topic fullstack/functional 15:25:31 <slaweq> in functional tests I saw mostly issues with those ovn agent tests 15:25:57 <slaweq> ralonsoh: maybe we should mark those tests as unstable temporary until jlibosva will fix it? 15:25:59 <slaweq> wdyt? 15:26:09 <ralonsoh> ok 15:26:24 <slaweq> ok, can You propose patch for that? 15:26:26 <ralonsoh> suer 15:26:29 <ralonsoh> sure 15:26:32 <slaweq> thx a lot 15:26:35 <jlibosva> is it reproducible well in the gate? I ran functional tests several times locally and didn't hit it even single time 15:26:54 <slaweq> jlibosva: maybe not "well" but it happens pretty often I would say 15:27:03 <slaweq> like e.g.: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_db9/816856/6/check/neutron-functional-with-uwsgi/db9ba2c/testr_results.html 15:27:11 <slaweq> probably You can find other similar examples 15:27:32 <ralonsoh> I'll do 15:27:40 <slaweq> thx ralonsoh 15:29:03 <obondarev> another: https://2636029923414340a2c8-da9bd67ce2281bb41bf3df3e007407f6.ssl.cf1.rackcdn.com/818067/2/check/neutron-functional-with-uwsgi/d20e295/testr_results.html 15:29:19 <ralonsoh> in any case, I'll wait to the OVN agent refactor 15:29:20 <opendevreview> yatin proposed openstack/neutron master: Fix tunnel_types in ml2 ovs sample config https://review.opendev.org/c/openstack/neutron/+/818911 15:29:43 <slaweq> thx obondarev 15:30:15 <slaweq> and that's basically only issue which I saw that was repeating last week in the functional job 15:30:29 <slaweq> regading fullstack, I proposed small improvement today https://review.opendev.org/c/openstack/neutron/+/818877 15:30:35 <slaweq> please check when You will have time 15:30:36 <jlibosva> it looks like the problem there is that OVN event gets processed before mech driver initialization is completed - which the agent refactor will fix only for the agent but will not fix the root cause of it 15:31:01 <obondarev> not sure it it's the same: https://7bed286b1abc124aef60-716c2febf7f730d66787c22f3ed0da3e.ssl.cf5.rackcdn.com/818067/2/check/neutron-functional-with-uwsgi/48adb0f/testr_results.html 15:31:46 <obondarev> seems some other issue 15:32:41 <jlibosva> obondarev: that looks like max_tunid is missing in NB_Global table when creating network - 15:32:43 <jlibosva> https://7bed286b1abc124aef60-716c2febf7f730d66787c22f3ed0da3e.ssl.cf5.rackcdn.com/818067/2/check/neutron-functional-with-uwsgi/48adb0f/controller/logs/dsvm-functional-logs/neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestProvnetPorts.test_network_segments_localnet_ports/testrun.txt 15:33:39 <slaweq> obondarev: isn't that related to https://bugs.launchpad.net/neutron/+bug/1903008 ? 15:34:06 <slaweq> or maybe it's something new 15:34:41 <obondarev> slaweq: not sure, sorry 15:36:29 <slaweq> it seems that the stacktrace is different there 15:36:36 <slaweq> I will check it 15:36:58 <slaweq> #action slaweq to check https://7bed286b1abc124aef60-716c2febf7f730d66787c22f3ed0da3e.ssl.cf5.rackcdn.com/818067/2/check/neutron-functional-with-uwsgi/48adb0f/testr_results.html 15:38:21 <slaweq> ok, I think we can move on to the next topics now 15:38:26 <slaweq> #topic Tempest/Scenario 15:38:54 <slaweq> here, I found 2 or 3 times same test failing neutron_tempest_plugin.scenario.test_mac_learning.MacLearningTest in the ovn scenario job 15:38:58 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b75/815962/5/check/neutron-tempest-plugin-scenario-ovn/b75474f/testr_results.html 15:38:58 <slaweq> https://b30211aa4f809fc4a91b-baf4f807d40559415da582760ebf9456.ssl.cf2.rackcdn.com/817525/7/check/neutron-tempest-plugin-scenario-ovn/c356679/testr_results.html 15:39:12 <slaweq> I think we should investigate why in that test ssh is not working from time to time 15:39:22 <slaweq> maybe it's coincidence, idk really 15:39:53 <slaweq> anyone want's to investigate that? 15:41:06 <slaweq> ok, if I will have time, I will try to check it 15:41:12 <slaweq> if not, let's wait for next week :) 15:41:17 <slaweq> I will also report LP for that 15:41:33 <slaweq> #action slaweq to report bug about failing neutron_tempest_plugin.scenario.test_mac_learning.MacLearningTest 15:41:52 <slaweq> and that's all what I had prepared for You for today 15:42:02 <slaweq> we have one on demand topic 15:42:03 <slaweq> #topic On Demand 15:42:11 <slaweq> lajoskatona: move on with what You wanted to discuss 15:42:53 <lajoskatona> it's about gate load 15:44:08 <lajoskatona> history: beginning of this year a list was sent with data that Neutron is one of the biggest user of gate resources, and I asked clarkb if we can repeat the execution 15:44:24 <lajoskatona> https://paste.opendev.org/show/bYTlHXfbX84aESK6cLMM/ 15:45:15 <ralonsoh> (I feel myself guilty looking at those figures) 15:45:16 <lajoskatona> the issue is that due to some zuul changes this is for only one executor so we have to have some other way to have real data for project's resource usage 15:45:50 <slaweq> but it's clear for me that neutron is still in the very top of that list :/ 15:45:59 <lajoskatona> In theory everything is availabel on https://graphite.opendev.org/ 15:46:29 <lajoskatona> yeah with clarkb we checked on graphite and the results from there correlate with the paste 15:47:16 <obondarev> and that's why we started this 'rechecks' brainstorm, right? 15:47:20 <lajoskatona> the suggestion is to create a grafana dashboard, which I started: https://review.opendev.org/c/openstack/project-config/+/818230 15:47:51 <lajoskatona> the issue with it is mostly my lack of grfana background :-) 15:48:55 <slaweq> I can't help with that :) 15:48:55 <slaweq> sorry 15:49:04 <lajoskatona> I suppose if we have fresh data from this we could better act to things like we have a lot of rechecks, and more jobs than other projects 15:49:45 <obondarev> +1 15:50:14 <obondarev> and to track out progress with improving those numbers 15:50:18 <obondarev> our* 15:50:29 <slaweq> +! 15:50:31 <slaweq> +1 15:50:32 <slaweq> :) 15:50:39 <ralonsoh> yes, good idea 15:50:41 <slaweq> thx for taking care of it lajoskatona 15:50:50 <ralonsoh> and waiting for slaweq's summery 15:50:55 <lajoskatona> ok, I will ask around to improve this dashboard and we can check that regularly 15:51:12 <slaweq> ++ 15:51:25 <opendevreview> Terry Wilson proposed openstack/neutron master: WIP Use neutron db for ovn agents https://review.opendev.org/c/openstack/neutron/+/818850 15:51:27 <lajoskatona> yeah I will answer for that mail too :-) 15:52:28 <slaweq> ok, I think we are good with that topic for now 15:52:35 <slaweq> anything else You want to discuss today? 15:52:48 <lajoskatona> nothing from me 15:52:58 <ralonsoh> nothing, thanks 15:53:13 <bcafarel> all good too 15:53:14 <slaweq> if not, we can finish the meeting I guess :) 15:53:21 <slaweq> thx for attending the meeting today 15:53:26 <slaweq> and have a great week 15:53:26 <obondarev> bye everyone! 15:53:29 <slaweq> #endmeeting