15:01:05 <slaweq> #startmeeting neutron_ci
15:01:05 <opendevmeet> Meeting started Tue Nov 23 15:01:05 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:05 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:05 <opendevmeet> The meeting name has been set to 'neutron_ci'
15:01:10 <ralonsoh> hi
15:01:12 <slaweq> and welcome again :D
15:01:52 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:00 <obondarev> hi
15:02:47 <lajoskatona> Hi
15:02:58 <slaweq> ok, lets start
15:03:04 <slaweq> #topic Actions from previous meetings
15:03:13 <slaweq> lajoskatona to check why make FIP down took more than 120 seconds in the L3 agent
15:03:26 <slaweq> it was from two weeks but lajoskatona wasn't there last week
15:03:27 <slaweq> :)
15:03:42 <lajoskatona> I havent worked on it last week, sorry....
15:03:51 <bcafarel> o/ again
15:04:05 <lajoskatona> The logs are open, but havent diged in them....
15:04:37 <ykarel> o/
15:04:46 <slaweq> hi ykarel :)
15:04:55 <obondarev> does it still happen in the gates?
15:05:02 <slaweq> lajoskatona: ok, so should I assign it to You this week?
15:05:29 <slaweq> obondarev: yes, I saw it last week once: https://08d14f4ddffb82b199e7-61a732188f1643f755755f84f6310584.ssl.cf1.rackcdn.com/817525/7/check/neutron-tempest-plugin-scenario-linuxbridge/e4b6cfb/testr_results.html
15:05:29 <lajoskatona> yes, I have to check it to close those tabs in the browser fially ;)
15:05:44 <slaweq> #action lajoskatona to check why make FIP down took more than 120 seconds in the L3 agent
15:05:45 <obondarev> slaweq: ack
15:05:57 <lajoskatona> slaweq: thanks
15:06:17 <slaweq> ok, next one
15:06:19 <slaweq> slaweq to remove ussuri jobs from neutron-tempest-plugin queues
15:06:25 <slaweq> Patch https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/818688
15:06:32 <slaweq> And it requires https://review.opendev.org/c/openstack/releases/+/818687
15:07:07 <slaweq> please review them when You will have few minutes
15:07:25 <slaweq> next one
15:07:27 <slaweq> slaweq to open LP for issue with neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovsdb_monitor.TestAgentMonitor.test_network_agent_present
15:07:33 <slaweq> Done: https://bugs.launchpad.net/neutron/+bug/1951225
15:07:48 <ralonsoh> BTW, we are currently working (again) on a refactor of the OVN agent code
15:07:54 <slaweq> jlibosva is already taking care of it
15:07:55 <ralonsoh> that could affect to this test
15:08:07 <ralonsoh> so I'll ping him then
15:08:11 <slaweq> I saw it couple of times this week again
15:08:25 <obondarev> +1
15:08:49 <slaweq> next one
15:08:51 <slaweq> slaweq to check failed neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon.test_handle_initial_state_backup
15:08:56 <slaweq> I didn't had time (still)
15:09:02 <slaweq> I will try to check it this week
15:09:09 <slaweq> #action slaweq to check failed neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon.test_handle_initial_state_backup
15:09:17 <slaweq> and last one
15:09:19 <slaweq> slaweq to send email about rechecks numbers to the ML
15:09:29 <slaweq> I did http://lists.openstack.org/pipermail/openstack-discuss/2021-November/025813.html
15:09:35 <slaweq> and there is good feedback there
15:09:55 <slaweq> I will collect all of that for next week's meeting and we can discuss all ideas during the meeting next week
15:10:06 <ralonsoh> slaweq++
15:10:42 <slaweq> #topic Stable branches
15:10:42 <slaweq> bcafarel: any updates?
15:10:48 <lajoskatona> the release team is now busy with yoga-1 I suppose so perhaps the release patch will be slow to land, I will ping them to be sure
15:11:03 <bcafarel> overall quite good this week
15:11:05 <lajoskatona> sorry this is for the tempest-plugin stuff
15:11:05 <slaweq> lajoskatona: thx
15:11:22 <bcafarel> with https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/817933 merged (thanks slaweq) for older branches
15:11:36 <slaweq> good to hear that
15:11:48 <bcafarel> for rocky we may need to update a bit the filtered out tests https://review.opendev.org/c/openstack/neutron/+/808502 has a lot of rechecks
15:11:59 <slaweq> yeah, I saw it
15:12:09 <bcafarel> but >=stein are quite good, and queens too so only one branch :)
15:12:42 <lajoskatona> by the way I have this stein patch for odl: https://review.opendev.org/c/openstack/networking-odl/+/818384
15:13:07 <slaweq> it's for rocky, not stein :)
15:13:15 <slaweq> I think stein was merged recently, no?
15:13:19 <lajoskatona> oh really the stein one is merged....
15:14:12 <slaweq> I will review that one tomorrow
15:14:16 <lajoskatona> thanks
15:14:40 <slaweq> so if that's all for stable branches, we can move on to the next topic
15:14:52 <slaweq> #topic Stadium projects
15:15:25 <lajoskatona> no news from them
15:16:50 <slaweq> ok, lets move on
15:16:56 <slaweq> #topic Grafana
15:17:31 <slaweq> from good news - scenario jobs are suprisely good recently :)
15:17:42 <slaweq> but there is bad news also:
15:17:55 <slaweq> it seems that functional tests are failing often again
15:18:14 <slaweq> and UT are also on high failure rate last few days
15:19:00 <ralonsoh> that could be due to failed patches
15:19:20 <obondarev> I also didn't see failed UT unrelated to the patch recently
15:19:22 <slaweq> ralonsoh: yes, when I was checking specific patches from last week I noticed that
15:19:35 <slaweq> there was a lot of patches with broken UT really because of the patch
15:19:53 <slaweq> so I think we just need to keep an eye on it for next days
15:20:16 <slaweq> but I also noticed few times timeout in the lower-constraints job, like:
15:20:21 <slaweq> https://zuul.opendev.org/t/openstack/build/93fa264c3b0e482590369e8b31fa7546/logs
15:20:21 <slaweq> https://zuul.opendev.org/t/openstack/build/b73ff16dc4ec4606a3244ee36f699b43/logs
15:20:21 <slaweq> https://zuul.opendev.org/t/openstack/build/bba203de96d54d95be30b7309a902a38
15:21:02 <lajoskatona> the l-c job timout was not increased last time?
15:21:14 <ralonsoh> I don't think so
15:21:18 <lajoskatona> ok
15:21:45 <slaweq> so maybe we should increase it too?
15:21:50 <slaweq> as for other jobs?
15:22:19 <ralonsoh> yes
15:22:23 <lajoskatona> yes
15:22:35 <lajoskatona> I can push it for review
15:22:38 <obondarev> it should be connected to UT timeout + some gap for coverage stuff
15:22:49 <lajoskatona> obondarev: +1
15:22:58 <obondarev> as far as I understand
15:23:00 <ralonsoh> right
15:23:25 <opendevreview> Terry Wilson proposed openstack/neutron master: WIP Use neutron db for ovn agents  https://review.opendev.org/c/openstack/neutron/+/818850
15:24:25 <slaweq> thx
15:24:42 <slaweq> #action lajoskatona to increase timeout of the lower-constraints job in neutron
15:25:03 <slaweq> ok, lets move on
15:25:08 <slaweq> #topic fullstack/functional
15:25:31 <slaweq> in functional tests I saw mostly issues with those ovn agent tests
15:25:57 <slaweq> ralonsoh: maybe we should mark those tests as unstable temporary until jlibosva will fix it?
15:25:59 <slaweq> wdyt?
15:26:09 <ralonsoh> ok
15:26:24 <slaweq> ok, can You propose patch for that?
15:26:26 <ralonsoh> suer
15:26:29 <ralonsoh> sure
15:26:32 <slaweq> thx a lot
15:26:35 <jlibosva> is it reproducible well in the gate? I ran functional tests several times locally and didn't hit it even single time
15:26:54 <slaweq> jlibosva: maybe not "well" but it happens pretty often I would say
15:27:03 <slaweq> like e.g.: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_db9/816856/6/check/neutron-functional-with-uwsgi/db9ba2c/testr_results.html
15:27:11 <slaweq> probably You can find other similar examples
15:27:32 <ralonsoh> I'll do
15:27:40 <slaweq> thx ralonsoh
15:29:03 <obondarev> another: https://2636029923414340a2c8-da9bd67ce2281bb41bf3df3e007407f6.ssl.cf1.rackcdn.com/818067/2/check/neutron-functional-with-uwsgi/d20e295/testr_results.html
15:29:19 <ralonsoh> in any case, I'll wait to the OVN agent refactor
15:29:20 <opendevreview> yatin proposed openstack/neutron master: Fix tunnel_types in ml2 ovs sample config  https://review.opendev.org/c/openstack/neutron/+/818911
15:29:43 <slaweq> thx obondarev
15:30:15 <slaweq> and that's basically only issue which I saw that was repeating last week in the functional job
15:30:29 <slaweq> regading fullstack, I proposed small improvement today https://review.opendev.org/c/openstack/neutron/+/818877
15:30:35 <slaweq> please check when You will have time
15:30:36 <jlibosva> it looks like the problem there is that OVN event gets processed before mech driver initialization is completed - which the agent refactor will fix only for the agent but will not fix the root cause of it
15:31:01 <obondarev> not sure it it's the same: https://7bed286b1abc124aef60-716c2febf7f730d66787c22f3ed0da3e.ssl.cf5.rackcdn.com/818067/2/check/neutron-functional-with-uwsgi/48adb0f/testr_results.html
15:31:46 <obondarev> seems some other issue
15:32:41 <jlibosva> obondarev: that looks like max_tunid is missing in NB_Global table when creating network -
15:32:43 <jlibosva> https://7bed286b1abc124aef60-716c2febf7f730d66787c22f3ed0da3e.ssl.cf5.rackcdn.com/818067/2/check/neutron-functional-with-uwsgi/48adb0f/controller/logs/dsvm-functional-logs/neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.test_mech_driver.TestProvnetPorts.test_network_segments_localnet_ports/testrun.txt
15:33:39 <slaweq> obondarev: isn't that related to https://bugs.launchpad.net/neutron/+bug/1903008 ?
15:34:06 <slaweq> or maybe it's something new
15:34:41 <obondarev> slaweq: not sure, sorry
15:36:29 <slaweq> it seems that the stacktrace is different there
15:36:36 <slaweq> I will check it
15:36:58 <slaweq> #action slaweq to check https://7bed286b1abc124aef60-716c2febf7f730d66787c22f3ed0da3e.ssl.cf5.rackcdn.com/818067/2/check/neutron-functional-with-uwsgi/48adb0f/testr_results.html
15:38:21 <slaweq> ok, I think we can move on to the next topics now
15:38:26 <slaweq> #topic Tempest/Scenario
15:38:54 <slaweq> here, I found 2 or 3 times same test failing neutron_tempest_plugin.scenario.test_mac_learning.MacLearningTest in the ovn scenario job
15:38:58 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b75/815962/5/check/neutron-tempest-plugin-scenario-ovn/b75474f/testr_results.html
15:38:58 <slaweq> https://b30211aa4f809fc4a91b-baf4f807d40559415da582760ebf9456.ssl.cf2.rackcdn.com/817525/7/check/neutron-tempest-plugin-scenario-ovn/c356679/testr_results.html
15:39:12 <slaweq> I think we should investigate why in that test ssh is not working from time to time
15:39:22 <slaweq> maybe it's coincidence, idk really
15:39:53 <slaweq> anyone want's to investigate that?
15:41:06 <slaweq> ok, if I will have time, I will try to check it
15:41:12 <slaweq> if not, let's wait for next week :)
15:41:17 <slaweq> I will also report LP for that
15:41:33 <slaweq> #action slaweq to report bug about failing neutron_tempest_plugin.scenario.test_mac_learning.MacLearningTest
15:41:52 <slaweq> and that's all what I had prepared for You for today
15:42:02 <slaweq> we have one on demand topic
15:42:03 <slaweq> #topic On Demand
15:42:11 <slaweq> lajoskatona: move on with what You wanted to discuss
15:42:53 <lajoskatona> it's about gate load
15:44:08 <lajoskatona> history: beginning of this year a list was sent with data that Neutron is one of the biggest user of gate resources, and I asked clarkb if we can repeat the execution
15:44:24 <lajoskatona> https://paste.opendev.org/show/bYTlHXfbX84aESK6cLMM/
15:45:15 <ralonsoh> (I feel myself guilty looking at those figures)
15:45:16 <lajoskatona> the issue is that due to some zuul changes this is for only one executor so we have to have some other way to have real data for project's resource usage
15:45:50 <slaweq> but it's clear for me that neutron is still in the very top of that list :/
15:45:59 <lajoskatona> In theory everything is availabel on https://graphite.opendev.org/
15:46:29 <lajoskatona> yeah with clarkb we checked on graphite and the results from there correlate with the paste
15:47:16 <obondarev> and that's why we started this 'rechecks' brainstorm, right?
15:47:20 <lajoskatona> the suggestion is to create a grafana dashboard, which I started: https://review.opendev.org/c/openstack/project-config/+/818230
15:47:51 <lajoskatona> the issue with it is mostly my lack of grfana background :-)
15:48:55 <slaweq> I can't help with that :)
15:48:55 <slaweq> sorry
15:49:04 <lajoskatona> I suppose if we have fresh data from this we could better act to things like we have a lot of rechecks, and more jobs than other projects
15:49:45 <obondarev> +1
15:50:14 <obondarev> and to track out progress with improving those numbers
15:50:18 <obondarev> our*
15:50:29 <slaweq> +!
15:50:31 <slaweq> +1
15:50:32 <slaweq> :)
15:50:39 <ralonsoh> yes, good idea
15:50:41 <slaweq> thx for taking care of it lajoskatona
15:50:50 <ralonsoh> and waiting for slaweq's summery
15:50:55 <lajoskatona> ok, I will ask around to improve this dashboard and we can check that regularly
15:51:12 <slaweq> ++
15:51:25 <opendevreview> Terry Wilson proposed openstack/neutron master: WIP Use neutron db for ovn agents  https://review.opendev.org/c/openstack/neutron/+/818850
15:51:27 <lajoskatona> yeah I will answer for that mail too :-)
15:52:28 <slaweq> ok, I think we are good with that topic for now
15:52:35 <slaweq> anything else You want to discuss today?
15:52:48 <lajoskatona> nothing from me
15:52:58 <ralonsoh> nothing, thanks
15:53:13 <bcafarel> all good too
15:53:14 <slaweq> if not, we can finish the meeting I guess :)
15:53:21 <slaweq> thx for attending the meeting today
15:53:26 <slaweq> and have a great week
15:53:26 <obondarev> bye everyone!
15:53:29 <slaweq> #endmeeting