14:00:07 <slaweq> #startmeeting neutron_drivers 14:00:08 <openstack> Meeting started Fri Nov 13 14:00:07 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:09 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:12 <openstack> The meeting name has been set to 'neutron_drivers' 14:00:15 <mlavalle> o/ 14:00:20 <lajoskatona> Hi 14:00:20 <slaweq> welcome everyone! 14:00:26 <obondarev> hi 14:01:15 <slaweq> lets wait few more minutes for other people to join 14:01:32 <ralonsoh> hi 14:02:44 <slaweq> ping haleyb: njohnston amotoki yamamoto 14:02:54 <slaweq> maybe they will join us soon 14:04:05 <mlavalle> don't pay them today 14:04:17 <obondarev> :D 14:05:13 <slaweq> LOL 14:05:35 <haleyb> hi, sorry i'm late 14:07:35 <slaweq> ok, lets start 14:07:42 <slaweq> we quorum already 14:07:57 <slaweq> and we have only one topic in the on demand agenda for today 14:08:01 <slaweq> no new RFEs to discuss 14:08:05 <slaweq> #topi On Demand 14:08:08 <slaweq> #topic On Demand 14:08:23 <slaweq> obondarev: topic was added by You so please go on :) 14:09:07 <obondarev> yep, it's about https://bugs.launchpad.net/neutron/+bug/1887523 14:09:08 <openstack> Launchpad bug 1887523 in neutron "Deadlock detection code can be stale" [High,In progress] 14:09:10 <obondarev> so let me briefly describe my point 14:09:42 <obondarev> I think the bug with it's current title, description and suggested approach is not quite correct 14:10:06 <obondarev> "Deadlock detection code can be stale" - don't think so 14:10:24 <obondarev> "neutron has it's own implementation of it which is missing a bunch of deadlocks" - not True 14:11:11 <obondarev> so indeed neutron has a few decorators on top of oslo.db one 14:11:13 <lajoskatona> yeah perhaps "neutron has its own extra decorators around oslo decorator" 14:11:30 <obondarev> but they were added for a reason 14:11:36 <obondarev> for many reasons in fact 14:12:05 <obondarev> neutron had a long history of fighting against DB errors in concurrent scenarios 14:12:29 <obondarev> git blame may shed some light on it: https://github.com/openstack/neutron/blame/stable/rocky/neutron/db/api.py 14:12:36 <obondarev> just a few examples 14:12:45 <obondarev> https://bugs.launchpad.net/neutron/+bug/1596075 14:12:47 <openstack> Launchpad bug 1596075 in neutron "Neutron confused about overlapping subnet creation" [High,Fix released] - Assigned to Kevin Benton (kevinbenton) 14:12:52 <obondarev> https://bugs.launchpad.net/neutron/+bug/1612798 14:12:53 <openstack> Launchpad bug 1612798 in neutron "Move db retry logic closer to where DB error occur" [Critical,Fix released] - Assigned to Kevin Benton (kevinbenton) 14:13:13 <obondarev> https://github.com/openstack/neutron/commit/948461c8b2fbeb30e4fa3a43cc523cff76327d4e 14:13:31 <obondarev> most were quite a tricky ones 14:14:00 <obondarev> so I don't think moving back to oslo.db decorator is the right thing to do 14:14:24 <obondarev> there were not much work done in oslo.db: https://github.com/openstack/oslo.db/commits/master/oslo_db/api.py 14:14:36 <mlavalle> you mean lately? 14:14:43 <obondarev> right 14:14:45 <obondarev> thanks 14:15:05 <obondarev> in fact oslo.db retries were started by neutron folks) 14:15:11 <mlavalle> LOL 14:15:11 <obondarev> IIRC 14:15:29 <obondarev> and in fact neutron retry logic now handles more cases 14:15:48 <slaweq> but from what is written in the bug description we are missing some of deadlocks and not handling them properly 14:15:50 <obondarev> some more info could be found here: https://github.com/openstack/neutron/blob/master/doc/source/contributor/internals/retries.rst 14:16:20 <obondarev> slaweq, correct, so I think the bug should be about quota deadlocks 14:16:29 <slaweq> maybe we should change this LP to something like "compare our implementation with oslo db and update our where it's needed" 14:16:29 <obondarev> not about bad neutron retry logic 14:17:05 <mlavalle> mhhhhh.... 14:17:06 <obondarev> at least the bug should clearly show where neutron retries are bad 14:17:33 <lajoskatona> yeah that 's perhaps missing 14:17:43 <obondarev> as if we just start replacing retry_if_session_inactive - we may got regressions 14:17:53 <mlavalle> in principle I might agree wioth you obondarev. However, I want to point out that Mohammed is the CEO of a big operator 14:17:54 <obondarev> wjich might be hard to spot 14:18:00 <lajoskatona> if I undrstand well from summit generally there are scaling issues with neutron and one thing was that db issues can be behind that 14:18:09 <obondarev> even with Loki service plugin 14:18:48 <mlavalle> so at the very least I would like to get more input from mnaser and hear more about his point of view 14:18:51 <obondarev> so in the bug comment and proposed patch I agree that quota retries have issues 14:19:25 <ralonsoh> let me remember you that we are still migrating to the new DB engine facade 14:19:35 <ralonsoh> that will remove subtransactions 14:19:37 <mlavalle> that is also true 14:19:42 <obondarev> but please let's not just replace retry_if_session_inactive for oslo-db.wrap_db_retries all over neutron 14:19:47 <ralonsoh> and other issues related to mixing both facades 14:20:18 <ralonsoh> so I recommend start investigating this after completing the migration 14:20:37 <obondarev> ralonsoh, so there is no evidence what's exactly wrong with quota deadlock issue, right? 14:20:53 <ralonsoh> I can't tell, sorry 14:20:59 <ralonsoh> but 14:21:26 <ralonsoh> with the new facade at least we know each thread has one single transaction without subtransactions 14:21:33 <ralonsoh> and the context is unique 14:21:36 <ralonsoh> (per thread) 14:21:45 <ralonsoh> e.g.: https://review.opendev.org/#/c/715315/20/neutron/tests/functional/services/portforwarding/test_port_forwarding.py 14:21:45 <patchbot> patch 715315 - neutron - Finish the new DB engine facade migration - 20 patch sets 14:22:24 <obondarev> great, so I'd suggest we put this bug on hold for now, and after new facade is there - try to reproduce and investigate 14:22:36 <ralonsoh> +1 to this 14:22:41 <mlavalle> I propose three steps: 14:23:09 <mlavalle> 1) finish the migration to the new db engine facade, as indicated by ralonsoh and obondarev 14:23:27 <mlavalle> 2) Limit this bug, for the time being, to the quota issue 14:23:58 <mlavalle> 3) Seek more input from mnaser. He may giove us some good insights, given his operational experience 14:24:06 <obondarev> +1 14:24:16 <slaweq> mlavalle++ 14:24:22 <lajoskatona> sounds good 14:24:24 <ralonsoh> +1 to this proposal 14:24:47 <slaweq> and also if there will be another similar issues to what we have now with quota, lets treat them separately as regular bugs 14:24:50 * mnaser is happy to chime in -- just ping me via email/ml (cc'd directly) or any other way :) 14:25:05 <ralonsoh> we can always increase the scope of 2) to other code places 14:25:14 <mlavalle> thanks mnaser ! 14:25:18 <njohnston> makes sense to me 14:26:04 <slaweq> ok, so I think we have agreement on that for now 14:26:15 <slaweq> I will sum it up in the LP comment after the meeting 14:26:29 <lajoskatona> thanks slaweq 14:26:30 <obondarev> thanks slaweq 14:27:50 <slaweq> so I think this topic is done for today, right? 14:27:57 <obondarev> + 14:27:57 <mlavalle> I think so 14:28:14 <slaweq> do You have anything else You want to discuss today? if not, I will give You 30 minutes back :) 14:28:34 <mlavalle> yaaay, weekened is now a bit closer! 14:28:41 <slaweq> LOL 14:28:50 <slaweq> for me it's almost there :) 14:28:53 <lajoskatona> it's time :-) 14:28:56 <obondarev> same here) 14:28:58 <slaweq> ok, thx for attending the meeting 14:29:01 <ralonsoh> bye! enjoy the weekend 14:29:06 <slaweq> and have a great weekend! 14:29:06 <njohnston> o/ 14:29:07 <obondarev> bye! 14:29:08 <lajoskatona> Bye 14:29:09 <mlavalle> o/ 14:29:09 <slaweq> #endmeeting