14:01:29 <lajoskatona> #startmeeting neutron_drivers
14:01:29 <opendevmeet> Meeting started Fri May 13 14:01:29 2022 UTC and is due to finish in 60 minutes.  The chair is lajoskatona. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:29 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:29 <opendevmeet> The meeting name has been set to 'neutron_drivers'
14:01:33 <mlavalle> o/
14:01:33 <lajoskatona> o/
14:01:37 <slaweq> o/
14:01:43 <mtomaska> o/
14:01:50 <haleyb> hi
14:01:54 <amotoki> hi
14:02:07 <ralonsoh> hi
14:03:03 <lajoskatona> I think we can start
14:03:29 <lajoskatona> We have one RFE for today and one more topic from ralonsoh
14:03:32 <lajoskatona> [RFE][fwaas][OVN]support l3 firewall for ovn driver (#link https://bugs.launchpad.net/neutron/+bug/1971958)
14:03:34 <damiandabrowski[m]> hi!
14:04:23 <ralonsoh> (nice and complex RFE)
14:04:23 <lajoskatona> this RFE is as I see is the official thing for why Inspur arrived to maintain neutron-fwaas
14:04:59 <slaweq> yeah, IMO it should be straightforward as that's the reason why we revived neutron-fwaas mostly
14:05:23 <lajoskatona> I am not sure if we have to ask a spec for this where details can be discussed or just let it flow ?
14:05:40 <mlavalle> yes, the way I see this, is that we already approved this RFE
14:06:11 <ralonsoh> I would ask, in the LP, for the goals in this RFE
14:06:20 <ralonsoh> what are they expecting from fwaas in OVN
14:06:43 <lajoskatona> ralonsoh: true that can be listed to have clear focus
14:06:58 <mlavalle> +1
14:07:02 <amotoki> +1
14:07:13 <slaweq> maybe we should open Blueprint to track all of that work?
14:07:43 <lajoskatona> yes we can I think
14:07:49 <amotoki> i think a blueprint would be useful
14:07:56 <mlavalle> +1
14:08:57 <lajoskatona> ok, so let's accept it with a blueprint to track progress, and ask details in LP bug for the exact goals
14:09:28 <slaweq> +1
14:09:38 <amotoki> +1
14:09:52 <mlavalle> +1
14:10:00 <ralonsoh> +1
14:10:06 <haleyb> +1
14:10:33 <lajoskatona> thanks, let's move on
14:10:43 <lajoskatona> #topic On Demand Agenda
14:10:51 <lajoskatona> (ralonsoh): (added on 14/5/22) https://bugs.launchpad.net/neutron/+bug/1973049: with the SQLAlchemy 2.0 compatibility patch in place (n-lib 2.21.0), there are some issues when an API call (e.g.: user command) updates the same resource as another RPC call (agent call). This is very frequent with port status and FIP status. Sometimes (I still need to investigate the reason), the retry command detects that there is another SQ
14:10:51 <lajoskatona> ansaction in progress (https://paste.opendev.org/show/beHzMCKjVLmE5nX0ZXZX/).
14:10:57 <ralonsoh> thanks
14:10:57 <lajoskatona> A long sentence
14:11:06 <ralonsoh> summary:
14:11:07 <ralonsoh> This is a recurrent problem in the Neutron server when updating a resource that has "standardattributes". If a concurrent update is done, the DB (SQLAlchemy) will return a "StaleDataError" exception. E.g.: [1]
14:11:07 <ralonsoh> """
14:11:07 <ralonsoh> UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched.
14:11:07 <ralonsoh> """
14:11:34 <ralonsoh> that usually happens when a port or FIP is updated from two different agents
14:11:48 <ralonsoh> or the RPC (agent) and API call (user)
14:11:54 <slaweq> because of revision numbers, right?
14:11:59 <ralonsoh> yes
14:12:01 <lajoskatona> but we have this only since sqlalchemy 2.0 work ws merged?
14:12:06 <ralonsoh> no no
14:12:16 <ralonsoh> this is something that has been happening always
14:12:22 <lajoskatona> ahh, ok ,thanks
14:12:27 <mlavalle> it's a long standing thing
14:12:28 <slaweq> I think it is also with older SQLAlchemy
14:12:40 <ralonsoh> I'm not sure
14:12:49 <ralonsoh> because this is related to the object mapper
14:12:50 <ralonsoh> one sec
14:13:07 <ralonsoh> __mapper_args__ = {
14:13:07 <ralonsoh> # see http://docs.sqlalchemy.org/en/latest/orm/versioning.html for
14:13:07 <ralonsoh> # details about how this works
14:13:07 <ralonsoh> "version_id_col": id,
14:13:15 <ralonsoh> (this is in neutron-lib)
14:13:52 <ralonsoh> sqlalchemy will use (not ID, this is my change) "revision_number" as version_id_col
14:14:10 <ralonsoh> my proposal: https://bugs.launchpad.net/neutron/+bug/1973049/comments/2
14:15:46 <slaweq> does your comment means that in case of such conflict, we will not have correct revision number for resource (it will be one less that it should be)?
14:15:58 <slaweq> or am I misunderstanding something there?
14:16:00 <ralonsoh> slaweq, we'll have the latest one updated
14:16:34 <ralonsoh> but yes, it will be one less
14:16:46 <mlavalle> why keep the revision number at all, then?
14:16:46 <ralonsoh> in case of two commands
14:17:02 <slaweq> so if e.g. 2 workers will want to update same port and each of them will try to bump revision number to let's say from 9 to 10, both will do updates of the port but revision number will be 10 instead of 11 finally, right?
14:17:21 <ralonsoh> yes
14:17:32 <slaweq> mlavalle: revision_numbers are pretty important e.g. for ovn driver
14:17:33 <ralonsoh> mlavalle, this is something I would need to check
14:17:51 <slaweq> but it's in api too so I'm not sure we can remove it really
14:17:56 <ralonsoh> exactly... so this could be intrusive
14:18:07 <slaweq> maybe other software, like e.g. heat relies on it, idk
14:18:12 <lajoskatona> yes it can hit all clients I suppose
14:18:14 <mlavalle> I wasn't proposing removing revision numbers
14:18:31 <mlavalle> it was a rethroical question. Let me be more straight:
14:19:05 <mlavalle> revision_numbers protect us from multiple updates taking place at once
14:19:16 <mlavalle> isn't that true?
14:19:52 <ralonsoh> not really
14:20:16 <ralonsoh> well, it will prevent the DB from accepting the transaction
14:20:20 <ralonsoh> it will rollback
14:21:43 <slaweq> I'm not sure if we can do things like that, to e.g. not update revision number in some cases as this will break that described behaviour https://docs.openstack.org/api-ref/network/v2/#revisions IMO
14:21:51 <lajoskatona> as I see it is like a poor man's transaction guard :-) you ask the resource check the rev number and update with it and if the server check is different for the number the operation fails
14:22:03 <mlavalle> yeap
14:22:04 <ralonsoh> this is only enabled with service revisions and version_check=True
14:22:47 <ralonsoh> ok, I'll review this proposal again
14:22:56 <ralonsoh> I won't spend more of your time
14:23:18 <slaweq> can
14:23:20 <slaweq> sorry
14:23:38 <mlavalle> I think this revision number stuff was inspired by this: http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/
14:23:42 <mlavalle> originally
14:23:49 <slaweq> can't we e.g use some distributed lock to lock resource which is going to be updated
14:24:02 <slaweq> so that no other workers will try to update same resource in same time
14:24:02 <ralonsoh> nonono, no distributed locks
14:24:07 <ralonsoh> we have the DB for this
14:24:08 <mlavalle> it's the compare and swap approach
14:24:21 <ralonsoh> for reservations, we have our own engine
14:24:29 <ralonsoh> with a new driver in place now
14:24:36 <ralonsoh> no need for revisions
14:24:50 <ralonsoh> but OVN does, so we need to be careful on this
14:25:20 <obondarev> AFAIR rev numbers are needed for push notifications mechanism, for agent to be in sync with latest server updates
14:25:23 <slaweq> also API uses it, so that client can use If-Match header to check revisions of resources
14:25:50 <obondarev> (sorry for being late)
14:25:58 <ralonsoh> obondarev, right, agents consume that number
14:26:35 <ralonsoh> ok, I'm starting to think that this was a bad proposal...
14:26:55 <ralonsoh> (thanks for the brain storming)
14:27:21 <slaweq> ralonsoh: this is valid issue but IMO we have to keep revision numbers working as they are now and to be sure they are updated always
14:27:40 <ralonsoh> exactly
14:29:08 <lajoskatona> it was really enlighting
14:29:13 <ralonsoh> for sure
14:29:16 <ralonsoh> thank you all
14:29:26 <mlavalle> Let's also look at previous commits, to remeber the history. For example: https://review.opendev.org/c/openstack/neutron/+/423581
14:30:19 <mlavalle> just read the commit messager
14:30:28 <mlavalle> message^^^
14:30:33 <damiandabrowski[m]> sorry for interrupting but I need to leave in 5min
14:30:36 <damiandabrowski[m]> I just want to say that today I removed WIP tag from https://review.opendev.org/c/openstack/neutron/+/839671 so please have a look if You can. I think it may be the best possible solution
14:31:00 <ralonsoh> ok
14:31:08 <slaweq> damiandabrowski: sure
14:31:14 <lajoskatona> damiandabrowski[m]: thanks
14:32:34 <damiandabrowski[m]> thank You!
14:33:02 <lajoskatona> For https://bugs.launchpad.net/neutron/+bug/1973049/ does anybody more comments?
14:33:56 <slaweq> ralonsoh: one more question
14:34:00 <ralonsoh> suer
14:34:02 <ralonsoh> sure
14:34:14 <slaweq> can't we maybe somehow do something like:
14:34:44 <slaweq> UPDATE standardattributes SET revision_number = revision_number + 1 WHERE id = 42
14:34:56 <slaweq> instead of "get and update" like we do now?
14:35:13 <ralonsoh> but this is the goal: update what you retrieve from the DB in this transaction
14:35:16 <slaweq> wouldn't that solve the issue with those errors?
14:35:25 <ralonsoh> nope, that will introduce more
14:35:45 <ralonsoh> you are deleting the transactionality
14:36:02 <slaweq> ok
14:36:08 <slaweq> I was just thinking loud :)
14:37:07 <slaweq> so there is no way to tell mysql "increase for me this number by 1"?
14:37:17 <slaweq> and we need to do it in python?
14:37:20 <mlavalle> ralonsoh: to be clear, I don't oppose any efforts to improve DB and it's performance. But before we mess with the revision number mechanism, let's go back to the history (blog, commits) and all remind ourselves what we are dealing with
14:37:38 <mlavalle> so whatever change we make, we do it with our eyes wide open
14:37:46 <mlavalle> that's all
14:39:08 <lajoskatona> mlavalle: thanks
14:40:08 <lajoskatona> if no more comments or questions...
14:40:41 <lajoskatona> #endmeeting