14:01:29 #startmeeting neutron_drivers 14:01:29 Meeting started Fri May 13 14:01:29 2022 UTC and is due to finish in 60 minutes. The chair is lajoskatona. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:29 The meeting name has been set to 'neutron_drivers' 14:01:33 o/ 14:01:33 o/ 14:01:37 o/ 14:01:43 o/ 14:01:50 hi 14:01:54 hi 14:02:07 hi 14:03:03 I think we can start 14:03:29 We have one RFE for today and one more topic from ralonsoh 14:03:32 [RFE][fwaas][OVN]support l3 firewall for ovn driver (#link https://bugs.launchpad.net/neutron/+bug/1971958) 14:03:34 hi! 14:04:23 (nice and complex RFE) 14:04:23 this RFE is as I see is the official thing for why Inspur arrived to maintain neutron-fwaas 14:04:59 yeah, IMO it should be straightforward as that's the reason why we revived neutron-fwaas mostly 14:05:23 I am not sure if we have to ask a spec for this where details can be discussed or just let it flow ? 14:05:40 yes, the way I see this, is that we already approved this RFE 14:06:11 I would ask, in the LP, for the goals in this RFE 14:06:20 what are they expecting from fwaas in OVN 14:06:43 ralonsoh: true that can be listed to have clear focus 14:06:58 +1 14:07:02 +1 14:07:13 maybe we should open Blueprint to track all of that work? 14:07:43 yes we can I think 14:07:49 i think a blueprint would be useful 14:07:56 +1 14:08:57 ok, so let's accept it with a blueprint to track progress, and ask details in LP bug for the exact goals 14:09:28 +1 14:09:38 +1 14:09:52 +1 14:10:00 +1 14:10:06 +1 14:10:33 thanks, let's move on 14:10:43 #topic On Demand Agenda 14:10:51 (ralonsoh): (added on 14/5/22) https://bugs.launchpad.net/neutron/+bug/1973049: with the SQLAlchemy 2.0 compatibility patch in place (n-lib 2.21.0), there are some issues when an API call (e.g.: user command) updates the same resource as another RPC call (agent call). This is very frequent with port status and FIP status. Sometimes (I still need to investigate the reason), the retry command detects that there is another SQ 14:10:51 ansaction in progress (https://paste.opendev.org/show/beHzMCKjVLmE5nX0ZXZX/). 14:10:57 thanks 14:10:57 A long sentence 14:11:06 summary: 14:11:07 This is a recurrent problem in the Neutron server when updating a resource that has "standardattributes". If a concurrent update is done, the DB (SQLAlchemy) will return a "StaleDataError" exception. E.g.: [1] 14:11:07 """ 14:11:07 UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched. 14:11:07 """ 14:11:34 that usually happens when a port or FIP is updated from two different agents 14:11:48 or the RPC (agent) and API call (user) 14:11:54 because of revision numbers, right? 14:11:59 yes 14:12:01 but we have this only since sqlalchemy 2.0 work ws merged? 14:12:06 no no 14:12:16 this is something that has been happening always 14:12:22 ahh, ok ,thanks 14:12:27 it's a long standing thing 14:12:28 I think it is also with older SQLAlchemy 14:12:40 I'm not sure 14:12:49 because this is related to the object mapper 14:12:50 one sec 14:13:07 __mapper_args__ = { 14:13:07 # see http://docs.sqlalchemy.org/en/latest/orm/versioning.html for 14:13:07 # details about how this works 14:13:07 "version_id_col": id, 14:13:15 (this is in neutron-lib) 14:13:52 sqlalchemy will use (not ID, this is my change) "revision_number" as version_id_col 14:14:10 my proposal: https://bugs.launchpad.net/neutron/+bug/1973049/comments/2 14:15:46 does your comment means that in case of such conflict, we will not have correct revision number for resource (it will be one less that it should be)? 14:15:58 or am I misunderstanding something there? 14:16:00 slaweq, we'll have the latest one updated 14:16:34 but yes, it will be one less 14:16:46 why keep the revision number at all, then? 14:16:46 in case of two commands 14:17:02 so if e.g. 2 workers will want to update same port and each of them will try to bump revision number to let's say from 9 to 10, both will do updates of the port but revision number will be 10 instead of 11 finally, right? 14:17:21 yes 14:17:32 mlavalle: revision_numbers are pretty important e.g. for ovn driver 14:17:33 mlavalle, this is something I would need to check 14:17:51 but it's in api too so I'm not sure we can remove it really 14:17:56 exactly... so this could be intrusive 14:18:07 maybe other software, like e.g. heat relies on it, idk 14:18:12 yes it can hit all clients I suppose 14:18:14 I wasn't proposing removing revision numbers 14:18:31 it was a rethroical question. Let me be more straight: 14:19:05 revision_numbers protect us from multiple updates taking place at once 14:19:16 isn't that true? 14:19:52 not really 14:20:16 well, it will prevent the DB from accepting the transaction 14:20:20 it will rollback 14:21:43 I'm not sure if we can do things like that, to e.g. not update revision number in some cases as this will break that described behaviour https://docs.openstack.org/api-ref/network/v2/#revisions IMO 14:21:51 as I see it is like a poor man's transaction guard :-) you ask the resource check the rev number and update with it and if the server check is different for the number the operation fails 14:22:03 yeap 14:22:04 this is only enabled with service revisions and version_check=True 14:22:47 ok, I'll review this proposal again 14:22:56 I won't spend more of your time 14:23:18 can 14:23:20 sorry 14:23:38 I think this revision number stuff was inspired by this: http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/ 14:23:42 originally 14:23:49 can't we e.g use some distributed lock to lock resource which is going to be updated 14:24:02 so that no other workers will try to update same resource in same time 14:24:02 nonono, no distributed locks 14:24:07 we have the DB for this 14:24:08 it's the compare and swap approach 14:24:21 for reservations, we have our own engine 14:24:29 with a new driver in place now 14:24:36 no need for revisions 14:24:50 but OVN does, so we need to be careful on this 14:25:20 AFAIR rev numbers are needed for push notifications mechanism, for agent to be in sync with latest server updates 14:25:23 also API uses it, so that client can use If-Match header to check revisions of resources 14:25:50 (sorry for being late) 14:25:58 obondarev, right, agents consume that number 14:26:35 ok, I'm starting to think that this was a bad proposal... 14:26:55 (thanks for the brain storming) 14:27:21 ralonsoh: this is valid issue but IMO we have to keep revision numbers working as they are now and to be sure they are updated always 14:27:40 exactly 14:29:08 it was really enlighting 14:29:13 for sure 14:29:16 thank you all 14:29:26 Let's also look at previous commits, to remeber the history. For example: https://review.opendev.org/c/openstack/neutron/+/423581 14:30:19 just read the commit messager 14:30:28 message^^^ 14:30:33 sorry for interrupting but I need to leave in 5min 14:30:36 I just want to say that today I removed WIP tag from https://review.opendev.org/c/openstack/neutron/+/839671 so please have a look if You can. I think it may be the best possible solution 14:31:00 ok 14:31:08 damiandabrowski: sure 14:31:14 damiandabrowski[m]: thanks 14:32:34 thank You! 14:33:02 For https://bugs.launchpad.net/neutron/+bug/1973049/ does anybody more comments? 14:33:56 ralonsoh: one more question 14:34:00 suer 14:34:02 sure 14:34:14 can't we maybe somehow do something like: 14:34:44 UPDATE standardattributes SET revision_number = revision_number + 1 WHERE id = 42 14:34:56 instead of "get and update" like we do now? 14:35:13 but this is the goal: update what you retrieve from the DB in this transaction 14:35:16 wouldn't that solve the issue with those errors? 14:35:25 nope, that will introduce more 14:35:45 you are deleting the transactionality 14:36:02 ok 14:36:08 I was just thinking loud :) 14:37:07 so there is no way to tell mysql "increase for me this number by 1"? 14:37:17 and we need to do it in python? 14:37:20 ralonsoh: to be clear, I don't oppose any efforts to improve DB and it's performance. But before we mess with the revision number mechanism, let's go back to the history (blog, commits) and all remind ourselves what we are dealing with 14:37:38 so whatever change we make, we do it with our eyes wide open 14:37:46 that's all 14:39:08 mlavalle: thanks 14:40:08 if no more comments or questions... 14:40:41 #endmeeting