15:02:02 #startmeeting neutron_qos 15:02:03 Meeting started Tue Aug 25 15:02:02 2020 UTC and is due to finish in 60 minutes. The chair is ralonsoh. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:05 hi 15:02:06 The meeting name has been set to 'neutron_qos' 15:02:12 o/ 15:02:17 Hi ralonsoh 15:02:30 today we have one topic 15:02:31 ttps://bugs.launchpad.net/neutron/+bug/1882804 15:02:35 please lajoskatona 15:03:36 Last time we discussed it you proposed to check the Port BEFORE_UPDATE, to avoid rollbacks and so on 15:03:51 I checked and basically works fine :-) 15:03:56 ahhh yes 15:04:07 do you have the link to the patch? 15:04:23 https://review.opendev.org/#/q/topic:bug/1882804+(status:open+OR+status:merged) 15:04:47 today gibi went there and left really good comments both in tempest and in neutron patch 15:05:45 I'll review the n-lib and the neutron patch tomorrow morning 15:05:53 we should have the n-lib patch ASAP 15:06:03 and I think the most relevant is (as I see now at least) to do rollback in placement allocation if port update fails in neutron db operation or anytime after before_update 15:06:40 the neutron patch is under construction, so I hope today afternoon or tomorrow I can push a new version 15:06:46 exactly, if the placement call fails, we need to rollback the operation 15:07:33 that is clear, but the comment is to rollback the placement allocation change if neutron fails sometime later in port update operation 15:07:45 like imagine db error or something like that 15:08:32 you mean https://review.opendev.org/#/q/topic:bug/1882804+(status:open+OR+status:merged) 15:08:36 sorry 15:08:43 https://review.opendev.org/#/c/747774/1/neutron/services/qos/qos_plugin.py@250 15:08:46 this comment 15:08:58 to tell the truth I can't see if that is possible for all possible failures, or is there some mechanism in neutron to notify in case of failure? 15:09:30 yes that's it 15:10:22 yeah, we need to kind of mechanism to track the command failure 15:10:29 and in this case, revert the placement update 15:10:52 because executing this in the other way is not an option, right? 15:10:54 for example 15:11:00 1) request placement allocation 15:11:03 2) db operation 15:11:06 3) placement update 15:11:32 (3) if (2) succeeds 15:11:52 not really as allocation can change any time, so better to keep the GET allocation and PUt close in time 15:13:01 lajoskatona, but when you update the placement, you have a feedback 15:13:22 this is rest command 15:13:43 you send an update command over the instant value 15:13:50 I know this value can change 15:14:04 but you can have two operations in parallel 15:14:24 I don't know if I'm explaining myself well... 15:14:34 you can read the allocation 15:14:40 what do you mean in parallel? 15:14:53 for example, two update commands in parallel 15:14:59 parallel placement update and db operation? 15:15:06 no no 15:15:10 two port update commands 15:15:19 each one increasing the BW allocations 15:15:48 this is a risk you can assume 15:15:51 let me explain 15:16:05 you can oversubscribe the BW allocation 15:16:13 if you have a max BW of 100 15:16:23 and you have a current allocation of 90 15:16:38 and you increase two ports at the same time, adding 10 to each one 15:16:53 maybe you'll finish those commands with an allocation of 110 15:17:12 if you do those 3 steps 15:17:23 read, db operation, placement update 15:17:29 but this can be documented 15:17:48 yeah that's possible, I am not sure if placement has some way to avoid such things 15:18:41 by default, if you add a limit, placement won't let you overflow it 15:19:22 yes, that's true as the max is added to the rp on the compute 15:20:01 IMO, it's easier in this case to rollback manually the DB operations 15:20:26 so if (3) fails, revert (2) 15:20:34 but if the placement update is after the db operation in case placement update fails the db must be rolled back 15:20:46 yes 15:20:55 but how many times that will happen? 15:21:20 if you read that you have 10mbps available 15:21:31 and you increase the port BW this amount 15:21:51 and then you update the placement, how many times you'll have an error there? 15:22:24 good question, but it can happen :-) 15:22:52 yes, in case you execute port qos update commands in parallel 15:23:01 and as it is a remote operation it can take time with retries (due to generation conflicts i.e.) 15:23:13 not just qos update 15:23:46 allocation is for instance and that can be changed several ways like memory/disk whatever change 15:23:56 or the vm is moved to new host 15:24:30 and those are at least retries with new get allocation and new put allocation 15:24:54 https://docs.openstack.org/api-ref/placement/?expanded=update-allocations-detail#update-allocations 15:25:34 the terrible thing is that placement has this generation thing which is really good but for allocations theres more than enough :-) 15:27:00 it is not possible to send an "increase" command 15:27:10 you need first to read the value and then send the new one 15:27:15 so it can time consuming and the result of update is not sure till placement operation is finished 15:27:42 that is in the neutron-lib patch basically 15:28:41 https://review.opendev.org/#/c/741452/5/neutron_lib/placement/client.py 15:29:07 so placement should be a restfull API but it is not 15:29:09 update_qos_minbw_allocation awaits only the diff and adds that to the fetched values from placement 15:29:57 IMO, but this is out of scope, we should have an "atomic" operation to increase/decrease allocations 15:30:03 that's more efficient 15:30:04 those are always good discussions with them:-) Mostly they were the API sig at a time 15:30:12 sure 15:30:42 ok so 15:30:48 in your approach 15:31:00 you need to handle the case of a DB failure 15:33:02 yes, and currently I don't see way for that 15:34:02 if, eventually you run out of ideas or time, try my idea if you want it 15:34:58 you mean fetch allocation - db change - change placement allocation ? 15:35:12 yes 15:36:12 what do you think about this summary: https://bugs.launchpad.net/neutron/+bug/1882804/comments/4 15:36:13 Launchpad bug 1882804 in neutron "RFE: allow replacing the QoS policy of bound port" [Wishlist,Confirmed] - Assigned to Lajos Katona (lajos-katona) 15:36:45 it is not exactly that one, but as allocation update can be time consuming nearly the same 15:38:01 now the neutron-lib patch do the GET allocation and PUT allocation in one method to keep GET and PUT close 15:38:48 so you are using a new field in the port to set the update transaction status 15:40:02 that was part of that thinking to show the user if update is still in progress 15:40:38 do you think that is really necessary? 15:40:47 JKust thinking loudly 15:40:55 Just-^ 15:41:17 I always prefer to use the DB to lock a transaction 15:41:33 this is adding an extra transaction parameter 15:41:49 different to any other parameter (not standard in Neutron) 15:42:59 that's true, but the placement REST operations must be out of db 15:43:22 I mean not in the lock to avoid stopping every other db operations 15:44:50 not big fan of this idea 15:44:59 but I don't see a better 15:45:03 better one* 15:45:24 you should document in the dev docs what this flag means 15:46:03 update_status or similar 15:46:36 yeah, valid, if needed, for other DB objects 15:47:11 you mean other than port? 15:47:43 yes, in other feature 15:47:55 just to implement something standard 15:50:40 that is an extension that "extends" not just ports but i.e. networks, subnets.... 15:50:55 am I right? 15:51:34 if needed, yes 15:51:42 but let's focus now in the port 15:52:00 yeah, that can be a first step 15:53:23 thanks ralonsoh, I go a check my previous attempt to have the allocation update after db update, and push that when I make it work again :-) 15:53:34 sure 15:53:42 I'll review the patch then 15:55:30 thanks for your time, I think we can close the meeting if there's no other thing 15:55:58 yw 15:56:06 ok, let's finish for today 15:56:07 thanks! 15:56:16 Bye 15:56:17 #endmeeting