15:02:02 <ralonsoh> #startmeeting neutron_qos 15:02:03 <openstack> Meeting started Tue Aug 25 15:02:02 2020 UTC and is due to finish in 60 minutes. The chair is ralonsoh. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:05 <ralonsoh> hi 15:02:06 <openstack> The meeting name has been set to 'neutron_qos' 15:02:12 <lajoskatona> o/ 15:02:17 <lajoskatona> Hi ralonsoh 15:02:30 <ralonsoh> today we have one topic 15:02:31 <ralonsoh> ttps://bugs.launchpad.net/neutron/+bug/1882804 15:02:35 <ralonsoh> please lajoskatona 15:03:36 <lajoskatona> Last time we discussed it you proposed to check the Port BEFORE_UPDATE, to avoid rollbacks and so on 15:03:51 <lajoskatona> I checked and basically works fine :-) 15:03:56 <ralonsoh> ahhh yes 15:04:07 <ralonsoh> do you have the link to the patch? 15:04:23 <lajoskatona> https://review.opendev.org/#/q/topic:bug/1882804+(status:open+OR+status:merged) 15:04:47 <lajoskatona> today gibi went there and left really good comments both in tempest and in neutron patch 15:05:45 <ralonsoh> I'll review the n-lib and the neutron patch tomorrow morning 15:05:53 <ralonsoh> we should have the n-lib patch ASAP 15:06:03 <lajoskatona> and I think the most relevant is (as I see now at least) to do rollback in placement allocation if port update fails in neutron db operation or anytime after before_update 15:06:40 <lajoskatona> the neutron patch is under construction, so I hope today afternoon or tomorrow I can push a new version 15:06:46 <ralonsoh> exactly, if the placement call fails, we need to rollback the operation 15:07:33 <lajoskatona> that is clear, but the comment is to rollback the placement allocation change if neutron fails sometime later in port update operation 15:07:45 <lajoskatona> like imagine db error or something like that 15:08:32 <ralonsoh> you mean https://review.opendev.org/#/q/topic:bug/1882804+(status:open+OR+status:merged) 15:08:36 <ralonsoh> sorry 15:08:43 <ralonsoh> https://review.opendev.org/#/c/747774/1/neutron/services/qos/qos_plugin.py@250 15:08:46 <ralonsoh> this comment 15:08:58 <lajoskatona> to tell the truth I can't see if that is possible for all possible failures, or is there some mechanism in neutron to notify in case of failure? 15:09:30 <lajoskatona> yes that's it 15:10:22 <ralonsoh> yeah, we need to kind of mechanism to track the command failure 15:10:29 <ralonsoh> and in this case, revert the placement update 15:10:52 <ralonsoh> because executing this in the other way is not an option, right? 15:10:54 <ralonsoh> for example 15:11:00 <ralonsoh> 1) request placement allocation 15:11:03 <ralonsoh> 2) db operation 15:11:06 <ralonsoh> 3) placement update 15:11:32 <ralonsoh> (3) if (2) succeeds 15:11:52 <lajoskatona> not really as allocation can change any time, so better to keep the GET allocation and PUt close in time 15:13:01 <ralonsoh> lajoskatona, but when you update the placement, you have a feedback 15:13:22 <ralonsoh> this is rest command 15:13:43 <ralonsoh> you send an update command over the instant value 15:13:50 <ralonsoh> I know this value can change 15:14:04 <ralonsoh> but you can have two operations in parallel 15:14:24 <ralonsoh> I don't know if I'm explaining myself well... 15:14:34 <ralonsoh> you can read the allocation 15:14:40 <lajoskatona> what do you mean in parallel? 15:14:53 <ralonsoh> for example, two update commands in parallel 15:14:59 <lajoskatona> parallel placement update and db operation? 15:15:06 <ralonsoh> no no 15:15:10 <ralonsoh> two port update commands 15:15:19 <ralonsoh> each one increasing the BW allocations 15:15:48 <ralonsoh> this is a risk you can assume 15:15:51 <ralonsoh> let me explain 15:16:05 <ralonsoh> you can oversubscribe the BW allocation 15:16:13 <ralonsoh> if you have a max BW of 100 15:16:23 <ralonsoh> and you have a current allocation of 90 15:16:38 <ralonsoh> and you increase two ports at the same time, adding 10 to each one 15:16:53 <ralonsoh> maybe you'll finish those commands with an allocation of 110 15:17:12 <ralonsoh> if you do those 3 steps 15:17:23 <ralonsoh> read, db operation, placement update 15:17:29 <ralonsoh> but this can be documented 15:17:48 <lajoskatona> yeah that's possible, I am not sure if placement has some way to avoid such things 15:18:41 <ralonsoh> by default, if you add a limit, placement won't let you overflow it 15:19:22 <lajoskatona> yes, that's true as the max is added to the rp on the compute 15:20:01 <ralonsoh> IMO, it's easier in this case to rollback manually the DB operations 15:20:26 <ralonsoh> so if (3) fails, revert (2) 15:20:34 <lajoskatona> but if the placement update is after the db operation in case placement update fails the db must be rolled back 15:20:46 <ralonsoh> yes 15:20:55 <ralonsoh> but how many times that will happen? 15:21:20 <ralonsoh> if you read that you have 10mbps available 15:21:31 <ralonsoh> and you increase the port BW this amount 15:21:51 <ralonsoh> and then you update the placement, how many times you'll have an error there? 15:22:24 <lajoskatona> good question, but it can happen :-) 15:22:52 <ralonsoh> yes, in case you execute port qos update commands in parallel 15:23:01 <lajoskatona> and as it is a remote operation it can take time with retries (due to generation conflicts i.e.) 15:23:13 <lajoskatona> not just qos update 15:23:46 <lajoskatona> allocation is for instance and that can be changed several ways like memory/disk whatever change 15:23:56 <lajoskatona> or the vm is moved to new host 15:24:30 <lajoskatona> and those are at least retries with new get allocation and new put allocation 15:24:54 <lajoskatona> https://docs.openstack.org/api-ref/placement/?expanded=update-allocations-detail#update-allocations 15:25:34 <lajoskatona> the terrible thing is that placement has this generation thing which is really good but for allocations theres more than enough :-) 15:27:00 <ralonsoh> it is not possible to send an "increase" command 15:27:10 <ralonsoh> you need first to read the value and then send the new one 15:27:15 <lajoskatona> so it can time consuming and the result of update is not sure till placement operation is finished 15:27:42 <lajoskatona> that is in the neutron-lib patch basically 15:28:41 <lajoskatona> https://review.opendev.org/#/c/741452/5/neutron_lib/placement/client.py 15:29:07 <ralonsoh> so placement should be a restfull API but it is not 15:29:09 <lajoskatona> update_qos_minbw_allocation awaits only the diff and adds that to the fetched values from placement 15:29:57 <ralonsoh> IMO, but this is out of scope, we should have an "atomic" operation to increase/decrease allocations 15:30:03 <ralonsoh> that's more efficient 15:30:04 <lajoskatona> those are always good discussions with them:-) Mostly they were the API sig at a time 15:30:12 <ralonsoh> sure 15:30:42 <ralonsoh> ok so 15:30:48 <ralonsoh> in your approach 15:31:00 <ralonsoh> you need to handle the case of a DB failure 15:33:02 <lajoskatona> yes, and currently I don't see way for that 15:34:02 <ralonsoh> if, eventually you run out of ideas or time, try my idea if you want it 15:34:58 <lajoskatona> you mean fetch allocation - db change - change placement allocation ? 15:35:12 <ralonsoh> yes 15:36:12 <lajoskatona> what do you think about this summary: https://bugs.launchpad.net/neutron/+bug/1882804/comments/4 15:36:13 <openstack> Launchpad bug 1882804 in neutron "RFE: allow replacing the QoS policy of bound port" [Wishlist,Confirmed] - Assigned to Lajos Katona (lajos-katona) 15:36:45 <lajoskatona> it is not exactly that one, but as allocation update can be time consuming nearly the same 15:38:01 <lajoskatona> now the neutron-lib patch do the GET allocation and PUT allocation in one method to keep GET and PUT close 15:38:48 <ralonsoh> so you are using a new field in the port to set the update transaction status 15:40:02 <lajoskatona> that was part of that thinking to show the user if update is still in progress 15:40:38 <lajoskatona> do you think that is really necessary? 15:40:47 <lajoskatona> JKust thinking loudly 15:40:55 <lajoskatona> Just-^ 15:41:17 <ralonsoh> I always prefer to use the DB to lock a transaction 15:41:33 <ralonsoh> this is adding an extra transaction parameter 15:41:49 <ralonsoh> different to any other parameter (not standard in Neutron) 15:42:59 <lajoskatona> that's true, but the placement REST operations must be out of db 15:43:22 <lajoskatona> I mean not in the lock to avoid stopping every other db operations 15:44:50 <ralonsoh> not big fan of this idea 15:44:59 <ralonsoh> but I don't see a better 15:45:03 <ralonsoh> better one* 15:45:24 <ralonsoh> you should document in the dev docs what this flag means 15:46:03 <lajoskatona> update_status or similar 15:46:36 <ralonsoh> yeah, valid, if needed, for other DB objects 15:47:11 <lajoskatona> you mean other than port? 15:47:43 <ralonsoh> yes, in other feature 15:47:55 <ralonsoh> just to implement something standard 15:50:40 <lajoskatona> that is an extension that "extends" not just ports but i.e. networks, subnets.... 15:50:55 <lajoskatona> am I right? 15:51:34 <ralonsoh> if needed, yes 15:51:42 <ralonsoh> but let's focus now in the port 15:52:00 <lajoskatona> yeah, that can be a first step 15:53:23 <lajoskatona> thanks ralonsoh, I go a check my previous attempt to have the allocation update after db update, and push that when I make it work again :-) 15:53:34 <ralonsoh> sure 15:53:42 <ralonsoh> I'll review the patch then 15:55:30 <lajoskatona> thanks for your time, I think we can close the meeting if there's no other thing 15:55:58 <ralonsoh> yw 15:56:06 <ralonsoh> ok, let's finish for today 15:56:07 <ralonsoh> thanks! 15:56:16 <lajoskatona> Bye 15:56:17 <ralonsoh> #endmeeting