15:02:02 <ralonsoh> #startmeeting neutron_qos
15:02:03 <openstack> Meeting started Tue Aug 25 15:02:02 2020 UTC and is due to finish in 60 minutes.  The chair is ralonsoh. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:02:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:02:05 <ralonsoh> hi
15:02:06 <openstack> The meeting name has been set to 'neutron_qos'
15:02:12 <lajoskatona> o/
15:02:17 <lajoskatona> Hi ralonsoh
15:02:30 <ralonsoh> today we have one topic
15:02:31 <ralonsoh> ttps://bugs.launchpad.net/neutron/+bug/1882804
15:02:35 <ralonsoh> please lajoskatona
15:03:36 <lajoskatona> Last time we discussed it you proposed to check the Port BEFORE_UPDATE, to avoid rollbacks and so on
15:03:51 <lajoskatona> I checked and basically works fine :-)
15:03:56 <ralonsoh> ahhh yes
15:04:07 <ralonsoh> do you have the link to the patch?
15:04:23 <lajoskatona> https://review.opendev.org/#/q/topic:bug/1882804+(status:open+OR+status:merged)
15:04:47 <lajoskatona> today gibi went there and left really good comments both in tempest and in neutron patch
15:05:45 <ralonsoh> I'll review the n-lib and the neutron patch tomorrow morning
15:05:53 <ralonsoh> we should have the n-lib patch ASAP
15:06:03 <lajoskatona> and I think the most relevant is (as I see now at least) to do rollback in placement allocation if port update fails in neutron db operation or anytime after before_update
15:06:40 <lajoskatona> the neutron patch is under construction, so I hope today afternoon or tomorrow I can push a new version
15:06:46 <ralonsoh> exactly, if the placement call fails, we need to rollback the operation
15:07:33 <lajoskatona> that is clear, but the comment is to rollback the placement allocation change if neutron fails sometime later in port update operation
15:07:45 <lajoskatona> like imagine db error or something like that
15:08:32 <ralonsoh> you mean https://review.opendev.org/#/q/topic:bug/1882804+(status:open+OR+status:merged)
15:08:36 <ralonsoh> sorry
15:08:43 <ralonsoh> https://review.opendev.org/#/c/747774/1/neutron/services/qos/qos_plugin.py@250
15:08:46 <ralonsoh> this comment
15:08:58 <lajoskatona> to tell the truth I can't see if that is possible for all possible failures, or is there some mechanism in neutron to notify in case of failure?
15:09:30 <lajoskatona> yes that's it
15:10:22 <ralonsoh> yeah, we need to kind of mechanism to track the command failure
15:10:29 <ralonsoh> and in this case, revert the placement update
15:10:52 <ralonsoh> because executing this in the other way is not an option, right?
15:10:54 <ralonsoh> for example
15:11:00 <ralonsoh> 1) request placement allocation
15:11:03 <ralonsoh> 2) db operation
15:11:06 <ralonsoh> 3) placement update
15:11:32 <ralonsoh> (3) if (2) succeeds
15:11:52 <lajoskatona> not really as allocation can change any time, so better to keep the GET allocation and PUt close in time
15:13:01 <ralonsoh> lajoskatona, but when you update the placement, you have a feedback
15:13:22 <ralonsoh> this is rest command
15:13:43 <ralonsoh> you send an update command over the instant value
15:13:50 <ralonsoh> I know this value can change
15:14:04 <ralonsoh> but you can have two operations in parallel
15:14:24 <ralonsoh> I don't know if I'm explaining myself well...
15:14:34 <ralonsoh> you can read the allocation
15:14:40 <lajoskatona> what do you mean in parallel?
15:14:53 <ralonsoh> for example, two update commands in parallel
15:14:59 <lajoskatona> parallel placement update and db operation?
15:15:06 <ralonsoh> no no
15:15:10 <ralonsoh> two port update commands
15:15:19 <ralonsoh> each one increasing the BW allocations
15:15:48 <ralonsoh> this is a risk you can assume
15:15:51 <ralonsoh> let me explain
15:16:05 <ralonsoh> you can oversubscribe the BW allocation
15:16:13 <ralonsoh> if you have a max BW of 100
15:16:23 <ralonsoh> and you have a current allocation of 90
15:16:38 <ralonsoh> and you increase two ports at the same time, adding 10 to each one
15:16:53 <ralonsoh> maybe you'll finish those commands with an allocation of 110
15:17:12 <ralonsoh> if you do those 3 steps
15:17:23 <ralonsoh> read, db operation, placement update
15:17:29 <ralonsoh> but this can be documented
15:17:48 <lajoskatona> yeah that's possible, I am not sure if placement has some way to avoid such things
15:18:41 <ralonsoh> by default, if you add a limit, placement won't let you overflow it
15:19:22 <lajoskatona> yes, that's true as the max is added to the rp on the compute
15:20:01 <ralonsoh> IMO, it's easier in this case to rollback manually the DB operations
15:20:26 <ralonsoh> so if (3) fails, revert (2)
15:20:34 <lajoskatona> but if the placement update is after the db operation in case placement update fails the db must be rolled back
15:20:46 <ralonsoh> yes
15:20:55 <ralonsoh> but how many times that will happen?
15:21:20 <ralonsoh> if you read that you have 10mbps available
15:21:31 <ralonsoh> and you increase the port BW this amount
15:21:51 <ralonsoh> and then you update the placement, how many times you'll have an error there?
15:22:24 <lajoskatona> good question, but it can happen :-)
15:22:52 <ralonsoh> yes, in case you execute port qos update commands in parallel
15:23:01 <lajoskatona> and as it is a remote operation it can take time with retries (due to generation conflicts i.e.)
15:23:13 <lajoskatona> not just qos update
15:23:46 <lajoskatona> allocation is for instance and that can be changed several ways like memory/disk whatever change
15:23:56 <lajoskatona> or the vm is moved to new host
15:24:30 <lajoskatona> and those are at least retries with new get allocation and new put allocation
15:24:54 <lajoskatona> https://docs.openstack.org/api-ref/placement/?expanded=update-allocations-detail#update-allocations
15:25:34 <lajoskatona> the terrible thing is that placement has this generation thing which is really good but for allocations theres more than enough :-)
15:27:00 <ralonsoh> it is not possible to send an "increase" command
15:27:10 <ralonsoh> you need first to read the value and then send the new one
15:27:15 <lajoskatona> so it can time consuming and the result of update is not sure till placement operation is finished
15:27:42 <lajoskatona> that is in the neutron-lib patch basically
15:28:41 <lajoskatona> https://review.opendev.org/#/c/741452/5/neutron_lib/placement/client.py
15:29:07 <ralonsoh> so placement should be a restfull API but it is not
15:29:09 <lajoskatona> update_qos_minbw_allocation awaits only the diff and adds that to the fetched values from placement
15:29:57 <ralonsoh> IMO, but this is out of scope, we should have an "atomic" operation to increase/decrease allocations
15:30:03 <ralonsoh> that's more efficient
15:30:04 <lajoskatona> those are always good discussions with them:-) Mostly they were the API sig at a time
15:30:12 <ralonsoh> sure
15:30:42 <ralonsoh> ok so
15:30:48 <ralonsoh> in your approach
15:31:00 <ralonsoh> you need to handle the case of a DB failure
15:33:02 <lajoskatona> yes, and currently I don't see way for that
15:34:02 <ralonsoh> if, eventually you run out of ideas or time, try my idea if you want it
15:34:58 <lajoskatona> you mean fetch allocation - db change - change placement allocation ?
15:35:12 <ralonsoh> yes
15:36:12 <lajoskatona> what do you think about this summary: https://bugs.launchpad.net/neutron/+bug/1882804/comments/4
15:36:13 <openstack> Launchpad bug 1882804 in neutron "RFE: allow replacing the QoS policy of bound port" [Wishlist,Confirmed] - Assigned to Lajos Katona (lajos-katona)
15:36:45 <lajoskatona> it is not exactly that one, but as allocation update can be  time consuming nearly the same
15:38:01 <lajoskatona> now the neutron-lib patch do the GET allocation and PUT allocation in one method to keep GET and PUT close
15:38:48 <ralonsoh> so you are using a new field in the port to set the update transaction status
15:40:02 <lajoskatona> that was part of that thinking to show the user if update is still in progress
15:40:38 <lajoskatona> do you think that is really necessary?
15:40:47 <lajoskatona> JKust thinking loudly
15:40:55 <lajoskatona> Just-^
15:41:17 <ralonsoh> I always prefer to use the DB to lock a transaction
15:41:33 <ralonsoh> this is adding an extra transaction parameter
15:41:49 <ralonsoh> different to any other parameter (not standard in Neutron)
15:42:59 <lajoskatona> that's true, but the placement REST operations must be out of db
15:43:22 <lajoskatona> I mean not in the lock to avoid stopping every other db operations
15:44:50 <ralonsoh> not big fan of this idea
15:44:59 <ralonsoh> but I don't see a better
15:45:03 <ralonsoh> better one*
15:45:24 <ralonsoh> you should document in the dev docs what this flag means
15:46:03 <lajoskatona> update_status or similar
15:46:36 <ralonsoh> yeah, valid, if needed, for other DB objects
15:47:11 <lajoskatona> you mean other than port?
15:47:43 <ralonsoh> yes, in other feature
15:47:55 <ralonsoh> just to implement something standard
15:50:40 <lajoskatona> that is an extension that "extends" not just ports but i.e. networks, subnets....
15:50:55 <lajoskatona> am I right?
15:51:34 <ralonsoh> if needed, yes
15:51:42 <ralonsoh> but let's focus now in the port
15:52:00 <lajoskatona> yeah, that can be a first step
15:53:23 <lajoskatona> thanks ralonsoh, I go a check my previous attempt to have the allocation update after db update, and push that when I make it work again :-)
15:53:34 <ralonsoh> sure
15:53:42 <ralonsoh> I'll review the patch then
15:55:30 <lajoskatona> thanks for your time, I think we can close the meeting if there's no other thing
15:55:58 <ralonsoh> yw
15:56:06 <ralonsoh> ok, let's finish for today
15:56:07 <ralonsoh> thanks!
15:56:16 <lajoskatona> Bye
15:56:17 <ralonsoh> #endmeeting