#openstack-meeting log

13:00:25 <Qiming> #startmeeting senlin
13:00:26 <openstack> Meeting started Tue Aug  1 13:00:25 2017 UTC and is due to finish in 60 minutes.  The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:00:29 <openstack> The meeting name has been set to 'senlin'
13:00:41 <Qiming> evening
13:00:52 <chohoor> Hi
13:00:53 <ruijie> hi Qiming
13:01:05 <Qiming> hi, chohoor, ruijie
13:01:55 <Qiming> evening, elynn
13:02:02 <elynn> Hi Qiming
13:02:38 <Qiming> I don't have specific agenda for this meeting
13:02:53 <elynn> How is your princess ? Saw your pics in wechat.
13:02:55 <Qiming> if you have topics, feel free to add them here: https://wiki.openstack.org/wiki/Meetings/SenlinAgenda
13:03:07 <Qiming> veeeeeeeery naughty
13:03:13 <Qiming> uncontrollable
13:03:21 <elynn> I dont ,just to ask about my patch.
13:03:52 <elynn> haha, your gene is in it
13:03:57 <Qiming> that big one
13:03:59 <ruijie> I got questions about zone placement policy & health manager ~
13:04:04 <elynn> about https://review.openstack.org/#/c/467108/
13:04:06 <Qiming> +677, -343
13:04:21 <elynn> Will any one of you could review and test....
13:04:41 <Qiming> have you tested it?
13:04:50 <elynn> I test some basic functions, create ports ,create floating IPs, at least it works fine
13:05:10 <Qiming> great
13:05:16 <elynn> basic update.
13:06:22 <elynn> I will add you all to review this patch later :)
13:06:34 <Qiming> that is fair
13:06:49 <elynn> When is the RC?
13:07:18 <chohoor> I want to discuss some details about fast scaling and standby nodes.
13:07:28 <elynn> Also I'm working on k8s this week, hopefully I can get a running k8s this week with kubeadm.
13:07:31 <Qiming> chohoor, later
13:07:44 <Qiming> RC1 is next week
13:08:05 <elynn> That's all from my part :)
13:08:12 <Qiming> okay, thanks
13:08:27 <Qiming> final rc is aug 28-sep 01
13:08:50 <Qiming> oh, no, aug 21-aug 25
13:08:58 <Qiming> aug 28 is final release
13:09:13 <elynn> okay, I see.
13:09:37 <Qiming> okay, let's switch to chohoor's proposal
13:09:43 <Qiming> fast scaling of clusters
13:09:46 <Qiming> it is a great idea
13:09:58 <Qiming> although there are some details to  be discussed
13:10:25 <chohoor> I have write a spec in doc.
13:10:40 <Qiming> I've read it
13:11:49 <chohoor> In todo list, the num of standby nodes is max_size - desired_capacity, but I think if the max_size is too big....
13:11:52 <Qiming> say I have a cluster created: min=10, max=20
13:12:09 <Qiming> how many standby nodes should we create at the very beginning?
13:13:06 <chohoor> i think it shuould specify by user.
13:13:16 <Qiming> right, that was my point
13:13:28 <chohoor> but less than max_size
13:14:02 <Qiming> maybe we can put that constraint aside
13:14:22 <Qiming> let's focus on user's experience
13:14:45 <Qiming> if I'm a user, am I supposed to be aware of how many standby nodes there?
13:14:47 <chohoor> that's ok
13:15:34 <Qiming> can we share standby nodes among two or more clusters?
13:15:37 <chohoor> yes, I think use should know that.
13:16:00 <chohoor> No, the profile is diffrent.
13:16:08 <Qiming> profile can be the same
13:16:21 <ruijie> the properties are very different
13:16:24 <Qiming> say I have created two clusters from the same profile
13:16:28 <ruijie> like the disk type, network .etc
13:16:49 <Qiming> right, they could be ...
13:17:18 <chohoor> maybe the cluster should have own standby nodes.
13:17:52 <Qiming> okay ... if we agree that standby nodes are still members of a cluster
13:18:13 <Qiming> and they are visible to the cluster's owner
13:18:37 <Qiming> are you gonna charge them for these standby nodes?
13:19:40 <chohoor> yes, we need make sure standby nodes in active state.
13:20:21 <Qiming> that would lead us to another question: if the user is consuming additional resources, why don't we charge them?
13:21:13 <Qiming> if we charge them, they would be confused at least ... because they are not "running" workloads on those standby nodes
13:22:56 <ruijie> and if they costs extra money, why not just create extra nodes ..
13:23:12 <Qiming> yes
13:23:52 <chohoor> I have think this question, the standby nodes is for standby, but if the nodes in cluster is error, we could replace them  immediately.
13:23:54 <Qiming> so ... one of the key question to answer is about the definition of "standby"
13:25:35 <Qiming> if you allocate 10 nodes, all active, but only treat 8 of them as member of your cluster, the other 2 are only for "standby"
13:25:46 <Qiming> then why don't you run workloads on them as well?
13:27:38 <chohoor> you are right. i'll continue to consider.
13:28:41 <Qiming> one baby step is for this case
13:28:48 <Qiming> suppose I have 10 nodes running now
13:29:09 <Qiming> my cluster have min=5, max=15
13:29:53 <Qiming> for whatever reason, I want to scale it in by 2 nodes now, that means I'm setting the desired_capacity to 8
13:30:06 <Qiming> removing 2 nodes from the cluster
13:30:30 <Qiming> we can lazily remove those nodes
13:30:57 <Qiming> if later on I want to scale the cluster back to 10 nodes, I can reuse them
13:31:21 <Qiming> sounds the idea is very similar to yours
13:31:29 <chohoor> remove the nodes to standby nodes first, then real delete them?
13:31:56 <ruijie> Qiming, you mean keep them for a while, but still will be removed for a user defined period ?
13:31:56 <Qiming> let the cluster owner decide for how long we will keep those nodes
13:32:21 <chohoor> ok
13:32:30 <Qiming> because we have node-delete API, user can still delete them at will at any time
13:33:22 <Qiming> this use case is less risky because we are sure the nodes are homogeneous
13:33:42 <Qiming> we are sure the nodes are configured the same way
13:34:01 <Qiming> we are sure there won't be data leaked from one user to another
13:35:54 <Qiming> if there are such extra nodes, we may want to stop them instead of deleting them
13:36:56 <Qiming> that leads us back to the definition of "standby" nodes
13:38:13 <chohoor> I'll rewrite the spec and please give me more suggest in irc later. thank you.
13:38:21 <Qiming> my pleasure
13:38:39 <Qiming> just want to be sure that we have thought it through before writing the code
13:39:04 <chohoor> sure
13:40:16 <ruijie> is it my turn now :)
13:40:17 <Qiming> anything else?
13:40:24 <Qiming> shoot
13:40:36 <chohoor> and another question, about protect a special node not to be delete.
13:40:38 <ruijie> yes Qiming, the first one is this patch : https://review.openstack.org/#/c/481670/
13:41:07 <chohoor> your turn.
13:41:20 <ruijie> the change will rerun the actions which is action.status = READY
13:41:35 <ruijie> thah is caused by failed to acquire lock
13:42:00 <Qiming> yes
13:42:20 <ruijie> but it will try to process the action again and again ..
13:42:47 <ruijie> the cluster is still locked in this period .. that will create a lot of events?
13:42:47 <Qiming> what do you mean by "again and again"
13:42:56 <Qiming> what is the interval?
13:43:02 <ruijie> 0 sec
13:43:16 <Qiming> so .. that is the problem
13:43:23 <ruijie> yes Qiming
13:43:50 <Qiming> or we should add a global option lock_max_retries
13:44:23 <ruijie> and maybe a lock_retry_interval ..
13:44:31 <Qiming> :)
13:44:34 <Qiming> maybe both
13:45:15 <Qiming> the logic is not incorrect, just it ... is too frequent, too noisy
13:46:20 <Qiming> chohoor, you want to "lock" a node?
13:46:20 <ruijie> improvement is needed :)
13:46:40 <chohoor> yes
13:47:00 <Qiming> use case?
13:47:00 <chohoor> A node not to be delete.
13:47:39 <chohoor> maybe this is a special node in user point...
13:48:01 <Qiming> sounds like a use case of node roles
13:49:33 <chohoor> In tencent cloud, user could add a normal instance to cluster as a node.
13:49:53 <chohoor> and the code could be protect not to be delete.
13:50:06 <chohoor> s/code/node/
13:50:32 <ruijie> like node adoption
13:50:32 <Qiming> who is deleting it then? if the user doesn't want to delete it?
13:51:19 <chohoor> auto scale, because it could be random
13:52:03 <chohoor> or oldest first policy.
13:52:04 <Qiming> I see
13:52:54 <elynn> So by 'protect' here just mean cluster scaling actions won't delete it, but this node still can be deleted by 'node delete' command?
13:53:33 <Qiming> so it is gonna be some label or tag that will change some action's behavior
13:53:45 <chohoor> elynn: i think so
13:53:48 <elynn> yes, I guess.
13:54:30 <Qiming> alright, please think if we can use node.role for this purpose
13:54:36 <elynn> a 'protect' tag, will exclude it from scaling candidate list.
13:54:45 <Qiming> because ... that is a more general solution
13:55:08 <Qiming> em ... we don't support tags yet
13:55:08 <chohoor> ok
13:55:11 <Qiming> maybe we should
13:55:20 <chohoor> metedata?
13:55:22 <Qiming> no objection on implemeting tags
13:55:38 <Qiming> metadata is also fine
13:56:05 <Qiming> say 'protected: true'
13:57:03 <chohoor> if a node in protected state, what if the cluster/node execute update action?
13:57:13 <Qiming> feel free to propose a solution then
13:57:23 <elynn> Need to think about health policy, can we rebuild it?
13:57:46 <chohoor> I suggest not.
13:58:46 <elynn> Then update should also be rejected I guess?
13:59:13 <Qiming> ...
13:59:27 <chohoor> just don't update the protected node.
13:59:46 <Qiming> thanks guys
13:59:48 <XueFeng> hi, senlin rdo has been finished
13:59:52 <XueFeng> https://bugzilla.redhat.com/show_bug.cgi?id=1426551
13:59:52 <openstack> bugzilla.redhat.com bug 1426551 in Package Review "Review Request: Senlin - is a clustering service for OpenStack" [Unspecified,Closed: errata] - Assigned to amoralej
13:59:52 <Qiming> we are running out of time
14:00:00 <Qiming> #endmeeting