13:00:25 <Qiming> #startmeeting senlin 13:00:26 <openstack> Meeting started Tue Aug 1 13:00:25 2017 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:29 <openstack> The meeting name has been set to 'senlin' 13:00:41 <Qiming> evening 13:00:52 <chohoor> Hi 13:00:53 <ruijie> hi Qiming 13:01:05 <Qiming> hi, chohoor, ruijie 13:01:55 <Qiming> evening, elynn 13:02:02 <elynn> Hi Qiming 13:02:38 <Qiming> I don't have specific agenda for this meeting 13:02:53 <elynn> How is your princess ? Saw your pics in wechat. 13:02:55 <Qiming> if you have topics, feel free to add them here: https://wiki.openstack.org/wiki/Meetings/SenlinAgenda 13:03:07 <Qiming> veeeeeeeery naughty 13:03:13 <Qiming> uncontrollable 13:03:21 <elynn> I dont ,just to ask about my patch. 13:03:52 <elynn> haha, your gene is in it 13:03:57 <Qiming> that big one 13:03:59 <ruijie> I got questions about zone placement policy & health manager ~ 13:04:04 <elynn> about https://review.openstack.org/#/c/467108/ 13:04:06 <Qiming> +677, -343 13:04:21 <elynn> Will any one of you could review and test.... 13:04:41 <Qiming> have you tested it? 13:04:50 <elynn> I test some basic functions, create ports ,create floating IPs, at least it works fine 13:05:10 <Qiming> great 13:05:16 <elynn> basic update. 13:06:22 <elynn> I will add you all to review this patch later :) 13:06:34 <Qiming> that is fair 13:06:49 <elynn> When is the RC? 13:07:18 <chohoor> I want to discuss some details about fast scaling and standby nodes. 13:07:28 <elynn> Also I'm working on k8s this week, hopefully I can get a running k8s this week with kubeadm. 13:07:31 <Qiming> chohoor, later 13:07:44 <Qiming> RC1 is next week 13:08:05 <elynn> That's all from my part :) 13:08:12 <Qiming> okay, thanks 13:08:27 <Qiming> final rc is aug 28-sep 01 13:08:50 <Qiming> oh, no, aug 21-aug 25 13:08:58 <Qiming> aug 28 is final release 13:09:13 <elynn> okay, I see. 13:09:37 <Qiming> okay, let's switch to chohoor's proposal 13:09:43 <Qiming> fast scaling of clusters 13:09:46 <Qiming> it is a great idea 13:09:58 <Qiming> although there are some details to be discussed 13:10:25 <chohoor> I have write a spec in doc. 13:10:40 <Qiming> I've read it 13:11:49 <chohoor> In todo list, the num of standby nodes is max_size - desired_capacity, but I think if the max_size is too big.... 13:11:52 <Qiming> say I have a cluster created: min=10, max=20 13:12:09 <Qiming> how many standby nodes should we create at the very beginning? 13:13:06 <chohoor> i think it shuould specify by user. 13:13:16 <Qiming> right, that was my point 13:13:28 <chohoor> but less than max_size 13:14:02 <Qiming> maybe we can put that constraint aside 13:14:22 <Qiming> let's focus on user's experience 13:14:45 <Qiming> if I'm a user, am I supposed to be aware of how many standby nodes there? 13:14:47 <chohoor> that's ok 13:15:34 <Qiming> can we share standby nodes among two or more clusters? 13:15:37 <chohoor> yes, I think use should know that. 13:16:00 <chohoor> No, the profile is diffrent. 13:16:08 <Qiming> profile can be the same 13:16:21 <ruijie> the properties are very different 13:16:24 <Qiming> say I have created two clusters from the same profile 13:16:28 <ruijie> like the disk type, network .etc 13:16:49 <Qiming> right, they could be ... 13:17:18 <chohoor> maybe the cluster should have own standby nodes. 13:17:52 <Qiming> okay ... if we agree that standby nodes are still members of a cluster 13:18:13 <Qiming> and they are visible to the cluster's owner 13:18:37 <Qiming> are you gonna charge them for these standby nodes? 13:19:40 <chohoor> yes, we need make sure standby nodes in active state. 13:20:21 <Qiming> that would lead us to another question: if the user is consuming additional resources, why don't we charge them? 13:21:13 <Qiming> if we charge them, they would be confused at least ... because they are not "running" workloads on those standby nodes 13:22:56 <ruijie> and if they costs extra money, why not just create extra nodes .. 13:23:12 <Qiming> yes 13:23:52 <chohoor> I have think this question, the standby nodes is for standby, but if the nodes in cluster is error, we could replace them immediately. 13:23:54 <Qiming> so ... one of the key question to answer is about the definition of "standby" 13:25:35 <Qiming> if you allocate 10 nodes, all active, but only treat 8 of them as member of your cluster, the other 2 are only for "standby" 13:25:46 <Qiming> then why don't you run workloads on them as well? 13:27:38 <chohoor> you are right. i'll continue to consider. 13:28:41 <Qiming> one baby step is for this case 13:28:48 <Qiming> suppose I have 10 nodes running now 13:29:09 <Qiming> my cluster have min=5, max=15 13:29:53 <Qiming> for whatever reason, I want to scale it in by 2 nodes now, that means I'm setting the desired_capacity to 8 13:30:06 <Qiming> removing 2 nodes from the cluster 13:30:30 <Qiming> we can lazily remove those nodes 13:30:57 <Qiming> if later on I want to scale the cluster back to 10 nodes, I can reuse them 13:31:21 <Qiming> sounds the idea is very similar to yours 13:31:29 <chohoor> remove the nodes to standby nodes first, then real delete them? 13:31:56 <ruijie> Qiming, you mean keep them for a while, but still will be removed for a user defined period ? 13:31:56 <Qiming> let the cluster owner decide for how long we will keep those nodes 13:32:21 <chohoor> ok 13:32:30 <Qiming> because we have node-delete API, user can still delete them at will at any time 13:33:22 <Qiming> this use case is less risky because we are sure the nodes are homogeneous 13:33:42 <Qiming> we are sure the nodes are configured the same way 13:34:01 <Qiming> we are sure there won't be data leaked from one user to another 13:35:54 <Qiming> if there are such extra nodes, we may want to stop them instead of deleting them 13:36:56 <Qiming> that leads us back to the definition of "standby" nodes 13:38:13 <chohoor> I'll rewrite the spec and please give me more suggest in irc later. thank you. 13:38:21 <Qiming> my pleasure 13:38:39 <Qiming> just want to be sure that we have thought it through before writing the code 13:39:04 <chohoor> sure 13:40:16 <ruijie> is it my turn now :) 13:40:17 <Qiming> anything else? 13:40:24 <Qiming> shoot 13:40:36 <chohoor> and another question, about protect a special node not to be delete. 13:40:38 <ruijie> yes Qiming, the first one is this patch : https://review.openstack.org/#/c/481670/ 13:41:07 <chohoor> your turn. 13:41:20 <ruijie> the change will rerun the actions which is action.status = READY 13:41:35 <ruijie> thah is caused by failed to acquire lock 13:42:00 <Qiming> yes 13:42:20 <ruijie> but it will try to process the action again and again .. 13:42:47 <ruijie> the cluster is still locked in this period .. that will create a lot of events? 13:42:47 <Qiming> what do you mean by "again and again" 13:42:56 <Qiming> what is the interval? 13:43:02 <ruijie> 0 sec 13:43:16 <Qiming> so .. that is the problem 13:43:23 <ruijie> yes Qiming 13:43:50 <Qiming> or we should add a global option lock_max_retries 13:44:23 <ruijie> and maybe a lock_retry_interval .. 13:44:31 <Qiming> :) 13:44:34 <Qiming> maybe both 13:45:15 <Qiming> the logic is not incorrect, just it ... is too frequent, too noisy 13:46:20 <Qiming> chohoor, you want to "lock" a node? 13:46:20 <ruijie> improvement is needed :) 13:46:40 <chohoor> yes 13:47:00 <Qiming> use case? 13:47:00 <chohoor> A node not to be delete. 13:47:39 <chohoor> maybe this is a special node in user point... 13:48:01 <Qiming> sounds like a use case of node roles 13:49:33 <chohoor> In tencent cloud, user could add a normal instance to cluster as a node. 13:49:53 <chohoor> and the code could be protect not to be delete. 13:50:06 <chohoor> s/code/node/ 13:50:32 <ruijie> like node adoption 13:50:32 <Qiming> who is deleting it then? if the user doesn't want to delete it? 13:51:19 <chohoor> auto scale, because it could be random 13:52:03 <chohoor> or oldest first policy. 13:52:04 <Qiming> I see 13:52:54 <elynn> So by 'protect' here just mean cluster scaling actions won't delete it, but this node still can be deleted by 'node delete' command? 13:53:33 <Qiming> so it is gonna be some label or tag that will change some action's behavior 13:53:45 <chohoor> elynn: i think so 13:53:48 <elynn> yes, I guess. 13:54:30 <Qiming> alright, please think if we can use node.role for this purpose 13:54:36 <elynn> a 'protect' tag, will exclude it from scaling candidate list. 13:54:45 <Qiming> because ... that is a more general solution 13:55:08 <Qiming> em ... we don't support tags yet 13:55:08 <chohoor> ok 13:55:11 <Qiming> maybe we should 13:55:20 <chohoor> metedata? 13:55:22 <Qiming> no objection on implemeting tags 13:55:38 <Qiming> metadata is also fine 13:56:05 <Qiming> say 'protected: true' 13:57:03 <chohoor> if a node in protected state, what if the cluster/node execute update action? 13:57:13 <Qiming> feel free to propose a solution then 13:57:23 <elynn> Need to think about health policy, can we rebuild it? 13:57:46 <chohoor> I suggest not. 13:58:46 <elynn> Then update should also be rejected I guess? 13:59:13 <Qiming> ... 13:59:27 <chohoor> just don't update the protected node. 13:59:46 <Qiming> thanks guys 13:59:48 <XueFeng> hi, senlin rdo has been finished 13:59:52 <XueFeng> https://bugzilla.redhat.com/show_bug.cgi?id=1426551 13:59:52 <openstack> bugzilla.redhat.com bug 1426551 in Package Review "Review Request: Senlin - is a clustering service for OpenStack" [Unspecified,Closed: errata] - Assigned to amoralej 13:59:52 <Qiming> we are running out of time 14:00:00 <Qiming> #endmeeting