13:02:55 #startmeeting senlin 13:02:56 Meeting started Tue Sep 13 13:02:55 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:02:58 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:03:00 The meeting name has been set to 'senlin' 13:03:05 hi, sorry for being late 13:03:15 hi 13:03:19 Hi 13:03:24 hi, everyone 13:03:32 hi, everyone 13:04:01 please let me know if you have topics to discuss 13:04:11 Evening, everyone 13:04:22 welcome, guoshan and ruijie_ 13:04:57 meeting agenda here 13:04:59 #link https://wiki.openstack.org/wiki/Meetings/SenlinAgenda#Weekly_Senlin_.28Clustering.29_meeting 13:05:18 let's start with the newton work items etherpad 13:05:28 #link https://etherpad.openstack.org/p/senlin-newton-workitems 13:05:29 Will we talked about the desired_capacity today? 13:05:39 yes, we can 13:06:20 added to meeting agenda 13:06:44 I'm not aware of any progress in performance test during last week 13:06:50 yanyan is on vacation 13:07:21 he has been pushing the rally side on this 13:07:56 health management, no new patches related to this topic either 13:08:21 lixinhui_, the lb bug closed or not? 13:08:35 closed for Octavia 13:08:45 I will change the status for that bug 13:08:53 but for others not 13:08:57 depends on driver 13:09:10 So if we use haproxy then we will encounter this bug? 13:09:12 okay, so we still cannot get node status correct? 13:09:15 and neutron team is pushing change towards Octvia from lbaas 13:09:33 Nowdays 13:09:47 okay, fine, cannot rely on non-stable features there 13:10:07 we may have to postpone this feature to Ocata cycle then 13:10:08 once the node status change, octavia will send RPC call to lbaas for it to change the da status 13:10:38 but not implement the notify 13:10:54 it is a very useful feature, realy hope that can be landed and get stabilized soon 13:10:58 just PPC call 13:11:05 RPC call 13:11:17 but can we get pollers work? 13:11:51 oh, I see. will try this in vacation 13:11:58 middle-autumn 13:12:03 many thanks 13:12:08 np 13:12:31 documentation side, we have merged quite some doc fixes recently, about syntax and grammar 13:12:40 other than that, no new docs added 13:12:58 one thing I just realized is about testing 13:13:09 we do have testing section in the developer's guide 13:13:29 but we failed to let people know we have a cloud_backend = openstack_test option 13:13:34 that should be added 13:14:10 added this item to etherpad for tracking 13:14:21 container profile 13:14:40 haiwei has commited some patches about node/cluster dependencies 13:14:57 our discussion concluded that these dependencies can be generalized 13:15:09 so there have been db level and engine level patches 13:15:26 there are still a few patches about this to be reviewed 13:15:50 pls spend sometime on this when you are not so busy 13:16:00 zaqar receiver 13:16:15 the whole invocation flow is working 13:16:27 the most tricky part is about trust building 13:16:44 an end user trusts 'senlin' account to perform cluster operations 13:17:12 and he/she would trust 'zaqar' account to trigger such operations by sending in some messages 13:17:46 the trust between the requesting user and the 'senlin' account is much easier, thus solved a long time ago 13:18:11 the other one means we have to know zaqar user id to build such a trust 13:18:29 yanyan has worked out a solution there, so no worries 13:18:59 the current vision is that zaqar will enable more flexible action triggering if used properly 13:19:17 next topic is event/notification 13:19:22 no progress I'm aware of 13:19:41 I myself have been tweaking nova server profile update recently 13:19:53 buiding a long chain of patches 13:20:11 the goal is to make profile update more reliable and maintainable 13:20:40 also applied some optimizations related to name update or password update etc. 13:20:58 that's all about newton work items etherpad 13:21:08 questions/comments? 13:21:36 I'm a little concern about zaqar part 13:21:43 yes 13:21:54 since not all users will enable zaqar in their openstack 13:22:07 right 13:22:29 Is it better to provide an option to enable/disable it in senlin? 13:22:43 good question 13:23:01 but I hate options, which may get deprecated later 13:23:24 how about we do a check at api layer 13:23:40 That sounds great 13:24:20 if you are creating a message type of receiver, and we know zaqar is not installed, we throw some exception? 13:24:59 we cannot do this by simply checking if zaqar is installed on the local machine 13:25:02 service unavailable or bad request? 13:25:15 we should check keystone service catalog 13:25:32 should be a bad request of something 4xx 13:25:33 Yes 13:25:40 it is definitely not a 5xx error 13:26:38 added an item 13:26:44 won't be a huge task 13:26:46 en, bad request is better 13:27:15 anything else? 13:27:48 nope from me. 13:28:03 For the desired_capacity... 13:28:13 If we do not specified the max and min size 13:28:22 ruijie_, we leave that to the next topic 13:28:33 Sorry about that. 13:28:38 #topic planning for RC release 13:28:54 any high priority bugs seen recently? 13:29:36 https://bugs.launchpad.net/senlin/+bug/1619842 13:29:37 Launchpad bug 1619842 in senlin "after cluster-check, the status of cluster is warning" [Critical,Triaged] - Assigned to miaohb (miao-hongbao) 13:29:45 this one should be already closed ... 13:31:16 anyone looking at this #1546960 13:31:20 bug #1546960 13:31:20 bug 1546960 in senlin "node-create's index will be -1 if create more than 1 node,then cluster-node-add will fail as well" [Undecided,New] https://launchpad.net/bugs/1546960 13:31:58 lixinhui_, is the 'idontknow' you? 13:32:32 No, Qiming ... 13:32:40 en, youdontknow 13:32:44 I did not reproduce the situation the bug describled 13:33:00 thanks ruijie_, for confirmation 13:33:13 marking as incomplete for now 13:33:35 bug #1609244 13:33:36 bug 1609244 in senlin "Getting image authentication failed when use fernet in keystone" [Undecided,New] https://launchpad.net/bugs/1609244 13:33:53 this one seems fixed, it was a keystone configuration problem 13:34:56 I'm gonna cut RC1 this week, probably on Thursday 13:35:23 if you have got any patches you want a review please speak up on #senlin channel 13:35:38 #topic dealing with desired_capacity 13:35:54 ruijie_, room is yours 13:37:08 ruijie_, still awake? 13:37:11 yea, the eval)stauts() will check the desired_capacity 13:37:31 yes 13:37:34 if we do not specified the max and min size of the cluster 13:37:49 you get min_size=0, max_size = -1 13:38:08 this method weill change the reason to ' number of active nodes is above desired_capacity' 13:38:10 is that ok? 13:38:37 yes, it means the cluster is operational, just may be wasting some additional resources 13:39:53 having the number of active nodes equal to the desired_capacity would be great 13:40:08 That's true 13:40:29 The number of active nodes is equal to desired_Capacity 13:40:37 but the max_size is sitll -1 13:41:03 but if we only set cluster status to 'ACTIVE' when that number is exactly desired_capacity, we may be a little too restrictive 13:41:19 max_size means no upper limit 13:42:07 http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/cluster.py#n551 13:42:11 I understand that, but I think the reason of should be more friendly 13:43:44 that whole condition is about desired_capacity <= current capacity <= max_size 13:44:27 or if max_size is set to -1, there is no limit 13:45:11 Ok. I get it 13:45:38 feel free to propose a better description when you get one 13:45:41 :) 13:45:55 when speaking of desired_capacity 13:46:06 Sure :) 13:46:13 I'm gonna work on that during the coming week 13:46:25 the change would be about all cluster operations 13:46:47 especially those that change the size of a cluster 13:47:16 the goal is to make sure all operations are based on currently observed number of nodes, not desired capacity 13:47:24 Yes, that will be good for the Healty Manager 13:47:35 say if you have a cluster of 3 nodes, and your desired_capacity is 5 13:47:47 fine, cluster is in WARNING status 13:47:55 when I do scale out 13:48:17 I'll use the observed 3 nodes as the basis 13:48:48 we are not basing the resize operation on the desired_capacity 13:49:03 'desired_capacity' of a cluster is always a "desired", not the reality 13:49:25 and recalculate the desired_capacity or not? 13:49:34 if not for senlinclient being freezed, I'm even thinking of adding a 'active_nodes' property to a cluster 13:49:50 ruijie_, will recalculate 13:50:00 using the above example 13:50:15 you have 3 nodes, your previous desire was 5 (not realized) 13:50:28 now you say 'cluster-scale-out -c 3' 13:50:48 that means you will want the cluster to have 6 nodes, your new desire 13:50:59 senlin will try its best to achieve that 13:51:26 but ... in real world, you may still have 3 nodes, or maybe just 4, because you have running out of quota or resources ... 13:51:48 so, yes, desired_capacity will always describe your new desire 13:52:06 it will be changed 13:52:16 how does senlin treat the ERROR nodes when we do a 'SCALE_OUT' action 13:52:24 no matter the operation/action succeeded or not 13:52:32 we leave them there 13:52:34 I know senlin will delete ERROR nodes first when we do a 'SCALE_IN' 13:53:12 that's a good question, I haven't thought it thru yet 13:53:42 maybe add a cluster-reset operation, delete all inactive nodes? 13:53:44 not sure 13:54:07 cluster-delete-all-inactive-or-error-or-warning-nodes? 13:54:12 Maybe just let the user deside how to treat them 13:54:48 yup, will keep working on that when I finish the desired_capacity one 13:54:53 #topic open discussion 13:56:09 anything for a quick discussion? 13:56:32 nope from me 13:57:57 okay, thanks everyone 13:58:07 best wishes to you and your family, ... 13:58:18 u2 13:58:21 it's mid-autumn season 13:58:29 thanks for joining, see you 13:58:36 cu 13:58:45 good night, and have fun in festival days 13:58:47 #endmeeting