13:02:03 <Qiming> #startmeeting senlin 13:02:04 <openstack> Meeting started Tue Jun 14 13:02:03 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:02:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:02:07 <openstack> The meeting name has been set to 'senlin' 13:02:25 <yanyanhu> hi 13:02:25 <Qiming> seems working 13:02:27 <Qiming> cool 13:02:38 <haiwei_> hi 13:02:39 <Qiming> morning/evening 13:02:42 <elynn> good 13:02:44 <cschulz_> Hi 13:02:45 <lixinhui_> hi 13:02:57 <Qiming> pls feel free to add items to the agenda if you have topics 13:03:06 <Qiming> #link https://wiki.openstack.org/wiki/Meetings/SenlinAgenda#Weekly_Senlin_.28Clustering.29_meeting 13:03:29 <Qiming> hi, everyone 13:03:44 <Qiming> let's start with the etherpad 13:03:53 <Qiming> #link https://etherpad.openstack.org/p/senlin-newton-workitems 13:04:34 <Qiming> tempest testing side, we have made some really good progress on api tests, especially those negative ones 13:04:58 <elynn> yes 13:05:11 <yanyanhu> yes, almost done I think. just need negative test case for cluster actions 13:05:16 <elynn> How many negative tests left? 13:05:26 <yanyanhu> about 10 I think 13:05:36 <Qiming> great! 13:05:37 <yanyanhu> one for each action 13:05:38 <elynn> Great! 13:06:11 <Qiming> then we may need to migrate functional tests to use tempest? 13:06:32 <yanyanhu> Qiming, yes, I think so 13:06:45 <Qiming> or, do we want to do the migration at all? 13:06:46 <yanyanhu> maybe we can put them into scenario dir? 13:06:47 <Qiming> :) 13:07:03 <yanyanhu> Qiming, if possible, we should :) 13:07:10 <Qiming> okay 13:07:17 <Qiming> then we do it 13:07:45 <yanyanhu> then we don't need to maintain functional gate job 13:07:45 <elynn> Or we can create a functional dir? 13:07:46 <yanyanhu> all these tests will be done using tempest 13:07:59 <yanyanhu> although we may need tempest scenario test job? 13:08:17 <yanyanhu> yes 13:08:43 <yanyanhu> functional is also ok I think 13:09:04 <Qiming> functional tests will still use fake drivers 13:09:13 <yanyanhu> right 13:09:16 <Qiming> how about scenario tests? 13:09:27 <yanyanhu> that is the same as API test 13:09:51 <yanyanhu> actually in my mind, our current functional test is more like 'scenario' test :) 13:10:10 <yanyanhu> just the backend driver is fake 13:10:26 <elynn> scenario will use real driver, right? 13:10:38 <yanyanhu> um, that can be integration :) 13:10:43 <yanyanhu> integration test 13:10:54 <Qiming> right 13:11:04 <yanyanhu> seems there is no strict definition for these test types... 13:11:07 <elynn> oh, I thought they are the same... 13:11:22 <yanyanhu> elynn, that's confusing :) 13:11:23 <Qiming> api test focuses on the api surface 13:11:49 <Qiming> functional tests is more about exercising the senlin-engine 13:12:29 <Qiming> then I'm doubting if we should treat scenario test and integration test as the same thing 13:13:49 <yanyanhu> I'm ok with both functional and scenario. We just need to differentiate those two cases using real driver or not. 13:14:07 <elynn> Speaking of that, seems defcore tends to use the tests in tempest tree. 13:14:21 <elynn> Do we need to put some tests there? 13:14:33 <Qiming> if needed, we can copy the code there 13:14:40 <yanyanhu> I think we don't need to test complicated engine logic in integration test 13:14:53 <Qiming> defcore only cares about api surface, right? 13:15:03 <yanyanhu> just need to ensure senlin works well with other backend services I guess 13:15:16 <elynn> Qiming: yes 13:15:23 <Qiming> yanyanhu, agree, but we need to test some tricky, corner cases as well 13:15:37 <yanyanhu> yes, like lb policy 13:15:42 <elynn> Agree 13:15:50 <elynn> like lb policy and health management 13:15:53 <Qiming> integration test is more of exercises for profiles and policies 13:15:56 <yanyanhu> right 13:16:04 <yanyanhu> yep 13:16:26 <Qiming> so we can skip scenario tests? 13:16:45 <yanyanhu> ok 13:16:52 <Qiming> have functional focusing on engine testing, integration tests focusing on profiles/policies 13:17:05 <yanyanhu> move existing functional tests to tempest dir 13:17:10 <Qiming> the former still use the fake driver, the latter use real drivers 13:17:11 <elynn> Anyway, these tests are only tools to help us to make sure our services works as expected. Naming is not import :) 13:17:14 <yanyanhu> s/move/re-implement 13:17:27 <elynn> We can add any tests we want to tests. 13:17:31 <yanyanhu> elynn, +1 13:17:36 <Qiming> elynn, yep, but we have to speak the same language 13:18:05 <Qiming> we agree we won't talk about scenario tests 13:18:10 <Qiming> true? 13:18:50 <elynn> agree 13:19:04 <yanyanhu> ok 13:19:22 <Qiming> okay, let's keep things simple 13:19:28 <Qiming> stress testing 13:19:44 <Qiming> noticed your patch about rally testing, yanyan 13:19:57 <Qiming> quite some nits found when reviewing it 13:20:01 <Qiming> pls check 13:20:08 <yanyanhu> yes, have read your comments 13:20:12 <yanyanhu> will fix it tomorrow 13:20:30 <Qiming> also, we have got some comments from rally team about that plugin testing 13:20:56 <yanyanhu> Qiming, yes, definitely 13:21:00 <Qiming> pls help keep the balls rolling 13:21:15 <yanyanhu> I noticed roman just left some comments on my patch for adding cluster plugin 13:21:22 <yanyanhu> will reply and update patch 13:21:31 <Qiming> we may want to check if cmcc guys want to help on rally test cases 13:21:55 <yanyanhu> finally, all those plugins will stay in rally repo and we can remove our local copy 13:22:11 <yanyanhu> Qiming, yes, just didn't get msg from eldon zhao 13:22:17 <Qiming> you mean the rally_jobs subdir? 13:22:24 <yanyanhu> will contact with him to see whether there is anything we can help them 13:22:37 <yanyanhu> Qiming, yes, we just need to keep job description files 13:22:52 <yanyanhu> no need to keep local plugins if they have been merged to rally 13:23:00 <Qiming> okay 13:23:11 <Qiming> moving on 13:23:34 <Qiming> last week we talked about the health threshold problem 13:23:50 <Qiming> have you guys got some new ideas? thoughts? 13:24:30 <yanyanhu> hmm, still not very sure about it 13:25:08 <Qiming> ... 13:25:29 <yanyanhu> in my mind, there could be a property like percentage to describe the threshold of health status 13:25:36 <yanyanhu> which is based on desired_capacity 13:25:37 <yanyanhu> :) 13:26:03 <Qiming> yes, I know what you mean 13:26:19 <yanyanhu> which means if the current size of cluster is less than %* of user desired size, the cluster will be marked as unhealthy 13:26:23 <yanyanhu> yes 13:26:37 <Qiming> then we have four numbers, min_size, health_threshold, desired_capacity, max_size 13:26:49 <yanyanhu> yes, that's my original idea 13:27:28 <Qiming> how would you map the different node count to cluster status? 13:28:06 <yanyanhu> node count? you mean count of node in health status? 13:28:22 <Qiming> each time the cluster is resized, a user is supposed to adjust the health_threshold? 13:28:56 <yanyanhu> no, I think user should expect a stable/fix health_threshold no matter how large the cluster is 13:29:01 <Qiming> suppose we have the 4 numbers as these: 1, 3, 5, 7 13:29:18 <Qiming> the cluster now has 2 nodes active 13:29:28 <Qiming> what the cluster status would be? 13:29:38 <yanyanhu> that means they always want at least 80% of nodes in their cluster is health. e.g. 13:29:38 <yanyanhu> even when they have a very large cluster 13:29:59 <Qiming> then the health_threshold can only be a percentage? 13:30:05 <yanyanhu> 3 is health threshold? 13:30:08 <yanyanhu> yes 13:30:16 <yanyanhu> 60% in this case 13:30:45 <Qiming> the cluster has 2 nodes active, what should the cluster's status be? 13:30:58 <yanyanhu> but this is just my thought. I think we really need to find user's consideration about this issue 13:31:24 <yanyanhu> if the threshold is 60%, desired_capacity is 5, then I think the status should be warning/unhealthy 13:31:58 <yanyanhu> if there are 3 healthy nodes, we can mark cluster as healthy 13:32:00 <Qiming> the cluster's desired_capacity is 5 13:32:09 <Qiming> 2 nodes is less than that 13:32:14 <Qiming> it is no good 13:32:38 <yanyanhu> so in my idea, desired_capacity actually become the 'IDEAL' size 13:32:45 <Qiming> then what is 'desired_capacity' used for? 13:32:52 <yanyanhu> actually it's not 'desired_capacity' 13:33:11 <yanyanhu> you can't always get what you want, haha 13:33:13 <Qiming> health_threshold becomes the new 'desired'? 13:33:23 <yanyanhu> that why I have that idea :) 13:33:31 <yanyanhu> Qiming, seems so 13:33:48 <yanyanhu> so I said I'm not so sure about it after all these thinking 13:33:49 <Qiming> then we can completely abandon 'desired_capacity' 13:34:05 <yanyanhu> that depends on how user understand those conceptions, like desired_capacity 13:34:35 <yanyanhu> Qiming, maybe, or we use it as real 'desired' capacity 13:34:36 <Qiming> I was having problems explaining these four numbers to a user 13:35:00 <Qiming> when you do recovery 13:35:18 <Qiming> you recover to the health_threshold (3) nodes or desired_capacity (5) nodes? 13:35:31 <Qiming> if the cluster has 4 nodes active now 13:35:38 <Qiming> do you need to recover it? 13:35:54 <Qiming> it is not ideal (one less than the desired) 13:36:08 <yanyanhu> for recovery, I think we should try to recover to desired_capacity 13:36:36 <Qiming> recover to health_theshold would already make the cluster healthy 13:36:37 <yanyanhu> hmm, I think your right, we should only maintain three conceptions here 13:36:58 <Qiming> I've been struggling on this for a long time 13:37:37 <Qiming> was trying to get some more brains on this 13:37:53 <yanyanhu> then we need to explain the difference between 'ideal' and 'what your want' 13:38:08 <yanyanhu> that's too confusing 13:38:19 <Qiming> yes, that was my thought a long time ago 13:38:32 <Qiming> what you want == ideal 13:38:41 <Qiming> because the reality always changes 13:38:45 <yanyanhu> so if we only have desired_capacity, that is the threshold of healthy and also 'what user wants' 13:38:58 <yanyanhu> we always try to make the cluster size match it 13:39:07 <Qiming> that is the *desired* capacity 13:39:23 <yanyanhu> yes 13:39:30 <Qiming> it could be an over simplification 13:39:50 <yanyanhu> desired means what user wants 13:40:08 <yanyanhu> and we try to meet their requirement when creating/recovering cluster 13:40:09 <Qiming> yep, what else could it be ... 13:40:35 <yanyanhu> if we fail to achieve that goal, the cluster is unhealthy since it doesn't meet user's expectation 13:40:40 <Qiming> yes, the engine will always try it best to "kind of converge" the cluster to that size 13:41:15 <yanyanhu> ok, I'm much clearer now 13:41:20 <yanyanhu> about this 13:42:15 <Qiming> okay, if we all agree on this "over" simplification 13:42:22 <Qiming> we can start closing the loop 13:42:30 <yanyanhu> yes 13:42:37 <elynn> sounds good! 13:42:46 <Qiming> we can hear node (VM) failure events now 13:43:12 <Qiming> the basic health management will do some recover with and without guidance from a policy 13:43:37 <Qiming> kind of a convergence to the desired_capacity 13:44:02 <yanyanhu> yes, so policy just make it "automatic" and "smarter" 13:44:19 <Qiming> in future, when needed, we can add an option: the engine can optionally converge the cluster size to a number you want, not necessarily the 'desired_capacity' 13:44:31 <Qiming> that is sort of equivalent to health_threshold 13:44:54 <Qiming> maybe a policy option or something 13:45:16 <yanyanhu> yes, define that property in policy makes more sense IMHO 13:45:21 <Qiming> moving on 13:45:38 <Qiming> no update on documentation, though I did fixed some links on senlin wiki 13:46:11 <Qiming> reinstalled ceilometer along with aodh, will try workout some tutorial docs on manual/auto scaling 13:46:23 <yanyanhu> nice 13:46:25 <Qiming> haiwei_, progress on container support? 13:46:45 <haiwei_> no progress this week 13:47:09 <haiwei_> waiting for your review on the initial patch 13:47:15 <Qiming> okay, let us know if you need someone for a discussion 13:47:44 <haiwei_> ok 13:47:49 <Qiming> I haven't reviewed that? ... 13:47:54 <Qiming> my fault 13:47:55 <Qiming> sorry 13:48:10 <haiwei_> its ok 13:48:18 <Qiming> no progress on event/notification yet 13:48:38 <Qiming> though I have a half-baked patch on generalizing the backend 13:48:52 <Qiming> oops, we only have 12 mins left 13:49:11 <Qiming> #topic cluster-collect call 13:49:33 <Qiming> let me walk you quickly thru the cluster-collect call I'm adding 13:49:50 <Qiming> the basic requirement is that when you created a cluster of nova servers, for example 13:50:02 <Qiming> you want to get a list of the IP addresses of all nodes 13:50:28 <Qiming> this is a command being added, which necessitates a new engine version and a new api version 13:50:36 <Qiming> you will be able to do things like this: 13:50:55 <Qiming> senlin cluster-collect -p details.addresses.private[0] <cluster_name> 13:51:05 <haiwei_> this function is useful in container cluster, I think 13:51:23 <Qiming> the "details.addresses.private[0]" is modeled as a json path 13:51:46 <Qiming> I have some local patches to be commited to senlin/sdk/senlinclient 13:52:18 <Qiming> still trying to iron out some issues 13:52:32 <Qiming> but basically, it works pretty good 13:52:59 <Qiming> in the simple case you can do: senlin cluster-collect -p name <cluster_name> 13:53:13 <Qiming> that will give you a list of names on command line 13:53:27 <Qiming> you can also do: senlin cluster-collect -p details.addresses.private[0] -L <cluster_name> 13:53:49 <Qiming> the "-L" switch will print the output into a two-columned table 13:54:03 <Qiming> the first column is the node id, the second is the attribute value 13:54:32 <yanyanhu> nice. Have you decided which type of data collect operation will return? list or dict 13:54:33 <Qiming> engine patch and rpc patches are ready for review, I'm working on the api layer 13:54:49 <Qiming> if you review the code 13:55:44 <Qiming> you will see it is returning something like this: {'cluster_attributes': [{'id': 'NODE1', 'value': 'V1'}, {'id': 'NODE2', 'value': 'V2'}]} 13:56:13 <yanyanhu> I see 13:56:24 <Qiming> that's a quick update on cluster-collect call 13:56:31 <Qiming> #topic open discussions 13:56:54 <Qiming> open for questions/comments/suggestions 13:57:33 <yanyanhu> nothing from my side 13:57:36 <Qiming> btw, I have got a company internal call for presentation for Barcelona summit 13:57:46 <Qiming> time flies 13:57:57 <yanyanhu> yes... 13:58:07 <yanyanhu> the end of oct. 13:58:07 <elynn> yes... 13:58:35 <yanyanhu> will think about it 13:58:56 <elynn> lao si ji, dai dai wo. 13:59:05 <yanyanhu> :P 13:59:10 <yanyanhu> I can read it 13:59:27 <Qiming> :) 13:59:28 <yanyanhu> maybe we can have a brainstorming for it 13:59:34 <Qiming> time's up 13:59:36 <yanyanhu> in coffee time 13:59:38 <elynn> yes 13:59:42 <Qiming> thanks for joining guys 13:59:52 <Qiming> have a good night/day 13:59:56 <yanyanhu> thanks 14:00:00 <Qiming> #endmeeting