#openstack-meeting log

13:02:03 <Qiming> #startmeeting senlin
13:02:04 <openstack> Meeting started Tue Jun 14 13:02:03 2016 UTC and is due to finish in 60 minutes.  The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:02:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:02:07 <openstack> The meeting name has been set to 'senlin'
13:02:25 <yanyanhu> hi
13:02:25 <Qiming> seems working
13:02:27 <Qiming> cool
13:02:38 <haiwei_> hi
13:02:39 <Qiming> morning/evening
13:02:42 <elynn> good
13:02:44 <cschulz_> Hi
13:02:45 <lixinhui_> hi
13:02:57 <Qiming> pls feel free to add items to the agenda if you have topics
13:03:06 <Qiming> #link https://wiki.openstack.org/wiki/Meetings/SenlinAgenda#Weekly_Senlin_.28Clustering.29_meeting
13:03:29 <Qiming> hi, everyone
13:03:44 <Qiming> let's start with the etherpad
13:03:53 <Qiming> #link https://etherpad.openstack.org/p/senlin-newton-workitems
13:04:34 <Qiming> tempest testing side, we have made some really good progress on api tests, especially those negative ones
13:04:58 <elynn> yes
13:05:11 <yanyanhu> yes, almost done I think. just need negative test case for cluster actions
13:05:16 <elynn> How many negative tests left?
13:05:26 <yanyanhu> about 10 I think
13:05:36 <Qiming> great!
13:05:37 <yanyanhu> one for each action
13:05:38 <elynn> Great!
13:06:11 <Qiming> then we may need to migrate functional tests to use tempest?
13:06:32 <yanyanhu> Qiming, yes, I think so
13:06:45 <Qiming> or, do we want to do the migration at all?
13:06:46 <yanyanhu> maybe we can put them into scenario dir?
13:06:47 <Qiming> :)
13:07:03 <yanyanhu> Qiming, if possible, we should :)
13:07:10 <Qiming> okay
13:07:17 <Qiming> then we do it
13:07:45 <yanyanhu> then we don't need to maintain functional gate job
13:07:45 <elynn> Or we can create a functional dir?
13:07:46 <yanyanhu> all these tests will be done using tempest
13:07:59 <yanyanhu> although we may need tempest scenario test job?
13:08:17 <yanyanhu> yes
13:08:43 <yanyanhu> functional is also ok I think
13:09:04 <Qiming> functional tests will still use fake drivers
13:09:13 <yanyanhu> right
13:09:16 <Qiming> how about scenario tests?
13:09:27 <yanyanhu> that is the same as API test
13:09:51 <yanyanhu> actually in my mind, our current functional test is more like 'scenario' test :)
13:10:10 <yanyanhu> just the backend driver is fake
13:10:26 <elynn> scenario will use real driver, right?
13:10:38 <yanyanhu> um, that can be integration :)
13:10:43 <yanyanhu> integration test
13:10:54 <Qiming> right
13:11:04 <yanyanhu> seems there is no strict definition for these test types...
13:11:07 <elynn> oh, I thought they are the same...
13:11:22 <yanyanhu> elynn, that's confusing :)
13:11:23 <Qiming> api test focuses on the api surface
13:11:49 <Qiming> functional tests is more about exercising the senlin-engine
13:12:29 <Qiming> then I'm doubting if we should treat scenario test and integration test as the same thing
13:13:49 <yanyanhu> I'm ok with both functional and scenario. We just need to differentiate those two cases using real driver or not.
13:14:07 <elynn> Speaking of that, seems defcore tends to use the tests in tempest tree.
13:14:21 <elynn> Do we need to put some tests there?
13:14:33 <Qiming> if needed, we can copy the code there
13:14:40 <yanyanhu> I think we don't need to test complicated engine logic in integration test
13:14:53 <Qiming> defcore only cares about api surface, right?
13:15:03 <yanyanhu> just need to ensure senlin works well with other backend services I guess
13:15:16 <elynn> Qiming: yes
13:15:23 <Qiming> yanyanhu, agree, but we need to test some tricky, corner cases as well
13:15:37 <yanyanhu> yes, like lb policy
13:15:42 <elynn> Agree
13:15:50 <elynn> like lb policy and health management
13:15:53 <Qiming> integration test is more of exercises for profiles and policies
13:15:56 <yanyanhu> right
13:16:04 <yanyanhu> yep
13:16:26 <Qiming> so we can skip scenario tests?
13:16:45 <yanyanhu> ok
13:16:52 <Qiming> have functional focusing on engine testing, integration tests focusing on profiles/policies
13:17:05 <yanyanhu> move existing functional tests to tempest dir
13:17:10 <Qiming> the former still use the fake driver, the latter use real drivers
13:17:11 <elynn> Anyway, these tests are only tools to help us to make sure our services works as expected. Naming is not import :)
13:17:14 <yanyanhu> s/move/re-implement
13:17:27 <elynn> We can add any tests we want to tests.
13:17:31 <yanyanhu> elynn, +1
13:17:36 <Qiming> elynn, yep, but we have to speak the same language
13:18:05 <Qiming> we agree we won't talk about scenario tests
13:18:10 <Qiming> true?
13:18:50 <elynn> agree
13:19:04 <yanyanhu> ok
13:19:22 <Qiming> okay, let's keep things simple
13:19:28 <Qiming> stress testing
13:19:44 <Qiming> noticed your patch about rally testing, yanyan
13:19:57 <Qiming> quite some nits found when reviewing it
13:20:01 <Qiming> pls check
13:20:08 <yanyanhu> yes, have read your comments
13:20:12 <yanyanhu> will fix it tomorrow
13:20:30 <Qiming> also, we have got some comments from rally team about that plugin testing
13:20:56 <yanyanhu> Qiming, yes, definitely
13:21:00 <Qiming> pls help keep the balls rolling
13:21:15 <yanyanhu> I noticed roman just left some comments on my patch for adding cluster plugin
13:21:22 <yanyanhu> will reply and update patch
13:21:31 <Qiming> we may want to check if cmcc guys want to help on rally test cases
13:21:55 <yanyanhu> finally, all those plugins will stay in rally repo and we can remove our local copy
13:22:11 <yanyanhu> Qiming, yes, just didn't get msg from eldon zhao
13:22:17 <Qiming> you mean the rally_jobs subdir?
13:22:24 <yanyanhu> will contact with him to see whether there is anything we can help them
13:22:37 <yanyanhu> Qiming, yes, we just need to keep job description files
13:22:52 <yanyanhu> no need to keep local plugins if they have been merged to rally
13:23:00 <Qiming> okay
13:23:11 <Qiming> moving on
13:23:34 <Qiming> last week we talked about the health threshold problem
13:23:50 <Qiming> have you guys got some new ideas? thoughts?
13:24:30 <yanyanhu> hmm, still not very sure about it
13:25:08 <Qiming> ...
13:25:29 <yanyanhu> in my mind, there could be a property like percentage to describe the threshold of health status
13:25:36 <yanyanhu> which is based on desired_capacity
13:25:37 <yanyanhu> :)
13:26:03 <Qiming> yes, I know what you mean
13:26:19 <yanyanhu> which means if the current size of cluster is less than %* of user desired size, the cluster will be marked as unhealthy
13:26:23 <yanyanhu> yes
13:26:37 <Qiming> then we have four numbers,   min_size, health_threshold, desired_capacity, max_size
13:26:49 <yanyanhu> yes, that's my original idea
13:27:28 <Qiming> how would you map the different node count to cluster status?
13:28:06 <yanyanhu> node count? you mean count of node in health status?
13:28:22 <Qiming> each time the cluster is resized, a user is supposed to adjust the health_threshold?
13:28:56 <yanyanhu> no, I think user should expect a stable/fix health_threshold no matter how large the cluster is
13:29:01 <Qiming> suppose we have the 4 numbers as these:   1, 3, 5, 7
13:29:18 <Qiming> the cluster now has 2 nodes active
13:29:28 <Qiming> what the cluster status would be?
13:29:38 <yanyanhu> that means they always want at least 80% of nodes in their cluster is health. e.g.
13:29:38 <yanyanhu> even when they have a very large cluster
13:29:59 <Qiming> then the health_threshold can only be a percentage?
13:30:05 <yanyanhu> 3 is health threshold?
13:30:08 <yanyanhu> yes
13:30:16 <yanyanhu> 60% in this case
13:30:45 <Qiming> the cluster has 2 nodes active, what should the cluster's status be?
13:30:58 <yanyanhu> but this is just my thought. I think we really need to find user's consideration about this issue
13:31:24 <yanyanhu> if the threshold is 60%, desired_capacity is 5, then I think the status should be warning/unhealthy
13:31:58 <yanyanhu> if there are 3 healthy nodes, we can mark cluster as healthy
13:32:00 <Qiming> the cluster's desired_capacity is 5
13:32:09 <Qiming> 2 nodes is less than that
13:32:14 <Qiming> it is no good
13:32:38 <yanyanhu> so in my idea, desired_capacity actually become the 'IDEAL' size
13:32:45 <Qiming> then what is 'desired_capacity' used for?
13:32:52 <yanyanhu> actually it's not 'desired_capacity'
13:33:11 <yanyanhu> you can't always get what you want, haha
13:33:13 <Qiming> health_threshold becomes the new 'desired'?
13:33:23 <yanyanhu> that why I have that idea :)
13:33:31 <yanyanhu> Qiming, seems so
13:33:48 <yanyanhu> so I said I'm not so sure about it after all these thinking
13:33:49 <Qiming> then we can completely abandon 'desired_capacity'
13:34:05 <yanyanhu> that depends on how user understand those conceptions, like desired_capacity
13:34:35 <yanyanhu> Qiming, maybe, or we use it as real 'desired' capacity
13:34:36 <Qiming> I was having problems explaining these four numbers to a user
13:35:00 <Qiming> when you do recovery
13:35:18 <Qiming> you recover to the health_threshold (3) nodes or desired_capacity (5) nodes?
13:35:31 <Qiming> if the cluster has 4 nodes active now
13:35:38 <Qiming> do you need to recover it?
13:35:54 <Qiming> it is not ideal (one less than the desired)
13:36:08 <yanyanhu> for recovery, I think we should try to recover to desired_capacity
13:36:36 <Qiming> recover to health_theshold would already make the cluster healthy
13:36:37 <yanyanhu> hmm, I think your right, we should only maintain three conceptions here
13:36:58 <Qiming> I've been struggling on this for a long time
13:37:37 <Qiming> was trying to get some more brains on this
13:37:53 <yanyanhu> then we need to explain the difference between 'ideal' and 'what your want'
13:38:08 <yanyanhu> that's too confusing
13:38:19 <Qiming> yes, that was my thought a long time ago
13:38:32 <Qiming> what you want == ideal
13:38:41 <Qiming> because the reality always changes
13:38:45 <yanyanhu> so if we only have desired_capacity, that is the threshold of healthy and also 'what user wants'
13:38:58 <yanyanhu> we always try to make the cluster size match it
13:39:07 <Qiming> that is the *desired* capacity
13:39:23 <yanyanhu> yes
13:39:30 <Qiming> it could be an over simplification
13:39:50 <yanyanhu> desired means what user wants
13:40:08 <yanyanhu> and we try to meet their requirement when creating/recovering cluster
13:40:09 <Qiming> yep, what else could it be ...
13:40:35 <yanyanhu> if we fail to achieve that goal, the cluster is unhealthy since it doesn't meet user's expectation
13:40:40 <Qiming> yes, the engine will always try it best to "kind of converge" the cluster to that size
13:41:15 <yanyanhu> ok, I'm much clearer now
13:41:20 <yanyanhu> about this
13:42:15 <Qiming> okay, if we all agree on this "over" simplification
13:42:22 <Qiming> we can start closing the loop
13:42:30 <yanyanhu> yes
13:42:37 <elynn> sounds good!
13:42:46 <Qiming> we can hear node (VM) failure events now
13:43:12 <Qiming> the basic health management will do some recover with and without guidance from a policy
13:43:37 <Qiming> kind of a convergence to the desired_capacity
13:44:02 <yanyanhu> yes, so policy just make it "automatic" and "smarter"
13:44:19 <Qiming> in future, when needed, we can add an option: the engine can optionally converge the cluster size to a number you want, not necessarily the 'desired_capacity'
13:44:31 <Qiming> that is sort of equivalent to health_threshold
13:44:54 <Qiming> maybe a policy option or something
13:45:16 <yanyanhu> yes, define that property in policy makes more sense IMHO
13:45:21 <Qiming> moving on
13:45:38 <Qiming> no update on documentation, though I did fixed some links on senlin wiki
13:46:11 <Qiming> reinstalled ceilometer along with aodh, will try workout some tutorial docs on manual/auto scaling
13:46:23 <yanyanhu> nice
13:46:25 <Qiming> haiwei_, progress on container support?
13:46:45 <haiwei_> no progress this week
13:47:09 <haiwei_> waiting for your review on the initial patch
13:47:15 <Qiming> okay, let us know if you need someone for a discussion
13:47:44 <haiwei_> ok
13:47:49 <Qiming> I haven't reviewed that? ...
13:47:54 <Qiming> my fault
13:47:55 <Qiming> sorry
13:48:10 <haiwei_> its ok
13:48:18 <Qiming> no progress on event/notification yet
13:48:38 <Qiming> though I have a half-baked patch on generalizing the backend
13:48:52 <Qiming> oops, we only have 12 mins left
13:49:11 <Qiming> #topic cluster-collect call
13:49:33 <Qiming> let me walk you quickly thru the cluster-collect call I'm adding
13:49:50 <Qiming> the basic requirement is that when you created a cluster of nova servers, for example
13:50:02 <Qiming> you want to get a list of the IP addresses of all nodes
13:50:28 <Qiming> this is a command being added, which necessitates a new engine version and a new api version
13:50:36 <Qiming> you will be able to do things like this:
13:50:55 <Qiming> senlin cluster-collect -p details.addresses.private[0] <cluster_name>
13:51:05 <haiwei_> this function is useful in container cluster, I think
13:51:23 <Qiming> the "details.addresses.private[0]" is modeled as a json path
13:51:46 <Qiming> I have some local patches to be commited to senlin/sdk/senlinclient
13:52:18 <Qiming> still trying to iron out some issues
13:52:32 <Qiming> but basically, it works pretty good
13:52:59 <Qiming> in the simple case you can do: senlin cluster-collect -p name <cluster_name>
13:53:13 <Qiming> that will give you a list of names on command line
13:53:27 <Qiming> you can also do: senlin cluster-collect -p details.addresses.private[0] -L <cluster_name>
13:53:49 <Qiming> the "-L" switch will print the output into a two-columned table
13:54:03 <Qiming> the first column is the node id, the second is the attribute value
13:54:32 <yanyanhu> nice. Have you decided which type of data collect operation will return? list or dict
13:54:33 <Qiming> engine patch and rpc patches are ready for review, I'm working on the api layer
13:54:49 <Qiming> if you review the code
13:55:44 <Qiming> you will see it is returning something like this: {'cluster_attributes': [{'id': 'NODE1', 'value': 'V1'}, {'id': 'NODE2', 'value': 'V2'}]}
13:56:13 <yanyanhu> I see
13:56:24 <Qiming> that's a quick update on cluster-collect call
13:56:31 <Qiming> #topic open discussions
13:56:54 <Qiming> open for questions/comments/suggestions
13:57:33 <yanyanhu> nothing from my side
13:57:36 <Qiming> btw, I have got a company internal call for presentation for Barcelona summit
13:57:46 <Qiming> time flies
13:57:57 <yanyanhu> yes...
13:58:07 <yanyanhu> the end of oct.
13:58:07 <elynn> yes...
13:58:35 <yanyanhu> will think about it
13:58:56 <elynn> lao si ji, dai dai wo.
13:59:05 <yanyanhu> :P
13:59:10 <yanyanhu> I can read it
13:59:27 <Qiming> :)
13:59:28 <yanyanhu> maybe we can have a brainstorming for it
13:59:34 <Qiming> time's up
13:59:36 <yanyanhu> in coffee time
13:59:38 <elynn> yes
13:59:42 <Qiming> thanks for joining guys
13:59:52 <Qiming> have a good night/day
13:59:56 <yanyanhu> thanks
14:00:00 <Qiming> #endmeeting