13:00:10 <Qiming> #startmeeting senlin 13:00:10 <openstack> Meeting started Tue Jun 7 13:00:10 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:13 <openstack> The meeting name has been set to 'senlin' 13:00:26 <Qiming> #topic roll call 13:00:37 <yanyanhu> hello 13:00:58 <Qiming> hi 13:01:02 <xuhaiwei__> hi 13:01:08 <Qiming> hi, haiwei 13:01:09 <lixinhui_> hi 13:01:16 <yanyanhu> network connection at home is really unstable 13:01:17 <elynn> o/ 13:01:33 <Qiming> #topic newton work items 13:01:45 <Qiming> testing, where are we ? 13:02:03 <elynn> enable tempest api on gate 13:02:08 <yanyanhu> 50% I think 13:02:15 <yanyanhu> about negative test cases 13:02:30 <elynn> Saw some patches submitted by yanyanhu, many thanks! 13:02:40 <Qiming> okay, so we did find some inconsistencies in apis 13:02:43 <yanyanhu> elynn, my pleasure :) 13:02:57 <yanyanhu> also found some issues about our API implementation when writing the test 13:03:00 <Qiming> elynn, posted some comments to you latest patches 13:03:02 <yanyanhu> Qiming, yes 13:03:06 <yanyanhu> that is valuable 13:03:36 <elynn> Qiming: will check :) 13:03:50 <Qiming> tempest dsvm gate is not very slow, right? 13:03:56 <yanyanhu> yes 13:04:17 <Qiming> great 13:04:35 <Qiming> rally side 13:04:47 <Qiming> patch 318453 was in 13:04:49 <yanyanhu> gate job is finally ready 13:05:15 <Qiming> you mean gate job at senlin side? 13:05:18 <yanyanhu> and the 301522 works well now 13:05:21 <yanyanhu> both side 13:05:25 <yanyanhu> in rally and senlin 13:05:41 <yanyanhu> just need to address rally teams question about that patch 13:05:58 <yanyanhu> but no critical obstacle I think 13:05:58 <Qiming> we don't rely on 301522 to run gate at senlin side, right? 13:06:07 <yanyanhu> yes 13:06:15 <yanyanhu> that is for rally repo 13:06:24 <Qiming> my question is about the gate failures we saw when doing 'check experimental' at senlin side 13:06:56 <yanyanhu> actually 307170 works as well 13:07:03 <yanyanhu> you mean this one? https://review.openstack.org/307170 13:07:23 <Qiming> oh? 13:07:43 <Qiming> that is the first time we have gate-rally-dsvm-senlin-senlin working!! 13:08:01 <yanyanhu> sorry, I was dropped 13:08:06 <yanyanhu> Qiming, yes :) 13:08:08 <elynn> This name is a little weird... 13:08:26 <yanyanhu> elynn, yes if you mean senlin-senlin.yaml :) 13:08:40 <Qiming> yes, let's get it in and fix it later? 13:08:58 <yanyanhu> that is because we try to match gate-dsvm-rally-senlin-{name} job template 13:09:29 <yanyanhu> Qiming, ok, I think I may need to do some clean job before that patch become ready 13:09:42 <Qiming> okay 13:09:48 <yanyanhu> will remove[WIP] when it's ok 13:09:53 <elynn> double senlin, now we have amazon :P 13:10:04 <yanyanhu> :P 13:10:36 <yanyanhu> and will discuss with eldon about their test based on rally 13:10:36 <Qiming> better rename that... it is strange, indeed 13:10:42 <yanyanhu> hope can provide some help for them 13:11:08 <yanyanhu> Qiming, yes, I think maybe we can propose another job template in future 13:12:03 <Qiming> sure, we can work together on stress tests 13:12:18 <elynn> Anyway, now we have many eyes on gates :) 13:12:34 <yanyanhu> yes 13:12:48 <Qiming> moving on 13:12:55 <Qiming> health management 13:13:15 <Qiming> I saw xinhui has taken over that lbaas bug 13:13:25 <Qiming> patch proposed now 13:13:35 <yanyanhu> cool 13:13:47 <lixinhui_> yes 13:13:52 <lixinhui_> I am doing that 13:13:59 <Qiming> many thanks 13:14:07 <lixinhui_> my pleasure 13:14:19 <lixinhui_> will contribute more in next weeks 13:14:29 <Qiming> there are still some gate failures 13:14:52 <lixinhui_> en 13:15:08 <Qiming> I have added health monitoring by listening to vm lifecycle events 13:15:23 <Qiming> it took me quite some time to understand the filtering logics 13:15:45 <lixinhui_> congrats! 13:15:49 <lixinhui_> you got it 13:15:56 <Qiming> there are still things unstable inside oslo.messaging, complaining that some regex matching failures 13:15:56 <lixinhui_> will learn from you 13:16:23 <Qiming> anyway, we can get notified when vm status was modified (reboot, start, stop, ...) 13:16:32 <Qiming> next thing is to trigger some actions 13:16:41 <lixinhui_> is that reliable? 13:16:42 <Qiming> will dive into that 13:16:53 <lixinhui_> I mean the lisetning 13:17:18 <Qiming> listening is reliable, just some initialization wasn't complete I guess 13:17:28 <lixinhui_> ... 13:17:37 <Qiming> when I restart the engine, the listeners are created, but not receiving events 13:17:54 <Qiming> maybe need to do some fuzzy delay 13:18:02 <lixinhui_> ok 13:19:09 <Qiming> as for health threshold 13:19:21 <Qiming> I'm thinking of using desired_capacity 13:19:44 <lixinhui_> could you explain more? 13:20:06 <lixinhui_> if time permits 13:20:37 <Qiming> make desired_capacity the health threshold 13:21:04 <Qiming> a cluster is treated healthy if the number of active nodes >= desired_capacity 13:22:15 <Qiming> make sense? 13:22:16 <lixinhui_> okay 13:22:21 <lixinhui_> too hash? 13:22:37 <yanyanhu> Qiming, if so, the node number could beyond desired_capacity? 13:22:52 <Qiming> yes, between max_size and desired_capacity 13:23:26 <xuhaiwei__> what is the case node number bigger than desired_capacity? 13:23:43 <yanyanhu> hmm, sounds a little different from our discussion in summit 13:24:10 <Qiming> when you do cluster check, there are some nodes not responding 13:24:25 <Qiming> when you do some operations later, those nodes come back to life 13:25:17 <Qiming> did we have any discussion about the health threshold during summit? 13:25:28 <xuhaiwei__> so the desired_capacity only contains the alive nodes? 13:25:34 <yanyanhu> yes, so will the total number of health node finally match the desired_capacity 13:25:44 <Qiming> desired is always the 'desired' 13:25:47 <yanyanhu> Qiming, no, it's not about health threshold 13:25:55 <yanyanhu> about the scaling basement 13:26:16 <Qiming> it is not the number of actually active nodes, we can never assume so 13:26:50 <yanyanhu> Qiming, yes, that's what I mean. I think the case that total number of node beyond desired_capacity is kind of transient status? 13:27:11 <Qiming> if you don't do something, those nodes will be there 13:27:23 <yanyanhu> finally, health nodes amount will be desired_capacity 13:27:30 <Qiming> there are transient status some nodes are not active when you are checking them 13:28:05 <Qiming> the question is why are we maintaining the number of healthy nodes? 13:28:45 <Qiming> we are already not so sure about the number of active nodes, considering that there are transient problems 13:29:14 <Qiming> what we do care is "whether the cluster is healthy" 13:29:30 <Qiming> which means there are enough nodes to share workloads 13:29:51 <Qiming> by "enough" here, we mean that number of active nodes >= desired capacity 13:30:07 <Qiming> yes, it is a bit hash 13:30:21 <yanyanhu> yes. I think user specifies the desired_capacity which is the number of health nodes they want to have, so we should try to match it and the actual active nodes number? 13:30:31 <Qiming> but is there any way to maintain another statistics? 13:31:08 <Qiming> assume you are the user, when you are specifying desired_capacity, what are you thinking? 13:31:27 <yanyanhu> hmm, I want a cluster with this number of nodes being active 13:31:32 <yanyanhu> healthy 13:32:25 <Qiming> there could be a case where a user wants to create a cluster of 10 nodes, but 5 nodes is okay for him/her 13:32:44 <yanyanhu> yes, that is possible 13:32:54 <Qiming> why is he doing that? 13:33:19 <Qiming> if 5 is okay, then 5 is the min_size, right? 13:34:05 <yanyanhu> hmm, I think that means in any cases, the cluster size should not be less than 5 13:34:22 <Qiming> yes, then 5 is actually the min_size 13:34:24 <yanyanhu> that is not directly related to health management 13:34:31 <yanyanhu> yes, 5 is the min_size 13:34:44 <Qiming> if the cluster is dropping below that level, the cluster is in error status 13:35:01 <yanyanhu> I think that's two cases 13:35:03 <Qiming> if cluster size is between min_size and desired_capacity, we can treat it as warning 13:35:25 <yanyanhu> if cluster size is less than 5, that means internal error happened in senlin side 13:35:25 <Qiming> it is all about how we define the status of a cluster 13:35:41 <Qiming> users don't care what happened 13:35:56 <yanyanhu> yes, understand what you mean, just feel we shouldn't mix health management case and scaling case 13:36:01 <Qiming> maybe some nova nodes crashed 13:36:11 <Qiming> you cannot say it is senlin's fault 13:36:16 <yanyanhu> IMHO, min_size/max_size/desired is about scaling cases 13:36:41 <Qiming> they are properties you specify when you create a cluster 13:36:52 <Qiming> no matter you will scale that cluster or not 13:37:07 <yanyanhu> yes, that's the hard limit of cluster size 13:37:19 <yanyanhu> no matter the cluster has HA management support or not 13:37:28 <Qiming> exactly 13:38:07 <Qiming> so ... I'm wondering if we do want to introduce another threshold into senlin at the moment 13:38:14 <yanyanhu> so I think desired_capacity is something related to HA since it's user desired 13:38:39 <Qiming> and we can never make sure it matches the reality 13:38:43 <yanyanhu> Qiming, I agree we consider desired_capacity a health related property 13:38:51 <yanyanhu> but min_size, max_size could not be 13:39:21 <Qiming> okay, agree to disagree 13:39:30 <Qiming> let's think about it offline 13:39:38 <yanyanhu> ok 13:39:51 <yanyanhu> we can have a further discussion tomorrow :) 13:39:56 <Qiming> I'd suggest we forget all the actions/policies we have in senlin 13:39:59 <yanyanhu> really need more thinking about it 13:40:16 <Qiming> just think from a user's perspective, what makes a better sense for them 13:40:26 <yanyanhu> agree with this 13:40:40 <lixinhui_> jealous 13:40:46 <yanyanhu> a clear definition from user perspective is the most important 13:40:54 <lixinhui_> you can discuss face to face 13:41:07 <Qiming> we will discuss on irc 13:41:11 <lixinhui_> :) 13:41:15 <yanyanhu> lixinhui_, you can come here, some one will buy you coffee :P 13:41:24 <lixinhui_> cool! 13:41:31 <lixinhui_> or you can come to VMware 13:41:37 <lixinhui_> tomorrow we have happy hour 13:41:41 <yanyanhu> for free coffee :) 13:41:59 <lixinhui_> :P 13:42:01 <Qiming> we can define min_size, health_watermark, desired_capacity and max_size 13:42:13 <Qiming> try if you can explain all these four numbers to users 13:42:30 <yanyanhu> hmm, need more thinking on it 13:42:43 <Qiming> okay, let's move on 13:43:06 <Qiming> any news from you lixinhui_ on health management? 13:43:18 <lixinhui_> Sorry, Qiming 13:43:30 <lixinhui_> I will try to contribute more in the followed weeks 13:43:32 <yanyanhu> last sentence from me about this issue: maybe we should re consider why user define min_size/max_size and whether and when they really need it :) 13:43:39 <lixinhui_> too distract 13:43:49 <Qiming> no worry, just ask questions, in case you have moving too fast 13:44:21 <lixinhui_> :) 13:44:28 <Qiming> no update on documentation from me 13:44:38 <Qiming> container support 13:44:54 <xuhaiwei__> I submitted a patch 13:45:06 <xuhaiwei__> initialize docker driver 13:45:38 <Qiming> saw some patches from haiwei, I think we have been mixing things in a strange way 13:46:05 <Qiming> will have a closer look at the patch 13:46:31 <xuhaiwei__> yes, please comment it 13:46:44 <Qiming> notification/event side, some basics are there 13:47:05 <Qiming> need some serializers and an example to encapsulate a notification into an object 13:47:20 <Qiming> then dispatch that object to oslo.messaging or db 13:47:29 <Qiming> will continue work on that 13:47:41 <Qiming> zaqar work is stalled 13:48:06 <Qiming> that's all from the etherpad 13:48:15 <Qiming> things to add? 13:48:39 <yanyanhu> nope, really lots of work items 13:49:03 <Qiming> #topic senlin cluster-do operation 13:49:27 <Qiming> here is the patch: https://review.openstack.org/#/c/326208/ 13:49:51 <Qiming> we are adding OPERATIONS to a profile definition 13:50:00 <Qiming> it is not exposed to users for customization 13:50:30 <Qiming> but implementation wise, we are modeling operation parameters using schemas 13:50:46 <yanyanhu> oh, its for this purpose 13:50:54 <yanyanhu> I didn't get it when saw it first time 13:51:04 <Qiming> so an operation can be easily verified when we get a JSON containing the operation requested 13:51:10 <yanyanhu> will check it 13:51:15 <yanyanhu> yep 13:51:27 <yanyanhu> that's a nice wrap 13:51:31 <Qiming> an operation request could be {"reboot": {"type": "HARD"}} 13:51:55 <Qiming> when users input senlin cluster-do help cluster1 13:52:26 <Qiming> we can iterate through the OPERATIONS dict and return a help text --- here are operations you can try 13:52:49 <Qiming> just like when you do senlin profile-type-show <a-profile-type> 13:52:51 <yanyanhu> nice 13:53:23 <Qiming> in the case of a nova server cluster, you can do 'senlin cluster-do reboot --type=HARD cluster1' 13:53:57 <Qiming> command wise, we can add more parameters so that users can reboot nodes with specific roles, but those can be added later 13:54:26 <Qiming> parameters are checked just like profile/policy properties, they have data types 13:54:34 <lixinhui_> :) 13:54:36 <Qiming> the only difference is that they are not 'updatable' 13:54:55 <Qiming> that is why I revised the common schema module 13:55:11 <yanyanhu> yes, saw that patch 13:55:17 <Qiming> in future there could be some extensions to Operation schema, today it is only just a Map 13:55:32 <Qiming> that's some background about that patch 13:55:39 <Qiming> #topic open discussions 13:55:56 <Qiming> I think we have covered the health management part 13:56:16 <lixinhui_> yes 13:56:18 <Qiming> and ... 1 hour is definitely not enough for discussion 13:56:29 <lixinhui_> nod 13:56:30 <yanyanhu> yes :) 13:56:38 <Qiming> need some homework before we discuss it again 13:56:57 <Qiming> pls think from user's perspective 13:56:58 <Qiming> :) 13:57:00 <yanyanhu> will think about it as well 13:57:06 <lixinhui_> :) 13:57:06 <yanyanhu> yes 13:57:15 <xuhaiwei__> ok 13:58:29 <Qiming> oh, don't know if you have noticed it 13:58:43 <Qiming> we have had senlin 2.0.0.0b1 released last Friday 13:58:53 <Qiming> senlinclient 0.5.0 released today 13:59:08 <xuhaiwei__> saw it 13:59:11 <Qiming> senlinclient version jump was based on release team's suggesion 13:59:22 <Qiming> leave some version numbers for back-port 13:59:30 <Qiming> em. 13:59:35 <yanyanhu> in global requirement? 13:59:45 <Qiming> not yet propsed to global requirements 13:59:50 <yanyanhu> I see 13:59:59 <Qiming> feel free to do so 14:00:03 <Qiming> #endmeeting