12:59:57 <Qiming> #startmeeting senlin 12:59:57 <openstack> Meeting started Tue Aug 30 12:59:57 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:59:59 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:02 <openstack> The meeting name has been set to 'senlin' 13:00:28 <Qiming> evening 13:01:19 <yanyanhu> hi 13:01:54 <Qiming> hi, wait a few minutes and see if anyone else is joining 13:01:59 <yanyanhu> ok 13:03:02 <elynn> o/ 13:03:10 <yanyanhu> hi, elynn 13:03:13 <Qiming> hi, elynn and guoshan 13:03:18 <Qiming> not sure if others are joining 13:03:29 <Qiming> let's get started 13:03:31 <Qiming> #topic newton work items 13:03:32 <qwebirc78218> hello 13:03:56 <Qiming> hi, qwebirc78218 13:03:56 <yanyanhu> hi, qwebirc78218 13:03:58 <Qiming> #link https://etherpad.openstack.org/p/senlin-newton-workitems 13:04:05 <Qiming> performance testing, any progress? 13:04:10 <yanyanhu> yes 13:04:27 <yanyanhu> roman has put +2 on the profile context patch 13:04:33 <yanyanhu> need another +2 and workflow 13:04:48 <Qiming> need to ping rally core? 13:04:49 <yanyanhu> once this patch is merged, will add context for cluster as well 13:05:11 <yanyanhu> Qiming, yes, maybe wait for another one or two days 13:05:15 <Qiming> okay 13:05:22 <Qiming> integration test side, https://review.openstack.org/#/c/354566/ 13:05:31 <yanyanhu> good news is it works now 13:05:32 <Qiming> still waiting for another core to approve 13:05:42 <yanyanhu> Qiming, yes, for adding zaqar support 13:05:54 <yanyanhu> but at least we can rely on it to make some basic verifications 13:05:57 <Qiming> okay, that is not urgent 13:06:02 <yanyanhu> yes 13:06:12 <Qiming> basic verification passed, that is great 13:06:18 <yanyanhu> yep 13:06:27 <Qiming> health policy side 13:06:40 <Qiming> LB based health detection is still not there 13:06:46 <Qiming> not sure if xinhui is still pushing it 13:07:03 <Qiming> she has been working on fencing nova compute host 13:07:18 <Qiming> experimenting with IPMI drivers 13:07:58 <Qiming> the only problem in that direction is nova is not emitting a notification if nova-compute is down 13:08:32 <Qiming> there are notifications if the compute service is shut down by operators, but if the compute host is down, there is no notification 13:08:36 <Qiming> that is too bad 13:08:46 <Qiming> so the only workaround, as of today, would be a poller 13:09:15 <lixinhui_> you have confirmed that 13:09:21 <yanyanhu> poller sounds reasonable for this scenario 13:09:25 <Qiming> so ... I'm not sure if we should (in Ocata release) make health manager a separate service 13:09:31 <Qiming> yes, lixinhui_, confirmed 13:09:35 <Qiming> thanks for joining 13:09:51 <lixinhui_> sorry for late 13:09:55 <Qiming> that is a stupid design, hopefully we can help improve it if we get cycles 13:10:20 <Qiming> other improvements to health policy is about the recover/check workflow revision 13:10:26 <Qiming> mostly are done now 13:11:02 <Qiming> the policy can now suspend itself if node deletion was initiated from a RPC request instead of a failure detected 13:11:13 <Qiming> that part is also done 13:11:25 <Qiming> I was thinking of make the policy a little bit smarter 13:11:26 <yanyanhu> great 13:12:03 <Qiming> if you look at this: http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/health_manager.py#n61 13:12:20 <Qiming> when a node is down and get detected 13:12:47 <Qiming> we actually are sending this info as params when invoking the node_recover API 13:13:12 <Qiming> the policy can be improved to handle different 'event' and/or 'state' a little bit smarter 13:13:27 <lixinhui_> good point 13:13:27 <Qiming> say if a node is in SHUTDOWN state, the policy can try just 'reboot' it 13:13:36 <Qiming> or 'start' it 13:13:55 <Qiming> this is still an imagination, have to wait for the nova server operations patch merged into sdk 13:14:19 <Qiming> profile/policy version 13:14:35 <Qiming> yanyan has been working on a 'workaround' 13:14:47 <yanyanhu> yes, basic versioning support for schema and spec has been there 13:14:58 <Qiming> I'm calling it a 'workaround' because ... versioning is pretty big a problem to solve 13:15:07 <Qiming> we'll get back to that later 13:15:18 <yanyanhu> but I think we have a lot more detail to figure out before deciding how to support policy/profile version control 13:15:21 <Qiming> container support 13:15:23 <yanyanhu> yes 13:15:29 <Qiming> correct 13:15:38 <Qiming> haiwei's patch is finally in 13:15:49 <yanyanhu> yes, long run... 13:16:00 <Qiming> he is now experimenting specifying a host_cluster when creating container clusters 13:16:05 <Qiming> good luck ... 13:16:32 <Qiming> with that work as a starting point, we may want to discuss how to proceed as next step 13:16:52 <Qiming> haven't got time to review his new spec proposal though 13:17:09 <yanyanhu> better have a session in summit to discuss this topic 13:17:11 <Qiming> but I'd like to call a cross project discussion with magnum/zun on this 13:17:16 <Qiming> right 13:17:27 <yanyanhu> Qiming, sure, that will be the best 13:17:56 <Qiming> receiver side, yanyan has been working on zaqar support 13:18:15 <Qiming> please delete the items that are done 13:18:15 <yanyanhu> Qiming, yes 13:18:25 <yanyanhu> sure 13:18:43 <yanyanhu> the initial part has been merged today 13:18:46 <Qiming> hopefully, zaqar can bring in a more secure, more flexible channel for users/services to send signals to senlin 13:18:54 <yanyanhu> yes 13:19:03 <Qiming> that was another marathon 13:19:26 <Qiming> okay, anything else on the etherpad page? 13:19:32 <yanyanhu> looks so. hopefully we can have a basic version that works before cut our release 13:19:39 <Qiming> this week is the week to cut newton-3 release 13:19:58 <Qiming> I don't want to do it on Friday, too risky, when the gate is so jammed 13:20:11 <yanyanhu> ah, hope to catch rc1 13:20:29 <Qiming> we have the flexibility to merge more stable features in next few weeks 13:20:42 <Qiming> because we don't have a huge pipeline for review/debate 13:21:00 <yanyanhu> good news 13:21:08 <Qiming> okay, moving on to next topic 13:21:20 <Qiming> #topic health checking update 13:21:29 <Qiming> em ... I have basically covered that 13:21:42 <yanyanhu> yep 13:21:49 <Qiming> mostly about the check/recover workflow and the handling of different actions in the policy 13:21:57 <Qiming> there is still a feature not implemented 13:22:15 <Qiming> we were hoping that the recover action can be a list of operations for the profile to try 13:22:44 <Qiming> currently, the profile (nova in particular) only understand REBUILD, and the generic profile only handles RECREATE 13:23:01 <Qiming> that would be an interesting work for future 13:23:12 <Qiming> evening, xuhaiwei_ 13:23:19 <Qiming> #topic cluster status update 13:23:21 <xuhaiwei_> hi, Qiming 13:23:56 <Qiming> if you are watching the gerrit notifications, you will notice that I have been working on cluster status update fix these two days 13:24:02 <xuhaiwei_> kept silent to not disturb you:) 13:24:19 <Qiming> the basic idea is this: we will update cluster status, based on the status of the member nodes 13:24:30 <Qiming> NOT based on the last operation performed on it 13:24:55 <Qiming> e.g. a CLUSTER_UPDATE operation may fail, but the cluster may still remain ACTIVE 13:25:07 <Qiming> we have to differentiate this two things 13:25:49 <Qiming> A CLUSTER_SCALE_OUT may fail, but that failure is an action failure, it doesn't mean the cluster is not operable 13:26:08 <Qiming> I think this series of patches is near an end 13:26:38 <Qiming> when making these changes, I also changed the modifcation of 'desired_capacity' 13:27:01 <Qiming> we were changing the 'desired_capacity' after an action is completed, but that is WRONG 13:27:08 <Qiming> it has been reported several times 13:27:21 <yanyanhu> yes, saw that patch, that is reasonable 13:27:29 <yanyanhu> especially from ha perspective 13:27:29 <Qiming> so I was also making that happen before the action is executed 13:27:54 <Qiming> when a request arrives, the user's expectation is the desired_capacity 13:28:13 <Qiming> if the engine failes to perform the action, it should not change user's expectation 13:28:28 <Qiming> that was a simple logic, but we unfortunately learned it in a hard way 13:28:42 <Qiming> questions/comments on this? 13:29:09 <Qiming> seems a no 13:29:15 <qwebirc78218> sorry to break 13:29:20 <qwebirc78218> can i ask a question 13:29:23 <Qiming> sure 13:29:34 <qwebirc78218> last time, i create a node but failed 13:29:42 <qwebirc78218> so i recovered it 13:30:14 <qwebirc78218> but the desire capacity is still 0 13:30:36 <Qiming> yep, that is exactly one of the problems we are fixing 13:30:37 <qwebirc78218> is that should be 1 13:31:00 <Qiming> when you are creating a node, the desired capacity should be incremented by 1 13:31:03 <qwebirc78218> okey, thanks for answering 13:31:06 <Qiming> even if the node creation was a failure 13:31:41 <Qiming> 'increment the cluster size by one', that is the user's (your) desire 13:31:48 <Qiming> we should handle it differently 13:31:53 <Qiming> thanks for brining this up 13:32:06 <Qiming> moving on 13:32:18 <Qiming> #topic ocata design summit sessions 13:32:29 <Qiming> #link https://etherpad.openstack.org/p/ocata-senlin-sessions 13:33:01 <yanyanhu> have put my name on profile/policy versioning 13:33:01 <Qiming> I was just dumping some topics above my head 13:33:29 <Qiming> policy/profile versioning definitely needs some discussion 13:33:37 <Qiming> even before/after that session 13:33:39 <yanyanhu> yes 13:33:56 <Qiming> maybe combined with Topic 4 13:34:05 <Qiming> "versioned everything" 13:34:11 <yanyanhu> Qiming, yes, topic 4 can be a extentional discussion 13:34:24 <Qiming> yep, we cannot finish that in one session 13:34:31 <Qiming> maybe we need two slots 13:34:50 <yanyanhu> yes, if we have enough time slot 13:34:57 <Qiming> topic 2 is about health 13:35:25 <Qiming> we have some preliminary support now, next step is to make it work in production environments 13:35:37 <Qiming> it is a huge problem space 13:35:51 <Qiming> we have to brainstorm the working items and prioritize them 13:36:17 <Qiming> maybe involve a congress extension or mistral workflow 13:36:21 <Qiming> i just don't now 13:36:26 <Qiming> s/now/know 13:36:42 <Qiming> the 3rd topic I can think of is about container clustering 13:37:10 <Qiming> haiwei has set a stage for us, where are we heading next? 13:37:13 <xuhaiwei_> Maybe I can be the driver 13:37:27 <Qiming> that would be excellent 13:37:52 <xuhaiwei_> I didn't spend enough time on it up to now, will try to do more things before the summit 13:38:30 <Qiming> so ... 13:38:36 <Qiming> any more ideas you can think of? 13:38:37 <xuhaiwei_> first should let the container going 13:39:10 <Qiming> or we can just let ttx know that we need 4 working sessions? 13:39:24 <yanyanhu> I guess another topic may worth to discuss is cluster do operation? 13:39:41 <Qiming> okay 13:39:43 <yanyanhu> altough we already have some basic idea for it. but may need to figure out the detail 13:39:48 <yanyanhu> and also use case 13:40:15 <Qiming> openstack cluster do reboot 13:40:32 <xuhaiwei_> I updated the spec a few days ago, hope you can review it https://review.openstack.org/#/c/281102/ 13:41:05 <Qiming> we already support 'openstack cluster run --script <script> --network private --address-type private --identity-file <file> --user fedora <cluster_name> 13:41:37 <Qiming> terribly sorry, haiwei, will jump onto it tomorrow 13:41:51 <xuhaiwei_> ok 13:41:52 <Qiming> cluster do is more about actions supported by a profile type 13:42:07 <yanyanhu> yes, maybe we can support cluster run with a template as input :) 13:42:37 <yanyanhu> to improve the convenience 13:43:00 <Qiming> you will create several apis to manage the scripts 13:43:16 <yanyanhu> Qiming, or maybe just a client side support 13:43:47 <yanyanhu> to avoid enforcing user define too many parameters in command line 13:43:57 <Qiming> it is the same 13:44:09 <Qiming> just ... where you are putting your parameters 13:44:23 <yanyanhu> yes 13:44:26 <yanyanhu> seems so 13:44:29 <Qiming> if you tried 'glance image-create', you know what I mean 13:44:38 <yanyanhu> yea 13:45:04 <Qiming> so, please feel free to add items to the agenda 13:45:17 <yanyanhu> sure 13:45:25 <Qiming> I'll review the etherpad tomorrow and conclude with a number to feedback ttx 13:45:29 <yanyanhu> will think about it 13:45:33 <Qiming> thanks 13:45:38 <Qiming> #topic open discussion 13:46:52 <Qiming> we are freezing senlinclient this week 13:47:15 <Qiming> any topics/patches you want to merge before we cut a release? 13:47:42 <yanyanhu> Qiming, the message receiver support has been there, no more item from my side 13:47:47 <Qiming> one thing I can think of is about dumping out the action ID for all requests that return a pointer to the action 13:48:08 <Qiming> we were not so consistent on this before, there have been some complaints on this 13:48:46 <Qiming> sometimes we say "request accepted", sometimes we say "request accepted by actoin <action id>" 13:49:02 <Qiming> that is something we can improve 13:49:15 <Qiming> also the 'deprecation warning' is a little bit confusing 13:49:37 <Qiming> we can explicitly say WHEN it will deprecate 13:49:48 <Qiming> it should be April 2017 13:49:53 <Qiming> two cycles 13:50:08 <Qiming> and we will get back the '--profile' option from openstackclient 13:50:12 <Qiming> by then 13:50:47 <Qiming> anything else? 13:51:20 <Qiming> seems we are done? 13:51:21 <yanyanhu> nope 13:51:35 <xuhaiwei_> no 13:51:35 <Qiming> thanks for joining, everyone 13:51:49 <yanyanhu> thanks, have a good night 13:51:54 <Qiming> wish you all a sweet dream and a wet bed 13:51:55 <xuhaiwei_> thanks 13:52:03 <Qiming> bye 13:52:03 <yanyanhu> :) 13:52:08 <Qiming> #endmeeting