#openstack-meeting log

12:59:57 <Qiming> #startmeeting senlin
12:59:57 <openstack> Meeting started Tue Aug 30 12:59:57 2016 UTC and is due to finish in 60 minutes.  The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot.
12:59:59 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:00:02 <openstack> The meeting name has been set to 'senlin'
13:00:28 <Qiming> evening
13:01:19 <yanyanhu> hi
13:01:54 <Qiming> hi, wait a few minutes and see if anyone else is joining
13:01:59 <yanyanhu> ok
13:03:02 <elynn> o/
13:03:10 <yanyanhu> hi, elynn
13:03:13 <Qiming> hi, elynn and guoshan
13:03:18 <Qiming> not sure if others are joining
13:03:29 <Qiming> let's get started
13:03:31 <Qiming> #topic newton work items
13:03:32 <qwebirc78218> hello
13:03:56 <Qiming> hi, qwebirc78218
13:03:56 <yanyanhu> hi, qwebirc78218
13:03:58 <Qiming> #link https://etherpad.openstack.org/p/senlin-newton-workitems
13:04:05 <Qiming> performance testing, any progress?
13:04:10 <yanyanhu> yes
13:04:27 <yanyanhu> roman has put +2 on the profile context patch
13:04:33 <yanyanhu> need another +2 and workflow
13:04:48 <Qiming> need to ping rally core?
13:04:49 <yanyanhu> once this patch is merged, will add context for cluster as well
13:05:11 <yanyanhu> Qiming, yes, maybe wait for another one or two days
13:05:15 <Qiming> okay
13:05:22 <Qiming> integration test side, https://review.openstack.org/#/c/354566/
13:05:31 <yanyanhu> good news is it works now
13:05:32 <Qiming> still waiting for another core to approve
13:05:42 <yanyanhu> Qiming, yes, for adding zaqar support
13:05:54 <yanyanhu> but at least we can rely on it to make some basic verifications
13:05:57 <Qiming> okay, that is not urgent
13:06:02 <yanyanhu> yes
13:06:12 <Qiming> basic verification passed, that is great
13:06:18 <yanyanhu> yep
13:06:27 <Qiming> health policy side
13:06:40 <Qiming> LB based health detection is still not there
13:06:46 <Qiming> not sure if xinhui is still pushing it
13:07:03 <Qiming> she has been working on fencing nova compute host
13:07:18 <Qiming> experimenting with IPMI drivers
13:07:58 <Qiming> the only problem in that direction is nova is not emitting a notification if nova-compute is down
13:08:32 <Qiming> there are notifications if the compute service is shut down by operators, but if the compute host is down, there is no notification
13:08:36 <Qiming> that is too bad
13:08:46 <Qiming> so the only workaround, as of today, would be a poller
13:09:15 <lixinhui_> you have confirmed that
13:09:21 <yanyanhu> poller sounds reasonable for this scenario
13:09:25 <Qiming> so ... I'm not sure if we should (in Ocata release) make health manager a separate service
13:09:31 <Qiming> yes, lixinhui_, confirmed
13:09:35 <Qiming> thanks for joining
13:09:51 <lixinhui_> sorry for late
13:09:55 <Qiming> that is a stupid design, hopefully we can help improve it if we get cycles
13:10:20 <Qiming> other improvements to health policy is about the recover/check workflow revision
13:10:26 <Qiming> mostly are done now
13:11:02 <Qiming> the policy can now suspend itself if node deletion was initiated from a RPC request instead of a failure detected
13:11:13 <Qiming> that part is also done
13:11:25 <Qiming> I was thinking of make the policy a little bit smarter
13:11:26 <yanyanhu> great
13:12:03 <Qiming> if you look at this: http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/health_manager.py#n61
13:12:20 <Qiming> when a node is down and get detected
13:12:47 <Qiming> we actually are sending this info as params when invoking the node_recover API
13:13:12 <Qiming> the policy can be improved to handle different 'event' and/or 'state' a little bit smarter
13:13:27 <lixinhui_> good point
13:13:27 <Qiming> say if a node is in SHUTDOWN state, the policy can try just 'reboot' it
13:13:36 <Qiming> or 'start' it
13:13:55 <Qiming> this is still an imagination, have to wait for the nova server operations patch merged into sdk
13:14:19 <Qiming> profile/policy version
13:14:35 <Qiming> yanyan has been working on a 'workaround'
13:14:47 <yanyanhu> yes, basic versioning support for schema and spec has been there
13:14:58 <Qiming> I'm calling it a 'workaround' because ... versioning is pretty big a problem to solve
13:15:07 <Qiming> we'll get back to that later
13:15:18 <yanyanhu> but I think we have a lot more detail to figure out before deciding how to support policy/profile version control
13:15:21 <Qiming> container support
13:15:23 <yanyanhu> yes
13:15:29 <Qiming> correct
13:15:38 <Qiming> haiwei's patch is finally in
13:15:49 <yanyanhu> yes, long run...
13:16:00 <Qiming> he is now experimenting specifying a host_cluster when creating container clusters
13:16:05 <Qiming> good luck ...
13:16:32 <Qiming> with that work as a starting point, we may want to discuss how to proceed as next step
13:16:52 <Qiming> haven't got time to review his new spec proposal though
13:17:09 <yanyanhu> better have a session in summit to discuss this topic
13:17:11 <Qiming> but I'd like to call a cross project discussion with magnum/zun on this
13:17:16 <Qiming> right
13:17:27 <yanyanhu> Qiming, sure, that will be the best
13:17:56 <Qiming> receiver side, yanyan has been working on zaqar support
13:18:15 <Qiming> please delete the items that are done
13:18:15 <yanyanhu> Qiming, yes
13:18:25 <yanyanhu> sure
13:18:43 <yanyanhu> the initial part has been merged today
13:18:46 <Qiming> hopefully, zaqar can bring in a more secure, more flexible channel for users/services to send signals to senlin
13:18:54 <yanyanhu> yes
13:19:03 <Qiming> that was another marathon
13:19:26 <Qiming> okay, anything else on the etherpad page?
13:19:32 <yanyanhu> looks so. hopefully we can have a basic version that works before cut our release
13:19:39 <Qiming> this week is the week to cut newton-3 release
13:19:58 <Qiming> I don't want to do it on Friday, too risky, when the gate is so jammed
13:20:11 <yanyanhu> ah, hope to catch rc1
13:20:29 <Qiming> we have the flexibility to merge more stable features in next few weeks
13:20:42 <Qiming> because we don't have a huge pipeline for review/debate
13:21:00 <yanyanhu> good news
13:21:08 <Qiming> okay, moving on to next topic
13:21:20 <Qiming> #topic health checking update
13:21:29 <Qiming> em ... I have basically covered that
13:21:42 <yanyanhu> yep
13:21:49 <Qiming> mostly about the check/recover workflow and the handling of different actions in the policy
13:21:57 <Qiming> there is still a feature not implemented
13:22:15 <Qiming> we were hoping that the recover action can be a list of operations for the profile to try
13:22:44 <Qiming> currently, the profile (nova in particular) only understand REBUILD, and the generic profile only handles RECREATE
13:23:01 <Qiming> that would be an interesting work for future
13:23:12 <Qiming> evening, xuhaiwei_
13:23:19 <Qiming> #topic cluster status update
13:23:21 <xuhaiwei_> hi, Qiming
13:23:56 <Qiming> if you are watching the gerrit notifications, you will notice that I have been working on cluster status update fix these two days
13:24:02 <xuhaiwei_> kept silent to not disturb you:)
13:24:19 <Qiming> the basic idea is this: we will update cluster status, based on the status of the member nodes
13:24:30 <Qiming> NOT based on the last operation performed on it
13:24:55 <Qiming> e.g. a CLUSTER_UPDATE operation may fail, but the cluster may still remain ACTIVE
13:25:07 <Qiming> we have to differentiate this two things
13:25:49 <Qiming> A CLUSTER_SCALE_OUT may fail, but that failure is an action failure, it doesn't mean the cluster is not operable
13:26:08 <Qiming> I think this series of patches is near an end
13:26:38 <Qiming> when making these changes, I also changed the modifcation of 'desired_capacity'
13:27:01 <Qiming> we were changing the 'desired_capacity' after an action is completed, but that is WRONG
13:27:08 <Qiming> it has been reported several times
13:27:21 <yanyanhu> yes, saw that patch, that is reasonable
13:27:29 <yanyanhu> especially from ha perspective
13:27:29 <Qiming> so I was also making that happen before the action is executed
13:27:54 <Qiming> when a request arrives, the user's expectation is the desired_capacity
13:28:13 <Qiming> if the engine failes to perform the action, it should not change user's expectation
13:28:28 <Qiming> that was a simple logic, but we unfortunately learned it in a hard way
13:28:42 <Qiming> questions/comments on this?
13:29:09 <Qiming> seems a no
13:29:15 <qwebirc78218> sorry to break
13:29:20 <qwebirc78218> can i ask a question
13:29:23 <Qiming> sure
13:29:34 <qwebirc78218> last time, i create a node but failed
13:29:42 <qwebirc78218> so i recovered it
13:30:14 <qwebirc78218> but the desire capacity is still 0
13:30:36 <Qiming> yep, that is exactly one of the problems we are fixing
13:30:37 <qwebirc78218> is that should be 1
13:31:00 <Qiming> when you are creating a node, the desired capacity should be incremented by 1
13:31:03 <qwebirc78218> okey, thanks for answering
13:31:06 <Qiming> even if the node creation was a failure
13:31:41 <Qiming> 'increment the cluster size by one', that is the user's (your) desire
13:31:48 <Qiming> we should handle it differently
13:31:53 <Qiming> thanks for brining this up
13:32:06 <Qiming> moving on
13:32:18 <Qiming> #topic ocata design summit sessions
13:32:29 <Qiming> #link https://etherpad.openstack.org/p/ocata-senlin-sessions
13:33:01 <yanyanhu> have put my name on profile/policy versioning
13:33:01 <Qiming> I was just dumping some topics above my head
13:33:29 <Qiming> policy/profile versioning definitely needs some discussion
13:33:37 <Qiming> even before/after that session
13:33:39 <yanyanhu> yes
13:33:56 <Qiming> maybe combined with Topic 4
13:34:05 <Qiming> "versioned everything"
13:34:11 <yanyanhu> Qiming, yes, topic 4 can be a extentional discussion
13:34:24 <Qiming> yep, we cannot finish that in one session
13:34:31 <Qiming> maybe we need two slots
13:34:50 <yanyanhu> yes, if we have enough time slot
13:34:57 <Qiming> topic 2 is about health
13:35:25 <Qiming> we have some preliminary support now, next step is to make it work in production environments
13:35:37 <Qiming> it is a huge problem space
13:35:51 <Qiming> we have to brainstorm the working items and prioritize them
13:36:17 <Qiming> maybe involve a congress extension or mistral workflow
13:36:21 <Qiming> i just don't now
13:36:26 <Qiming> s/now/know
13:36:42 <Qiming> the 3rd topic I can think of is about container clustering
13:37:10 <Qiming> haiwei has set a stage for us, where are we heading next?
13:37:13 <xuhaiwei_> Maybe I can be the driver
13:37:27 <Qiming> that would be excellent
13:37:52 <xuhaiwei_> I didn't spend enough time on it up to now, will try  to do more things before the summit
13:38:30 <Qiming> so ...
13:38:36 <Qiming> any more ideas you can think of?
13:38:37 <xuhaiwei_> first should let the container going
13:39:10 <Qiming> or we can just let ttx know that we need 4 working sessions?
13:39:24 <yanyanhu> I guess another topic may worth to discuss is cluster do operation?
13:39:41 <Qiming> okay
13:39:43 <yanyanhu> altough we already have some basic idea for it. but may need to figure out the detail
13:39:48 <yanyanhu> and also use case
13:40:15 <Qiming> openstack cluster do reboot
13:40:32 <xuhaiwei_> I updated the spec a few days ago, hope you can review it https://review.openstack.org/#/c/281102/
13:41:05 <Qiming> we already support 'openstack cluster run --script <script> --network private --address-type private --identity-file <file> --user fedora <cluster_name>
13:41:37 <Qiming> terribly sorry, haiwei, will jump onto it tomorrow
13:41:51 <xuhaiwei_> ok
13:41:52 <Qiming> cluster do is more about actions supported by a profile type
13:42:07 <yanyanhu> yes, maybe we can support cluster run with a template as input :)
13:42:37 <yanyanhu> to improve the convenience
13:43:00 <Qiming> you will create several apis to manage the scripts
13:43:16 <yanyanhu> Qiming, or maybe just a client side support
13:43:47 <yanyanhu> to avoid enforcing user define too many parameters in command line
13:43:57 <Qiming> it is the same
13:44:09 <Qiming> just ... where you are putting your parameters
13:44:23 <yanyanhu> yes
13:44:26 <yanyanhu> seems so
13:44:29 <Qiming> if you tried 'glance image-create', you know what I mean
13:44:38 <yanyanhu> yea
13:45:04 <Qiming> so, please feel free to add items to the agenda
13:45:17 <yanyanhu> sure
13:45:25 <Qiming> I'll review the etherpad tomorrow and conclude with a number to feedback ttx
13:45:29 <yanyanhu> will think about it
13:45:33 <Qiming> thanks
13:45:38 <Qiming> #topic open discussion
13:46:52 <Qiming> we are freezing senlinclient this week
13:47:15 <Qiming> any topics/patches you want to merge before we cut a release?
13:47:42 <yanyanhu> Qiming, the message receiver support has been there, no more item from my side
13:47:47 <Qiming> one thing I can think of is about dumping out the action ID for all requests that return a pointer to the action
13:48:08 <Qiming> we were not so consistent on this before, there have been some complaints on this
13:48:46 <Qiming> sometimes we say "request accepted", sometimes we say "request accepted by actoin <action id>"
13:49:02 <Qiming> that is something we can improve
13:49:15 <Qiming> also the 'deprecation warning' is a little bit confusing
13:49:37 <Qiming> we can explicitly say WHEN it will deprecate
13:49:48 <Qiming> it should be April 2017
13:49:53 <Qiming> two cycles
13:50:08 <Qiming> and we will get back the '--profile' option from openstackclient
13:50:12 <Qiming> by then
13:50:47 <Qiming> anything else?
13:51:20 <Qiming> seems we are done?
13:51:21 <yanyanhu> nope
13:51:35 <xuhaiwei_> no
13:51:35 <Qiming> thanks for joining, everyone
13:51:49 <yanyanhu> thanks, have a good night
13:51:54 <Qiming> wish you all a sweet dream and a wet bed
13:51:55 <xuhaiwei_> thanks
13:52:03 <Qiming> bye
13:52:03 <yanyanhu> :)
13:52:08 <Qiming> #endmeeting