#openstack-meeting log

13:01:47 <Qiming> #startmeeting senlin
13:01:48 <openstack> Meeting started Tue Mar  8 13:01:47 2016 UTC and is due to finish in 60 minutes.  The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:01:49 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:01:51 <openstack> The meeting name has been set to 'senlin'
13:01:55 <Qiming> halo
13:02:00 <haiwei> hi
13:02:02 <zzxwill> Hello.
13:02:06 <yanyanhu> hello
13:02:09 <elynn> o/
13:02:13 <Qiming> got some problems updating the meeting agenda page
13:02:15 <Qiming> sorry
13:02:47 <Qiming> also got a note from xinhui, she won't be able to join us today, because of some company activities ... parties?
13:02:56 <Qiming> anyway, let's get started
13:03:03 <yanyanhu> also can't access it even with proxy
13:03:03 <Qiming> #topic mitaka work items
13:03:08 <yanyanhu> :)
13:03:21 <Qiming> #link https://etherpad.openstack.org/p/senlin-mitaka-workitems
13:03:22 <haiwei> me too
13:03:46 <Qiming> great, site is down, :), not my fault
13:04:12 <Qiming> I was trying to reach xujun about scalability testing, but haven't got a response yet
13:04:35 <Qiming> need an update from bran on stress testing
13:04:54 <yanyanhu> yes, he is not here I think
13:04:58 <Qiming> I myself spent some time on studying tempest
13:05:25 <Qiming> seems tempest is still the way to do api tests, but the code is suggested to live closer to individual projects
13:05:56 <Qiming> so even we start work on that, we are not supposed to commit to tempest directly, will need to confirm this
13:06:09 <yanyanhu> ok
13:06:10 <haiwei> only some core projects' function test are remained in tempest
13:06:31 <Qiming> if the code is to be commited to senlin, then we are still expected to use tempest lib
13:06:49 <Qiming> tempest lib was a separate project, but recently it has been merged back to tempest
13:06:58 <Qiming> ... a lot things happen every day
13:07:28 <Qiming> haiwei, it will be good if some one from NEC can help explain to us the right direction to go
13:07:35 <haiwei> you want to add API test?
13:07:41 <Qiming> I know you have some guys active on that
13:07:46 <haiwei> yes
13:08:01 <Qiming> we are also teaching us the right way to do stress tests
13:08:20 <Qiming> api test, stress test and scenario test were all scope of tempest before
13:08:27 <haiwei> like I have said in mid-cycle, there is some thing called tempest external plugin
13:08:27 <Qiming> but I don't know the current situation
13:08:52 <Qiming> okay, then that is the direction to go for api test
13:09:00 <Qiming> any idea about stress tests?
13:09:04 <haiwei> the tempest external plugin is something for scenario test
13:09:09 <Qiming> are we supposed to invent our own wheel?
13:09:24 <Qiming> scenario test is different from api tests
13:09:33 <haiwei> for stress test, not heard it before
13:09:43 <haiwei> I am not sure if tempest supports it
13:09:54 <Qiming> http://git.openstack.org/cgit/openstack/tempest/tree/tempest/README.rst
13:10:02 <Qiming> line 24 is about api test
13:10:08 <Qiming> line 39 is about scenario
13:10:13 <Qiming> line 50 is about stress test
13:10:22 <haiwei> oh, saw it
13:10:41 <haiwei> ok, I will ask the tempest guys tomorrow
13:10:51 <Qiming> not sure that is still the consensus
13:11:10 <haiwei> or ask him to join Senlin IRC
13:11:13 <Qiming> if we don't get an answer from them, we have to ask on -dev list
13:11:23 <haiwei> yes
13:11:23 <Qiming> okay, either way works
13:11:49 <Qiming> yanyanhu is still on functional test?
13:12:04 <Qiming> saw some patches about profile level support to updates
13:12:06 <yanyanhu> yes. but basic support is almost done
13:12:14 <Qiming> great
13:12:33 <Qiming> api surface test is important
13:12:40 <yanyanhu> yea, that's what I hoped to do, but met some issues
13:13:00 <Qiming> those are supposed to test how the service fails in addition to how it succeeds
13:13:01 <yanyanhu> that I don't know how to address
13:13:18 <Qiming> okay, maybe we need some experts in this field
13:13:28 <yanyanhu> yes, we need to cover failure cases as well
13:13:39 <Qiming> testing is supposed to be our focus in the coming weeks
13:14:00 <Qiming> next, health management
13:14:18 <Qiming> we have a basic version of health manager and a basic health policy
13:14:42 <Qiming> need some tests on them to make sure they work, so that at least we can remove line 16
13:15:31 <Qiming> lb based health detection, yanyanhu do you know the progress?
13:15:40 <yanyanhu> it's done I think
13:15:46 <yanyanhu> oh, sorry
13:15:52 <yanyanhu> you mean health detection
13:15:57 <Qiming> health monitor support part is done
13:16:11 <yanyanhu> not sure about it. Just ensure the HM support in lb policy works now
13:16:12 <Qiming> we need a poller to check it, right?
13:16:13 <yanyanhu> yep
13:16:25 <yanyanhu> yes, I think so
13:16:45 <yanyanhu> the poller need to check the health status of pool member
13:17:04 <yanyanhu> to decide whether it is active or not
13:17:13 <Qiming> yesterday, I noticed there have been some talks about HA in this channel, it turns out to be a HA team meeting
13:17:47 <Qiming> yanyanhu, that is already good because we don't have to check individual nodes
13:18:13 <yanyanhu> yes
13:18:22 <Qiming> in that meeting, I gave people a quick update on senlin, what it is doing on HA
13:18:49 <Qiming> hopefully, we can work with more people on this to solve some real concerns from enterprise users
13:19:12 <Qiming> this will also be one of the subtopic in lixinhui's summit presentation
13:19:42 <Qiming> she is working very hard on that, yesterday, she was working at 11pm ...
13:19:54 <yanyanhu> hard worker :)
13:20:01 <Qiming> hope we can have some thing to share soon
13:20:08 <yanyanhu> she is back from party I think
13:20:19 <lixinhui_> yes...
13:20:31 <Qiming> basic problem is about neutron, lbaas and octavia integration
13:20:32 <cschulz> Spoke with Marco on HA back a bit.  He said that simple restart of failed node was probably place to start.
13:20:34 <lixinhui_> sorry for late...
13:20:51 <Qiming> thanks, cschulz, for the input
13:21:10 <Qiming> we are wondering if recovery should be really automated
13:21:33 <lixinhui_> nice to catch this
13:21:40 <Qiming> there were some proponents on auto-recovery, not matter what the "recover" operation is
13:22:25 <Qiming> we can dig on that when we have some basic working code
13:22:29 <cschulz> I agree that recover is a process that could be different for different customers.
13:23:17 <Qiming> customizability is always desired, however, we have to control the degree to which we want it to be customizable
13:23:36 <yanyanhu> it really depends on the use case I think. Especially whether the app/service deployed in VM can restore automatically after restarting
13:24:14 <Qiming> I was thinking of the standby cluster support
13:24:23 <cschulz> Yes maybe we should have a list of common recovery procedures initially
13:24:41 <Qiming> having a standby cluster will speed up 'auto-scaling' and 'recovery'
13:24:59 <Qiming> but it will waste some resources for sure, :)
13:25:31 <cschulz> Depending on the accounting process, it may waste a lot of $
13:25:38 <yanyanhu> some nodes with role of 'backup'
13:25:44 <Qiming> for nova servers, we can do reboot, rebuild, evacuate, recreate
13:26:01 <Qiming> in some cases, you will need fencing
13:26:30 <Qiming> there is no fencing API on openstack, which means, fencing has to be a per-device-model thing
13:26:34 <cschulz> Yes and in some cases there will be a cluster manager to get informed.
13:27:33 <Qiming> exactly, we need a division of the problem domain, work out solutions step by step
13:27:55 <Qiming> starting from the basics
13:28:05 <Qiming> how about an etherpad for this?
13:28:17 <lixinhui_> okay
13:28:26 <cschulz> OK
13:28:33 <lixinhui_> will get one up for this discussion
13:28:39 <Qiming> that way we can collect many inputs
13:28:46 <Qiming> thanks, lixinhui_
13:28:53 <Qiming> let's move on
13:28:54 <lixinhui_> :)
13:29:11 <Qiming> documentation will be my main focus in the coming weeks
13:29:26 <Qiming> documenting use cases and the design of policies
13:29:57 <Qiming> end-to-end autoscaling story
13:30:05 <Qiming> I think xinhui is working on one
13:30:11 <lixinhui_> I am prototyping this
13:30:19 <Qiming> https://www.openstack.org/summit/austin-2016/summit-schedule/events/7469
13:30:28 <lixinhui_> still some problem with neutron lbaas v2
13:30:29 <Qiming> congrat's on your talk being accepted
13:30:47 <lixinhui_> thanks for wisdom from all of you
13:31:17 <Qiming> lixinhui_, any specifics we can help?
13:31:49 <lixinhui_> Thanks for suggestions on alarm side from you and yanyanhu
13:31:52 <yanyanhu> lixinhui_, can ceilometer generate samples of lbaas v2 pool correctly?
13:32:21 <lixinhui_> accoring to my trial based half month ago
13:32:24 <lixinhui_> it works
13:32:28 <lixinhui_> but recently
13:32:29 <yanyanhu> nice!
13:32:33 <lixinhui_> neutron can not
13:32:49 <lixinhui_> create lbaas v2 loadbalancer successfully
13:32:58 <lixinhui_> always "pending to create"
13:33:10 <yanyanhu> a new bug?
13:33:14 <lixinhui_> I am blocked by this problem
13:33:20 <lixinhui_> need to search more
13:33:35 <Qiming> ah, I see, let's spend some time together tomorrow on this
13:33:39 <yanyanhu> I recalled I met similar issue before, when using lbaas v1
13:33:45 <Qiming> seems to me like a VM creation problem
13:33:52 <yanyanhu> it was caused by incorrect haproxy configuration
13:33:57 <lixinhui_> Thanks in advance
13:34:12 <Qiming> okay, will be online for this tomorrow
13:34:14 <yanyanhu> basically, the driver of lbaas didn't work correctly
13:34:25 <Qiming> next, profile for container support
13:34:26 <yanyanhu> not sure whether this is the problem you met
13:34:43 <Qiming> the profile part is easy ...
13:34:50 <Qiming> the difficult part is about scheduling
13:35:03 <Qiming> when we have new containers to create, we have to specify a 'host' for it
13:35:24 <Qiming> (let's pretend we don't have network/storage problems at the moment)
13:35:28 <haiwei> you will start containers on vms?
13:35:38 <Qiming> the 'host' could be a VM, it could be a physical machine
13:35:45 <haiwei> ok
13:36:10 <Qiming> let's start with container inside VMs, which seems a common scenario today
13:36:27 <Qiming> the VMs are created by senlin or other services
13:36:36 <Qiming> we need to do some kind of scheduling
13:37:05 <Qiming> one option is to introduce mesos-alike framework, including mesos agents into guest images
13:37:19 <haiwei> If the containers run on vms, there will be two kinds of clusters at the same time
13:37:26 <Qiming> then we will still need a scheduler, e.g. marathon
13:37:44 <Qiming> users don't need to care
13:37:56 <Qiming> we can manage the hosts for the containers
13:38:12 <Qiming> in some use cases I heard of
13:38:45 <Qiming> people build a huge resource pool to run containers, but that huge resource (vm or physical) pool is transparent to users
13:39:20 <Qiming> one option, discussed with yanyanhu today, is to do some resource based 'host' selection
13:39:34 <Qiming> that will give us a very simple starting point to go forward
13:40:01 <haiwei> the 'user' does not include cloud operator ?
13:40:18 <Qiming> we can leverage ceilometer (or others) to monitor per-vm resource availability and find a candidate node to launch the container
13:40:41 <Qiming> cloud operator will know the underlying cluster of VMs
13:41:00 <Qiming> they may even need to autoscale this VM cluster to accommodate more containers
13:41:59 <Qiming> we can start getting our hands dirty and see if this is just another 'placement' policy
13:42:40 <Qiming> btw, congrat's to haiwei's talk proposal being accepted
13:43:10 <Qiming> we are gonna help make that presentation a successful one
13:43:14 <haiwei> currently I can't think out one way to auto-scale containers to the vms which are not controlled by Senlin
13:43:32 <haiwei> Thank you for the help for session proposal
13:43:46 <Qiming> there are two levels of entities to manage
13:43:48 <Qiming> the VM pool
13:43:53 <Qiming> the container pool
13:43:54 <haiwei> I mean that vms seems not under control of Senlin
13:44:01 <yanyanhu> actually, for that scenario, VM cluster have to be managed by Senlin I think
13:44:18 <Qiming> well, they may and may not be
13:44:30 <Qiming> you just need to know their IP
13:44:52 <Qiming> and possibly some secrets to log into them
13:45:25 <Qiming> having senlin manage the two clusters makes total sense to me
13:45:37 <Qiming> was just trying to be open minded on this
13:45:47 <haiwei> ok
13:45:55 <Qiming> haiwei, can you help revise the specs
13:46:05 <Qiming> we can start drilling down the details?
13:46:09 <haiwei> ok, I will do it
13:46:16 <yanyanhu> yes, resource(host) finding progress become necessary if the VM cluster is not created by Senlin
13:46:24 <Qiming> or if a spec is not the right tool, we can use etherpads as well
13:46:39 <haiwei> I can't spend much time on that recently, I will try my best to do it
13:47:27 <Qiming> we need careful preparations for all the presentations
13:47:49 <haiwei> yes, indeed
13:48:02 <Qiming> okay
13:48:14 <Qiming> last item, rework NODE_CREATE/DELETE
13:48:22 <Qiming> it has been there for a long time
13:48:39 <Qiming> let's keep them there as is, :P
13:49:01 <Qiming> driver work
13:49:15 <yanyanhu> have been done
13:49:24 <Qiming> it was much smoother than we had thought
13:49:27 <yanyanhu> just need a little change in neutron driver
13:49:33 <Qiming> that was great
13:49:34 <yanyanhu> yep :P
13:49:39 <Qiming> btw
13:49:45 <lixinhui_> cool!
13:49:53 <Qiming> we just relased mitaka-3 milestone for senlin and senlinclient last week
13:50:13 <Qiming> that is a milestone for everyone
13:50:25 <Qiming> thank you all for your contributions during to past months
13:50:28 <Qiming> we made it
13:50:45 <Qiming> #topic open discussion
13:50:55 <lixinhui_> Qiming, I remember there is still a patch on senlinclient about check/recover, once
13:51:13 <lixinhui_> once you -2 to avoid merge by accidently
13:51:26 <lixinhui_> not sure we will merge it or not
13:51:58 <Qiming> oh, yes
13:51:59 <lixinhui_> at that time, the patch is blocked by sdk version
13:52:31 <Qiming> that one can be unblocked now
13:52:43 <lixinhui_> ok
13:52:53 <Qiming> should have merge it into m-3
13:53:03 <haiwei> what about SDK's issue
13:53:27 <Qiming> sdk version has been bumped into 0.8.1
13:54:23 <yanyanhu> global requirement has been updated
13:54:26 <Qiming> https://review.openstack.org/#/c/285599/
13:55:22 <Qiming> anything else?
13:55:33 <yanyanhu> nope from me
13:55:36 <lixinhui_> no
13:55:39 <elynn> no
13:55:43 <cschulz> no
13:55:52 <Qiming> thanks everyone for joining
13:55:52 <haiwei> no
13:56:01 <Qiming> good night/day
13:56:06 <Qiming> #endmeeting