13:01:47 #startmeeting senlin 13:01:48 Meeting started Tue Mar 8 13:01:47 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:01:49 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:01:51 The meeting name has been set to 'senlin' 13:01:55 halo 13:02:00 hi 13:02:02 Hello. 13:02:06 hello 13:02:09 o/ 13:02:13 got some problems updating the meeting agenda page 13:02:15 sorry 13:02:47 also got a note from xinhui, she won't be able to join us today, because of some company activities ... parties? 13:02:56 anyway, let's get started 13:03:03 also can't access it even with proxy 13:03:03 #topic mitaka work items 13:03:08 :) 13:03:21 #link https://etherpad.openstack.org/p/senlin-mitaka-workitems 13:03:22 me too 13:03:46 great, site is down, :), not my fault 13:04:12 I was trying to reach xujun about scalability testing, but haven't got a response yet 13:04:35 need an update from bran on stress testing 13:04:54 yes, he is not here I think 13:04:58 I myself spent some time on studying tempest 13:05:25 seems tempest is still the way to do api tests, but the code is suggested to live closer to individual projects 13:05:56 so even we start work on that, we are not supposed to commit to tempest directly, will need to confirm this 13:06:09 ok 13:06:10 only some core projects' function test are remained in tempest 13:06:31 if the code is to be commited to senlin, then we are still expected to use tempest lib 13:06:49 tempest lib was a separate project, but recently it has been merged back to tempest 13:06:58 ... a lot things happen every day 13:07:28 haiwei, it will be good if some one from NEC can help explain to us the right direction to go 13:07:35 you want to add API test? 13:07:41 I know you have some guys active on that 13:07:46 yes 13:08:01 we are also teaching us the right way to do stress tests 13:08:20 api test, stress test and scenario test were all scope of tempest before 13:08:27 like I have said in mid-cycle, there is some thing called tempest external plugin 13:08:27 but I don't know the current situation 13:08:52 okay, then that is the direction to go for api test 13:09:00 any idea about stress tests? 13:09:04 the tempest external plugin is something for scenario test 13:09:09 are we supposed to invent our own wheel? 13:09:24 scenario test is different from api tests 13:09:33 for stress test, not heard it before 13:09:43 I am not sure if tempest supports it 13:09:54 http://git.openstack.org/cgit/openstack/tempest/tree/tempest/README.rst 13:10:02 line 24 is about api test 13:10:08 line 39 is about scenario 13:10:13 line 50 is about stress test 13:10:22 oh, saw it 13:10:41 ok, I will ask the tempest guys tomorrow 13:10:51 not sure that is still the consensus 13:11:10 or ask him to join Senlin IRC 13:11:13 if we don't get an answer from them, we have to ask on -dev list 13:11:23 yes 13:11:23 okay, either way works 13:11:49 yanyanhu is still on functional test? 13:12:04 saw some patches about profile level support to updates 13:12:06 yes. but basic support is almost done 13:12:14 great 13:12:33 api surface test is important 13:12:40 yea, that's what I hoped to do, but met some issues 13:13:00 those are supposed to test how the service fails in addition to how it succeeds 13:13:01 that I don't know how to address 13:13:18 okay, maybe we need some experts in this field 13:13:28 yes, we need to cover failure cases as well 13:13:39 testing is supposed to be our focus in the coming weeks 13:14:00 next, health management 13:14:18 we have a basic version of health manager and a basic health policy 13:14:42 need some tests on them to make sure they work, so that at least we can remove line 16 13:15:31 lb based health detection, yanyanhu do you know the progress? 13:15:40 it's done I think 13:15:46 oh, sorry 13:15:52 you mean health detection 13:15:57 health monitor support part is done 13:16:11 not sure about it. Just ensure the HM support in lb policy works now 13:16:12 we need a poller to check it, right? 13:16:13 yep 13:16:25 yes, I think so 13:16:45 the poller need to check the health status of pool member 13:17:04 to decide whether it is active or not 13:17:13 yesterday, I noticed there have been some talks about HA in this channel, it turns out to be a HA team meeting 13:17:47 yanyanhu, that is already good because we don't have to check individual nodes 13:18:13 yes 13:18:22 in that meeting, I gave people a quick update on senlin, what it is doing on HA 13:18:49 hopefully, we can work with more people on this to solve some real concerns from enterprise users 13:19:12 this will also be one of the subtopic in lixinhui's summit presentation 13:19:42 she is working very hard on that, yesterday, she was working at 11pm ... 13:19:54 hard worker :) 13:20:01 hope we can have some thing to share soon 13:20:08 she is back from party I think 13:20:19 yes... 13:20:31 basic problem is about neutron, lbaas and octavia integration 13:20:32 Spoke with Marco on HA back a bit. He said that simple restart of failed node was probably place to start. 13:20:34 sorry for late... 13:20:51 thanks, cschulz, for the input 13:21:10 we are wondering if recovery should be really automated 13:21:33 nice to catch this 13:21:40 there were some proponents on auto-recovery, not matter what the "recover" operation is 13:22:25 we can dig on that when we have some basic working code 13:22:29 I agree that recover is a process that could be different for different customers. 13:23:17 customizability is always desired, however, we have to control the degree to which we want it to be customizable 13:23:36 it really depends on the use case I think. Especially whether the app/service deployed in VM can restore automatically after restarting 13:24:14 I was thinking of the standby cluster support 13:24:23 Yes maybe we should have a list of common recovery procedures initially 13:24:41 having a standby cluster will speed up 'auto-scaling' and 'recovery' 13:24:59 but it will waste some resources for sure, :) 13:25:31 Depending on the accounting process, it may waste a lot of $ 13:25:38 some nodes with role of 'backup' 13:25:44 for nova servers, we can do reboot, rebuild, evacuate, recreate 13:26:01 in some cases, you will need fencing 13:26:30 there is no fencing API on openstack, which means, fencing has to be a per-device-model thing 13:26:34 Yes and in some cases there will be a cluster manager to get informed. 13:27:33 exactly, we need a division of the problem domain, work out solutions step by step 13:27:55 starting from the basics 13:28:05 how about an etherpad for this? 13:28:17 okay 13:28:26 OK 13:28:33 will get one up for this discussion 13:28:39 that way we can collect many inputs 13:28:46 thanks, lixinhui_ 13:28:53 let's move on 13:28:54 :) 13:29:11 documentation will be my main focus in the coming weeks 13:29:26 documenting use cases and the design of policies 13:29:57 end-to-end autoscaling story 13:30:05 I think xinhui is working on one 13:30:11 I am prototyping this 13:30:19 https://www.openstack.org/summit/austin-2016/summit-schedule/events/7469 13:30:28 still some problem with neutron lbaas v2 13:30:29 congrat's on your talk being accepted 13:30:47 thanks for wisdom from all of you 13:31:17 lixinhui_, any specifics we can help? 13:31:49 Thanks for suggestions on alarm side from you and yanyanhu 13:31:52 lixinhui_, can ceilometer generate samples of lbaas v2 pool correctly? 13:32:21 accoring to my trial based half month ago 13:32:24 it works 13:32:28 but recently 13:32:29 nice! 13:32:33 neutron can not 13:32:49 create lbaas v2 loadbalancer successfully 13:32:58 always "pending to create" 13:33:10 a new bug? 13:33:14 I am blocked by this problem 13:33:20 need to search more 13:33:35 ah, I see, let's spend some time together tomorrow on this 13:33:39 I recalled I met similar issue before, when using lbaas v1 13:33:45 seems to me like a VM creation problem 13:33:52 it was caused by incorrect haproxy configuration 13:33:57 Thanks in advance 13:34:12 okay, will be online for this tomorrow 13:34:14 basically, the driver of lbaas didn't work correctly 13:34:25 next, profile for container support 13:34:26 not sure whether this is the problem you met 13:34:43 the profile part is easy ... 13:34:50 the difficult part is about scheduling 13:35:03 when we have new containers to create, we have to specify a 'host' for it 13:35:24 (let's pretend we don't have network/storage problems at the moment) 13:35:28 you will start containers on vms? 13:35:38 the 'host' could be a VM, it could be a physical machine 13:35:45 ok 13:36:10 let's start with container inside VMs, which seems a common scenario today 13:36:27 the VMs are created by senlin or other services 13:36:36 we need to do some kind of scheduling 13:37:05 one option is to introduce mesos-alike framework, including mesos agents into guest images 13:37:19 If the containers run on vms, there will be two kinds of clusters at the same time 13:37:26 then we will still need a scheduler, e.g. marathon 13:37:44 users don't need to care 13:37:56 we can manage the hosts for the containers 13:38:12 in some use cases I heard of 13:38:45 people build a huge resource pool to run containers, but that huge resource (vm or physical) pool is transparent to users 13:39:20 one option, discussed with yanyanhu today, is to do some resource based 'host' selection 13:39:34 that will give us a very simple starting point to go forward 13:40:01 the 'user' does not include cloud operator ? 13:40:18 we can leverage ceilometer (or others) to monitor per-vm resource availability and find a candidate node to launch the container 13:40:41 cloud operator will know the underlying cluster of VMs 13:41:00 they may even need to autoscale this VM cluster to accommodate more containers 13:41:59 we can start getting our hands dirty and see if this is just another 'placement' policy 13:42:40 btw, congrat's to haiwei's talk proposal being accepted 13:43:10 we are gonna help make that presentation a successful one 13:43:14 currently I can't think out one way to auto-scale containers to the vms which are not controlled by Senlin 13:43:32 Thank you for the help for session proposal 13:43:46 there are two levels of entities to manage 13:43:48 the VM pool 13:43:53 the container pool 13:43:54 I mean that vms seems not under control of Senlin 13:44:01 actually, for that scenario, VM cluster have to be managed by Senlin I think 13:44:18 well, they may and may not be 13:44:30 you just need to know their IP 13:44:52 and possibly some secrets to log into them 13:45:25 having senlin manage the two clusters makes total sense to me 13:45:37 was just trying to be open minded on this 13:45:47 ok 13:45:55 haiwei, can you help revise the specs 13:46:05 we can start drilling down the details? 13:46:09 ok, I will do it 13:46:16 yes, resource(host) finding progress become necessary if the VM cluster is not created by Senlin 13:46:24 or if a spec is not the right tool, we can use etherpads as well 13:46:39 I can't spend much time on that recently, I will try my best to do it 13:47:27 we need careful preparations for all the presentations 13:47:49 yes, indeed 13:48:02 okay 13:48:14 last item, rework NODE_CREATE/DELETE 13:48:22 it has been there for a long time 13:48:39 let's keep them there as is, :P 13:49:01 driver work 13:49:15 have been done 13:49:24 it was much smoother than we had thought 13:49:27 just need a little change in neutron driver 13:49:33 that was great 13:49:34 yep :P 13:49:39 btw 13:49:45 cool! 13:49:53 we just relased mitaka-3 milestone for senlin and senlinclient last week 13:50:13 that is a milestone for everyone 13:50:25 thank you all for your contributions during to past months 13:50:28 we made it 13:50:45 #topic open discussion 13:50:55 Qiming, I remember there is still a patch on senlinclient about check/recover, once 13:51:13 once you -2 to avoid merge by accidently 13:51:26 not sure we will merge it or not 13:51:58 oh, yes 13:51:59 at that time, the patch is blocked by sdk version 13:52:31 that one can be unblocked now 13:52:43 ok 13:52:53 should have merge it into m-3 13:53:03 what about SDK's issue 13:53:27 sdk version has been bumped into 0.8.1 13:54:23 global requirement has been updated 13:54:26 https://review.openstack.org/#/c/285599/ 13:55:22 anything else? 13:55:33 nope from me 13:55:36 no 13:55:39 no 13:55:43 no 13:55:52 thanks everyone for joining 13:55:52 no 13:56:01 good night/day 13:56:06 #endmeeting