13:00:45 <Qiming> #startmeeting senlin 13:00:46 <openstack> Meeting started Tue May 10 13:00:45 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:49 <openstack> The meeting name has been set to 'senlin' 13:01:35 <Qiming> hello 13:01:39 <zzxwill> Good evening. 13:01:42 <Qiming> welcome back 13:01:49 <Qiming> o/, zzxwill 13:01:53 <haiwei> hi 13:02:08 <zzxwill> Thanks. Crazy with my work recently:( 13:02:10 <yanyanhu> hi 13:02:13 <Qiming> hi, haiwei, in taipei? 13:02:19 <elynn> Evening! 13:02:25 <haiwei> back now 13:02:44 <Qiming> actually, I only manage to restore my work env 1 hour ago 13:02:57 <Qiming> lost my laptop at austin airport 13:03:05 <Qiming> anyway 13:03:06 <lixinhui_> hi 13:03:07 <haiwei> :( 13:03:10 <zzxwill> Oh, it was a pity. 13:03:18 <elynn> Got a new one? 13:03:23 <Qiming> the only item I have in mind is about newton work items 13:03:53 <Qiming> feel free to add topics to meeting agenda: https://wiki.openstack.org/wiki/Meetings/SenlinAgenda 13:04:28 <yanyanhu> Just add a topic about adding new workitems based on our discussion in summit 13:04:55 <Qiming> yes, that is also about newton work items 13:05:23 <yanyanhu> ok 13:06:06 <cschulz_> Hi 13:06:08 <Qiming_> let's quickly go thru the current list 13:06:10 <yanyanhu> hi, cschulz_ 13:06:22 <yanyanhu> ok 13:06:22 <Qiming_> #link https://etherpad.openstack.org/p/senlin-newton-workitems 13:06:43 <Qiming_> scalability improvement, need to sync with xujun/junwei 13:06:49 <yanyanhu> some of them have been done and some should be obsoleted 13:07:06 <Qiming_> tempest test 13:07:12 <Qiming_> we are on track 13:07:16 <yanyanhu> me and ethan are working on it 13:07:18 <yanyanhu> yep 13:07:18 <elynn> yes 13:07:20 <Qiming_> basic support is there 13:07:34 <elynn> We are working on API tests 13:07:50 <elynn> still need policy type list and profile type list and negative tests 13:07:59 <elynn> Do we need to add a gate job for it? 13:08:07 <elynn> in experimental 13:08:16 <Qiming_> sure, that would be nice 13:08:40 <elynn> ok, I will work on it then. 13:08:57 <yanyanhu> may also need to rework the client to enable negative test. Or we can use exception not resp status to verify the result 13:08:59 <Qiming_> recorded 13:09:33 <elynn> About the negative tests for API 13:09:36 <yanyanhu> I mean the clusteringclient of tempest test 13:09:42 <Qiming_> need to test the status code at least, imo 13:09:46 <elynn> do we only check the respond code? 13:09:52 <yanyanhu> Qiming_, yes 13:10:21 <yanyanhu> agree with this. So we may need to invoke raw_request directly 13:10:33 <Qiming_> yes 13:10:59 <elynn> I think only check status code and respond body is enough for API negative tests. 13:11:06 <yanyanhu> elynn, yes 13:11:06 <Qiming_> if we are bringing in senlinclient into this, it looks then more like a functional test of senlinclient, instead of an API test of the server 13:11:30 <Qiming_> so.. benchmarking 13:11:31 <elynn> Existing client can not return a bad status code? 13:11:42 <yanyanhu> basic support has been done 13:11:50 <yanyanhu> in rally side 13:12:03 <Qiming_> lixinhui_, any update? 13:12:18 <yanyanhu> will work on some simplest test case based on it 13:12:24 <yanyanhu> but maybe not now 13:12:49 <lixinhui_> Qiming 13:13:07 <lixinhui_> is it about benchmarking? 13:13:21 <yanyanhu> elynn, nope, the failure will be caught by rest client of tempest lib and converted to exception 13:13:22 <Qiming_> I'm wondering if bran and xinhui has done some experiments on engine/api stress test 13:13:31 <lixinhui_> we are 13:13:39 <elynn> yanyanhu, okay, I got your point... 13:13:44 <yanyanhu> :) 13:13:46 <lixinhui_> but 13:14:03 <lixinhui_> we are bottlenecked by nova 13:14:13 <Qiming_> still need to overcome the scalability issue of oslo.messaging? 13:14:28 <lixinhui_> not really about oslo 13:14:32 <lixinhui_> but nova 13:14:36 <Qiming_> nova api rate-limit? 13:14:41 <yanyanhu> oh, about this topics, I think there should be some performance improvement benefit from lastet scheduler rework 13:15:01 <lixinhui_> something like that 13:15:03 <yanyanhu> I mean the performance of senlin engine 13:15:14 <lixinhui_> we may try to resolve it at driver layer 13:15:27 <lixinhui_> from product env 13:15:38 <lixinhui_> we have rally and heat based 13:15:44 <Qiming_> okay, we need some rough numbers using both the fake driver and the real one 13:15:45 <lixinhui_> stress tests 13:15:58 <yanyanhu> Qiming_, agree 13:16:29 <lixinhui_> but that will depends on if we need bring in senlin into this test env 13:17:04 <Qiming_> yes, it would be nice to know senlin has scalability issue or not 13:17:13 <Qiming_> the earlier the better 13:17:24 <lixinhui_> Bran has tried with simulated one 13:17:54 <Qiming_> maybe we can paste the numbers on senlin wiki? 13:17:58 <lixinhui_> and found that no up limit on the 13:18:04 <lixinhui_> one engine 13:18:07 <lixinhui_> test 13:18:19 <lixinhui_> but parallel tests will need more time 13:18:26 <yanyanhu> maybe I should implement a basic rally plugin for senlin cluster and node operation to support this test 13:18:32 <Qiming_> okay 13:18:40 <yanyanhu> lixinhui_, if you guys need it, please just tell me 13:18:50 <lixinhui_> not really now 13:18:53 <Qiming_> I see, so there is a dependency 13:18:54 <lixinhui_> thanks yanyanhu 13:19:00 <yanyanhu> no problem 13:19:20 <lixinhui_> we will keep working on multiple engine simulted driver test 13:19:25 <Qiming_> or these two threads can go in parallel 13:19:37 <Qiming_> cool 13:20:09 <Qiming_> please check if we can record these "baseline" numbers into senlin wiki: https://wiki.openstack.org/wiki/Senlin 13:20:16 <lixinhui_> sure 13:20:23 <Qiming_> Rally side 13:20:49 <yanyanhu> basic support for senlin in rally has been done. Will start to work on plugin 13:21:00 <Qiming_> we are still about to commit rally test cases to rally project? 13:21:06 <yanyanhu> will start from basic cluster operations 13:21:39 <Qiming_> by plug-in, you mean we will be hosting the rally test cases? 13:21:56 <yanyanhu> Qiming_, we can if we want to I think 13:22:10 <yanyanhu> to hold the test jobs 13:22:11 <Qiming_> what's the suggestion from rally team? 13:22:23 <yanyanhu> they sugguest us to contribute the plugin to rally repo 13:22:28 <yanyanhu> which I think makes sense 13:22:47 <yanyanhu> for those jobs, we can hold it in senlin repo I think 13:23:28 <Qiming_> ... jobs are not modelled as plugins? 13:23:48 <yanyanhu> no, jobs means those job description file :) 13:23:49 <Qiming_> what is this then? https://review.openstack.org/#/c/301522/ 13:23:54 <yanyanhu> those yaml or json file 13:24:23 <yanyanhu> Qiming_, those jobs are used as example to verify the plugin :) 13:24:34 <Qiming_> okay, makes sense 13:24:35 <yanyanhu> more jobs should be defined per our test requirement 13:24:44 <yanyanhu> which I guess should be hold by ourselves 13:24:57 <Qiming_> that is fine 13:25:59 <Qiming_> pls help make that plugin work so others may help contribute job definitions etc. 13:26:10 <yanyanhu> sure 13:26:16 <yanyanhu> will work on it 13:26:26 <Qiming_> health management 13:26:42 <Qiming_> em, a huge topic indeed 13:26:44 <lixinhui_> is trying the linux HA 13:26:56 <Qiming_> for health detection? 13:27:06 <Qiming_> or recovery, or both? 13:27:09 <lixinhui_> wanna Qiming_ to share more picture in your mind 13:27:27 <Qiming_> you mean photo from San Antonio? 13:27:29 <lixinhui_> based on dicussion with adam and DD 13:27:36 <lixinhui_> fencing 13:27:39 <lixinhui_> nowdays 13:27:46 <lixinhui_> with CentOS 13:27:54 <lixinhui_> VM 13:28:05 <Qiming_> got it 13:28:05 <lixinhui_> but you know 13:28:20 <lixinhui_> just wanna to know more picture 13:28:31 <Qiming_> need to spend sometime on the specs and the etherpad 13:28:32 <lixinhui_> about the HA story 13:28:49 <lixinhui_> yes 13:28:58 <Qiming_> we cannot cover all HA requirement in our very first step 13:29:12 <Qiming_> we may not be able to cover them all in future 13:29:14 <lixinhui_> from presentation of Adam and DD 13:29:22 <Qiming_> need to focus on some typical usage scenarios 13:29:38 <lixinhui_> They hope to leverage Senlin on Recover and 13:29:42 <lixinhui_> fecing 13:29:47 <lixinhui_> fencing 13:29:59 <Qiming_> right 13:30:15 <lixinhui_> but is that assumed design by ourselves? 13:30:27 <Qiming_> so ... let's focus on the user story then 13:30:29 <cschulz_> What is your thought, that HM will create events that may trigger cluster actions based on cluster policies? 13:30:55 <lixinhui_> yes 13:31:02 <Qiming_> we will build the story step by step 13:31:07 <lixinhui_> that is the recover part 13:31:23 <Qiming_> first step is check/recover mechanism, the very basic ones 13:31:39 <Qiming_> and fencing may become part of the recover process 13:32:01 <cschulz_> So there probably also needs to be policy like things in HM that defines how the health of a cluster is assessed? 13:32:12 <Qiming_> second step is to try introduce some intelligence on failure detection 13:32:38 <yanyanhu> health check and failure recovery can be two workitems in parallel I guess? 13:32:48 <Qiming_> third step is to link the pieces together using some sample health policies 13:33:01 <lixinhui_> actually I do not think we should do many check things 13:33:03 <Qiming_> yes, guess so 13:33:21 <cschulz_> agreed, health checking is independent of what actions you take when you've made an assessment 13:33:42 <Qiming_> if users don't like the health policy, we still provide some basic APIs for them to do cluster-check, cluster-recover [--with-fence], etc. 13:34:12 <cschulz_> Actually that is where I'd start 13:34:30 <Qiming_> user may don't like the way we do health checking, still, they can do cluster-recover by triggering that operation from their software/service 13:34:31 <cschulz_> Then add some basic mechanisms for those who just want simple 13:34:40 <Qiming_> right 13:35:14 <Qiming_> I cannot assume we understand all usage scenarios 13:36:06 <Qiming_> :) I was challenged by linux-ha author during my presentation --- how do you detect application failure? 13:36:27 <cschulz_> And your answer was? 13:36:38 <Qiming_> it is a huge space, we cannot assume we know all the answers 13:36:52 <Qiming_> application failure detection is currenly out of senlin's scope 13:37:03 <cschulz_> Agreed! 13:37:03 <lixinhui_> yes 13:37:05 <haiwei> I think so 13:37:08 <lixinhui_> that is his anwser 13:37:15 <lixinhui_> from this 13:37:23 <Qiming_> there are plenty of software doing application monitoring, use them 13:37:27 <lixinhui_> I do not think we can understand the use case today 13:37:42 <Qiming_> but we can start from the basics 13:37:57 <lixinhui_> or the design on the loop of check and recover 13:38:15 <yanyanhu> so the key is how to leverge those monitoring tools/services 13:38:32 <lixinhui_> but trying to provide some basic investment 13:38:35 <yanyanhu> to detect failure of node/app happened in senlin cluster 13:38:38 <lixinhui_> on the choice of failure proceing 13:38:40 <Qiming_> we leave choices to users, though we do provide some basic support to simple cases 13:38:45 <lixinhui_> processing 13:39:00 <lixinhui_> even today 13:39:03 <lixinhui_> masakari 13:39:08 <lixinhui_> 's evacuate 13:39:15 <Qiming_> recover a heat stack is completely different from recovering a nova server 13:39:22 <lixinhui_> can not work well with all guest OS and hypervisor 13:39:33 <Qiming_> you are already onto masakari? 13:39:53 <lixinhui_> tries some that function of masakari 13:40:07 <lixinhui_> need to investigate more 13:40:17 <Qiming_> ... big thanks! 13:40:27 <lixinhui_> :) 13:40:31 <cschulz_> masakari is new to me. Will investigate 13:41:04 <lixinhui_> it has a vagrant and chef deployer, cschulz 13:41:05 <Qiming_> for HA support, let's focus on planning 13:41:20 <lixinhui_> yes 13:41:24 <yanyanhu> https://github.com/ntt-sic/masakari 13:41:30 <yanyanhu> this one? 13:41:31 <Qiming_> build stories on the etherpad: https://etherpad.openstack.org/p/senlin-ha-recover 13:41:45 <Qiming_> yanyanhu, yes 13:41:50 <Qiming_> moving on 13:42:17 <Qiming_> documentation side 13:42:31 <Qiming_> I'm working on API documentation in RST 13:43:00 <Qiming_> hopefully, it can be done soon, then I can switch to tutorial/wiki docs 13:43:06 <yanyanhu> will provide some help on it 13:43:14 <Qiming_> great, yanyanhu 13:43:24 <Qiming_> container support 13:43:45 <Qiming_> haiwei, maybe we can check in the container profile as an experimental one 13:44:22 <haiwei> you mean just create one first? 13:44:33 <Qiming_> yes, very simple one is okay 13:44:46 <Qiming_> it has to work, it has to be clean 13:44:59 <Qiming_> we can improve it gradually 13:45:32 <haiwei> ok, I will submit some patches for it 13:45:56 <Qiming_> then we can start looking into the specific issues when CLUSTERING containers together 13:46:24 <Qiming_> at the same time, we will watch the progress of the Higgins project: https://review.openstack.org/#/c/313935/ 13:46:47 <haiwei> yes, I noticed it recently 13:47:00 <Qiming_> if that one grows fast, we can spend less and less energy at this layer 13:47:14 <Qiming_> just focusing on the clustering aspect of the problem 13:47:25 <yanyanhu> agree :) 13:47:41 <Qiming_> that's why I think a simple profile suffices 13:47:50 <haiwei> ok 13:47:57 <Qiming_> for us to think about the next layer 13:48:30 <Qiming_> "tickless" scheduler is out 13:48:33 <Qiming_> that is great!!! 13:49:09 <yanyanhu> :) 13:49:22 <yanyanhu> it do improve the efficiency of our scheduler 13:49:27 <Qiming_> any news from zaqar investigation? 13:49:38 <yanyanhu> very appreciated your suggestion in summit :P 13:49:54 <lixinhui_> about event and notice mechanism 13:50:14 <cschulz_> I've been very distracted since week of Austin summit, so not much progress. 13:50:19 <lixinhui_> I do not know if that is related to the scenario discussion on summit 13:50:22 <lixinhui_> but vmware PM 13:50:37 <lixinhui_> on customisable reaction 13:50:45 <Qiming_> okay 13:51:12 <Qiming_> lixinhui_, I was thinking of this scenario 13:51:12 <lixinhui_> or just related to the processing of action 13:51:14 <cschulz_> Can someone give me a brief on the scenario discussion? 13:51:20 <Qiming_> for vmware vm monitoring 13:51:53 <Qiming_> senlin can emit events for vmware to listen 13:52:06 <Qiming_> so that it will know which node belongs to which cluster 13:52:09 <lixinhui_> that will be great 13:52:29 <Qiming_> it will have some knowledge to filter out irrelevant vms when doing maths on metrics 13:52:43 <lixinhui_> yes 13:53:01 <lixinhui_> that is desired by mix deployment env 13:53:16 <Qiming_> okay, we can work on a design first 13:53:43 <Qiming_> a multi-string configuration option for event backend 13:53:52 <Qiming_> we only have database backend implemented 13:54:19 <Qiming_> we can add http, message queue as backends 13:55:01 <Qiming_> detailed design is still needed 13:55:12 <Qiming_> em... only 5 mins left 13:55:17 <lixinhui_> Okay 13:55:24 <cschulz_> Are events predefined? Or can a stack/cluster define events it wants? 13:55:33 <Qiming_> yes, cschulz 13:55:39 <yanyanhu> Qiming_, maybe we postpone the second topic to next meeting 13:55:54 <yanyanhu> about adding new workitems based on discussion in summit 13:55:58 <Qiming_> ok 13:56:21 <Qiming_> there are followups wrt the design summit sessions 13:56:34 <Qiming_> need to dump them into TODO items 13:56:49 <Qiming_> and those items will be migrated to this etherpad for progress checking 13:57:01 <yanyanhu> yes 13:57:03 <Qiming_> for example, profile/policy validation 13:57:18 <Qiming_> that means one or two apis to be added 13:57:34 <Qiming_> when someone has cycles to work on it, we can add it to the etherpad 13:57:51 <Qiming_> the same applies to all other topics we have discussed during the summit 13:58:33 <lixinhui_> cool 13:58:40 <Qiming_> that's all from my side 13:58:51 <Qiming_> two mins left for free discussions 13:58:59 <Qiming_> #topic open topics 13:58:59 <lixinhui_> that was good discussion there in Austin 13:59:15 <yanyanhu> yep :) 13:59:33 <cschulz_> Anyone can send me anything they would like proofread for English. 13:59:41 <Qiming_> okay, we successfully used up the 1 hour slot, :) 13:59:50 <cschulz_> bye 13:59:53 <Qiming_> thanks, cschulz_ 14:00:08 <Qiming_> thanks everyone for joining 14:00:12 <Qiming_> #endmeeting 14:00:15 <haiwei> thanks 14:00:18 <haiwei> bye 14:00:35 <yanyanhu> bye 14:00:35 * regXboi finds a corner in the room and quietly snores 14:00:39 <Qiming_> cannot end meeting 14:00:49 <lixinhui_> .. 14:01:16 <Qiming_> nickname occupied I think 14:01:51 <Sam-I-Am> hello networking nerds 14:02:02 <pcm_> o/ 14:02:02 <jlibosva> o/ 14:02:04 <Qiming_> #endmeeting