13:00:28 #startmeeting senlin 13:00:29 Meeting started Tue Aug 23 13:00:28 2016 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:31 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:33 The meeting name has been set to 'senlin' 13:00:46 hi, guoshan 13:00:56 hi 13:01:05 evening, yanyan 13:01:15 o/ 13:01:22 evening 13:01:28 xinhui just texted me that she cannot join because of stomachache 13:01:36 hi, elynn 13:01:46 hope she will be fine 13:01:48 let's get started, quite somethings to go thru 13:02:01 #topic newton work items 13:02:13 #link https://etherpad.openstack.org/p/senlin-newton-workitems 13:02:41 about rally plugin, just kept working on profile context support last week 13:02:53 still some complaints about rally patch? 13:02:56 may need more time to finish it 13:02:59 yes 13:03:11 but I think there is no critical issue 13:03:24 fine. let's go their way 13:03:41 integration test 13:03:43 just need more discussion on some detail 13:04:10 another fix has been proposed and need another +2 and workflow 13:04:26 hope this is the last fix needed... 13:04:35 okay 13:04:54 if needed, we may ping the reviewers for a push 13:05:00 sure, Qiming 13:05:08 health management 13:05:17 POC almost done 13:05:23 great news 13:05:27 fixed a lot of corner issues 13:05:42 we can listen to nova events and take recover actions automatically 13:05:56 Great! 13:06:06 though there are follow ups about fencing, fixed in some other patches 13:06:23 the reason is that we were doing fencing regardless the failure's nature 13:06:54 if we are already got an event notification from nova, that means the vm is already stopped, shutdown, deleted etc. 13:07:08 we don't need to do a 'fencing' operation to ensure that the vm is dead 13:07:27 Qiming, so in that case, fencing just means "confirmation" maybe 13:07:33 fencing is still needed, if we are detecting vm failure from external monitoring software/service 13:07:49 current fencing implementation is a forced delete 13:07:59 I see 13:08:05 however, if we unconditionally enables fencing that means: 13:08:12 the most completed one 13:08:54 server deleted -> nova sends notification -> senlin heard it -> senlin try to 'fence' it -> senlin force delete the server -> nova sends another notification -> senlin heard it again ... 13:09:15 it is an endless loop 13:09:31 another fix is we temporarily removed network/storage fencing 13:09:33 the second notification was caused by deleting request from senlin? 13:09:38 because they are not there yet 13:09:46 yes, yanyan 13:09:58 so I have my node recovered twice ... 13:10:09 yes... that is unexpected 13:10:16 if I am stopping a node 13:10:38 yet another patch was submitted to oslo.messaging, because there are filter bugs there 13:11:03 https://review.openstack.org/#/c/329754/ 13:11:22 nice 13:11:31 we sometimes got an ValueError from the listener because some notifications have 'project_id' set to None 13:11:34 sigh 13:11:51 I'm working on a new method eval_status for cluster 13:12:10 that can be invoked at any time to reevaluate a cluster's "health status" 13:12:23 https://review.openstack.org/359177 13:12:26 yes, that's a very useful interface 13:12:41 user can manually trigger it 13:12:41 this method can be invoked after a cluster_check operation ... when all node checks are done 13:12:53 we need to reassess the cluster's status 13:12:56 and build their own health check logic based on this interface 13:13:08 yes 13:13:18 yep, that is another possiblity, I mean, expose the interface to users 13:13:38 it can be invoked after a scaling operation, be it a success or a failure 13:13:50 that will be semi-auto healing :) 13:13:58 anyway, still work on it 13:14:02 cool 13:14:07 as a user, I need to know the cluster status for sure 13:14:18 yea 13:14:37 documentation side, fixed some trivial bugs in api-ref and user tutorial 13:14:43 small patches for review 13:15:05 profile/policy version control, yanyan has a new patchset? 13:15:14 yes 13:15:21 a new patchset based on your comments 13:15:21 will review it tomorrow 13:15:26 thanks a lot :) 13:15:42 haiwei is not in, right? 13:15:49 seems so 13:15:49 haven't heard from him for a while 13:16:00 then we can skip container support today 13:16:14 recevier with zaqar 13:16:23 saw your patches, almost there I think 13:16:37 support in sdk side has almost been done 13:16:44 yes, will fix those issues you mentioned 13:16:44 I have added an item to the meeting agenda for you to talk about the receiver design 13:16:57 thanks 13:17:03 versioned notification ... emm ... no cycles on that 13:17:05 will quick go through the basic idea 13:17:25 have been contacting brian about sdk version cut 13:17:38 great 13:18:00 I think we are in good shape for a new sdk version and we actually need it badly, it is breaking senlin gates 13:18:00 hope it will include all those features we are expecting 13:18:10 yes 13:18:17 the gate is broken now... 13:18:27 any other topics beside policy/profile validate and zaqar support? 13:18:37 nope from me 13:18:45 no 13:18:49 I was thinking about adding start_server and stop_server today 13:18:57 as part of the HA story 13:19:15 if we know a sever was just accidentally stopped, we can just start it 13:19:30 it can even get used in an autoscaling scenario 13:19:35 yes, that will be useful 13:19:48 if you want to scale out, just "start" a sever that was previously stopped 13:19:59 that will make things very quick 13:20:00 yes, like nova-compute are shutdown and recover 13:20:09 Then we might need to start our vm nodes. 13:20:41 it is a little bit complicated when we are talking about nova-compute ... 13:20:42 maybe this feature can be used to support quick scaling :) 13:21:04 I hope there are notifications about nova-compute going down, but got no confirmation so far 13:21:05 as well as scaling to standby cluster 13:21:10 yes 13:21:20 oh, you remind me a topic about standby cluster 13:21:26 elynn, :) 13:21:28 it could be very easy, 90% documentation + 10 % code 13:21:35 yes 13:21:51 okay, anything else about newton work items? 13:21:52 it is a useful scenario. 13:22:09 we'll try get a new sdk version this week and have the gate fixes 13:22:11 fixed 13:22:17 then standby cluster will be real "standby" 13:22:28 then we freeze senlinclient, then senlin rc1 13:22:35 @Qiming, based on OSC 3.0? 13:22:39 Qiming, ok, will finish the patches for zaqar v2 api 13:22:50 zzxwill, we already support OSC 13:23:00 Thanks. 13:23:03 just there are some defects about the --profile parameter 13:23:19 it is ... a conflict with the existing osprofiler parameter 13:23:25 we don't want to change the name 13:23:50 so OSC will deprecate '--profile' parameter in April 2017, according to the plan 13:24:04 then we can completely throw away our own CLI 13:24:07 Since people seldon enable osprofiler in their env, that would be ok... 13:24:07 Got it. I saw your comments to a bug. 13:24:49 that is the best result we can get, considering the concern about the existing users 13:25:14 #topic health checking update 13:25:28 I think I have gone thru most of them just now 13:25:54 there are still a lots of gaps, will do my best to close them one by one 13:26:15 any questions/comments about health? 13:26:19 great. hope there will be a fantastic demo for summit 13:26:29 if our topic is accepted 13:26:33 I can already give you one, :) 13:26:40 cool :) 13:26:56 if you cherrypick the patches, you can try it yourself, ;) 13:27:06 ok, will make try 13:27:11 let's move on 13:27:15 hope my devstack is not too old 13:27:20 #topic zaqar receiver's design 13:27:33 #link https://etherpad.openstack.org/p/senlin-message-type-receiver 13:27:49 yes, this is the etherpad to record the idea 13:27:54 if you haven't checked this etherpad, you may need to go thru it quickly 13:28:05 spent some time to think about the design of message type of receiver 13:28:29 based on my current understanding of zaqar message 13:28:55 okay, please do your best to avoid a new middleware 13:29:18 Qiming, yes, I considered to reuse webhook 13:29:25 just found some gaps there 13:29:36 need more thinking here 13:29:42 okay 13:29:50 webhook is already dirty 13:29:57 yes 13:30:01 I'm not sure if there are security breaches there 13:30:14 it could be 13:30:34 actually the same situation for message notification 13:30:55 It might be... 13:31:56 so we need to create an endpoint for zaqar to invoke anyway, right? 13:32:21 yes, Qiming 13:32:25 that endpoint, for zaqar, is a webhook? 13:32:34 it's for subscriber 13:32:55 Qiming, yes, currently, zaqar supports two types of subscriber, webhook and mail 13:33:06 http/https 13:33:15 it is not like we create a listener, hook it to some target, topic then get notified? 13:33:22 no 13:33:37 it doesn't work as message broker 13:33:37 sigh, that is not a message queue 13:33:42 are we going to port webhook based on zaqar? 13:33:49 another kind of message service 13:34:10 elynn, that could be possible, but the use case will be different 13:34:12 sounds like it could be 13:34:25 since for message type of receiver, user will send message to zaqar queue to trigger action 13:34:52 by for webhook type of receiver, user directly send http request to trigger action 13:35:10 that step we don't care in our code, though we can document it in senlin tutorial 13:35:25 for the former one, zaqar will stay between enduser and senlin to transimit message 13:35:47 for senlin, the only difference is that the zaqar is playing the user's role, invoking our webhook 13:35:48 Qiming, you mean? 13:35:56 Qiming, yes 13:36:02 just zaqar actually can do more 13:36:10 including some security help 13:36:44 that's great 13:36:44 okay, so we can somehow extend webhook to accommodate it? 13:36:56 I can't find the link, but zaqar team are now working on a new feature called authenticated subscription notification 13:36:59 something like this 13:37:09 can't recall the exact name 13:37:17 em, sounds an interesting feature 13:37:52 just like the semi-autoscaling scenario previously mentioned by chuck 13:38:00 yes, with it, the subscriber, e.g. senlin can choose to reject notification from zaqar which is triggered by message posting 13:38:04 there are things out of senlin's domain 13:38:10 Qiming, yes, I think so 13:38:35 okay 13:38:53 POST webhooks/{webhook_id}/trigger 13:39:02 no sure this feature will be completely support after newton cycle 13:39:11 with a body and/or params 13:39:14 but it is very useful I think 13:39:34 is zaqar providing some additional info in the request body or header? 13:39:45 no, sadly... 13:40:07 zaqar simply send post request to subscriber url with fixed body 13:40:09 so that we know that is a notification different from others? 13:40:43 then maybe we can just extend webhook middleware/api? 13:40:46 I think there should be information embedded in the request body 13:41:02 do hope so 13:41:05 but I don't think user can customize the notification 13:41:16 will do more investigation on it 13:41:24 okay, thanks 13:41:35 actually, the control bar provide by zaqar is mainly supported using "claim" 13:42:08 em, needs some experimentation on it 13:42:14 if their docs are not so great 13:42:15 which is a kind of "initiative" message grasping from queue 13:42:30 yes, need more tests 13:42:42 okay, that sounds something with value 13:42:47 anyway, will focus on this in coming week 13:43:03 thanks 13:43:06 and will also ask feilong for his suggestion 13:43:14 I think it is too late for him now :) 13:43:21 will contact him tomorrow 13:43:32 yep, very too late, :) 13:43:37 yes... 13:43:39 midnight 13:43:57 okay, move on? 13:44:03 sure 13:44:09 #topic open discussions 13:44:44 so ttx has sent out notes about room requirements during barcelona summit 13:44:55 we need to figure out how many sessions we need this time 13:45:01 yes, we need to figure out our work sessions 13:45:19 I'll create an etherpad for soliciting ideas 13:45:25 great 13:45:33 will propose the idea 13:45:34 will let the team know 13:45:47 will fill things in after you created it. 13:45:48 do have something want to make f2f discussion 13:46:02 the space is not so plentiful when compared to austin 13:46:13 yes... seems so 13:46:15 cool 13:46:30 hope can get chance to travel, haha 13:46:39 Senlin: 1fb, 5wr, cm:half 13:47:13 that is the data from austin, we had 1 fishbowl, 5 working room and half day committer meetup 13:47:41 the last session is almost useless, would rather spend that half day talking to other teams 13:47:51 yes 13:47:55 abosolutely 13:48:02 yes... 13:48:18 cross team communication is important 13:48:33 #action Qiming to create an etherpad for design summit session proposals 13:48:37 very helpful to talk some important issues and reach concensus 13:48:49 thanks, Qiming 13:49:11 even there is no conclusion, communication is still .... as usual ... sometimes .... useless 13:49:16 hahaha 13:49:20 :) 13:49:38 did I say useless? 13:49:41 I meant useful 13:49:44 :) 13:49:46 that's sad, but true sometimes... 13:49:50 :) 13:49:59 Talking about cross team communication, I remember haiwei mentioned that tacker will try to use senlin in their project 13:50:07 yes 13:50:11 Not sure how is that going. 13:50:20 oh, right, hope can get a chance to talk with tacker team 13:50:21 https://review.openstack.org/352943 13:50:57 I just didn't find the bp which was approved 13:52:18 it will save their a lot energy doing things not aligned to the core value of tacker 13:52:32 have the same feeling 13:53:12 but, it is still an open community, people decides what they want to do 13:53:53 good news from brian just now: "I'm back to work today so I will get caught up on reviews and do a release asap. I'll let you know when it happens. 13:53:53 " 13:54:09 great 13:54:33 anything else? 13:54:33 will propose new patchset for zaqar claim support asap tomorrow 13:54:40 good 13:54:42 nope from me 13:54:45 Hope we can get our patches in before cutting a new release. 13:55:03 yes, let's make it happen 13:55:09 no more from me :) 13:55:25 alright, thanks for joining 13:55:31 thanks, have good night 13:55:32 it's late, good night 13:55:38 Thanks, good night. 13:55:43 good night all:) 13:55:52 #endmeeting