13:00:39 #startmeeting senlin 13:00:40 Meeting started Tue Sep 1 13:00:39 2015 UTC and is due to finish in 60 minutes. The chair is Qiming. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:43 The meeting name has been set to 'senlin' 13:00:56 hi 13:01:15 hi 13:01:25 hi 13:02:05 ? 13:02:45 okay, it is working 13:02:52 maybe my network connection is too bad 13:03:01 guess so ;) 13:03:16 anyway 13:03:42 please feel free to add agenda items: https://wiki.openstack.org/wiki/Meetings/SenlinAgenda 13:04:13 #topic l-3 milestone items 13:04:20 #link https://etherpad.openstack.org/p/senlin-liberty-workitems 13:04:52 just did a cleanup on the etherpad page 13:05:22 in backlog, we still have some test cases 13:05:42 keystone and sdk test cases still not there? 13:06:01 yes 13:06:01 yes some of those are mine. i am working on them today 13:06:25 I'm signing on the keystone and sdk unit tests 13:06:35 L3 goals 13:06:49 container clusters ... 13:07:25 the progress of last week was good, haven't heard a thing since then from the team 13:07:30 need to catch up 13:07:42 #action Qiming to catch up with the SUR team on progress 13:08:09 placement policy, we have a simple POC there, 219212 13:08:16 need to make it work before release 13:08:47 I'm not worrying about cross-region support, the key is about algorithm, it has to be flexible 13:09:21 patch 219212 was just checked in by Xinhui, Xinhui cannot join us today due to biz trip 13:09:21 Qiming: https://review.openstack.org/#/c/219212/ 13:09:38 hello patchbot 13:09:52 exception handling .. 13:09:58 haiwei, anything new? 13:10:18 no, it's already finished i think 13:10:28 I'm seeing all items crossed over 13:10:30 I made some tests these days it works fine 13:10:35 great. 13:10:56 next is functional tests 13:11:15 I believe we had an issue here 13:11:17 just a little worry about it that some one may complain it is not suitable for the cloud operator 13:11:22 just finished the cluster scaling test case 13:11:45 the test case passed? 13:12:04 nope, it was blocked by the problem in Action progress 13:12:11 without this issue, it passed 13:12:27 okay, we had two issues here actually 13:12:38 one is about the decorator for connection creation 13:12:46 it has been solved 13:12:52 yep 13:13:00 the second one is a big one :) 13:13:07 another one is related to context usage in action hierarchy 13:13:19 yanyanhu, I'll work with you on this tomorrow 13:13:30 thanks, that will be much helpful :) 13:13:35 the problem is? 13:13:42 took almost two days on this problem 13:14:00 it is about concurrent operations on sqlalchemy DB 13:14:04 saw your conversation this afternoon, not very clearly about it 13:14:30 the data written from one session cannot be seen from another session immediately 13:15:01 oh 13:15:02 we are having some context/session management problems just surfaced after some "bug fixings" 13:15:47 we may need to rethink whether our usage of oslo_context.get_current() is "green-thread-safe" 13:16:19 oh wow. that does seem like a big problem to debug 13:16:20 I believe some of you have seen it this way or that 13:16:23 looks like there are still some issues about DB session we need to figure out 13:16:48 jruano, yes :) 13:17:09 most projects are not using oslo_context.get_current(), we may be the first to do async executions in engine as well 13:17:09 this problem can be reproduce in what kind of use case? 13:17:40 some locks are not released when action complete 13:18:01 I think I met it before 13:18:02 hi, haiwei, I think some operations like cluster-create/delete/update 13:18:07 especially when the action involves both cluster-action and node-action 13:18:11 and also resize/scalein/scaleout 13:18:14 and there is a bug report for it 13:18:51 this is a critical issue, we need to solve it as early as possible before it is getting too complicated 13:20:10 next item along the list 13:20:30 senlinclient test cases, I have just started working on it 13:20:46 need more hands on it 13:21:07 I assigned one, but not moving on 13:21:20 actually, there are something I don't think we need to test 13:21:38 I mean the "models" module 13:21:45 ok 13:21:49 i can get you an extra hand qi ming. colleague reached out to me other day wanting to see where in openstack he can help 13:21:54 they will eventually get contributed to sdk 13:22:00 shell.py and client.py are necessary i think 13:22:16 jruano, that would be great 13:22:40 it is pretty a labor-intensive job 13:23:07 today is the l-3 milestone ... 13:23:49 I'm thinking maybe we need to create a branch in the coming days and practice feature freeze 13:23:58 agree 13:24:30 we will create a release in this branch, and continue development on master 13:24:30 yes, that will be the most efficient way to get a release 13:25:01 adding test should be allowed I think 13:25:08 not only bug fix 13:25:10 in the 'release' (0.2?) branch, we can delete all half-baked things 13:25:22 yes, 0.2 sounds good :) 13:25:31 yes, we just make sure it is a usable package 13:26:13 so we may need some manual tests on all important features 13:26:21 yes 13:26:44 after fix the existing bugs, we can start the test 13:27:00 hopefully we can finish the test and debug in a week I think 13:27:01 there will be a branch for senlinclient as well 13:27:05 if we focus on this 13:27:20 the senlin-dashboard project needs a senlinclient package on Pypi 13:27:26 the next is dc-1? 13:27:37 rc-1 13:27:47 guess so 13:27:58 not quite familar with the process 13:28:19 we learn by doing it, as always 13:28:33 yea 13:28:53 so ... haiwei, you just mentioned something about exception handling, not appropriate for cloud operators 13:29:12 can you elaborate that? something we can improve/fix? 13:29:53 yes, one of my colleague complains it 13:30:19 specifics? 13:30:48 because we changed all the sdk exceptions to internal error, we can't get the original information from drivers 13:31:21 but I think the original msg from driver is recored in log 13:31:37 Qiming is disconnected? 13:31:53 I missed the previous sentence ... 13:31:56 oh, just connection reset 13:32:01 and also we catch doe exceptions and don't raise it again, from the engine logs there is not error trace, so for the operator it is difficult to debug the exception 13:32:13 because we changed all the sdk exceptions to internal error, we can't get the original information from drivers 13:32:16 because we changed all the sdk exceptions to internal error, we can't get the original information from drivers 13:32:31 okay, that is something we can improve 13:32:53 we can still write logs 13:32:55 yes, agree that the exception dump stack is important for debug 13:33:00 it seems we can do a middleware to handle the exception 13:33:13 stack dump is annoying to users 13:33:33 haiwei, that is important :) 13:33:43 magnum seems to do it that way 13:33:59 exception handling is always a cross-cutting concern in software engineering 13:34:33 we have been trying to consolidate it into more mangeable framework 13:35:07 please feel free to improve the driver end exception handling 13:35:18 ok 13:35:31 we need to make sure the operators (at least) knows what has been going wrong 13:35:41 yes 13:36:22 at the same time, we filter out messages that are not supposed to be seen by end users 13:36:53 there is always a gray area in-between 13:37:27 #topic revisions to profile/policy schema 13:38:10 during the past week (weekend actually), the biggest modification to the code is about profile and policy definitions 13:38:40 we were using 'senlin profile-create -t os.heat.stack -s specfile name' command to create profiles 13:38:49 and a similar command to create policies 13:39:04 actually, the '-t os.heat.stack' should be part of the specfile 13:39:39 so, we have changed the format of the profile spec and policy spec 13:39:48 now a profile looks like this: 13:39:55 type: os.heat.stack 13:39:58 version: 1.0 13:40:00 properties: 13:40:08 template: blah blah 13:40:15 parameters: blah blah 13:40:29 13:40:42 a policy will look like this: 13:40:47 type: senlin.policy.deletion 13:40:49 version: 1.0 13:40:53 properties: 13:41:03 destroy_after_deletion: True 13:41:16 criteria: OLDEST_FIRST 13:41:30 13:42:05 this was a disruptive change we have to do, and hopefully it is done once for all 13:42:24 in future, we can just change the version number to accommodate new properties 13:42:56 this was also an effort to get senlin policy definition better aligned with TOSCA 13:43:15 all relevant changes have been merged 13:43:45 if you are using the master code, you will need to delete existing profiles/policies and create new ones 13:44:37 there are still some open issues 13:45:09 for example, whether we use os.heat.stack as the 'type' or 'os.heat.stack-1.0' as the type name 13:45:16 Qiming, does that mean for specific type of profile/policy, we will support different versions in the same module? 13:45:32 good question 13:46:11 maybe we need to revise the setup.cfg file to spell out version numbers 13:46:52 yes 13:47:09 still thinking what is the best way to express version difference 13:48:01 when listing profile types or policy types, we need version numbers there too 13:49:08 #topic open discussions 13:49:47 anything? 13:50:04 nope from me 13:50:22 when are we targeting code freeze? 13:50:26 i am ok 13:50:50 jruano, i was calling it a feature freeze 13:50:58 ah, gotcha 13:51:09 sounds good to me 13:51:12 the code won't be frozen, we will "backport" bug fixes when necessary 13:51:33 as for feature freeze, it's today 13:52:10 there can be FFE (feature freeze exceptions), though, :) 13:52:34 any new feature we want to add before doing a release 13:53:31 if there is nothing else, we can call an end to the meeting 13:53:34 yeah 13:53:36 sounds good 13:53:46 3 13:53:54 2 13:54:00 1 13:54:06 0.5 13:54:09 #endmeeting