20:05:29 #startmeeting Orchestration 20:05:30 Meeting started Thu Nov 17 20:05:29 2011 UTC. The chair is mikeyp. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:05:32 Useful Commands: #action #agreed #help #info #idea #link #topic. 20:05:54 #topic workflow engines 20:06:17 #topic workflow engines 20:06:24 :) 20:07:00 i was wondering about the error handling in the mailing list 20:07:03 we dont have an agenda full agenda - I think it's workflow engines, eventlet/zookeeper, and anything else 20:07:20 ok 20:07:35 i'm interested more in they handle runtime errors 20:08:14 and also if it deal with some failure issues, such as a node is crashed 20:08:26 main thing I noticed was that exceptions are just raised; there didn't appear to be any concept of exception handling specific to the workflow. 20:08:55 is the exception raised in another node (or another Python interpreter)? 20:09:27 it's single threaded, no conecpt of concurrency or parallelism. 20:09:37 got it 20:09:51 but we need something that can handle those 20:11:19 definitely, but I didn't find any cloud-grade (tm) workflows libraries 20:12:04 it's raises the larger point of how this will all work together - think we need Sandy for that. 20:12:26 ok. i'll put some thoughts on that too. 20:12:30 i'll try to convert my powerpoint proposal to a wiki page before next meeting. 20:12:57 the strawman I have in my head is 'orchestration' is a reliable service, that calls into other openstack services. 20:13:01 now that i've read though the nova code i have a better idea how to fit in the code.. 20:13:14 yes 20:13:43 I'm not sure what the granularity would be, either in initial or later releases. 20:14:24 i think combining that, with more orchestration cooperation logic inside the compute/network nodes, we have something there. 20:14:28 it seems like TROPIC could support fine grained control. 20:14:55 the "orchestrator" might actually nicely fit with the scheduler 20:15:47 agreed - I see changes there. 20:16:20 mike, can you elaborate in "fine grained control"? 20:16:47 just the general level of steps. 20:16:54 ok 20:17:33 so today, the operations are pretty high level. Schedule calls create, and a large number of things happen. 20:18:15 should those individual operations be coordinated by orchestration ? 20:18:43 i think if they are non trivial, e.g. takes a while to finish 20:18:53 they should report their status 20:19:10 so that the orchestrator could 1) know what's going on 20:19:17 2) if it's stuck/dead/crashed 20:19:29 3) abort, or restart if necessary 20:20:37 #action get sandys input on granularity of orchestration 20:21:00 the state of the workflow progress should be available 20:21:14 it could be either in database, or in zookeeper 20:21:48 right now, the task_state column is kind of like that 20:22:12 but can definitely be improved 20:22:14 yes - when I'v done this in the past, workflow runs independently of other operations, and can be interrogated 20:24:05 in your TROPIC work, where there multiple workflow servers ? 20:24:12 i also like to use the analogy of the OS process 20:24:45 we essentially need to build mechanisms to track the distributed processes as a coherent workflow 20:24:54 restart, or abort it if necessary 20:26:07 if you look at those business workflow management software, they are solving a different problem 20:26:53 yes. in TROPIC we call them controllers 20:27:01 there are multiple of them 20:27:04 business workflow tends to focus on process control, rather than process execution. 20:27:22 but only one is elected as leader to make decisions 20:27:49 so one is active, the others are 'standby' or failover ? 20:28:08 yes 20:28:36 got it - thats what I thoiugh the paper said. 20:28:41 it's hard to make distributed decision. :) 20:29:07 although possible, we run the numbers and seems one active is fast enough 20:29:40 i also looked at the other proposal mentioned in last meeting 20:29:44 from dragon 20:30:00 i felt it's very similar to the ppt file I sent 20:30:03 I#topic pacemaker 20:30:12 #topic pacemaker 20:30:47 I haven't really reviewed it, was mostly looking at libraries. 20:31:03 what are the main differences ? 20:31:27 between dragon's and mine? 20:31:52 yes 20:32:42 mine also proposes to keep logs so that we can automatic rollback 20:33:42 hold one 20:33:53 i need to refresh my memory. :) 20:34:31 #action maoy gives dragon proposal feedback 20:34:42 i'll do this in an email after the meeting 20:34:53 #action maoy gives dragon proposal feedback 20:35:15 ok, I will also review it. 20:35:32 i don't know much about pacemaker 20:35:42 #action mikeyp to review dragon's proposal 20:36:00 the picture of pacemaker seems to suggest that corosync is a dependency which i also know nothing about 20:37:02 i got zookeeper working with eventlet 20:37:11 so that's not a concern. 20:37:29 #link https://lists.launchpad.net/openstack/msg03767.html dragondm's proposal 20:37:44 #topic zookeeper / eventlet 20:38:00 yes, I saw that, good progress. 20:38:37 #topic vm-stat transitions 20:38:53 The proposed vm state transitions are in review 20:39:06 #link https://review.openstack.org/#change,1695 20:39:34 They seem to be held up, but I'm reviewing the changes anyways. 20:40:07 I should have said state transition management 20:41:06 somehow i felt that the solution they proposed is a little too complicated 20:41:41 i remember i saw a big state transition table in the summit 20:42:01 hopefully it can be simplified, otherwise it's hard to debug 20:42:53 hopefully, orchestration can remove some of the complications. 20:43:00 exactly 20:43:20 so, what else do we have ? 20:43:37 #topic wrap up 20:43:56 not much 20:44:34 next week is thanksgiving 20:44:36 OK, then lets wrap up till Sandy can review - I know he was out of pocket travelling today. 20:45:20 cool 20:45:28 #action mikeyp to send email re: next week schedule 20:45:34 ok, ttyl. 20:45:48 #endmeeting