17:00:35 <sandywalsh> #startmeeting 17:00:36 <openstack> Meeting started Tue Nov 8 17:00:35 2011 UTC. The chair is sandywalsh. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic. 17:00:51 <sandywalsh> Who is here for orchestration meeting? 17:01:02 <sandywalsh> o/ 17:01:05 <maoy> me 17:01:19 <maoy> hi sandy 17:01:27 <sandywalsh> hey! ... may be a short meeting 17:01:52 <sandywalsh> #topic pacemaker and zookeeper 17:02:14 <maoy> is Andrew Beekoff here? 17:02:22 <sandywalsh> I don't believe so 17:02:49 <maoy> I'm not very familiar with pacemaker. 17:02:50 <sandywalsh> he's in charge of Pacemaker, which is a core part of red hats clustering strategy 17:03:05 <sandywalsh> nor am I, it seems very capable 17:03:24 <sandywalsh> the biggest issue I see is the conflict with the nova architecture 17:03:27 <mikeyp> I havent had a chance to review pacemaker at all, got through maoy's ppt, but not the full Tropic paper 17:03:32 <sandywalsh> (workers vs. master-slave) 17:04:01 <sandywalsh> mikeyp, the tropic paper is the log from last meeting 17:04:22 <mikeyp> just need to read it :-) 17:04:27 <sandywalsh> maoy, thanks again for the discussion on the row-locking issues 17:04:57 <sandywalsh> for me the next step is to mess with zookeeper (and the python bindings) to see what we can make it do 17:04:59 <maoy> the tropic paper requires a fairly big change to the nova architecture. so I made some changes in the ppt to simplify things 17:05:15 <maoy> you are welcome 17:05:33 <sandywalsh> I think we're in general agreement on the approach. I think your strategy fits in well with my proposal 17:05:43 <maoy> cool. 17:05:45 <sandywalsh> mikeyp, the workflow summary was great 17:06:15 <sandywalsh> I haven't heard of pyutillib.workflow ... what's the recent status of it? 17:06:22 <sandywalsh> is it maintained actively? 17:06:38 <maoy> the zk python binding works fine for me although I never tried with eventlet 17:06:41 <sandywalsh> (spiff workflow isn't actively maintained, and the author suggested we fork) 17:06:50 <mikeyp> I looks like it's actively maintained - last checkin was a couple of weeks ago. 17:06:55 <sandywalsh> nice 17:07:24 <sandywalsh> does it make any assumptions about persistence layer or require a web interface, etc? Or is it just an engine? 17:07:58 <dragondm> Yah, I do wonder abt the ZK interface + eventlet 17:08:07 <mikeyp> I don't know yet - I'm going to kich the tires today 17:08:10 <dragondm> It uses threading, + a C module 17:08:24 <mikeyp> It does seem to be primarily an engine, thogu 17:08:25 <sandywalsh> dragondm, good point 17:08:47 <sandywalsh> mikeyp, can we put you down to give us a report on it? 17:09:03 <mikeyp> sure, no problem. 17:09:17 <sandywalsh> #action mikeyp to give us a report on pyutillib.workflow (dependencies ideally) 17:09:33 <maoy> there is a non-threading version, at least for the C API. Not sure if there is a python binding as well. 17:10:05 <sandywalsh> maoy, which did you use previously? 17:10:11 <sandywalsh> dragondm, did you look at zookeeper before? 17:10:25 <maoy> I used the multithread python binding 17:10:39 <dragondm> I've looked at it briefly, I haven't played w/ it much 17:10:47 <sandywalsh> maoy, but you weren't doing your project against nova, correct? 17:10:56 <maoy> correct 17:11:06 <maoy> that's for tropic 17:11:10 <maoy> which doesn't use evently 17:11:12 <maoy> eventlet 17:11:50 <sandywalsh> andrew mentioned the licensing of the python binding for pacemaker wouldn't be an issue, I do have a question for him on the engine portion. 17:12:02 <sandywalsh> or if he makes his money from professional services of the product 17:12:10 <maoy> i c. 17:12:16 <dragondm> the main concern would be something blocking in a C module that would prevent eventlet from taskswitching 17:12:23 <sandywalsh> correct 17:13:03 <sandywalsh> so, as an action item, maoy, can we put you down to investigate zookeeper/eventlet integration? 17:13:04 <dragondm> BTW: sandywalsh: my workorder concept was posted here: https://lists.launchpad.net/openstack/msg03767.html 17:13:08 <maoy> i'll take a look at it 17:13:37 <sandywalsh> #link https://lists.launchpad.net/openstack/msg03767.html dragondm's proposal 17:13:44 <sandywalsh> thanks 17:14:01 <dragondm> I can expand that out if needed. 17:14:19 <sandywalsh> #action maoy to investigate zookeeper/eventlet integration. Is the threading model with the C library going to be an issue? 17:14:35 <maoy> so the concept of tasks and the analogy to process in OS makes sense? 17:14:47 <garyk_> Is it possible to run redundant Zookeepers? 17:14:54 <sandywalsh> dragondm, we'll give it a re-read and give you some feedback 17:15:39 <maoy> for zookeeper, you can run 2f+1 nodes to tolerate f node failures 17:15:40 <sandywalsh> #action give dragondm feedback on his proposal 17:15:47 <mikeyp> The workorder proposal seems really compatible with orchestration. 17:16:09 <dragondm> sandywalsh: and s/scheduler/orchestrator/ in that :> 17:16:49 <mikeyp> A question re:zookeeper - are there any concerns about adding a dependency on Zookeeper ? 17:16:56 <sandywalsh> dragondm, right ... I still think the two are synonymous 17:17:09 <dragondm> ya, pretty much 17:17:19 <sandywalsh> mikeyp, I thought about that ... I sort of view it the same as rabbit, mysql or apache 17:17:31 <sandywalsh> so long as the license works. 17:17:40 <maoy> have you guys thought about the retry logic? 17:17:50 <sandywalsh> however, there are replacements for apache, rabbit and mysql ... no so with zookeeper 17:18:11 <sandywalsh> maoy, not in depth yet ... until the workflow engine is in place 17:19:04 <sandywalsh> maoy did you think of it being handled in a different manner than the workflow? 17:19:24 <mikeyp> retry and rollback might become a next-release item - I've been thinking a little about it. 17:19:39 <maoy> it's just my opinion, but workflow is mostly studied in computer programming to capture and develop human-to-machine interaction. 17:20:04 <maoy> there is not much human interaction in nova. everything is a computer program.. 17:20:15 <sandywalsh> when I say workflow I mean petri-net 17:20:24 <sandywalsh> (state machine) 17:20:27 <sandywalsh> sorry 17:20:53 <maoy> i like petri-net at the design phase 17:21:13 <sandywalsh> I'd like to see what the python code is going to look like to model these petri-nets 17:21:25 <dragondm> ya' 17:21:30 <sandywalsh> maoy, do you see something more formal for later stage? 17:21:32 <mikeyp> Maoy, thats true for a lot of cases. There's also a whole world of production scheduling, ETL, and APP integration with little or no human interaction. 17:21:33 <maoy> when implemented, it's still going to be python programs with greenthreads and rpc calls, right? 17:21:48 <sandywalsh> maoy, yes 17:22:09 <mikeyp> I was planning on trying to implement a couple of workflows in pyutilab.workflow, to see what they look like. 17:22:24 <sandywalsh> mikeyp, that would be a big help 17:22:51 <garyk_> are the calls blocking? that is, can a number of events take place at once? 17:22:55 <maoy> sandy, nothing more formal.. 17:22:56 <sandywalsh> #task examples of what the petri-net models would look like in python 17:23:12 <maoy> i'm trying to figure out the exact benefit after we have the petri-net 17:23:26 <sandywalsh> garyk_, good question. 17:23:52 <sandywalsh> maoy, of petri-net over single-state state machine? 17:24:23 <maoy> no. 17:24:36 <mikeyp> I think the benefit of a workflow / petri-net is that there can be many pre-defined workflows, so the service could expand to uses we haven't yet considered. 17:24:37 <sandywalsh> garyk_, there will likely be some blocking in the orchestation layer, but it should be on a per-job basis ... not per-service 17:24:45 <maoy> petri-net can model concurrent stuff. that i buy. 17:24:52 <maoy> but i'm wondering after we have the models, how to take advantages of it 17:25:01 <garyk_> ok 17:25:29 <sandywalsh> maoy, well, as you mentioned before, I think like a defining a single "retry" operation would be useful 17:25:40 <sandywalsh> and reusing that model in various places 17:25:52 <sandywalsh> (for example) 17:26:23 <mikeyp> workflows can also be nested - pretty powerful way of combining primitives for reuse. 17:26:29 <sandywalsh> yes 17:26:57 <maoy> how does that compare to a decorator @retry(max=3) then automatically catch exceptions and retry? 17:27:06 <sandywalsh> Perhaps even the whole "provision" an instance process would be comprised of sub workflows (such as load-image, move image, change networking, etc) 17:27:16 <garyk_> silly question - in the event that the host running the orchestration reboots, is there a way in which the orchestration can be resumed from the same point 17:27:18 <sandywalsh> maoy, what happens if that service dies? 17:27:28 <sandywalsh> maoy, the decorator has no persistence 17:27:55 <sandywalsh> but I see your point ... there may be places where code-level retries are better than workflow-modeled retries 17:28:02 <sandywalsh> hey andrew! 17:28:06 <beekhof> hey! 17:28:14 <beekhof> jetlagged, so i happened to be awake :) 17:28:24 <sandywalsh> beekhof, you may want to read the scrollback 17:28:51 <beekhof> could someone paste it somewhere? i only just got my internet connection back 17:28:57 <maoy> I'm thinking that if a compute node dies, then the scheduler should receive a timeoutException if engineered correctly and retry there. 17:29:03 <sandywalsh> beekhof, there'll be a log when we stop the meeting 17:29:09 <beekhof> k 17:29:13 <maoy> hi beekhof! 17:29:15 <beekhof> i read last weeks too 17:29:23 <beekhof> hi maoy :) 17:29:31 <mikeyp> garyk, I think thats one reason to consider ZooKeeper - a way of storing state reliably. 17:29:45 * heckj wonders which meeting he walked into 17:29:53 <sandywalsh> heckj, orchestration 17:30:10 <dragondm> garyk_: yah, that was the reason I thought of the workorder idea. THat way the orchestration service is basically stateless. Doesn't matterif one falls over. 17:30:12 <sandywalsh> #topic Orchestration - packemaker & zookeeper 17:30:16 <heckj> sandywalsh: cool, thank you 17:30:53 <garyk_> ok - sounds good. does the zookeeper keep some kind of configuration id to track the states? 17:30:54 <sandywalsh> dragondm, does your proposal use zookeeper? 17:31:09 <dragondm> I didn't specify storage. 17:31:16 <sandywalsh> k 17:31:31 <sandywalsh> well, I think we have a good list of to-do's for this week 17:31:36 <dragondm> The design I had could be persisted with an db 17:31:39 <beekhof> for those that came in late, what type of state are we specifically talking about? 17:31:43 <maoy> dragondm, i'll read your link after the meeting 17:31:59 <sandywalsh> beekhof, state machine 17:32:15 <heckj> if you have many "clients" wanting to all agree on state, zookeeper is an excellent way of doing it. 17:32:25 <maoy> zookeeper is used for 3 reasons: persistent storage, lock management, and leader election 17:32:27 <sandywalsh> beekhof, and that could contain things like "VM state" 17:32:50 <heckj> a bit of extra complexity, but does a lot of the hard work of distributed locks to enable that sort of thing 17:32:50 <maoy> it could be for the queue as well but since we're using rabbit, no need for zk at the moment 17:32:56 <sandywalsh> beekhof, or rollback status, etc 17:33:27 <garyk_> will it require support for authentication? 17:33:35 <sandywalsh> yes, the trickiest thing about zk is what is it's core competency 17:33:41 <beekhof> so would this be analogous to writing "guest X is starting" to a db? 17:34:00 <sandywalsh> beekhof, I think so, yes 17:34:26 <maoy> beekhof, yes for storage purpose. 17:34:27 <sandywalsh> beekhof, the concern I brought up on the ML was zk vs. row-level locking 17:34:54 <sandywalsh> and it sort of sounds like zk is an abstraction over those differences 17:35:06 <sandywalsh> likely doing it's own row-level locking under the hook 17:35:48 <beekhof> so a random scheduler would grab guestX off the queue, say "i got this", and then go about the steps involved in start it up, updating the state as it went? 17:35:48 <sandywalsh> ok ... I'd like to push the topic ahead for now 17:36:02 <sandywalsh> sorry, go ahead beekhof ... 17:36:02 <maoy> zookeeper uses a quorum protocol to reach consensus. 17:36:06 <maoy> :) 17:36:45 <sandywalsh> beekhof, it would do one step in the process, when the event came it that the step finished another worker could handle the next step. 17:36:58 <beekhof> ok, i can see the advantage there 17:37:32 <sandywalsh> beekhof, I think that's the fundamental difference between PM and ZK ... master/slave vs. workers 17:37:39 <beekhof> yep 17:37:46 <sandywalsh> (well, and your resource manager) 17:37:46 <maoy> it can also grab a lock on the instance so that no one else is touching the VM 17:38:02 <sandywalsh> (which zk doesn't do) 17:38:24 <sandywalsh> let's continue this one on the ML 17:38:28 <beekhof> sure 17:38:34 <sandywalsh> #topic Orchestration - Meeting time 17:38:48 <sandywalsh> what UTC are most of you in? 17:38:57 <sandywalsh> -4 17:38:58 <beekhof> ah, this one's my fault :) 17:39:05 <maoy> -5 17:39:20 <sandywalsh> well, also, is Tuesday best? 17:39:37 <beekhof> right now, I'm +10 17:39:40 <beekhof> http://www.worldtimebuddy.com/ 17:39:53 <beekhof> really handy for this sort of thing 17:39:54 <mikeyp> UTC -8 / Pacific 17:40:10 <sandywalsh> beekhof, heh, both sides 17:40:23 <sandywalsh> I mean beekhof & mikeyp 17:40:26 <heckj> same as mikeyp 17:40:42 <garyk_> i am sorry i need to go and feed the animal in my zoo. thanks for the great ideas. 17:40:53 <mikeyp> round the clock, follow the sun development :-) 17:40:56 <maoy> bye garyk 17:40:58 <sandywalsh> garyk_, thanks for the input 17:41:21 <sandywalsh> k, so unless there are any objections ... keep meeting time the same? 17:41:33 <mikeyp> works for me 17:41:34 <beekhof> actually 17:42:19 <beekhof> did we get any europeans? 17:42:35 <sandywalsh> not active ... perhaps lurkers 17:43:14 <beekhof> what about 2:15 from now? 17:43:19 <beekhof> is that too late for anyone? 17:43:26 <beekhof> 2:15:00 17:43:35 <sandywalsh> on tues it will be a conflict with other openstack teams 17:43:44 <sandywalsh> we'd have to move days 17:43:53 <beekhof> because thats 7am, which is easily doable 17:44:00 <beekhof> 7am here i mean 17:44:29 * sandywalsh tries to figure out what that would be for him 17:44:30 <beekhof> 4am is harder and i'm less coherent 17:44:34 <sandywalsh> :) 17:44:54 <beekhof> for PDT it should be about lunch time 17:44:57 <maoy> 7pm Eastern? 17:45:43 <beekhof> is boston eastern? 17:45:47 <sandywalsh> yup 17:45:50 <maoy> yes 17:46:17 <beekhof> that website is claiming my 7am is your 3pm 17:46:56 <sandywalsh> this room is booked until 2300 UTC on tues 17:47:40 <beekhof> different day? i'd really like to join on a regular basis 17:47:57 <sandywalsh> See an opening? http://wiki.openstack.org/Meetings 17:48:28 <sandywalsh> it would have to be Thurs for me 17:48:55 <beekhof> thurs is fine by me 17:49:20 <beekhof> and there appears to be only one other meeting on that day 17:49:23 <mikeyp> thurs works for me. 17:49:25 <dragondm> thursday, when? 17:49:33 <maoy> Thursday 3pm EST, 20 UTC? 17:49:58 <beekhof> that would be ideal for me 17:49:59 <maoy> 20:00 UTC 17:50:02 <sandywalsh> done 17:50:09 <beekhof> sweet :) 17:50:11 <maoy> cool 17:50:20 <sandywalsh> #action meeting moved to Thursdays 3pm EST, 2000 UTC 17:50:31 <maoy> beekhof, what's the best intro reading for pacemaker? 17:50:31 <sandywalsh> thanks guys ... keep active on the ML! 17:50:52 <heckj> there's a detailed PDF called "pacemaker explained" which does a good job. 17:50:57 <heckj> warning: very complex critter... 17:51:08 <beekhof> maoy: that one and "clusters from scratch" 17:51:20 <beekhof> heckj: yeah, pretty dry 17:51:37 <heckj> beekhof: yeah, but the best detail short of "playing with it" incessantly 17:51:38 <beekhof> maoy: http://www.clusterlabs.org/doc <-- look for the 1.1 version 17:52:21 <maoy> ok 17:52:26 <sandywalsh> anything quick before we end? 17:52:31 <beekhof> nod. its job is to detail all the options and possibilities, but doesnt give the first clue how to put it together sanely :) 17:52:43 <beekhof> i'll read the notes 17:52:43 <sandywalsh> #endmeeting