#openstack-meeting log

17:00:35 <sandywalsh> #startmeeting
17:00:36 <openstack> Meeting started Tue Nov  8 17:00:35 2011 UTC.  The chair is sandywalsh. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic.
17:00:51 <sandywalsh> Who is here for orchestration meeting?
17:01:02 <sandywalsh> o/
17:01:05 <maoy> me
17:01:19 <maoy> hi sandy
17:01:27 <sandywalsh> hey! ... may be a short meeting
17:01:52 <sandywalsh> #topic pacemaker and zookeeper
17:02:14 <maoy> is Andrew Beekoff here?
17:02:22 <sandywalsh> I don't believe so
17:02:49 <maoy> I'm not very familiar with pacemaker.
17:02:50 <sandywalsh> he's in charge of Pacemaker, which is a core part of red hats clustering strategy
17:03:05 <sandywalsh> nor am I, it seems very capable
17:03:24 <sandywalsh> the biggest issue I see is the conflict with the nova architecture
17:03:27 <mikeyp> I havent had a chance to review pacemaker at all, got through maoy's ppt, but not the full Tropic paper
17:03:32 <sandywalsh> (workers vs. master-slave)
17:04:01 <sandywalsh> mikeyp, the tropic paper is the log from last meeting
17:04:22 <mikeyp> just need to read it :-)
17:04:27 <sandywalsh> maoy, thanks again for the discussion on the row-locking issues
17:04:57 <sandywalsh> for me the next step is to mess with zookeeper (and the python bindings) to see what we can make it do
17:04:59 <maoy> the tropic paper requires a fairly big change to the nova architecture. so I made some changes in the ppt to simplify things
17:05:15 <maoy> you are welcome
17:05:33 <sandywalsh> I think we're in general agreement on the approach. I think your strategy fits in well with my proposal
17:05:43 <maoy> cool.
17:05:45 <sandywalsh> mikeyp, the workflow summary was great
17:06:15 <sandywalsh> I haven't heard of pyutillib.workflow ... what's the recent status of it?
17:06:22 <sandywalsh> is it maintained actively?
17:06:38 <maoy> the zk python binding works fine for me although I never tried with eventlet
17:06:41 <sandywalsh> (spiff workflow isn't actively maintained, and the author suggested we fork)
17:06:50 <mikeyp> I looks like it's actively maintained - last checkin was a couple of weeks ago.
17:06:55 <sandywalsh> nice
17:07:24 <sandywalsh> does it make any assumptions about persistence layer or require a web interface, etc? Or is it just an engine?
17:07:58 <dragondm> Yah, I do wonder abt the ZK interface + eventlet
17:08:07 <mikeyp> I don't know yet - I'm going to kich the tires today
17:08:10 <dragondm> It uses threading, +  a C module
17:08:24 <mikeyp> It does seem to be primarily an engine, thogu
17:08:25 <sandywalsh> dragondm, good point
17:08:47 <sandywalsh> mikeyp, can we put you down to give us a report on it?
17:09:03 <mikeyp> sure, no problem.
17:09:17 <sandywalsh> #action mikeyp to give us a report on pyutillib.workflow (dependencies ideally)
17:09:33 <maoy> there is a non-threading version, at least for the C API. Not sure if there is a python binding as well.
17:10:05 <sandywalsh> maoy, which did you use previously?
17:10:11 <sandywalsh> dragondm, did you look at zookeeper before?
17:10:25 <maoy> I used the multithread python binding
17:10:39 <dragondm> I've looked at it briefly, I haven't played w/ it much
17:10:47 <sandywalsh> maoy, but you weren't doing your project against nova, correct?
17:10:56 <maoy> correct
17:11:06 <maoy> that's for tropic
17:11:10 <maoy> which doesn't use evently
17:11:12 <maoy> eventlet
17:11:50 <sandywalsh> andrew mentioned the licensing of the python binding for pacemaker wouldn't be an issue, I do have a question for him on the engine portion.
17:12:02 <sandywalsh> or if he makes his money from professional services of the product
17:12:10 <maoy> i c.
17:12:16 <dragondm> the main concern would be something blocking in a C module that would prevent eventlet from taskswitching
17:12:23 <sandywalsh> correct
17:13:03 <sandywalsh> so, as an action item, maoy, can we put you down to investigate zookeeper/eventlet integration?
17:13:04 <dragondm> BTW: sandywalsh: my workorder concept was posted here: https://lists.launchpad.net/openstack/msg03767.html
17:13:08 <maoy> i'll take a look at it
17:13:37 <sandywalsh> #link https://lists.launchpad.net/openstack/msg03767.html  dragondm's proposal
17:13:44 <sandywalsh> thanks
17:14:01 <dragondm> I can expand that out if needed.
17:14:19 <sandywalsh> #action maoy to investigate zookeeper/eventlet integration. Is the threading model with the C library going to be an issue?
17:14:35 <maoy> so the concept of tasks and the analogy to process in OS makes sense?
17:14:47 <garyk_> Is it possible to run redundant Zookeepers?
17:14:54 <sandywalsh> dragondm, we'll give it a re-read and give you some feedback
17:15:39 <maoy> for zookeeper, you can run 2f+1 nodes to tolerate f node failures
17:15:40 <sandywalsh> #action give dragondm feedback on his proposal
17:15:47 <mikeyp> The workorder proposal seems really compatible with orchestration.
17:16:09 <dragondm> sandywalsh: and s/scheduler/orchestrator/ in that :>
17:16:49 <mikeyp> A question re:zookeeper - are there any concerns about adding a dependency on Zookeeper ?
17:16:56 <sandywalsh> dragondm, right ... I still think the two are synonymous
17:17:09 <dragondm> ya, pretty much
17:17:19 <sandywalsh> mikeyp, I thought about that ... I sort of view it the same as rabbit, mysql or apache
17:17:31 <sandywalsh> so long as the license works.
17:17:40 <maoy> have you guys thought about the retry logic?
17:17:50 <sandywalsh> however, there are replacements for apache, rabbit and mysql ... no so with zookeeper
17:18:11 <sandywalsh> maoy, not in depth yet ... until the workflow engine is in place
17:19:04 <sandywalsh> maoy did you think of it being handled in a different manner than the workflow?
17:19:24 <mikeyp> retry and rollback might become a next-release item - I've been thinking a little about it.
17:19:39 <maoy> it's just my opinion, but workflow is mostly studied in computer programming to capture and develop human-to-machine interaction.
17:20:04 <maoy> there is not much human interaction in nova. everything is a computer program..
17:20:15 <sandywalsh> when I say workflow I mean petri-net
17:20:24 <sandywalsh> (state machine)
17:20:27 <sandywalsh> sorry
17:20:53 <maoy> i like petri-net at the design phase
17:21:13 <sandywalsh> I'd like to see what the python code is going to look like to model these petri-nets
17:21:25 <dragondm> ya'
17:21:30 <sandywalsh> maoy, do you see something more formal for later stage?
17:21:32 <mikeyp> Maoy, thats true for a lot of cases.  There's also a whole world of production scheduling, ETL, and APP integration with little or no human interaction.
17:21:33 <maoy> when implemented, it's still going to be python programs with greenthreads and rpc calls, right?
17:21:48 <sandywalsh> maoy, yes
17:22:09 <mikeyp> I was planning on trying to implement a couple of workflows in pyutilab.workflow, to see what they look like.
17:22:24 <sandywalsh> mikeyp, that would be a big help
17:22:51 <garyk_> are the calls blocking? that is, can a number of events take place at once?
17:22:55 <maoy> sandy, nothing more formal..
17:22:56 <sandywalsh> #task examples of what the petri-net models would look like in python
17:23:12 <maoy> i'm trying to figure out the exact benefit after we have the petri-net
17:23:26 <sandywalsh> garyk_, good question.
17:23:52 <sandywalsh> maoy, of petri-net over single-state state machine?
17:24:23 <maoy> no.
17:24:36 <mikeyp> I think the benefit of a workflow / petri-net is that there can be many pre-defined workflows, so the service could expand to uses we haven't yet considered.
17:24:37 <sandywalsh> garyk_, there will likely be some blocking in the orchestation layer, but it should be on a per-job basis ... not per-service
17:24:45 <maoy> petri-net can model concurrent stuff. that i buy.
17:24:52 <maoy> but i'm wondering after we have the models, how to take advantages of it
17:25:01 <garyk_> ok
17:25:29 <sandywalsh> maoy, well, as you mentioned before, I think like a defining a single "retry" operation would be useful
17:25:40 <sandywalsh> and reusing that model in various places
17:25:52 <sandywalsh> (for example)
17:26:23 <mikeyp> workflows can also be nested - pretty powerful way of combining primitives for reuse.
17:26:29 <sandywalsh> yes
17:26:57 <maoy> how does that compare to a decorator @retry(max=3) then automatically catch exceptions and retry?
17:27:06 <sandywalsh> Perhaps even the whole "provision" an instance process would be comprised of sub workflows (such as load-image, move image, change networking, etc)
17:27:16 <garyk_> silly question - in the event that the host running the orchestration reboots, is there a way in which the orchestration can be resumed from the same point
17:27:18 <sandywalsh> maoy, what happens if that service dies?
17:27:28 <sandywalsh> maoy, the decorator has no persistence
17:27:55 <sandywalsh> but I see your point ... there may be places where code-level retries are better than workflow-modeled retries
17:28:02 <sandywalsh> hey andrew!
17:28:06 <beekhof> hey!
17:28:14 <beekhof> jetlagged, so i happened to be awake :)
17:28:24 <sandywalsh> beekhof, you may want to read the scrollback
17:28:51 <beekhof> could someone paste it somewhere?  i only just got my internet connection back
17:28:57 <maoy> I'm thinking that if a compute node dies, then the scheduler should receive a timeoutException if engineered correctly and retry there.
17:29:03 <sandywalsh> beekhof, there'll be a log when we stop the meeting
17:29:09 <beekhof> k
17:29:13 <maoy> hi beekhof!
17:29:15 <beekhof> i read last weeks too
17:29:23 <beekhof> hi maoy :)
17:29:31 <mikeyp> garyk, I think thats one reason to consider ZooKeeper - a way of storing state reliably.
17:29:45 * heckj wonders which meeting he walked into
17:29:53 <sandywalsh> heckj, orchestration
17:30:10 <dragondm> garyk_: yah, that was the reason I thought of the workorder idea. THat way the orchestration service is basically stateless.  Doesn't matterif one falls over.
17:30:12 <sandywalsh> #topic Orchestration - packemaker & zookeeper
17:30:16 <heckj> sandywalsh: cool, thank you
17:30:53 <garyk_> ok - sounds good. does the zookeeper keep some kind of configuration id to track the states?
17:30:54 <sandywalsh> dragondm, does your proposal use zookeeper?
17:31:09 <dragondm> I didn't specify storage.
17:31:16 <sandywalsh> k
17:31:31 <sandywalsh> well, I think we have a good list of to-do's for this week
17:31:36 <dragondm> The design I had  could be persisted with an db
17:31:39 <beekhof> for those that came in late, what type of state are we specifically talking about?
17:31:43 <maoy> dragondm, i'll read your link after the meeting
17:31:59 <sandywalsh> beekhof, state machine
17:32:15 <heckj> if you have many "clients" wanting to all agree on state, zookeeper is an excellent way of doing it.
17:32:25 <maoy> zookeeper is used for 3 reasons: persistent storage, lock management, and leader election
17:32:27 <sandywalsh> beekhof, and that could contain things like "VM state"
17:32:50 <heckj> a bit of extra complexity, but does a lot of the hard work of distributed locks to enable that sort of thing
17:32:50 <maoy> it could be for the queue as well but since we're using rabbit, no need for zk at the moment
17:32:56 <sandywalsh> beekhof, or rollback status, etc
17:33:27 <garyk_> will it require support for authentication?
17:33:35 <sandywalsh> yes, the trickiest thing about zk is what is it's core competency
17:33:41 <beekhof> so would this be analogous to writing "guest X is starting" to a db?
17:34:00 <sandywalsh> beekhof, I think so, yes
17:34:26 <maoy> beekhof, yes for storage purpose.
17:34:27 <sandywalsh> beekhof, the concern I brought up on the ML was zk vs. row-level locking
17:34:54 <sandywalsh> and it sort of sounds like zk is an abstraction over those differences
17:35:06 <sandywalsh> likely doing it's own row-level locking under the hook
17:35:48 <beekhof> so a random scheduler would grab guestX off the queue, say "i got this", and then go about the steps involved in start it up, updating the state as it went?
17:35:48 <sandywalsh> ok ... I'd like to push the topic ahead for now
17:36:02 <sandywalsh> sorry, go ahead beekhof  ...
17:36:02 <maoy> zookeeper uses a quorum protocol to reach consensus.
17:36:06 <maoy> :)
17:36:45 <sandywalsh> beekhof, it would do one step in the process, when the event came it that the step finished another worker could handle the next step.
17:36:58 <beekhof> ok, i can see the advantage there
17:37:32 <sandywalsh> beekhof, I think that's the fundamental difference between PM and ZK ... master/slave vs. workers
17:37:39 <beekhof> yep
17:37:46 <sandywalsh> (well, and your resource manager)
17:37:46 <maoy> it can also grab a lock on the instance so that no one else is touching the VM
17:38:02 <sandywalsh> (which zk doesn't do)
17:38:24 <sandywalsh> let's continue this one on the ML
17:38:28 <beekhof> sure
17:38:34 <sandywalsh> #topic Orchestration - Meeting time
17:38:48 <sandywalsh> what UTC are most of you in?
17:38:57 <sandywalsh> -4
17:38:58 <beekhof> ah, this one's my fault :)
17:39:05 <maoy> -5
17:39:20 <sandywalsh> well, also, is Tuesday best?
17:39:37 <beekhof> right now, I'm +10
17:39:40 <beekhof> http://www.worldtimebuddy.com/
17:39:53 <beekhof> really handy for this sort of thing
17:39:54 <mikeyp> UTC -8 / Pacific
17:40:10 <sandywalsh> beekhof, heh, both sides
17:40:23 <sandywalsh> I mean beekhof & mikeyp
17:40:26 <heckj> same as mikeyp
17:40:42 <garyk_> i am sorry i need to go and feed the animal in my zoo. thanks for the great ideas.
17:40:53 <mikeyp> round the clock, follow the sun  development :-)
17:40:56 <maoy> bye garyk
17:40:58 <sandywalsh> garyk_, thanks for the input
17:41:21 <sandywalsh> k, so unless there are any objections ... keep meeting time the same?
17:41:33 <mikeyp> works for me
17:41:34 <beekhof> actually
17:42:19 <beekhof> did we get any europeans?
17:42:35 <sandywalsh> not active ... perhaps lurkers
17:43:14 <beekhof> what about 2:15 from now?
17:43:19 <beekhof> is that too late for anyone?
17:43:26 <beekhof> 2:15:00
17:43:35 <sandywalsh> on tues it will be a conflict with other openstack teams
17:43:44 <sandywalsh> we'd have to move days
17:43:53 <beekhof> because thats 7am, which is easily doable
17:44:00 <beekhof> 7am here i mean
17:44:29 * sandywalsh tries to figure out what that would be for him
17:44:30 <beekhof> 4am is harder and i'm less coherent
17:44:34 <sandywalsh> :)
17:44:54 <beekhof> for PDT it should be about lunch time
17:44:57 <maoy> 7pm Eastern?
17:45:43 <beekhof> is boston eastern?
17:45:47 <sandywalsh> yup
17:45:50 <maoy> yes
17:46:17 <beekhof> that website is claiming my 7am is your 3pm
17:46:56 <sandywalsh> this room is booked until 2300 UTC on tues
17:47:40 <beekhof> different day?  i'd really like to join on a regular basis
17:47:57 <sandywalsh> See an opening? http://wiki.openstack.org/Meetings
17:48:28 <sandywalsh> it would have to be Thurs for me
17:48:55 <beekhof> thurs is fine by me
17:49:20 <beekhof> and there appears to be only one other meeting on that day
17:49:23 <mikeyp> thurs works for me.
17:49:25 <dragondm> thursday, when?
17:49:33 <maoy> Thursday 3pm EST, 20 UTC?
17:49:58 <beekhof> that would be ideal for me
17:49:59 <maoy> 20:00 UTC
17:50:02 <sandywalsh> done
17:50:09 <beekhof> sweet :)
17:50:11 <maoy> cool
17:50:20 <sandywalsh> #action meeting moved to Thursdays 3pm EST, 2000 UTC
17:50:31 <maoy> beekhof, what's the best intro reading for pacemaker?
17:50:31 <sandywalsh> thanks guys ... keep active on the ML!
17:50:52 <heckj> there's a detailed PDF called "pacemaker explained" which does a good job.
17:50:57 <heckj> warning: very complex critter...
17:51:08 <beekhof> maoy: that one and "clusters from scratch"
17:51:20 <beekhof> heckj: yeah, pretty dry
17:51:37 <heckj> beekhof: yeah, but the best detail short of "playing with it" incessantly
17:51:38 <beekhof> maoy: http://www.clusterlabs.org/doc <-- look for the 1.1 version
17:52:21 <maoy> ok
17:52:26 <sandywalsh> anything quick before we end?
17:52:31 <beekhof> nod.  its job is to detail all the options and possibilities, but doesnt give the first clue how to put it together sanely :)
17:52:43 <beekhof> i'll read the notes
17:52:43 <sandywalsh> #endmeeting