20:00:10 #startmeeting state-management 20:00:10 Meeting started Thu May 30 20:00:10 2013 UTC. The chair is harlowja. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:14 The meeting name has been set to 'state_management' 20:00:29 Hi all, rollcall if anyone is interested :) 20:00:50 harlowja, hello there 20:00:52 howdy! 20:01:17 Ho 20:02:03 * harlowja give others a few minutes 20:02:14 #link https://wiki.openstack.org/wiki/Meetings/StateManagement#Agenda_for_next_meeting 20:02:45 hi 20:03:05 hey 20:03:31 so seems like we have enough people 20:03:48 can start off with a little status of the library (from what i know) 20:03:56 #topic status 20:04:12 anyone that has done anything with the library feel free to let others know :) 20:04:43 Looks like you guys have some good progress, I am still trying to catch up with you folks 20:04:49 i've been adding more decorators in, helping rackspace folks get there changes in for the db 'backend' 20:04:53 changbl thx! 20:04:58 harlowja, the code is mainly in TaskFlow right? 20:05:04 not NovaOrc? 20:05:10 correct 20:05:28 #link https://github.com/Yahoo/TaskFlow 20:05:34 ok, I will check taskflow 20:05:35 thats current location until the stackforge move finishes 20:05:56 so we are moving taskflow to stackforge? 20:06:08 ya, so that its not just a yahoo (aka me) thing :) 20:06:37 and we can use the review system that everyone involved in openstack knows 20:06:39 and all that 20:06:40 hello. 20:06:43 sure 20:06:44 hi kebray 20:07:06 kebray and updates from your side, just going through a little status of 'the world; 20:07:15 ^any 20:07:16 harlowja: Error: "any" is not a valid command. 20:07:28 harlowja, do I need to check anything on NovaOrc? 20:07:59 changbl if u just want to see how some of the nttdata folks and i started shifting code around in nova, then its useful 20:07:59 No updates from me other than I know Jessica was reworking her code… 20:08:08 kebray thx 20:08:19 harlowja, ok 20:08:26 she has an idea for distributed task flow management, but the first path she went down didn't pan out.. but, she's reworking it. 20:08:32 all good :) 20:08:43 She's out of office today. 20:09:06 np 20:09:14 kebray, is Jessica working on ZK backend now? 20:09:39 changbl. not at the moment… last I heard from her I think she said someone else is working on that. maybe you? 20:10:09 She is working on distributed state management, using celery I believe. 20:10:21 #topic whose-working-on-what 20:10:23 as opposed to linear/sequential state management. 20:10:26 kebray, I had a very busy week... but I plan to work on ZK backend 20:10:34 changbl i think thats still an open item 20:10:45 changbl: excellent. We have an interest. so happy to hear that. 20:11:07 kebray, sounds good 20:11:10 btw i'm working on plugging cinder in (and just the general library goodness) 20:11:37 i think kevin and jessica (from rackspace) are doing the distributed states (with celery) + db backend 20:11:37 harlowja: where is the code living now? Is it on stackforge? 20:11:54 harlowja: correct about kevin and Jessica. 20:12:08 kebray so its almost to stackforge, i think jessica has to do one more review commit :) 20:12:18 #link https://review.openstack.org/#/c/30789/ 20:12:26 someone complained about whitespace :( 20:12:40 lol 20:12:56 quite some effort to put a project on stackforge 20:12:59 def 20:13:40 howdy -- sorry i'm late 20:13:45 one questions, what is the diff between zk backend and celery? 20:13:45 i believe another guy is involved might also be looking in the ZK stuff as well changbl , haven't heard much though, can connect u 2 20:14:02 harlowja, sure 20:14:05 let me know 20:14:16 cool 20:14:20 hi alexheneveld 20:14:28 *we are just going over who is doing what as of right now :) 20:14:41 alexheneveld: hi 20:14:43 which does bring up a good point about how putting stuff on say launchpad might help it become more clear 20:14:52 but maybe after the stackforge move we can do that? 20:15:00 like launchpad/~taskflow or something 20:15:13 harlowja, one question, what is the diff between ZK backend and celery? I see a celery folder in backends/ in Taskflow 20:15:38 its a good question, and comes down to what do we want the ZK backend to do :) 20:15:57 celery is more of a way to run tasks, ZK wouldn't neccasarily be a 'running mechansim' 20:16:14 does that make sense? 20:16:30 #link http://www.celeryproject.org/ 20:16:30 I once used ZK to implemented a distributed queue 20:16:31 thx harlowja hi adrian_otto - we've been spiking it (samc) 20:16:36 yes, checking their website 20:16:38 celery supports ZK as a transport 20:16:42 not used it tho 20:16:52 TBH i think since we have a DB that is a logical choice 20:17:01 cool as ZK is of course :) 20:17:12 agreed, so ZK can be a 'storage' backend 20:17:25 but it also provides the other more interesting part of 'job transfer' 20:17:32 example 20:17:34 not sure if celery supports all the mutexes/alarms we need but that should be an abstract service 20:17:56 harlowja, what do we use ZK to store? 20:17:58 we'll likely need a combination of both 20:18:21 conductor A gets job to do Y, conductor A claims job Y via ZK, conductor A does 1/2 of the task required for job A, conductor A dies 20:18:42 now with ZK u can transfrer job Y from A -> conductor B when u notice it dies 20:19:04 assuming it rolls back and restarts from the beginnning? 20:19:21 #topic details 20:19:26 harlowja, you mean leader election via ZK? 20:20:17 adrian_otto well rolling back may be appropriate in some of the workflows, but not all, some u may just be able to resume where conductor A left off 20:20:46 ok 20:20:51 changbl so if u think of each job as having an owner, it maps to the concept of ZK locks and ZK watches 20:21:09 harlowja, yes, I used that before 20:21:34 when owner (conductor A) dies, ZK releases lock, then conductor B gets a notification via its watch on said lock, and then conductor B can attempt to acquire it 20:22:04 so then that brings up the question of what is stored to be able to allow conductor B to resume (or rollback) job Y 20:22:51 harlowja, make sense, that means ZK stores that info? 20:23:03 i mean from where to resume 20:23:17 right, each task/flow that composes job Y creates some result, that resulted is stored 'somewhere' and that result set can be referenced later for rollback or resuming 20:23:25 so ZK can be one such place to store, or a DB can 20:23:44 kevin from rackspace is working on the DB place to store that 20:24:01 which started to showup this week (in the db backend) 20:24:32 an example of how said storage might look like (when printed), was showing this to john griffith (the cinder PTL) 20:24:35 #link http://paste.openstack.org/show/37929/ 20:24:40 *just an example* 20:24:55 o/ 20:25:10 said information could be exposed as 'what did my job do' API 20:25:22 *if desired* 20:25:48 changbl does that all make sense? :) 20:26:06 harlowja, yes. Just wonder, do you have any code on this that I can check out? 20:26:22 for the ZK backend? 20:26:57 or the code that produced the above paste? 20:27:22 The code which defines what to store, and how they call APIs 20:27:43 so right now that is 20:27:49 #link https://github.com/yahoo/TaskFlow/blob/master/taskflow/logbook.py 20:27:55 so except locking you may like ZK also to keep the workflow state? why both db and ZK? 20:28:40 some folks are opposed to zookeeper… also, it'll be good to have a lightweight db implementation for devstack me thinks. 20:29:14 yes, but if someone wants to work on a ZK 'storage' backend, then that seems fine no? 20:29:22 Yes! 20:29:25 agreed 20:29:50 I'm not opposed to Zookeeper. I'm pro modularity and plugable backend implementations. 20:29:55 which does bring up the good question of ZK ;) 20:30:01 #topic locks 20:30:05 #link https://wiki.openstack.org/wiki/StructuredWorkflowLocks 20:30:24 so if u guys want to check that out, its my idea of something that taskflow could provide 20:30:37 although maybe not at stage 1.0 20:31:37 when i was reading the google chubby paper, they say that not even googles own developers do locking right, so thats why they made chubby, so it'd be nice to offer a api that can help get it *somewhat* right 20:31:45 #link http://research.google.com/archive/chubby-osdi06.pdf 20:32:33 i just think that resource level locking will come up pretty quick, especially after talking to devananda about ironic 20:32:39 #link https://wiki.openstack.org/wiki/StructuredWorkflowLocks#Ironic_.28WIP.29 20:33:11 o/ 20:33:25 ha, just talking about locking :-P 20:33:43 i think its something we should try to offer, but offer it 'very carefully' 20:33:46 thoughts? 20:33:47 #link https://github.com/openstack/ironic/blob/master/ironic/manager/task_manager.py#L20 20:33:55 is what i put together for ironic 20:34:23 thx devananda 20:34:40 nova alrady does this is a fairly poor way, IMHO 20:34:43 with 'task_state' 20:34:53 very easy to get out of sync // lost 20:35:16 yup, nova confuses a task_state with a lock 20:35:25 when task_state should just be that, a task state :-P 20:35:34 just look at "nova reset-state" 20:35:43 that that even exists says bad things 20:35:45 ya :-/ 20:36:33 do others think it'd be useful for taskflow to provide something like that locking API in the above wiki, or at least something like it 20:36:53 *backed by 1 or more implementations 20:37:29 harlowja will have to wade through some of those links before I have feedback. 20:37:32 np 20:37:34 regarding the lock wiki reference from above, in section "Filesystem" under Drawbacks, it reads "Does not release automatically on lock holder failure (but timeouts on locks are possible)." 20:37:43 yes, a generic locking service would be good 20:37:54 adrian_otto does that make sense, i could word it better? 20:38:00 that's not true, depending on what you define as failure 20:38:19 if the caller holding an advisory lock goes away, then the lock is automatically released by the kernel. 20:38:33 adrian_otto true, good point 20:38:52 go away = process terminates for any reason 20:39:01 alexheneveld its just when we give a locking api thingy, then people have to be prettty aware of how to acquire those locks 'sanely' 20:39:13 *ordering issues become a big problem, lol 20:39:14 is there a good one we could reuse however? 20:39:22 alexheneveld none afaik 20:39:46 filesystem locks should, IMO, call out a draback that it's not distributed 20:39:53 devananda sure 20:39:54 shame. it's hard to get right! people will just keep asking for new use cases otherwise. 20:40:19 i was reading about stuff like http://linux.die.net/man/3/dlm_lock, ibms vms and such 20:40:45 it might be an idea to aim for a higher-level model 20:40:46 devananda: you still could use a multiplicity of filesystems as a backing store for locks if you were inclined to make a networked lock service built that way. 20:40:50 dlm_lock and such stuff i didn't find much info on though, except that glusterfs i think uses its :-p 20:41:20 as in you are locking lifecycle of a server 20:41:43 alexheneveld agreed, i think this is why chubby (the google zookeeper) provides only coarse grained locking 20:42:04 and i think if we can be pretty coarse, we will also be fine, its just defining 'coarse' isn't so easy, ha 20:42:08 people should be discouraged from low-level locking operations tho i accept they may need to be available 20:42:09 adrian_otto: true, but with a SPoF on the network FS. or a consistency problem if using a distributed FS 20:42:41 harlowja: makes a lot of sense (re chubby) 20:42:50 harlowja, one question, why do we have so many (6) providers for locking? plan to implement all of them? 20:43:08 changbl it was more of just a survey of ones i could think of 20:43:18 oh, ok 20:43:26 i think redis, filesystem, ZK might be a good 3 20:44:04 or just redis + filesystem to start 20:44:37 *so as not to scare people with ZK 20:44:46 I can take ZK 20:44:59 for both storage and locking, i guess here? 20:45:13 that'd be cool 20:45:27 #link http://openreplica.org/faq/ was another interesting one, that has something like ZK in python 20:45:31 didn't investigate much more though 20:45:56 though i don't usually recommend it, i suspect innodb+galera would actually be a good fit here 20:46:06 NDB would be another, but very complex to set up 20:46:08 never used OpenReplica before 20:46:22 ZK + Kazoo seems work nice 20:47:02 changbl ya, it shouldn't be that hard to connect it to the taskflow api 20:47:21 AIUI, galera, with the proper settings, will do distributed lock before committing a write. min 3 servers, so partitioning is less of an issue 20:47:34 devananda intereting, didn't know that :) 20:47:47 *interesting 20:47:59 OpenReplica looks interesting. 20:48:19 adrian_otto ya, i'm not sure how mature it is though, didn't mess around with it that much 20:48:44 from the little reading on openreplica i did, it's aimed at geographic distribution 20:49:14 with some underlying paxos concoord thing 20:49:24 devananda: do you have a documentation pointer to galera re: locking? Where did you learn about that behavior? 20:49:46 lemme see if i can find it 20:49:56 adrian_otto: learned about it while working at percona ... :) 20:50:45 changbl if u want to work on the ZK stuff that'd be really neat, i think we can keep in the back of our mind how 'job/workflow' transfer can occur with ZK 20:51:02 adrian_otto: http://www.codership.com/wiki/doku.php?id=mysql_options_0.8 -- wsrep_causal_reads, IIRC 20:51:06 most of what I found when I looked for that is actually about table locking limitations in Master:Master setups 20:51:46 aha 20:52:06 harlowja, sure 20:52:12 read committed isolation level might actually work well enough for what we need. 20:52:46 and assuming that works equally well in a 3 node arrangement, that might actually be one of the best ZK alternatives. 20:53:12 better link / explanation 20:53:12 we could deploy/configure it using a HOT in a well automated way too, I expect. 20:53:17 #link http://www.percona.com/doc/percona-xtradb-cluster/wsrep-system-index.html#wsrep_causal_reads 20:53:36 adrian_otto: and yes, galera is designed to require min 3 nodes. it wont start with 2 20:53:50 but will continue to run if a 3-node degrades to 2 20:53:56 devananda for ironic, do u have any thoughts on what u'd like for say when the thing using your context manager crashes (without releasing)? 20:54:15 manual recovery at that point to release? time based release? 20:54:28 harlowja: eventually, time-based release-wipe-set-to-error or something 20:54:42 k 20:55:13 using a Galera cluster also solves a multi-tenancy concern as well, which may of the other options don't address. 20:55:18 (if any) 20:55:41 s/may/many/ 20:55:45 ZK has namespaces afaik ;) 20:55:57 with related auth? 20:56:05 *unsure* 20:56:29 alright, 4 minutes 20:57:32 adrian_otto it might be interesting to see if the hortonworks people (that i think are doing hadoop) have thought about how ZK and openstack 20:57:51 and especially the tenant issue 20:58:14 Savanna is actually Mirantis, not Hortonworks 20:58:18 ah 20:58:27 we know them, so I could ask 20:58:32 ah, either way 20:58:43 i thought hortonworks, guess i was mistakened 20:58:50 *which we know :-P 20:58:59 anyways, that'd be neat to see what they think 20:59:28 cool, well times up folks! 20:59:38 email, or irc, or more email for anything else :) 20:59:45 #endmeeting