20:00:29 #startmeeting state-management 20:00:30 Meeting started Thu May 16 20:00:29 2013 UTC. The chair is harlowja. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:31 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:34 The meeting name has been set to 'state_management' 20:00:36 howday all! 20:00:39 hi 20:00:41 hi 20:00:50 hi hi 20:01:04 hello~ 20:01:11 howdy 20:01:21 #link https://wiki.openstack.org/wiki/Meetings/StateManagement#Agenda_for_next_meeting 20:01:33 jgriffith yt 20:02:24 hi 20:02:39 hey, so i guess we can start, anyone else can wonder in :) 20:02:39 hi 20:02:59 so lots of good stuff happening, where to start! :) 20:03:14 :) 20:03:37 i think there is enough activity that we decided to go directly to stackforge, and then as we flush out cinder (And elsewhere) we can start hooking it in there 20:03:50 jlucci from rackspace is helping get the stackforge stuff going 20:04:27 #link https://bugs.launchpad.net/openstack-ci/+bug/1179754 20:04:29 Launchpad bug 1179754 in openstack-ci "Create taskflow-core gerrit group" [Wishlist,Incomplete] 20:04:43 i started that, but it seems like its more of the gerrit config files that matters, so thats good news, pretty self-service 20:05:19 so as far as cinder integration, jgriffith has written up some useful initial diagrams of what cinder is doing and the issues it has 20:05:42 #link https://wiki.openstack.org/wiki/StructuredWorkflows#Cinder 20:06:04 create volume and create snapshot currently suffer from the lack of abilitiy to revert correctly it seems 20:06:48 the ntt folks have also updated evacuate/migrate in https://wiki.openstack.org/wiki/StructuredWorkflows#Nova as well for those that are interested 20:06:58 #link https://wiki.openstack.org/wiki/StructuredWorkflows#Nova 20:07:29 so thats all good stuff, and shows that we are making progress in step #1, understanding the problems to fix :) 20:08:05 if anyone is interested they can add there potential use case there also :) 20:08:32 wow 20:08:40 RunInstance diagram looks impressive 20:08:50 scary or impressive ;) 20:08:57 scary impressive, lol 20:09:08 ) 20:09:11 i also updated the primitives wiki with some minor changes 20:09:16 #link https://wiki.openstack.org/wiki/StructuredWorkflowPrimitives 20:09:40 added the concept of a 'job claim' which is how a job gets claimed by somethign about to work on it 20:09:58 *could be ZK based, DB based... 20:10:32 also started adding some of the underlying patterns that taskflow might be able to help organize code 'states' with 20:10:34 #link https://wiki.openstack.org/wiki/StructuredWorkflowPrimitives#Patterns 20:10:45 hi -- sorry late. but just in time -- that was the question i was going to ask 20:11:07 can some of this be tied in to the existing DB entities somehow… 20:11:19 thats the plan 20:11:36 excellent! 20:12:06 we'll see what kind of schema changes might have to happen, but hopefuly not so many :) 20:12:18 guys, that is not going to work for lock management 20:12:24 agreed 20:12:33 it works fine for persisting state, but not for managing concurrency. 20:13:22 right, but i can see certain uses (say in nova) where they don't have much of a concurrency to begin with, so they can still take advantage of the other parts of the library (just not the cocurrent job claiming stuff) 20:13:31 so if the only expectation is that a single thread in a single sequence is going to use the DB to track state transitions, then fine 20:13:41 right 20:13:44 but as soon as you bring in multiple workers, that breaks down 20:13:55 yup 20:14:10 i think we need to put a big warning on the DB implementations that say that 20:14:16 adrian_otto: some db's let you … but is the concern portabiliity? 20:14:23 yes 20:14:31 but zk should be fine for concurrency 20:14:41 there is a feature in MySQL to do applicaiton locks, but you can't expect to put an ORM in front of that 20:14:42 We don't want to commit to zk 20:14:50 I'm not suggesting zk 20:15:09 but that you can't rely on SQLite or MySQL to act as a backing store for a locking implementation 20:15:10 some generic mutex / notify service 20:15:15 ? 20:15:25 because of MVCC 20:16:04 yes, something as simple as something with a backend on a single node with flock() or fcntl() would even work 20:16:33 but we need to thing of solving concurrency control and data persistence as separate implementations 20:16:38 s/thing/think/ 20:16:49 that are coordinated. 20:16:57 so i think there are 2 pieces of this library that it will provide, 1 is the job posting, ownership, concurrency stuff, which likely a DB is not gonna work, but then there is the task/workflow organization, which can be used almost seperatly from the distributed stuff in a way 20:16:58 i must be missing something. orchestration tasks are long running, it's probably a bad idea to hold db lock for that long. need to use some entries to "safeguard" instead 20:17:30 harlowja: exactly. 20:18:00 cool, i think seeing an example might help :) 20:18:09 maoy: we need a lock primitive. Regardless if they are coarse locks or fine grained locks. 20:18:18 harlowja: could job posting + ownership still be done in the DB ? just have a very lightweight concurrency control service. 20:18:29 yes 20:19:28 alexheneveld i believe so, it really depends on the tasks that compose the workflow, simple stuff like in most of openstack i think can be made pretty lightweight, but more complicated things like user-defined workflows shouldn't be restricted by the same code, we should try to enable both :) 20:20:10 say like for most of openstack, u could get away with posting (which is really a mq message) and ownership can be setting a field in a db 20:20:20 that works ok, but then said owner isn't resilent to failure 20:20:31 *which might be ok for openstack binaries which can be restarted by init.d and such 20:21:19 harlowja: but you can have a health check process which determines whether owner is gone away and/or timeout on the job... 20:21:40 for the first drop of this, I'd like to see something simple 20:21:41 yes, which might be fine for openstack, but not acceptable for users i think 20:21:42 which touches on scheduler… the two are pretty closely related i think 20:21:42 and not over engineered 20:21:52 hemna agreed 20:22:02 but we should discuss the complex cases, since it will come up 20:22:03 lets not solve world hunger here 20:22:12 imho the complex cases are just more advanced primitives 20:22:25 the simple cases just use dumbed-down primitives :) 20:22:34 agreed 20:22:44 agreed 20:22:49 so thats why having a solid primitive foundation is pretty important 20:22:55 I'm fine with using a db backing for starters. 20:23:05 agreed 20:23:25 so the taskflow library right now has a memory backed part to start, new this week, mostly works :-P 20:23:34 :) 20:23:43 example usage 20:23:45 #link http://paste.openstack.org/show/37363/ 20:24:02 that would be something like the cinder workflow that they have 20:24:21 * harlowja let people digest that for a sec 20:25:09 * hemna chews 20:25:09 using the primitives, even with simple memory backends we can experiment with reverting, resumption and all that 20:26:31 it gets interesting when u start using a database backed logbook for example 20:26:33 looks good to me. I like that as a simple starting point. 20:26:55 *note that there is no concurrency there, no distributed jobs and such 20:27:08 but imho u get alot just with this type of usage 20:27:08 yah I think that's good 20:27:48 yep, we can start with this, and revisit the concurrency stuff in a subsequent effort 20:28:03 well jlucci is helping make the distributed stuff work :) so its all happening ;) 20:28:20 but i think the distributed stuff is a seperate 'pattern', not the only pattern, but one of them 20:28:29 yep 20:28:56 some peoples usage will just be the simple case, like cinder for example, don't care about distributing your job, just want to have a workflow revert, then this might work fine (or almost be fine +- some other changes) 20:29:06 nova i think is similar with conductor 20:29:19 since they have there own 'job posting/disribution' mechanisms 20:29:26 *distribution 20:29:44 its not perfect yet, but i think the example there shows the potential :) 20:29:50 harlowja: code looks clean 20:29:58 thx! 20:31:19 so i am hoping to add some more tests and such, there are a few, and then getting a db based logbook, which will allow for resumption across a process (but not concurrently acting on it) 20:31:30 harlowja: i like it too. is there some parts i can help implement? backends maybe? 20:31:45 sure ipersky if u want to start on the db backend stuff, that might be neat :) 20:32:17 i think we need that one before we can hook-in to cinder, although i think we can start working with jgriffith and getting some code in even at this stage 20:32:19 +1 for backend work. :D 20:32:39 harlowja: could this be backed by celery? should it? 20:32:53 after it moves to stackforge, and we all think its ok, we can pypi it and start using it in nova, cinder, ... 20:33:01 alexheneveld so excellent question 20:33:05 jlucci :) 20:33:07 I think jlucci may be working on some celery goodness, no? 20:33:08 ha 20:33:11 Yeah 20:33:22 So the distributed pattern right now is using celery 20:33:31 cool 20:34:02 jlucci is helping alot in making sure the primitives work there, or if not, how can we adjust them so that it does 20:34:09 I'll start looking at the usages for it and see if it makes sense to dink with cinder yet 20:34:26 since i think celery (the distributed pattern) should also be possible if people want to use it that way 20:34:30 And so far the primitives seem to translate pretty well 20:34:46 hemna thanks, it might be a few weeks off, but the foundation i think is getting less slushy 20:34:58 jlucci is having to put up with me changing it, sorry jlucci :) 20:35:01 ok 20:35:08 S'all good. :P 20:35:21 it might be a good exercize for me just to start learning the api and where stuff might go in cinder 20:35:36 sure, feel free to mess around with it 20:35:57 #link https://github.com/Yahoo/TaskFlow/ 20:36:08 it will proably end up at stackforge soon 20:36:36 if anyone wants to try a ZK backend, it might be neat also 20:36:59 or think about how conductor could start to use this, i have some ideas, i think john barriet (sp?) might have some to 20:37:20 I'd like to look into it 20:37:26 i mean ZK ) 20:37:33 ipersky cool 20:37:35 any new use cases would be really cool, the more the better :) 20:37:41 more betterness for all ;) 20:38:04 ipersky: did you see the little zk code stub that's already in there? 20:38:14 well, still first have to read the code/docs and ast a lot of dumb questions 20:38:26 s/ast/ask/ 20:38:36 there is a place to hang that once you look at the source 20:38:39 ipersky i can connect u with the nttdata folks also, they were working with the prototype nova code and zk 20:39:24 it might be a 'easy' move into this library (not sure) 20:39:39 adrian_otto are u talking about that code, or the nova zk code? 20:39:47 *which is the service group stuff 20:40:25 harlowja, ipersky , i can also help with zk backend 20:40:36 changbl great :) 20:41:03 awesome stuff :) 20:41:06 harlowja: I was referring to what I saw in gerrit for Nova 20:41:10 kk 20:41:53 afaik there is some ongoing work to move to 1 zk library in nova, so that if taskflow gets used it won't pull in a secondary zk library 20:42:05 kazoo seems to be the library thats the best supported nowadays 20:42:13 yes 20:42:17 #link https://github.com/python-zk/kazoo/ 20:42:47 cool, so lets open for any other discussion, i think we covered all i want to talk about, unless others have topics :) 20:43:04 great progress is happening imho :) 20:43:41 #topic open-discussion 20:43:42 agreed - I'll try to have an example of a celery/distributed pattern soon for people to see 20:44:04 btw, some example that i was working on last night that people might like 20:44:12 jlucci: thanks for your efforts on this. I know you've really been cranking on it. 20:44:28 yes thanks jlucci :) 20:44:36 jlucci: looking forward to it 20:44:37 haha np Glad to be helping. It's an exciting project 20:44:40 changbl great i'll add you to discussion when start asking dumb questions on backend design 20:44:45 ) 20:44:47 Do we have a target timeline/goal in mind for delivering a solid use case (e.g. for Cinder, or Heat, etc.) that we should be aware that we are working toward? 20:45:01 hmmm, hemna any thoughts? 20:45:21 #link #link https://github.com/harlowja/TaskFlow/blob/new-hotness/taskflow/tests/unit/test_memory.py#L161 for the neat local threaded jobboard workflow stuff if anyone is interested 20:45:50 good progress, I'd just like to start playing with it and see how it fits in 20:46:01 i think havana-1 is soon right, so my guess not that one 20:46:04 havana-2 maybe? 20:46:15 when is the H1 date? 20:46:27 very close 20:46:40 #link https://wiki.openstack.org/wiki/Havana_Release_Schedule 20:46:43 next week 20:46:49 H2: July 19 20:46:57 ok H2 sounds more reasonable 20:46:57 i think H2 is more realistic :) 20:47:22 H1 could happen, but i think that would be pushing to hard 20:47:24 next week is kinda a wash for most folks anyway with the US holiday 20:47:25 ok, cool. no pressure, just good to have goals :-) 20:47:29 def 20:47:32 ipersky, np~ 20:47:39 shoot for getting basic primitives in H1 20:47:47 and then code that implements them in H2 20:47:56 sure, so that brings up a good question that i'm not sure about 20:48:04 would it make any sense to pull this into cinder as is for H1 ? 20:48:07 do stackforge projects follow h1,h2... 20:48:16 we can of course make sure we follow that 20:48:21 hemna i don't think so 20:48:34 ok 20:48:48 needs a little more time to mature i think 20:49:08 but i think its 'playground ready' 20:49:08 +1 for h2 20:49:21 +1 20:49:21 +1 for h2 20:49:26 I just thought it'd be nice to get in the gating and all. But didn't we already say that stackforge can do the same? 20:49:33 hemna ya 20:49:39 ok, then +1 for H2 then 20:49:47 stackforge does all the gating, and ci tests and such 20:49:57 I'd like to see if I can get a few simple use cases plugged in for cinder at H2 as well 20:50:06 volume creation, snapshot creation 20:50:21 def 20:50:38 hemna https://wiki.openstack.org/wiki/StructuredWorkflows#Cinder 20:50:39 I don't want to wait for H3 to get this used :) 20:50:48 john put some of those up, so we can have a place to track them 20:50:53 ok cool 20:51:17 H2 sounds reasonable to me 20:51:22 and lets it mature to a decent level 20:51:34 ok lets plan on getting it in H2 along with create volume and create snapshot 20:51:44 +1 20:51:59 cool 20:52:38 also for ongoing discussions we can use the #openstack-state-management room 20:52:44 probably easier than email and such 20:52:47 we'll have a better idea on the api as well as the work required to cover the rest of cinder for H3 then 20:52:53 ok 20:53:05 hemna sounds great :) 20:53:13 if all of cinder would use this, that would be incredible imho 20:53:19 * harlowja would be super happy 20:53:22 that's the plan :) 20:53:38 very cool 20:54:31 #agreed aim for a few use-cases for cinder in H2 20:54:41 * adrian_otto departing to catch a flight 20:55:02 if anyone wants to try it in nova and such, that'd be cool to, i might be able to try that out, we'll see 20:55:18 or reddwarf, or anywhere people feel :) 20:55:39 ok dokie, any other stuff to talk about? 20:55:49 nice job man 20:55:59 thx! 20:56:05 #endmeeting