20:00:29 <harlowja> #startmeeting state-management 20:00:30 <openstack> Meeting started Thu May 16 20:00:29 2013 UTC. The chair is harlowja. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:34 <openstack> The meeting name has been set to 'state_management' 20:00:36 <harlowja> howday all! 20:00:39 <adrian_otto> hi 20:00:41 <kebray> hi 20:00:50 <harlowja> hi hi 20:01:04 <changbl> hello~ 20:01:11 <jlucci> howdy 20:01:21 <harlowja> #link https://wiki.openstack.org/wiki/Meetings/StateManagement#Agenda_for_next_meeting 20:01:33 <harlowja> jgriffith yt 20:02:24 <nsavin> hi 20:02:39 <harlowja> hey, so i guess we can start, anyone else can wonder in :) 20:02:39 <ipersky> hi 20:02:59 <harlowja> so lots of good stuff happening, where to start! :) 20:03:14 <ipersky> :) 20:03:37 <harlowja> i think there is enough activity that we decided to go directly to stackforge, and then as we flush out cinder (And elsewhere) we can start hooking it in there 20:03:50 <harlowja> jlucci from rackspace is helping get the stackforge stuff going 20:04:27 <harlowja> #link https://bugs.launchpad.net/openstack-ci/+bug/1179754 20:04:29 <uvirtbot> Launchpad bug 1179754 in openstack-ci "Create taskflow-core gerrit group" [Wishlist,Incomplete] 20:04:43 <harlowja> i started that, but it seems like its more of the gerrit config files that matters, so thats good news, pretty self-service 20:05:19 <harlowja> so as far as cinder integration, jgriffith has written up some useful initial diagrams of what cinder is doing and the issues it has 20:05:42 <harlowja> #link https://wiki.openstack.org/wiki/StructuredWorkflows#Cinder 20:06:04 <harlowja> create volume and create snapshot currently suffer from the lack of abilitiy to revert correctly it seems 20:06:48 <harlowja> the ntt folks have also updated evacuate/migrate in https://wiki.openstack.org/wiki/StructuredWorkflows#Nova as well for those that are interested 20:06:58 <harlowja> #link https://wiki.openstack.org/wiki/StructuredWorkflows#Nova 20:07:29 <harlowja> so thats all good stuff, and shows that we are making progress in step #1, understanding the problems to fix :) 20:08:05 <harlowja> if anyone is interested they can add there potential use case there also :) 20:08:32 <ipersky> wow 20:08:40 <ipersky> RunInstance diagram looks impressive 20:08:50 <harlowja> scary or impressive ;) 20:08:57 <harlowja> scary impressive, lol 20:09:08 <ipersky> ) 20:09:11 <harlowja> i also updated the primitives wiki with some minor changes 20:09:16 <harlowja> #link https://wiki.openstack.org/wiki/StructuredWorkflowPrimitives 20:09:40 <harlowja> added the concept of a 'job claim' which is how a job gets claimed by somethign about to work on it 20:09:58 <harlowja> *could be ZK based, DB based... 20:10:32 <harlowja> also started adding some of the underlying patterns that taskflow might be able to help organize code 'states' with 20:10:34 <harlowja> #link https://wiki.openstack.org/wiki/StructuredWorkflowPrimitives#Patterns 20:10:45 <alexheneveld> hi -- sorry late. but just in time -- that was the question i was going to ask 20:11:07 <alexheneveld> can some of this be tied in to the existing DB entities somehow… 20:11:19 <harlowja> thats the plan 20:11:36 <alexheneveld> excellent! 20:12:06 <harlowja> we'll see what kind of schema changes might have to happen, but hopefuly not so many :) 20:12:18 <adrian_otto> guys, that is not going to work for lock management 20:12:24 <harlowja> agreed 20:12:33 <adrian_otto> it works fine for persisting state, but not for managing concurrency. 20:13:22 <harlowja> right, but i can see certain uses (say in nova) where they don't have much of a concurrency to begin with, so they can still take advantage of the other parts of the library (just not the cocurrent job claiming stuff) 20:13:31 <adrian_otto> so if the only expectation is that a single thread in a single sequence is going to use the DB to track state transitions, then fine 20:13:41 <harlowja> right 20:13:44 <adrian_otto> but as soon as you bring in multiple workers, that breaks down 20:13:55 <harlowja> yup 20:14:10 <harlowja> i think we need to put a big warning on the DB implementations that say that 20:14:16 <alexheneveld> adrian_otto: some db's let you … but is the concern portabiliity? 20:14:23 <adrian_otto> yes 20:14:31 <nsavin> but zk should be fine for concurrency 20:14:41 <adrian_otto> there is a feature in MySQL to do applicaiton locks, but you can't expect to put an ORM in front of that 20:14:42 <jlucci> We don't want to commit to zk 20:14:50 <adrian_otto> I'm not suggesting zk 20:15:09 <adrian_otto> but that you can't rely on SQLite or MySQL to act as a backing store for a locking implementation 20:15:10 <alexheneveld> some generic mutex / notify service 20:15:15 <alexheneveld> ? 20:15:25 <adrian_otto> because of MVCC 20:16:04 <adrian_otto> yes, something as simple as something with a backend on a single node with flock() or fcntl() would even work 20:16:33 <adrian_otto> but we need to thing of solving concurrency control and data persistence as separate implementations 20:16:38 <adrian_otto> s/thing/think/ 20:16:49 <adrian_otto> that are coordinated. 20:16:57 <harlowja> so i think there are 2 pieces of this library that it will provide, 1 is the job posting, ownership, concurrency stuff, which likely a DB is not gonna work, but then there is the task/workflow organization, which can be used almost seperatly from the distributed stuff in a way 20:16:58 <maoy> i must be missing something. orchestration tasks are long running, it's probably a bad idea to hold db lock for that long. need to use some entries to "safeguard" instead 20:17:30 <adrian_otto> harlowja: exactly. 20:18:00 <harlowja> cool, i think seeing an example might help :) 20:18:09 <adrian_otto> maoy: we need a lock primitive. Regardless if they are coarse locks or fine grained locks. 20:18:18 <alexheneveld> harlowja: could job posting + ownership still be done in the DB ? just have a very lightweight concurrency control service. 20:18:29 <adrian_otto> yes 20:19:28 <harlowja> alexheneveld i believe so, it really depends on the tasks that compose the workflow, simple stuff like in most of openstack i think can be made pretty lightweight, but more complicated things like user-defined workflows shouldn't be restricted by the same code, we should try to enable both :) 20:20:10 <harlowja> say like for most of openstack, u could get away with posting (which is really a mq message) and ownership can be setting a field in a db 20:20:20 <harlowja> that works ok, but then said owner isn't resilent to failure 20:20:31 <harlowja> *which might be ok for openstack binaries which can be restarted by init.d and such 20:21:19 <alexheneveld> harlowja: but you can have a health check process which determines whether owner is gone away and/or timeout on the job... 20:21:40 <hemna> for the first drop of this, I'd like to see something simple 20:21:41 <harlowja> yes, which might be fine for openstack, but not acceptable for users i think 20:21:42 <alexheneveld> which touches on scheduler… the two are pretty closely related i think 20:21:42 <hemna> and not over engineered 20:21:52 <harlowja> hemna agreed 20:22:02 <harlowja> but we should discuss the complex cases, since it will come up 20:22:03 <hemna> lets not solve world hunger here 20:22:12 <harlowja> imho the complex cases are just more advanced primitives 20:22:25 <harlowja> the simple cases just use dumbed-down primitives :) 20:22:34 <adrian_otto> agreed 20:22:44 <nsavin> agreed 20:22:49 <harlowja> so thats why having a solid primitive foundation is pretty important 20:22:55 <hemna> I'm fine with using a db backing for starters. 20:23:05 <harlowja> agreed 20:23:25 <harlowja> so the taskflow library right now has a memory backed part to start, new this week, mostly works :-P 20:23:34 <hemna> :) 20:23:43 <harlowja> example usage 20:23:45 <harlowja> #link http://paste.openstack.org/show/37363/ 20:24:02 <harlowja> that would be something like the cinder workflow that they have 20:24:21 * harlowja let people digest that for a sec 20:25:09 * hemna chews 20:25:09 <harlowja> using the primitives, even with simple memory backends we can experiment with reverting, resumption and all that 20:26:31 <harlowja> it gets interesting when u start using a database backed logbook for example 20:26:33 <adrian_otto> looks good to me. I like that as a simple starting point. 20:26:55 <harlowja> *note that there is no concurrency there, no distributed jobs and such 20:27:08 <harlowja> but imho u get alot just with this type of usage 20:27:08 <hemna> yah I think that's good 20:27:48 <adrian_otto> yep, we can start with this, and revisit the concurrency stuff in a subsequent effort 20:28:03 <harlowja> well jlucci is helping make the distributed stuff work :) so its all happening ;) 20:28:20 <harlowja> but i think the distributed stuff is a seperate 'pattern', not the only pattern, but one of them 20:28:29 <adrian_otto> yep 20:28:56 <harlowja> some peoples usage will just be the simple case, like cinder for example, don't care about distributing your job, just want to have a workflow revert, then this might work fine (or almost be fine +- some other changes) 20:29:06 <harlowja> nova i think is similar with conductor 20:29:19 <harlowja> since they have there own 'job posting/disribution' mechanisms 20:29:26 <harlowja> *distribution 20:29:44 <harlowja> its not perfect yet, but i think the example there shows the potential :) 20:29:50 <alexheneveld> harlowja: code looks clean 20:29:58 <harlowja> thx! 20:31:19 <harlowja> so i am hoping to add some more tests and such, there are a few, and then getting a db based logbook, which will allow for resumption across a process (but not concurrently acting on it) 20:31:30 <ipersky> harlowja: i like it too. is there some parts i can help implement? backends maybe? 20:31:45 <harlowja> sure ipersky if u want to start on the db backend stuff, that might be neat :) 20:32:17 <harlowja> i think we need that one before we can hook-in to cinder, although i think we can start working with jgriffith and getting some code in even at this stage 20:32:19 <jlucci> +1 for backend work. :D 20:32:39 <alexheneveld> harlowja: could this be backed by celery? should it? 20:32:53 <harlowja> after it moves to stackforge, and we all think its ok, we can pypi it and start using it in nova, cinder, ... 20:33:01 <harlowja> alexheneveld so excellent question 20:33:05 <harlowja> jlucci :) 20:33:07 <kebray> I think jlucci may be working on some celery goodness, no? 20:33:08 <jlucci> ha 20:33:11 <jlucci> Yeah 20:33:22 <jlucci> So the distributed pattern right now is using celery 20:33:31 <alexheneveld> cool 20:34:02 <harlowja> jlucci is helping alot in making sure the primitives work there, or if not, how can we adjust them so that it does 20:34:09 <hemna> I'll start looking at the usages for it and see if it makes sense to dink with cinder yet 20:34:26 <harlowja> since i think celery (the distributed pattern) should also be possible if people want to use it that way 20:34:30 <jlucci> And so far the primitives seem to translate pretty well 20:34:46 <harlowja> hemna thanks, it might be a few weeks off, but the foundation i think is getting less slushy 20:34:58 <harlowja> jlucci is having to put up with me changing it, sorry jlucci :) 20:35:01 <hemna> ok 20:35:08 <jlucci> S'all good. :P 20:35:21 <hemna> it might be a good exercize for me just to start learning the api and where stuff might go in cinder 20:35:36 <harlowja> sure, feel free to mess around with it 20:35:57 <harlowja> #link https://github.com/Yahoo/TaskFlow/ 20:36:08 <harlowja> it will proably end up at stackforge soon 20:36:36 <harlowja> if anyone wants to try a ZK backend, it might be neat also 20:36:59 <harlowja> or think about how conductor could start to use this, i have some ideas, i think john barriet (sp?) might have some to 20:37:20 <ipersky> I'd like to look into it 20:37:26 <ipersky> i mean ZK ) 20:37:33 <harlowja> ipersky cool 20:37:35 <harlowja> any new use cases would be really cool, the more the better :) 20:37:41 <harlowja> more betterness for all ;) 20:38:04 <adrian_otto> ipersky: did you see the little zk code stub that's already in there? 20:38:14 <ipersky> well, still first have to read the code/docs and ast a lot of dumb questions 20:38:26 <ipersky> s/ast/ask/ 20:38:36 <adrian_otto> there is a place to hang that once you look at the source 20:38:39 <harlowja> ipersky i can connect u with the nttdata folks also, they were working with the prototype nova code and zk 20:39:24 <harlowja> it might be a 'easy' move into this library (not sure) 20:39:39 <harlowja> adrian_otto are u talking about that code, or the nova zk code? 20:39:47 <harlowja> *which is the service group stuff 20:40:25 <changbl> harlowja, ipersky , i can also help with zk backend 20:40:36 <harlowja> changbl great :) 20:41:03 <harlowja> awesome stuff :) 20:41:06 <adrian_otto> harlowja: I was referring to what I saw in gerrit for Nova 20:41:10 <harlowja> kk 20:41:53 <harlowja> afaik there is some ongoing work to move to 1 zk library in nova, so that if taskflow gets used it won't pull in a secondary zk library 20:42:05 <harlowja> kazoo seems to be the library thats the best supported nowadays 20:42:13 <adrian_otto> yes 20:42:17 <harlowja> #link https://github.com/python-zk/kazoo/ 20:42:47 <harlowja> cool, so lets open for any other discussion, i think we covered all i want to talk about, unless others have topics :) 20:43:04 <harlowja> great progress is happening imho :) 20:43:41 <harlowja> #topic open-discussion 20:43:42 <jlucci> agreed - I'll try to have an example of a celery/distributed pattern soon for people to see 20:44:04 <harlowja> btw, some example that i was working on last night that people might like 20:44:12 <adrian_otto> jlucci: thanks for your efforts on this. I know you've really been cranking on it. 20:44:28 <harlowja> yes thanks jlucci :) 20:44:36 <alexheneveld> jlucci: looking forward to it 20:44:37 <jlucci> haha np Glad to be helping. It's an exciting project 20:44:40 <ipersky> changbl great i'll add you to discussion when start asking dumb questions on backend design 20:44:45 <ipersky> ) 20:44:47 <kebray> Do we have a target timeline/goal in mind for delivering a solid use case (e.g. for Cinder, or Heat, etc.) that we should be aware that we are working toward? 20:45:01 <harlowja> hmmm, hemna any thoughts? 20:45:21 <harlowja> #link #link https://github.com/harlowja/TaskFlow/blob/new-hotness/taskflow/tests/unit/test_memory.py#L161 for the neat local threaded jobboard workflow stuff if anyone is interested 20:45:50 <hemna> good progress, I'd just like to start playing with it and see how it fits in 20:46:01 <harlowja> i think havana-1 is soon right, so my guess not that one 20:46:04 <harlowja> havana-2 maybe? 20:46:15 <hemna> when is the H1 date? 20:46:27 <adrian_otto> very close 20:46:40 <harlowja> #link https://wiki.openstack.org/wiki/Havana_Release_Schedule 20:46:43 <adrian_otto> next week 20:46:49 <adrian_otto> H2: July 19 20:46:57 <hemna> ok H2 sounds more reasonable 20:46:57 <harlowja> i think H2 is more realistic :) 20:47:22 <harlowja> H1 could happen, but i think that would be pushing to hard 20:47:24 <hemna> next week is kinda a wash for most folks anyway with the US holiday 20:47:25 <kebray> ok, cool. no pressure, just good to have goals :-) 20:47:29 <harlowja> def 20:47:32 <changbl> ipersky, np~ 20:47:39 <adrian_otto> shoot for getting basic primitives in H1 20:47:47 <adrian_otto> and then code that implements them in H2 20:47:56 <harlowja> sure, so that brings up a good question that i'm not sure about 20:48:04 <hemna> would it make any sense to pull this into cinder as is for H1 ? 20:48:07 <harlowja> do stackforge projects follow h1,h2... 20:48:16 <harlowja> we can of course make sure we follow that 20:48:21 <harlowja> hemna i don't think so 20:48:34 <hemna> ok 20:48:48 <harlowja> needs a little more time to mature i think 20:49:08 <harlowja> but i think its 'playground ready' 20:49:08 <jlucci> +1 for h2 20:49:21 <harlowja> +1 20:49:21 <nsavin> +1 for h2 20:49:26 <hemna> I just thought it'd be nice to get in the gating and all. But didn't we already say that stackforge can do the same? 20:49:33 <harlowja> hemna ya 20:49:39 <hemna> ok, then +1 for H2 then 20:49:47 <harlowja> stackforge does all the gating, and ci tests and such 20:49:57 <hemna> I'd like to see if I can get a few simple use cases plugged in for cinder at H2 as well 20:50:06 <hemna> volume creation, snapshot creation 20:50:21 <harlowja> def 20:50:38 <harlowja> hemna https://wiki.openstack.org/wiki/StructuredWorkflows#Cinder 20:50:39 <hemna> I don't want to wait for H3 to get this used :) 20:50:48 <harlowja> john put some of those up, so we can have a place to track them 20:50:53 <hemna> ok cool 20:51:17 <harlowja> H2 sounds reasonable to me 20:51:22 <harlowja> and lets it mature to a decent level 20:51:34 <hemna> ok lets plan on getting it in H2 along with create volume and create snapshot 20:51:44 <adrian_otto> +1 20:51:59 <harlowja> cool 20:52:38 <harlowja> also for ongoing discussions we can use the #openstack-state-management room 20:52:44 <harlowja> probably easier than email and such 20:52:47 <hemna> we'll have a better idea on the api as well as the work required to cover the rest of cinder for H3 then 20:52:53 <hemna> ok 20:53:05 <harlowja> hemna sounds great :) 20:53:13 <harlowja> if all of cinder would use this, that would be incredible imho 20:53:19 * harlowja would be super happy 20:53:22 <hemna> that's the plan :) 20:53:38 <harlowja> very cool 20:54:31 <harlowja> #agreed aim for a few use-cases for cinder in H2 20:54:41 * adrian_otto departing to catch a flight 20:55:02 <harlowja> if anyone wants to try it in nova and such, that'd be cool to, i might be able to try that out, we'll see 20:55:18 <harlowja> or reddwarf, or anywhere people feel :) 20:55:39 <harlowja> ok dokie, any other stuff to talk about? 20:55:49 <hemna> nice job man 20:55:59 <harlowja> thx! 20:56:05 <harlowja> #endmeeting