#openstack-meeting log

20:00:29 <harlowja> #startmeeting state-management
20:00:30 <openstack> Meeting started Thu May 16 20:00:29 2013 UTC.  The chair is harlowja. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:34 <openstack> The meeting name has been set to 'state_management'
20:00:36 <harlowja> howday all!
20:00:39 <adrian_otto> hi
20:00:41 <kebray> hi
20:00:50 <harlowja> hi hi
20:01:04 <changbl> hello~
20:01:11 <jlucci> howdy
20:01:21 <harlowja> #link https://wiki.openstack.org/wiki/Meetings/StateManagement#Agenda_for_next_meeting
20:01:33 <harlowja> jgriffith yt
20:02:24 <nsavin> hi
20:02:39 <harlowja> hey, so i guess we can start, anyone else can wonder in :)
20:02:39 <ipersky> hi
20:02:59 <harlowja> so lots of good stuff happening, where to start! :)
20:03:14 <ipersky> :)
20:03:37 <harlowja> i think there is enough activity that we decided to go directly to stackforge, and then as we flush out cinder (And elsewhere) we can start hooking it in there
20:03:50 <harlowja> jlucci from rackspace is helping get the stackforge stuff going
20:04:27 <harlowja> #link https://bugs.launchpad.net/openstack-ci/+bug/1179754
20:04:29 <uvirtbot> Launchpad bug 1179754 in openstack-ci "Create taskflow-core gerrit group" [Wishlist,Incomplete]
20:04:43 <harlowja> i started that, but it seems like its more of the gerrit config files that matters, so thats good news, pretty self-service
20:05:19 <harlowja> so as far as cinder integration, jgriffith  has written up some useful initial diagrams of what cinder is doing and the issues it has
20:05:42 <harlowja> #link https://wiki.openstack.org/wiki/StructuredWorkflows#Cinder
20:06:04 <harlowja> create volume and create snapshot currently suffer from the lack of abilitiy to revert correctly it seems
20:06:48 <harlowja> the ntt folks have also updated evacuate/migrate  in https://wiki.openstack.org/wiki/StructuredWorkflows#Nova as well for those that are interested
20:06:58 <harlowja> #link https://wiki.openstack.org/wiki/StructuredWorkflows#Nova
20:07:29 <harlowja> so thats all good stuff, and shows that we are making progress in step #1, understanding the problems to fix :)
20:08:05 <harlowja> if anyone is interested they can add there potential use case there also :)
20:08:32 <ipersky> wow
20:08:40 <ipersky> RunInstance diagram looks impressive
20:08:50 <harlowja> scary or impressive ;)
20:08:57 <harlowja> scary impressive, lol
20:09:08 <ipersky> )
20:09:11 <harlowja> i also updated the primitives wiki with some minor changes
20:09:16 <harlowja> #link https://wiki.openstack.org/wiki/StructuredWorkflowPrimitives
20:09:40 <harlowja> added the concept of a 'job claim' which is how a job gets claimed by somethign about to work on it
20:09:58 <harlowja> *could be ZK based, DB based...
20:10:32 <harlowja> also started adding some of the underlying patterns that taskflow might be able to help organize code 'states' with
20:10:34 <harlowja> #link https://wiki.openstack.org/wiki/StructuredWorkflowPrimitives#Patterns
20:10:45 <alexheneveld> hi -- sorry late.  but just in time -- that was the question i was going to ask
20:11:07 <alexheneveld> can some of this be tied in to the existing DB entities somehow…
20:11:19 <harlowja> thats the plan
20:11:36 <alexheneveld> excellent!
20:12:06 <harlowja> we'll see what kind of schema changes might have to happen, but hopefuly not so many :)
20:12:18 <adrian_otto> guys, that is not going to work for lock management
20:12:24 <harlowja> agreed
20:12:33 <adrian_otto> it works fine for persisting state, but not for managing concurrency.
20:13:22 <harlowja> right, but i can see certain uses (say in nova) where they don't have much of a concurrency to begin with, so they can still take advantage of the other parts of the library (just not the cocurrent job claiming stuff)
20:13:31 <adrian_otto> so if the only expectation is that a single thread in a single sequence is going to use the DB to track state transitions, then fine
20:13:41 <harlowja> right
20:13:44 <adrian_otto> but as soon as you bring in multiple workers, that breaks down
20:13:55 <harlowja> yup
20:14:10 <harlowja> i think we need to put a big warning on the DB implementations that say that
20:14:16 <alexheneveld> adrian_otto: some db's let you … but is the concern portabiliity?
20:14:23 <adrian_otto> yes
20:14:31 <nsavin> but zk should be fine for concurrency
20:14:41 <adrian_otto> there is a feature in MySQL to do applicaiton locks, but you can't expect to put an ORM in front of that
20:14:42 <jlucci> We don't want to commit to zk
20:14:50 <adrian_otto> I'm not suggesting zk
20:15:09 <adrian_otto> but that you can't rely on SQLite or MySQL to act as a backing store for a locking implementation
20:15:10 <alexheneveld> some generic mutex / notify service
20:15:15 <alexheneveld> ?
20:15:25 <adrian_otto> because of MVCC
20:16:04 <adrian_otto> yes, something as simple as something with a backend on a single node with flock() or fcntl() would even work
20:16:33 <adrian_otto> but we need to thing of solving concurrency control and data persistence as separate implementations
20:16:38 <adrian_otto> s/thing/think/
20:16:49 <adrian_otto> that are coordinated.
20:16:57 <harlowja> so i think there are 2 pieces of this library that it will provide, 1 is the job posting, ownership, concurrency stuff, which likely a DB is not gonna work, but then there is the task/workflow organization, which can be used almost seperatly from the distributed stuff in a way
20:16:58 <maoy> i must be missing something. orchestration tasks are long running, it's probably a bad idea to hold db lock for that long. need to use some entries to "safeguard" instead
20:17:30 <adrian_otto> harlowja: exactly.
20:18:00 <harlowja> cool, i think seeing an example might help :)
20:18:09 <adrian_otto> maoy: we need a lock primitive. Regardless if they are coarse locks or fine grained locks.
20:18:18 <alexheneveld> harlowja: could job posting + ownership still be done in the DB ?  just have a very lightweight concurrency control service.
20:18:29 <adrian_otto> yes
20:19:28 <harlowja> alexheneveld i believe so, it really depends on the tasks that compose the workflow, simple stuff like in most of openstack i think can be made pretty lightweight, but more complicated things like user-defined workflows shouldn't be restricted by the same code, we should try to enable both :)
20:20:10 <harlowja> say like for most of openstack, u could get away with posting (which is really a mq message) and ownership can be setting a field in a db
20:20:20 <harlowja> that works ok, but then said owner isn't resilent to failure
20:20:31 <harlowja> *which might be ok for openstack binaries which can be restarted by init.d and such
20:21:19 <alexheneveld> harlowja: but you can have a health check process which determines whether owner is gone away and/or timeout on the job...
20:21:40 <hemna> for the first drop of this, I'd like to see something simple
20:21:41 <harlowja> yes, which might be fine for openstack, but not acceptable for users i think
20:21:42 <alexheneveld> which touches on scheduler… the two are pretty closely related i think
20:21:42 <hemna> and not over engineered
20:21:52 <harlowja> hemna agreed
20:22:02 <harlowja> but we should discuss the complex cases, since it will come up
20:22:03 <hemna> lets not solve world hunger here
20:22:12 <harlowja> imho the complex cases are just more advanced primitives
20:22:25 <harlowja> the simple cases just use dumbed-down primitives :)
20:22:34 <adrian_otto> agreed
20:22:44 <nsavin> agreed
20:22:49 <harlowja> so thats why having a solid primitive foundation is pretty important
20:22:55 <hemna> I'm fine with using a db backing for starters.
20:23:05 <harlowja> agreed
20:23:25 <harlowja> so the taskflow library right now has a memory backed part to start, new this week, mostly works :-P
20:23:34 <hemna> :)
20:23:43 <harlowja> example usage
20:23:45 <harlowja> #link http://paste.openstack.org/show/37363/
20:24:02 <harlowja> that would be something like the cinder workflow that they have
20:24:21 * harlowja let people digest that for a sec
20:25:09 * hemna chews
20:25:09 <harlowja> using the primitives, even with simple memory backends we can experiment with reverting, resumption and all that
20:26:31 <harlowja> it gets interesting when u start using a database backed logbook for example
20:26:33 <adrian_otto> looks good to me. I like that as a simple starting point.
20:26:55 <harlowja> *note that there is no concurrency there, no distributed jobs and such
20:27:08 <harlowja> but imho u get alot just with this type of usage
20:27:08 <hemna> yah I think that's good
20:27:48 <adrian_otto> yep, we can start with this, and revisit the concurrency stuff in a subsequent effort
20:28:03 <harlowja> well jlucci is helping make the distributed stuff work :) so its all happening ;)
20:28:20 <harlowja> but i think the distributed stuff is a seperate 'pattern', not the only pattern, but one of them
20:28:29 <adrian_otto> yep
20:28:56 <harlowja> some peoples usage will just be the simple case, like cinder for example, don't care about distributing your job, just want to have a workflow revert, then this might work fine (or almost be fine +- some other changes)
20:29:06 <harlowja> nova i think is similar with conductor
20:29:19 <harlowja> since they have there own 'job posting/disribution' mechanisms
20:29:26 <harlowja> *distribution
20:29:44 <harlowja> its not perfect yet, but i think the example there shows the potential :)
20:29:50 <alexheneveld> harlowja: code looks clean
20:29:58 <harlowja> thx!
20:31:19 <harlowja> so i am hoping to add some more tests and such, there are a few, and then getting a db based logbook, which will allow for resumption across a process (but not concurrently acting on it)
20:31:30 <ipersky> harlowja: i like it too. is there some parts i can help implement? backends maybe?
20:31:45 <harlowja> sure ipersky if u want to start on the db backend stuff, that might be neat :)
20:32:17 <harlowja> i think we need that one before we can hook-in to cinder, although i think we can start working with jgriffith  and getting some code in even at this stage
20:32:19 <jlucci> +1 for backend work. :D
20:32:39 <alexheneveld> harlowja: could this be backed by celery?  should it?
20:32:53 <harlowja> after it moves to stackforge, and we all think its ok, we can pypi it and start using it in nova, cinder, ...
20:33:01 <harlowja> alexheneveld so excellent question
20:33:05 <harlowja> jlucci :)
20:33:07 <kebray> I think jlucci may be working on some celery goodness, no?
20:33:08 <jlucci> ha
20:33:11 <jlucci> Yeah
20:33:22 <jlucci> So the distributed pattern right now is using celery
20:33:31 <alexheneveld> cool
20:34:02 <harlowja> jlucci is helping alot in making sure the primitives work there, or if not, how can we adjust them so that it does
20:34:09 <hemna> I'll start looking at the usages for it and see if it makes sense to dink with cinder yet
20:34:26 <harlowja> since i think celery (the distributed pattern) should also be possible if people want to use it that way
20:34:30 <jlucci> And so far the primitives seem to translate pretty well
20:34:46 <harlowja> hemna thanks, it might be a few weeks off, but the foundation i think is getting less slushy
20:34:58 <harlowja> jlucci is having to put up with me changing it, sorry jlucci  :)
20:35:01 <hemna> ok
20:35:08 <jlucci> S'all good. :P
20:35:21 <hemna> it might be a good exercize for me just to start learning the api and where stuff might go in cinder
20:35:36 <harlowja> sure, feel free to mess around with it
20:35:57 <harlowja> #link https://github.com/Yahoo/TaskFlow/
20:36:08 <harlowja> it will proably end up at stackforge soon
20:36:36 <harlowja> if anyone wants to try a ZK backend, it might be neat also
20:36:59 <harlowja> or think about how conductor could start to use this, i have some ideas, i think john barriet (sp?) might have some to
20:37:20 <ipersky> I'd like to look into it
20:37:26 <ipersky> i mean ZK )
20:37:33 <harlowja> ipersky cool
20:37:35 <harlowja> any new use cases would be really cool, the more the better :)
20:37:41 <harlowja> more betterness for all ;)
20:38:04 <adrian_otto> ipersky: did you see the little zk code stub that's already in there?
20:38:14 <ipersky> well, still first have to read the code/docs and ast a lot of dumb questions
20:38:26 <ipersky> s/ast/ask/
20:38:36 <adrian_otto> there is a place to hang that once you look at the source
20:38:39 <harlowja> ipersky i can connect u with the nttdata folks also, they were working with the prototype nova code and zk
20:39:24 <harlowja> it might be a 'easy' move into this library (not sure)
20:39:39 <harlowja> adrian_otto are u talking about that code, or the nova zk code?
20:39:47 <harlowja> *which is the service group stuff
20:40:25 <changbl> harlowja, ipersky , i can also help with zk backend
20:40:36 <harlowja> changbl great :)
20:41:03 <harlowja> awesome stuff :)
20:41:06 <adrian_otto> harlowja: I was referring to what I saw in gerrit for Nova
20:41:10 <harlowja> kk
20:41:53 <harlowja> afaik there is some ongoing work to move to 1 zk library  in nova, so that if taskflow gets used it won't pull in a secondary zk library
20:42:05 <harlowja> kazoo seems to be the library thats the best supported nowadays
20:42:13 <adrian_otto> yes
20:42:17 <harlowja> #link https://github.com/python-zk/kazoo/
20:42:47 <harlowja> cool, so lets open for any other discussion, i think we covered all i want to talk about, unless others have topics :)
20:43:04 <harlowja> great progress is happening imho :)
20:43:41 <harlowja> #topic open-discussion
20:43:42 <jlucci> agreed - I'll try to have an example of a celery/distributed pattern soon for people to see
20:44:04 <harlowja> btw, some example that i was working on last night that people might like
20:44:12 <adrian_otto> jlucci: thanks for your efforts on this. I know you've really been cranking on it.
20:44:28 <harlowja> yes thanks jlucci  :)
20:44:36 <alexheneveld> jlucci: looking forward to it
20:44:37 <jlucci> haha np Glad to be helping. It's an exciting project
20:44:40 <ipersky> changbl great i'll add you to discussion when start asking dumb questions on backend design
20:44:45 <ipersky> )
20:44:47 <kebray> Do we have a target timeline/goal in mind for delivering a solid use case (e.g. for Cinder, or Heat, etc.) that we should be aware that we are working toward?
20:45:01 <harlowja> hmmm, hemna any thoughts?
20:45:21 <harlowja> #link #link https://github.com/harlowja/TaskFlow/blob/new-hotness/taskflow/tests/unit/test_memory.py#L161 for the neat local threaded jobboard workflow stuff if anyone is interested
20:45:50 <hemna> good progress, I'd just like to start playing with it and see how it fits in
20:46:01 <harlowja> i think havana-1 is soon right, so my guess not that one
20:46:04 <harlowja> havana-2 maybe?
20:46:15 <hemna> when is the H1 date?
20:46:27 <adrian_otto> very close
20:46:40 <harlowja> #link https://wiki.openstack.org/wiki/Havana_Release_Schedule
20:46:43 <adrian_otto> next week
20:46:49 <adrian_otto> H2: July 19
20:46:57 <hemna> ok H2 sounds more reasonable
20:46:57 <harlowja> i think H2 is more realistic :)
20:47:22 <harlowja> H1 could happen, but i think that would be pushing to hard
20:47:24 <hemna> next week is kinda a wash for most folks anyway with the US holiday
20:47:25 <kebray> ok, cool.  no pressure, just good to have goals :-)
20:47:29 <harlowja> def
20:47:32 <changbl> ipersky, np~
20:47:39 <adrian_otto> shoot for getting basic primitives in H1
20:47:47 <adrian_otto> and then code that implements them in H2
20:47:56 <harlowja> sure, so that brings up a good question that i'm not sure about
20:48:04 <hemna> would it make any sense to pull this into cinder as is for H1 ?
20:48:07 <harlowja> do stackforge projects follow h1,h2...
20:48:16 <harlowja> we can of course make sure we follow that
20:48:21 <harlowja> hemna i don't think so
20:48:34 <hemna> ok
20:48:48 <harlowja> needs a little more time to mature i think
20:49:08 <harlowja> but i think its 'playground ready'
20:49:08 <jlucci> +1 for h2
20:49:21 <harlowja> +1
20:49:21 <nsavin> +1 for h2
20:49:26 <hemna> I just thought it'd be nice to get in the gating and all.  But didn't we already say that stackforge can do the same?
20:49:33 <harlowja> hemna ya
20:49:39 <hemna> ok, then +1 for H2 then
20:49:47 <harlowja> stackforge does all the gating, and ci tests and such
20:49:57 <hemna> I'd like to see if I can get a few simple use cases plugged in for cinder at H2 as well
20:50:06 <hemna> volume creation, snapshot creation
20:50:21 <harlowja> def
20:50:38 <harlowja> hemna https://wiki.openstack.org/wiki/StructuredWorkflows#Cinder
20:50:39 <hemna> I don't want to wait for H3 to get this used :)
20:50:48 <harlowja> john put some of those up, so we can have a place to track them
20:50:53 <hemna> ok cool
20:51:17 <harlowja> H2 sounds reasonable to me
20:51:22 <harlowja> and lets it mature to a decent level
20:51:34 <hemna> ok lets plan on getting it in H2 along with create volume and create snapshot
20:51:44 <adrian_otto> +1
20:51:59 <harlowja> cool
20:52:38 <harlowja> also for ongoing discussions we can use the #openstack-state-management room
20:52:44 <harlowja> probably easier than email and such
20:52:47 <hemna> we'll have a better idea on the api as well as the work required to cover the rest of cinder for H3 then
20:52:53 <hemna> ok
20:53:05 <harlowja> hemna sounds great :)
20:53:13 <harlowja> if all of cinder would use this, that would be incredible imho
20:53:19 * harlowja would be super happy
20:53:22 <hemna> that's the plan :)
20:53:38 <harlowja> very cool
20:54:31 <harlowja> #agreed aim for a few use-cases for cinder in H2
20:54:41 * adrian_otto departing to catch a flight
20:55:02 <harlowja> if anyone wants to try it in nova and such, that'd be cool to, i might be able to try that out, we'll see
20:55:18 <harlowja> or reddwarf, or anywhere people feel :)
20:55:39 <harlowja> ok dokie, any other stuff to talk about?
20:55:49 <hemna> nice job man
20:55:59 <harlowja> thx!
20:56:05 <harlowja> #endmeeting