#openstack-meeting log

20:01:12 <harlowja> #startmeeting state-management
20:01:13 <openstack> Meeting started Thu May 23 20:01:12 2013 UTC.  The chair is harlowja. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:01:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:01:16 <openstack> The meeting name has been set to 'state_management'
20:01:19 <harlowja> hi everyone! :)
20:01:20 <adrian_otto> hi
20:01:33 <harlowja> howdy
20:01:43 <jlucci> hola
20:02:05 <harlowja> oops, forgot to send out an agenda, well we can make one up
20:02:26 <harlowja> lets wait a few for others
20:02:41 <harlowja> jlucci i think i can mesh the rollbackaccumulator into our stuff
20:02:54 <harlowja> that way everyone will be happy
20:02:55 <jlucci> Sounds good (:
20:03:09 <maoy> have a conflict. will check the log later. have fun guys.
20:03:15 <harlowja> sounds good, thx maoy
20:03:39 <harlowja> #topic status
20:04:08 <harlowja> so if we have people working on taskflow (or integrating it) we can use this little topic for any kind of status on what u are doing, i can go first
20:04:48 <harlowja> i've been just working on the library structure, and impls, and have been experimenting with how cinder might change to use said library
20:05:04 <harlowja> #link https://review.openstack.org/#/c/29862/
20:05:27 <harlowja> also been helping jlucci get her db/celery stuff in
20:05:58 <harlowja> and trying to see how we can get more nova usage and adjusting to see how we can make that happen in a simple (not major restructure way)
20:06:07 <harlowja> thats it for me :)
20:06:55 <jlucci> Shweet. Guess I'll go next
20:06:58 <harlowja> sureeee
20:07:14 <jlucci> So, spent a lot of time getting a database backend setup
20:07:27 <jlucci> All of that seems to be working appropriate (currently only implementation is sql)
20:07:43 <harlowja> sweet!
20:07:48 <jlucci> and kchenweijie is working on some unit tests for all of that
20:07:59 <harlowja> *oh ya, i've been doing unit tests this week also
20:08:06 <jlucci> So, that along with some basic config stuff got pulled into the code
20:08:11 <jlucci> :P yay unit tests
20:08:15 <harlowja> how's the stackforge move going?
20:08:22 <jlucci> I'm obviously a gerrit-noob
20:08:23 <jlucci> ha
20:08:32 <harlowja> np :)
20:08:38 <harlowja> #link https://review.openstack.org/#/c/29981/
20:08:47 <jlucci> Accidentally put in two requests, went back, squashed my previous commit into the first one, and pushed that back up for review
20:08:49 <harlowja> i put up a small comment, the infra people probably want it squashed
20:09:02 <harlowja> *so that it doesn't have 2 change-ids
20:09:18 <harlowja> change-ids are how gerrit associates commits so 2 might confuse it
20:10:02 <jlucci> Oh, well I abandoned the first commit/review
20:10:18 <jlucci> https://review.openstack.org/#/c/29981/ Has all the commits that need to be merged into the stackforge stuff
20:10:51 <harlowja> ya, that one looks ok, just might want to remove one of the 'Change-Id: ' lines
20:11:07 <jlucci> Oh, snap. Didn't see the second one
20:11:11 <jlucci> Oh, gerrit
20:11:14 <harlowja> :)
20:11:19 <jlucci> So, will fix that shortly. haha
20:11:25 <harlowja> cool
20:11:33 <harlowja> sounds good
20:11:55 <harlowja> anyone else want to report any kind of useful status info :)
20:12:00 <jlucci> As for the celery stuff, after a talk I had today, I actually think I'm going to go back and re-implement it in a different way. Something more distributed, less workflow-oriented
20:12:07 <harlowja> ok
20:12:13 <jlucci> Yeah, and that covers my stuffs
20:12:20 <harlowja> sweet
20:12:39 <harlowja> #topic use-cases
20:13:12 <harlowja> if devananda is around, his baremetal stuff might have a new use-case we can get involved in
20:13:41 <harlowja> not sure if he is, but anyone something to think about
20:13:43 <harlowja> #link https://review.openstack.org/#/c/29804/
20:14:02 <harlowja> he's the first that i think is trying to do locking
20:14:02 <devananda> \o
20:14:10 <harlowja> hi devananda !
20:14:38 <harlowja> just was mentioning your review, and how taskflow pep's can think about how to provide that use-case
20:14:54 <devananda> cool :)
20:15:09 <devananda> want meto say anything about what we're doing?
20:15:18 <harlowja> sure
20:15:26 <jlucci> please (:
20:15:32 <devananda> k
20:15:34 <harlowja> acquiring locks on stuff, haha
20:15:50 <devananda> to support having multiple manager services in one ironic deployment
20:16:02 <devananda> need to coordinate which one is acting on what physical resource
20:16:14 <devananda> eg, who's talking to the BMC
20:16:22 <devananda> so there are 2 levels of locks
20:16:34 <harlowja> BMC == bare metal controller?
20:16:38 <devananda> ya
20:16:42 <devananda> IPMI card or what ever
20:16:50 <harlowja> k, thx
20:17:04 <devananda> one lock in the db, to prevent another manager process from doing _anything_ with that BMC
20:17:15 <devananda> and then a semaphore inside the manager process
20:17:25 <devananda> so only one thread can do things that require exclusive access
20:17:26 <devananda> like writes
20:17:32 <devananda> but other threads can still do reads to the BMC
20:17:37 <devananda> [eol]
20:17:51 <harlowja> interesting
20:18:32 <jlucci> We could definitely carry over the blocking manager processes
20:18:41 <adrian_otto> I'd like to suggest that there is no such thing as a lock in the DB, unless all access to the db is limited to a single thread in a single process.
20:18:44 <harlowja> whats an example of something that would happen simulatenously (by different threads)
20:19:09 <harlowja> adrian_otto yes, its a very good point
20:19:21 <adrian_otto> otherwise you get race conditions with the MVCC implementations of all popular databases
20:19:58 <devananda> harlowja: 1 thd doing a deployment. 1 thd polling power state
20:20:12 <harlowja> devananda thanks
20:20:25 <adrian_otto> state machines and MVCC systems are fundamentally incompatible, such that MVCC must not be a component of a state machine.
20:20:33 <devananda> adrian_otto: update .. set col=X where col=Null and id=123;
20:21:13 <devananda> adrian_otto: at least with innodb's mvcc, i believe that will work
20:21:22 <devananda> but, in principle, i agree :)
20:21:47 <adrian_otto> that will work if there is no concurrency at the time of the update.
20:22:00 <devananda> even if there is. only one writer will succeed
20:22:10 <devananda> others will timeout or fail
20:22:13 <adrian_otto> yes
20:22:51 <adrian_otto> you will get different transaction commit results from SQLite and InnoDB for example
20:23:01 <devananda> yep
20:23:08 <devananda> and fwiw, i have no idea how postgres will behave :(
20:23:08 <harlowja> ya, which is where using sqlalchemy will bite us
20:23:22 <devananda> so, if there is another / better solution, i'm all ears :)
20:23:26 <adrian_otto> so if the idea is to make a db implementation that lets you put arbitrary databases behind it… then this is going to flop.
20:23:43 <adrian_otto> I'm saying go ahead and use the DB for persistence of state transitions
20:23:52 <harlowja> right, so far jlucci  is working on that
20:24:24 <adrian_otto> but you need an abstraction on top of the persistence layer that manages locks and eliminates the concurrency edge cases.
20:24:28 <harlowja> the locking part we have somewhat (basically job ownership should be atomic), but we do not have this type of locking yet
20:24:44 <adrian_otto> right, that's the root of my concern.
20:25:02 <harlowja> sure
20:25:06 <harlowja> understandable
20:25:51 <adrian_otto> so if the goal is to start with something simple, and iterate, then funnel all state transitions through an intentional bottleneck where you manage the concurrency.
20:26:25 <adrian_otto> one such approach is to expose an API that serializes access to the database without relying on the database for the locking
20:26:40 <adrian_otto> and all concurrent clients use that API
20:26:54 <harlowja> like a DB proxy :(
20:26:56 <adrian_otto> there are other solutions as well, but that one is not complicated
20:27:03 <adrian_otto> yes, you can think of it that way.
20:27:28 <devananda> a separate special db proxy
20:27:32 <devananda> just used for lock mgmt
20:27:38 <harlowja> could be
20:27:45 <devananda> since it would probably make everything else really slow :)
20:27:45 <adrian_otto> yes
20:28:10 <adrian_otto> well, if it has a reader/writer lock implementation it would not necessarily be slow
20:28:10 <harlowja> like something like zookeeper, haha
20:28:38 <adrian_otto> but if there are lots of concurrent writers, then by definition it would slow it down.
20:28:56 <adrian_otto> and I'd argue that's the desired outcome
20:29:30 <harlowja> ya, i wonder if time should be spent inventing said service (which is like a mini-serializing-ZK), or just recommend people use ZK, idk
20:29:57 <devananda> my point is, some db traffic doesn't need write locks around it. it's really only the _establishing_ of a lock that requires it
20:30:05 <devananda> eg, in my use case
20:30:23 <devananda> once a given process has that lock, it should be free to write until it releases the lock
20:30:31 <devananda> since no one else will touch that resource
20:30:43 <devananda> the same model probably works in nova and elsewhere
20:31:36 <devananda> "lock instance" should be non-concurrent. "write stuff" could be parallel after that.
20:31:48 <adrian_otto> sure, and we could implement that simply by having a single manager process that handles issuing you the lock.
20:31:58 <devananda> right
20:32:28 <adrian_otto> but that at no time shall any two manager processes try to use the same lock table in the db
20:33:06 <adrian_otto> you also need to require that any readers also get a lock from the same authority that the writer's lock came from
20:33:29 <adrian_otto> they can't just expect to look in the db, and if the lock is in the table, then enter a polling loop
20:33:32 <adrian_otto> see what I mean?
20:33:35 <harlowja> yup, seems like a weird scaling bottleck :(
20:33:42 <adrian_otto> definitely, it is.
20:34:08 <devananda> yep
20:34:23 <adrian_otto> but properly implemented you should be able to handle thousands of locks per second with that design
20:34:48 <harlowja> *and with a correct backing database*
20:34:53 <adrian_otto> which should work fine for a control plane like this one, even if there were a very large number of cloud resources under management.
20:34:56 <devananda> CAP theorem at work
20:35:38 <harlowja> devananda def, lock reclaimation, who is the right manager and so on worries me, haha
20:35:38 <adrian_otto> and yes, it does make the system more brittle. It's the Consistency vs. Availability tradeoff in CAP.
20:35:47 <devananda> i feel like this discussion has turned from C to P
20:36:13 <devananda> heh
20:36:33 <devananda> at least we all share the same concerns :)
20:36:39 <harlowja> agreed
20:37:28 <adrian_otto> what's the objection to just taking a hard dependency on ZK? I know there is a reluctance around that, but I missed whomever expressed it.
20:37:42 <harlowja> i haven't quite figured that out yet either
20:37:58 <harlowja> i'd almost rather recommend ZK instead of trying to build mini-ZK+db wrongly ;)
20:38:24 <harlowja> i think the main objections were that its a new thing to manage
20:38:25 <adrian_otto> is it the fact that people want a library to use within a single thread of an app, and don't want the overhead of a ZK unless they are dealing with distributed state?
20:38:47 <harlowja> i think we can handle that problem with filelocks
20:38:59 <harlowja> i think ZK is just a new service that people don't have operational experience with
20:39:12 <adrian_otto> yeah, I raised that suggestion before.
20:39:22 <devananda> i'm concerned also with cross-host locks
20:39:28 <devananda> eg, HA for the ironic manager service
20:39:34 <harlowja> devananda me to :(
20:39:41 <devananda> so filelocks are no use
20:39:56 <adrian_otto> …unless…
20:40:09 <adrian_otto> you basically re-implement what ZK does...
20:40:16 <harlowja> and publish papers!
20:40:18 <devananda> also, i need to go read up on ZK :)
20:40:24 <adrian_otto> with what amounts to a 2PC of data between a quorum of nodes.
20:40:49 <devananda> hmm. 2PC again depends on the db backend
20:40:59 <adrian_otto> don't think DB
20:40:59 <harlowja> and 1+ years of work ;)
20:41:04 <adrian_otto> just think how you commit state
20:41:15 <adrian_otto> regardless of that the persistence layer is.
20:41:24 <devananda> yeh. that's really tricky :)
20:41:31 <adrian_otto> right.
20:41:57 <adrian_otto> so maybe we could think about ways to make ZK brain dead simple to use for this.
20:42:10 <adrian_otto> and overcome the management objection
20:42:13 <harlowja> thats easy i think, kazoo makes it pretty braindead
20:42:23 <harlowja> #link https://github.com/python-zk/kazoo/tree/master/kazoo/recipe
20:43:04 <harlowja> adrian_otto that could work, i think that most companies are running ZK anyway, i just don't know if we can convince other devs that its required so easily
20:43:21 <harlowja> *thats the hardier part* since it requires u to bend your mind (in a way)
20:44:02 <adrian_otto> the current plan is to have pluggable backends
20:44:34 <adrian_otto> if you plug in "db" then it should use a single (centralized) lock service, and the bottleneck and HA characteristic that comes with
20:44:46 <adrian_otto> if you plug in "zk" then you get HA
20:44:59 <harlowja> sure, but said 'single (centralized) lock service' doesn't seem like it should be provided by this library
20:45:10 <harlowja> and i don't think it exists anywhere right?
20:45:26 <harlowja> so then there would be a ZK backend, and a phantom backend?
20:45:46 <adrian_otto> right, it does not yet exist
20:46:28 <harlowja> sure, i wonder who would desire to make it then, since ZK does it without developing a new backend
20:46:38 <adrian_otto> but I'm suggesting that it's not hard to offer one for those that want to use the db backend simply need to run the lock service somewhere, and specify a configuration attribute to the taskflow library that indicates the host:port where it is running
20:47:43 <adrian_otto> in all honesty I think this could be done in about 100 lines of C++
20:48:06 <adrian_otto> or maybe less of python code
20:48:31 <harlowja> sure, the part that worries me is that providing that means that we have to support it and then can't use more advanced features of ZK later due to this db-backend
20:48:50 <harlowja> but maybe i'm thinking to much, ha
20:48:56 <adrian_otto> is there such a thing as a single node ZK?
20:49:03 <harlowja> run it on 1 computer :-P
20:49:11 <harlowja> its just a java program
20:49:27 <adrian_otto> oh, a bell just went off in my head.
20:49:30 <harlowja> ?
20:49:38 <adrian_otto> I think that's the reluctance to work with ZK
20:49:51 <adrian_otto> Java.
20:50:09 <harlowja> the underlying linux is written in c, we should not use it either ;)
20:50:15 <harlowja> and that libvirt thing, ha
20:50:16 <devananda> hah
20:50:20 <adrian_otto> LOL
20:50:29 <devananda> adrian_otto: flashing red lights.
20:51:05 <devananda> harlowja: so that's probably the source of the reluctance. Java :)
20:51:18 <harlowja> ya, that mindsets messed up :-P
20:51:25 <devananda> it certainly turns me off of it ...
20:51:30 <adrian_otto> that's a theme that keeps cropping up
20:52:03 <harlowja> its a service that provides apis, so u don't have to know its running java, lol
20:52:11 <harlowja> just somewhere it will be
20:52:11 <devananda> except we do to deploy it
20:52:20 <harlowja> have someone else deploy it, lol
20:52:22 <devananda> and make openstack depend on java? hrm...
20:52:36 <devananda> if there were a non-java alternative, that'd probably fly
20:52:41 <harlowja> i saw one in go, lol
20:52:46 <devananda> :)
20:52:59 <adrian_otto> let's table this for now
20:53:02 <harlowja> google likely has one internally in c++/c
20:53:06 <harlowja> but good luck getting it out of google
20:53:51 <adrian_otto> every distributed filesystem has solved this issue.
20:53:51 <harlowja> well alot of opensource projects use zookeeper, so i don't think its anything new
20:54:28 <jlucci> So, i'm coming halfway through this, but I don't think using zk will necessarily be an issue
20:54:48 <harlowja> it depends on what lock features we want
20:54:54 <jlucci> All we need is some sort of abstraction that provides the same functionality as zookeeper, right?
20:54:59 <adrian_otto> jlucci: there you are, thinking rationally.
20:55:14 <jlucci> Then tell the user to throw whatever they want want behind that abstraction/api/whatever
20:55:15 <adrian_otto> jlucci: yes
20:55:43 <adrian_otto> a get_lock() call is really not taht hard to back-end with Py
20:56:02 <harlowja> well release is though, especially if the backend goes away :-P
20:56:31 <harlowja> but maybe jlucci  is right and we just make some simple backends, idk
20:56:50 <adrian_otto> that would need to be something you accept when you decide not to use the HA option backed by ZK
20:57:25 <harlowja> agreed
20:57:55 <adrian_otto> it would be reliable as long as the backend remained running
20:58:12 <adrian_otto> which moves distributed risk to centralized risk
20:58:29 <harlowja> sure, so another idea is that redis/memcache provide these semantics
20:58:38 <adrian_otto> which is a design pattern that IT managers are very familiar with handling
20:59:15 <adrian_otto> harlowja: you can actually use memcache as a backing store for locks
20:59:15 <harlowja> ok, so lets see what we can develop for this
20:59:26 <harlowja> ya, i think it has basic semantics for this
20:59:27 <harlowja> nothing special
20:59:34 <harlowja> and might be more 'acceptable' that ZK
20:59:40 <harlowja> since its in C ;)
20:59:42 <adrian_otto> just put a thin api on the front of it to make it more usable
20:59:47 <harlowja> ya
20:59:55 <harlowja> that could be the default impl
21:00:08 <jlucci> Whelp - we're almost at time
21:00:11 <harlowja> ok, we can chat more on the mailing list, sound good?
21:00:15 <adrian_otto> yep
21:00:18 <harlowja> good discussion :)
21:00:19 <jlucci> Yup
21:00:24 <harlowja> #endmeeting