20:01:12 <harlowja> #startmeeting state-management 20:01:13 <openstack> Meeting started Thu May 23 20:01:12 2013 UTC. The chair is harlowja. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:01:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:01:16 <openstack> The meeting name has been set to 'state_management' 20:01:19 <harlowja> hi everyone! :) 20:01:20 <adrian_otto> hi 20:01:33 <harlowja> howdy 20:01:43 <jlucci> hola 20:02:05 <harlowja> oops, forgot to send out an agenda, well we can make one up 20:02:26 <harlowja> lets wait a few for others 20:02:41 <harlowja> jlucci i think i can mesh the rollbackaccumulator into our stuff 20:02:54 <harlowja> that way everyone will be happy 20:02:55 <jlucci> Sounds good (: 20:03:09 <maoy> have a conflict. will check the log later. have fun guys. 20:03:15 <harlowja> sounds good, thx maoy 20:03:39 <harlowja> #topic status 20:04:08 <harlowja> so if we have people working on taskflow (or integrating it) we can use this little topic for any kind of status on what u are doing, i can go first 20:04:48 <harlowja> i've been just working on the library structure, and impls, and have been experimenting with how cinder might change to use said library 20:05:04 <harlowja> #link https://review.openstack.org/#/c/29862/ 20:05:27 <harlowja> also been helping jlucci get her db/celery stuff in 20:05:58 <harlowja> and trying to see how we can get more nova usage and adjusting to see how we can make that happen in a simple (not major restructure way) 20:06:07 <harlowja> thats it for me :) 20:06:55 <jlucci> Shweet. Guess I'll go next 20:06:58 <harlowja> sureeee 20:07:14 <jlucci> So, spent a lot of time getting a database backend setup 20:07:27 <jlucci> All of that seems to be working appropriate (currently only implementation is sql) 20:07:43 <harlowja> sweet! 20:07:48 <jlucci> and kchenweijie is working on some unit tests for all of that 20:07:59 <harlowja> *oh ya, i've been doing unit tests this week also 20:08:06 <jlucci> So, that along with some basic config stuff got pulled into the code 20:08:11 <jlucci> :P yay unit tests 20:08:15 <harlowja> how's the stackforge move going? 20:08:22 <jlucci> I'm obviously a gerrit-noob 20:08:23 <jlucci> ha 20:08:32 <harlowja> np :) 20:08:38 <harlowja> #link https://review.openstack.org/#/c/29981/ 20:08:47 <jlucci> Accidentally put in two requests, went back, squashed my previous commit into the first one, and pushed that back up for review 20:08:49 <harlowja> i put up a small comment, the infra people probably want it squashed 20:09:02 <harlowja> *so that it doesn't have 2 change-ids 20:09:18 <harlowja> change-ids are how gerrit associates commits so 2 might confuse it 20:10:02 <jlucci> Oh, well I abandoned the first commit/review 20:10:18 <jlucci> https://review.openstack.org/#/c/29981/ Has all the commits that need to be merged into the stackforge stuff 20:10:51 <harlowja> ya, that one looks ok, just might want to remove one of the 'Change-Id: ' lines 20:11:07 <jlucci> Oh, snap. Didn't see the second one 20:11:11 <jlucci> Oh, gerrit 20:11:14 <harlowja> :) 20:11:19 <jlucci> So, will fix that shortly. haha 20:11:25 <harlowja> cool 20:11:33 <harlowja> sounds good 20:11:55 <harlowja> anyone else want to report any kind of useful status info :) 20:12:00 <jlucci> As for the celery stuff, after a talk I had today, I actually think I'm going to go back and re-implement it in a different way. Something more distributed, less workflow-oriented 20:12:07 <harlowja> ok 20:12:13 <jlucci> Yeah, and that covers my stuffs 20:12:20 <harlowja> sweet 20:12:39 <harlowja> #topic use-cases 20:13:12 <harlowja> if devananda is around, his baremetal stuff might have a new use-case we can get involved in 20:13:41 <harlowja> not sure if he is, but anyone something to think about 20:13:43 <harlowja> #link https://review.openstack.org/#/c/29804/ 20:14:02 <harlowja> he's the first that i think is trying to do locking 20:14:02 <devananda> \o 20:14:10 <harlowja> hi devananda ! 20:14:38 <harlowja> just was mentioning your review, and how taskflow pep's can think about how to provide that use-case 20:14:54 <devananda> cool :) 20:15:09 <devananda> want meto say anything about what we're doing? 20:15:18 <harlowja> sure 20:15:26 <jlucci> please (: 20:15:32 <devananda> k 20:15:34 <harlowja> acquiring locks on stuff, haha 20:15:50 <devananda> to support having multiple manager services in one ironic deployment 20:16:02 <devananda> need to coordinate which one is acting on what physical resource 20:16:14 <devananda> eg, who's talking to the BMC 20:16:22 <devananda> so there are 2 levels of locks 20:16:34 <harlowja> BMC == bare metal controller? 20:16:38 <devananda> ya 20:16:42 <devananda> IPMI card or what ever 20:16:50 <harlowja> k, thx 20:17:04 <devananda> one lock in the db, to prevent another manager process from doing _anything_ with that BMC 20:17:15 <devananda> and then a semaphore inside the manager process 20:17:25 <devananda> so only one thread can do things that require exclusive access 20:17:26 <devananda> like writes 20:17:32 <devananda> but other threads can still do reads to the BMC 20:17:37 <devananda> [eol] 20:17:51 <harlowja> interesting 20:18:32 <jlucci> We could definitely carry over the blocking manager processes 20:18:41 <adrian_otto> I'd like to suggest that there is no such thing as a lock in the DB, unless all access to the db is limited to a single thread in a single process. 20:18:44 <harlowja> whats an example of something that would happen simulatenously (by different threads) 20:19:09 <harlowja> adrian_otto yes, its a very good point 20:19:21 <adrian_otto> otherwise you get race conditions with the MVCC implementations of all popular databases 20:19:58 <devananda> harlowja: 1 thd doing a deployment. 1 thd polling power state 20:20:12 <harlowja> devananda thanks 20:20:25 <adrian_otto> state machines and MVCC systems are fundamentally incompatible, such that MVCC must not be a component of a state machine. 20:20:33 <devananda> adrian_otto: update .. set col=X where col=Null and id=123; 20:21:13 <devananda> adrian_otto: at least with innodb's mvcc, i believe that will work 20:21:22 <devananda> but, in principle, i agree :) 20:21:47 <adrian_otto> that will work if there is no concurrency at the time of the update. 20:22:00 <devananda> even if there is. only one writer will succeed 20:22:10 <devananda> others will timeout or fail 20:22:13 <adrian_otto> yes 20:22:51 <adrian_otto> you will get different transaction commit results from SQLite and InnoDB for example 20:23:01 <devananda> yep 20:23:08 <devananda> and fwiw, i have no idea how postgres will behave :( 20:23:08 <harlowja> ya, which is where using sqlalchemy will bite us 20:23:22 <devananda> so, if there is another / better solution, i'm all ears :) 20:23:26 <adrian_otto> so if the idea is to make a db implementation that lets you put arbitrary databases behind it… then this is going to flop. 20:23:43 <adrian_otto> I'm saying go ahead and use the DB for persistence of state transitions 20:23:52 <harlowja> right, so far jlucci is working on that 20:24:24 <adrian_otto> but you need an abstraction on top of the persistence layer that manages locks and eliminates the concurrency edge cases. 20:24:28 <harlowja> the locking part we have somewhat (basically job ownership should be atomic), but we do not have this type of locking yet 20:24:44 <adrian_otto> right, that's the root of my concern. 20:25:02 <harlowja> sure 20:25:06 <harlowja> understandable 20:25:51 <adrian_otto> so if the goal is to start with something simple, and iterate, then funnel all state transitions through an intentional bottleneck where you manage the concurrency. 20:26:25 <adrian_otto> one such approach is to expose an API that serializes access to the database without relying on the database for the locking 20:26:40 <adrian_otto> and all concurrent clients use that API 20:26:54 <harlowja> like a DB proxy :( 20:26:56 <adrian_otto> there are other solutions as well, but that one is not complicated 20:27:03 <adrian_otto> yes, you can think of it that way. 20:27:28 <devananda> a separate special db proxy 20:27:32 <devananda> just used for lock mgmt 20:27:38 <harlowja> could be 20:27:45 <devananda> since it would probably make everything else really slow :) 20:27:45 <adrian_otto> yes 20:28:10 <adrian_otto> well, if it has a reader/writer lock implementation it would not necessarily be slow 20:28:10 <harlowja> like something like zookeeper, haha 20:28:38 <adrian_otto> but if there are lots of concurrent writers, then by definition it would slow it down. 20:28:56 <adrian_otto> and I'd argue that's the desired outcome 20:29:30 <harlowja> ya, i wonder if time should be spent inventing said service (which is like a mini-serializing-ZK), or just recommend people use ZK, idk 20:29:57 <devananda> my point is, some db traffic doesn't need write locks around it. it's really only the _establishing_ of a lock that requires it 20:30:05 <devananda> eg, in my use case 20:30:23 <devananda> once a given process has that lock, it should be free to write until it releases the lock 20:30:31 <devananda> since no one else will touch that resource 20:30:43 <devananda> the same model probably works in nova and elsewhere 20:31:36 <devananda> "lock instance" should be non-concurrent. "write stuff" could be parallel after that. 20:31:48 <adrian_otto> sure, and we could implement that simply by having a single manager process that handles issuing you the lock. 20:31:58 <devananda> right 20:32:28 <adrian_otto> but that at no time shall any two manager processes try to use the same lock table in the db 20:33:06 <adrian_otto> you also need to require that any readers also get a lock from the same authority that the writer's lock came from 20:33:29 <adrian_otto> they can't just expect to look in the db, and if the lock is in the table, then enter a polling loop 20:33:32 <adrian_otto> see what I mean? 20:33:35 <harlowja> yup, seems like a weird scaling bottleck :( 20:33:42 <adrian_otto> definitely, it is. 20:34:08 <devananda> yep 20:34:23 <adrian_otto> but properly implemented you should be able to handle thousands of locks per second with that design 20:34:48 <harlowja> *and with a correct backing database* 20:34:53 <adrian_otto> which should work fine for a control plane like this one, even if there were a very large number of cloud resources under management. 20:34:56 <devananda> CAP theorem at work 20:35:38 <harlowja> devananda def, lock reclaimation, who is the right manager and so on worries me, haha 20:35:38 <adrian_otto> and yes, it does make the system more brittle. It's the Consistency vs. Availability tradeoff in CAP. 20:35:47 <devananda> i feel like this discussion has turned from C to P 20:36:13 <devananda> heh 20:36:33 <devananda> at least we all share the same concerns :) 20:36:39 <harlowja> agreed 20:37:28 <adrian_otto> what's the objection to just taking a hard dependency on ZK? I know there is a reluctance around that, but I missed whomever expressed it. 20:37:42 <harlowja> i haven't quite figured that out yet either 20:37:58 <harlowja> i'd almost rather recommend ZK instead of trying to build mini-ZK+db wrongly ;) 20:38:24 <harlowja> i think the main objections were that its a new thing to manage 20:38:25 <adrian_otto> is it the fact that people want a library to use within a single thread of an app, and don't want the overhead of a ZK unless they are dealing with distributed state? 20:38:47 <harlowja> i think we can handle that problem with filelocks 20:38:59 <harlowja> i think ZK is just a new service that people don't have operational experience with 20:39:12 <adrian_otto> yeah, I raised that suggestion before. 20:39:22 <devananda> i'm concerned also with cross-host locks 20:39:28 <devananda> eg, HA for the ironic manager service 20:39:34 <harlowja> devananda me to :( 20:39:41 <devananda> so filelocks are no use 20:39:56 <adrian_otto> …unless… 20:40:09 <adrian_otto> you basically re-implement what ZK does... 20:40:16 <harlowja> and publish papers! 20:40:18 <devananda> also, i need to go read up on ZK :) 20:40:24 <adrian_otto> with what amounts to a 2PC of data between a quorum of nodes. 20:40:49 <devananda> hmm. 2PC again depends on the db backend 20:40:59 <adrian_otto> don't think DB 20:40:59 <harlowja> and 1+ years of work ;) 20:41:04 <adrian_otto> just think how you commit state 20:41:15 <adrian_otto> regardless of that the persistence layer is. 20:41:24 <devananda> yeh. that's really tricky :) 20:41:31 <adrian_otto> right. 20:41:57 <adrian_otto> so maybe we could think about ways to make ZK brain dead simple to use for this. 20:42:10 <adrian_otto> and overcome the management objection 20:42:13 <harlowja> thats easy i think, kazoo makes it pretty braindead 20:42:23 <harlowja> #link https://github.com/python-zk/kazoo/tree/master/kazoo/recipe 20:43:04 <harlowja> adrian_otto that could work, i think that most companies are running ZK anyway, i just don't know if we can convince other devs that its required so easily 20:43:21 <harlowja> *thats the hardier part* since it requires u to bend your mind (in a way) 20:44:02 <adrian_otto> the current plan is to have pluggable backends 20:44:34 <adrian_otto> if you plug in "db" then it should use a single (centralized) lock service, and the bottleneck and HA characteristic that comes with 20:44:46 <adrian_otto> if you plug in "zk" then you get HA 20:44:59 <harlowja> sure, but said 'single (centralized) lock service' doesn't seem like it should be provided by this library 20:45:10 <harlowja> and i don't think it exists anywhere right? 20:45:26 <harlowja> so then there would be a ZK backend, and a phantom backend? 20:45:46 <adrian_otto> right, it does not yet exist 20:46:28 <harlowja> sure, i wonder who would desire to make it then, since ZK does it without developing a new backend 20:46:38 <adrian_otto> but I'm suggesting that it's not hard to offer one for those that want to use the db backend simply need to run the lock service somewhere, and specify a configuration attribute to the taskflow library that indicates the host:port where it is running 20:47:43 <adrian_otto> in all honesty I think this could be done in about 100 lines of C++ 20:48:06 <adrian_otto> or maybe less of python code 20:48:31 <harlowja> sure, the part that worries me is that providing that means that we have to support it and then can't use more advanced features of ZK later due to this db-backend 20:48:50 <harlowja> but maybe i'm thinking to much, ha 20:48:56 <adrian_otto> is there such a thing as a single node ZK? 20:49:03 <harlowja> run it on 1 computer :-P 20:49:11 <harlowja> its just a java program 20:49:27 <adrian_otto> oh, a bell just went off in my head. 20:49:30 <harlowja> ? 20:49:38 <adrian_otto> I think that's the reluctance to work with ZK 20:49:51 <adrian_otto> Java. 20:50:09 <harlowja> the underlying linux is written in c, we should not use it either ;) 20:50:15 <harlowja> and that libvirt thing, ha 20:50:16 <devananda> hah 20:50:20 <adrian_otto> LOL 20:50:29 <devananda> adrian_otto: flashing red lights. 20:51:05 <devananda> harlowja: so that's probably the source of the reluctance. Java :) 20:51:18 <harlowja> ya, that mindsets messed up :-P 20:51:25 <devananda> it certainly turns me off of it ... 20:51:30 <adrian_otto> that's a theme that keeps cropping up 20:52:03 <harlowja> its a service that provides apis, so u don't have to know its running java, lol 20:52:11 <harlowja> just somewhere it will be 20:52:11 <devananda> except we do to deploy it 20:52:20 <harlowja> have someone else deploy it, lol 20:52:22 <devananda> and make openstack depend on java? hrm... 20:52:36 <devananda> if there were a non-java alternative, that'd probably fly 20:52:41 <harlowja> i saw one in go, lol 20:52:46 <devananda> :) 20:52:59 <adrian_otto> let's table this for now 20:53:02 <harlowja> google likely has one internally in c++/c 20:53:06 <harlowja> but good luck getting it out of google 20:53:51 <adrian_otto> every distributed filesystem has solved this issue. 20:53:51 <harlowja> well alot of opensource projects use zookeeper, so i don't think its anything new 20:54:28 <jlucci> So, i'm coming halfway through this, but I don't think using zk will necessarily be an issue 20:54:48 <harlowja> it depends on what lock features we want 20:54:54 <jlucci> All we need is some sort of abstraction that provides the same functionality as zookeeper, right? 20:54:59 <adrian_otto> jlucci: there you are, thinking rationally. 20:55:14 <jlucci> Then tell the user to throw whatever they want want behind that abstraction/api/whatever 20:55:15 <adrian_otto> jlucci: yes 20:55:43 <adrian_otto> a get_lock() call is really not taht hard to back-end with Py 20:56:02 <harlowja> well release is though, especially if the backend goes away :-P 20:56:31 <harlowja> but maybe jlucci is right and we just make some simple backends, idk 20:56:50 <adrian_otto> that would need to be something you accept when you decide not to use the HA option backed by ZK 20:57:25 <harlowja> agreed 20:57:55 <adrian_otto> it would be reliable as long as the backend remained running 20:58:12 <adrian_otto> which moves distributed risk to centralized risk 20:58:29 <harlowja> sure, so another idea is that redis/memcache provide these semantics 20:58:38 <adrian_otto> which is a design pattern that IT managers are very familiar with handling 20:59:15 <adrian_otto> harlowja: you can actually use memcache as a backing store for locks 20:59:15 <harlowja> ok, so lets see what we can develop for this 20:59:26 <harlowja> ya, i think it has basic semantics for this 20:59:27 <harlowja> nothing special 20:59:34 <harlowja> and might be more 'acceptable' that ZK 20:59:40 <harlowja> since its in C ;) 20:59:42 <adrian_otto> just put a thin api on the front of it to make it more usable 20:59:47 <harlowja> ya 20:59:55 <harlowja> that could be the default impl 21:00:08 <jlucci> Whelp - we're almost at time 21:00:11 <harlowja> ok, we can chat more on the mailing list, sound good? 21:00:15 <adrian_otto> yep 21:00:18 <harlowja> good discussion :) 21:00:19 <jlucci> Yup 21:00:24 <harlowja> #endmeeting