20:01:12 #startmeeting state-management 20:01:13 Meeting started Thu May 23 20:01:12 2013 UTC. The chair is harlowja. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:01:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:01:16 The meeting name has been set to 'state_management' 20:01:19 hi everyone! :) 20:01:20 hi 20:01:33 howdy 20:01:43 hola 20:02:05 oops, forgot to send out an agenda, well we can make one up 20:02:26 lets wait a few for others 20:02:41 jlucci i think i can mesh the rollbackaccumulator into our stuff 20:02:54 that way everyone will be happy 20:02:55 Sounds good (: 20:03:09 have a conflict. will check the log later. have fun guys. 20:03:15 sounds good, thx maoy 20:03:39 #topic status 20:04:08 so if we have people working on taskflow (or integrating it) we can use this little topic for any kind of status on what u are doing, i can go first 20:04:48 i've been just working on the library structure, and impls, and have been experimenting with how cinder might change to use said library 20:05:04 #link https://review.openstack.org/#/c/29862/ 20:05:27 also been helping jlucci get her db/celery stuff in 20:05:58 and trying to see how we can get more nova usage and adjusting to see how we can make that happen in a simple (not major restructure way) 20:06:07 thats it for me :) 20:06:55 Shweet. Guess I'll go next 20:06:58 sureeee 20:07:14 So, spent a lot of time getting a database backend setup 20:07:27 All of that seems to be working appropriate (currently only implementation is sql) 20:07:43 sweet! 20:07:48 and kchenweijie is working on some unit tests for all of that 20:07:59 *oh ya, i've been doing unit tests this week also 20:08:06 So, that along with some basic config stuff got pulled into the code 20:08:11 :P yay unit tests 20:08:15 how's the stackforge move going? 20:08:22 I'm obviously a gerrit-noob 20:08:23 ha 20:08:32 np :) 20:08:38 #link https://review.openstack.org/#/c/29981/ 20:08:47 Accidentally put in two requests, went back, squashed my previous commit into the first one, and pushed that back up for review 20:08:49 i put up a small comment, the infra people probably want it squashed 20:09:02 *so that it doesn't have 2 change-ids 20:09:18 change-ids are how gerrit associates commits so 2 might confuse it 20:10:02 Oh, well I abandoned the first commit/review 20:10:18 https://review.openstack.org/#/c/29981/ Has all the commits that need to be merged into the stackforge stuff 20:10:51 ya, that one looks ok, just might want to remove one of the 'Change-Id: ' lines 20:11:07 Oh, snap. Didn't see the second one 20:11:11 Oh, gerrit 20:11:14 :) 20:11:19 So, will fix that shortly. haha 20:11:25 cool 20:11:33 sounds good 20:11:55 anyone else want to report any kind of useful status info :) 20:12:00 As for the celery stuff, after a talk I had today, I actually think I'm going to go back and re-implement it in a different way. Something more distributed, less workflow-oriented 20:12:07 ok 20:12:13 Yeah, and that covers my stuffs 20:12:20 sweet 20:12:39 #topic use-cases 20:13:12 if devananda is around, his baremetal stuff might have a new use-case we can get involved in 20:13:41 not sure if he is, but anyone something to think about 20:13:43 #link https://review.openstack.org/#/c/29804/ 20:14:02 he's the first that i think is trying to do locking 20:14:02 \o 20:14:10 hi devananda ! 20:14:38 just was mentioning your review, and how taskflow pep's can think about how to provide that use-case 20:14:54 cool :) 20:15:09 want meto say anything about what we're doing? 20:15:18 sure 20:15:26 please (: 20:15:32 k 20:15:34 acquiring locks on stuff, haha 20:15:50 to support having multiple manager services in one ironic deployment 20:16:02 need to coordinate which one is acting on what physical resource 20:16:14 eg, who's talking to the BMC 20:16:22 so there are 2 levels of locks 20:16:34 BMC == bare metal controller? 20:16:38 ya 20:16:42 IPMI card or what ever 20:16:50 k, thx 20:17:04 one lock in the db, to prevent another manager process from doing _anything_ with that BMC 20:17:15 and then a semaphore inside the manager process 20:17:25 so only one thread can do things that require exclusive access 20:17:26 like writes 20:17:32 but other threads can still do reads to the BMC 20:17:37 [eol] 20:17:51 interesting 20:18:32 We could definitely carry over the blocking manager processes 20:18:41 I'd like to suggest that there is no such thing as a lock in the DB, unless all access to the db is limited to a single thread in a single process. 20:18:44 whats an example of something that would happen simulatenously (by different threads) 20:19:09 adrian_otto yes, its a very good point 20:19:21 otherwise you get race conditions with the MVCC implementations of all popular databases 20:19:58 harlowja: 1 thd doing a deployment. 1 thd polling power state 20:20:12 devananda thanks 20:20:25 state machines and MVCC systems are fundamentally incompatible, such that MVCC must not be a component of a state machine. 20:20:33 adrian_otto: update .. set col=X where col=Null and id=123; 20:21:13 adrian_otto: at least with innodb's mvcc, i believe that will work 20:21:22 but, in principle, i agree :) 20:21:47 that will work if there is no concurrency at the time of the update. 20:22:00 even if there is. only one writer will succeed 20:22:10 others will timeout or fail 20:22:13 yes 20:22:51 you will get different transaction commit results from SQLite and InnoDB for example 20:23:01 yep 20:23:08 and fwiw, i have no idea how postgres will behave :( 20:23:08 ya, which is where using sqlalchemy will bite us 20:23:22 so, if there is another / better solution, i'm all ears :) 20:23:26 so if the idea is to make a db implementation that lets you put arbitrary databases behind it… then this is going to flop. 20:23:43 I'm saying go ahead and use the DB for persistence of state transitions 20:23:52 right, so far jlucci is working on that 20:24:24 but you need an abstraction on top of the persistence layer that manages locks and eliminates the concurrency edge cases. 20:24:28 the locking part we have somewhat (basically job ownership should be atomic), but we do not have this type of locking yet 20:24:44 right, that's the root of my concern. 20:25:02 sure 20:25:06 understandable 20:25:51 so if the goal is to start with something simple, and iterate, then funnel all state transitions through an intentional bottleneck where you manage the concurrency. 20:26:25 one such approach is to expose an API that serializes access to the database without relying on the database for the locking 20:26:40 and all concurrent clients use that API 20:26:54 like a DB proxy :( 20:26:56 there are other solutions as well, but that one is not complicated 20:27:03 yes, you can think of it that way. 20:27:28 a separate special db proxy 20:27:32 just used for lock mgmt 20:27:38 could be 20:27:45 since it would probably make everything else really slow :) 20:27:45 yes 20:28:10 well, if it has a reader/writer lock implementation it would not necessarily be slow 20:28:10 like something like zookeeper, haha 20:28:38 but if there are lots of concurrent writers, then by definition it would slow it down. 20:28:56 and I'd argue that's the desired outcome 20:29:30 ya, i wonder if time should be spent inventing said service (which is like a mini-serializing-ZK), or just recommend people use ZK, idk 20:29:57 my point is, some db traffic doesn't need write locks around it. it's really only the _establishing_ of a lock that requires it 20:30:05 eg, in my use case 20:30:23 once a given process has that lock, it should be free to write until it releases the lock 20:30:31 since no one else will touch that resource 20:30:43 the same model probably works in nova and elsewhere 20:31:36 "lock instance" should be non-concurrent. "write stuff" could be parallel after that. 20:31:48 sure, and we could implement that simply by having a single manager process that handles issuing you the lock. 20:31:58 right 20:32:28 but that at no time shall any two manager processes try to use the same lock table in the db 20:33:06 you also need to require that any readers also get a lock from the same authority that the writer's lock came from 20:33:29 they can't just expect to look in the db, and if the lock is in the table, then enter a polling loop 20:33:32 see what I mean? 20:33:35 yup, seems like a weird scaling bottleck :( 20:33:42 definitely, it is. 20:34:08 yep 20:34:23 but properly implemented you should be able to handle thousands of locks per second with that design 20:34:48 *and with a correct backing database* 20:34:53 which should work fine for a control plane like this one, even if there were a very large number of cloud resources under management. 20:34:56 CAP theorem at work 20:35:38 devananda def, lock reclaimation, who is the right manager and so on worries me, haha 20:35:38 and yes, it does make the system more brittle. It's the Consistency vs. Availability tradeoff in CAP. 20:35:47 i feel like this discussion has turned from C to P 20:36:13 heh 20:36:33 at least we all share the same concerns :) 20:36:39 agreed 20:37:28 what's the objection to just taking a hard dependency on ZK? I know there is a reluctance around that, but I missed whomever expressed it. 20:37:42 i haven't quite figured that out yet either 20:37:58 i'd almost rather recommend ZK instead of trying to build mini-ZK+db wrongly ;) 20:38:24 i think the main objections were that its a new thing to manage 20:38:25 is it the fact that people want a library to use within a single thread of an app, and don't want the overhead of a ZK unless they are dealing with distributed state? 20:38:47 i think we can handle that problem with filelocks 20:38:59 i think ZK is just a new service that people don't have operational experience with 20:39:12 yeah, I raised that suggestion before. 20:39:22 i'm concerned also with cross-host locks 20:39:28 eg, HA for the ironic manager service 20:39:34 devananda me to :( 20:39:41 so filelocks are no use 20:39:56 …unless… 20:40:09 you basically re-implement what ZK does... 20:40:16 and publish papers! 20:40:18 also, i need to go read up on ZK :) 20:40:24 with what amounts to a 2PC of data between a quorum of nodes. 20:40:49 hmm. 2PC again depends on the db backend 20:40:59 don't think DB 20:40:59 and 1+ years of work ;) 20:41:04 just think how you commit state 20:41:15 regardless of that the persistence layer is. 20:41:24 yeh. that's really tricky :) 20:41:31 right. 20:41:57 so maybe we could think about ways to make ZK brain dead simple to use for this. 20:42:10 and overcome the management objection 20:42:13 thats easy i think, kazoo makes it pretty braindead 20:42:23 #link https://github.com/python-zk/kazoo/tree/master/kazoo/recipe 20:43:04 adrian_otto that could work, i think that most companies are running ZK anyway, i just don't know if we can convince other devs that its required so easily 20:43:21 *thats the hardier part* since it requires u to bend your mind (in a way) 20:44:02 the current plan is to have pluggable backends 20:44:34 if you plug in "db" then it should use a single (centralized) lock service, and the bottleneck and HA characteristic that comes with 20:44:46 if you plug in "zk" then you get HA 20:44:59 sure, but said 'single (centralized) lock service' doesn't seem like it should be provided by this library 20:45:10 and i don't think it exists anywhere right? 20:45:26 so then there would be a ZK backend, and a phantom backend? 20:45:46 right, it does not yet exist 20:46:28 sure, i wonder who would desire to make it then, since ZK does it without developing a new backend 20:46:38 but I'm suggesting that it's not hard to offer one for those that want to use the db backend simply need to run the lock service somewhere, and specify a configuration attribute to the taskflow library that indicates the host:port where it is running 20:47:43 in all honesty I think this could be done in about 100 lines of C++ 20:48:06 or maybe less of python code 20:48:31 sure, the part that worries me is that providing that means that we have to support it and then can't use more advanced features of ZK later due to this db-backend 20:48:50 but maybe i'm thinking to much, ha 20:48:56 is there such a thing as a single node ZK? 20:49:03 run it on 1 computer :-P 20:49:11 its just a java program 20:49:27 oh, a bell just went off in my head. 20:49:30 ? 20:49:38 I think that's the reluctance to work with ZK 20:49:51 Java. 20:50:09 the underlying linux is written in c, we should not use it either ;) 20:50:15 and that libvirt thing, ha 20:50:16 hah 20:50:20 LOL 20:50:29 adrian_otto: flashing red lights. 20:51:05 harlowja: so that's probably the source of the reluctance. Java :) 20:51:18 ya, that mindsets messed up :-P 20:51:25 it certainly turns me off of it ... 20:51:30 that's a theme that keeps cropping up 20:52:03 its a service that provides apis, so u don't have to know its running java, lol 20:52:11 just somewhere it will be 20:52:11 except we do to deploy it 20:52:20 have someone else deploy it, lol 20:52:22 and make openstack depend on java? hrm... 20:52:36 if there were a non-java alternative, that'd probably fly 20:52:41 i saw one in go, lol 20:52:46 :) 20:52:59 let's table this for now 20:53:02 google likely has one internally in c++/c 20:53:06 but good luck getting it out of google 20:53:51 every distributed filesystem has solved this issue. 20:53:51 well alot of opensource projects use zookeeper, so i don't think its anything new 20:54:28 So, i'm coming halfway through this, but I don't think using zk will necessarily be an issue 20:54:48 it depends on what lock features we want 20:54:54 All we need is some sort of abstraction that provides the same functionality as zookeeper, right? 20:54:59 jlucci: there you are, thinking rationally. 20:55:14 Then tell the user to throw whatever they want want behind that abstraction/api/whatever 20:55:15 jlucci: yes 20:55:43 a get_lock() call is really not taht hard to back-end with Py 20:56:02 well release is though, especially if the backend goes away :-P 20:56:31 but maybe jlucci is right and we just make some simple backends, idk 20:56:50 that would need to be something you accept when you decide not to use the HA option backed by ZK 20:57:25 agreed 20:57:55 it would be reliable as long as the backend remained running 20:58:12 which moves distributed risk to centralized risk 20:58:29 sure, so another idea is that redis/memcache provide these semantics 20:58:38 which is a design pattern that IT managers are very familiar with handling 20:59:15 harlowja: you can actually use memcache as a backing store for locks 20:59:15 ok, so lets see what we can develop for this 20:59:26 ya, i think it has basic semantics for this 20:59:27 nothing special 20:59:34 and might be more 'acceptable' that ZK 20:59:40 since its in C ;) 20:59:42 just put a thin api on the front of it to make it more usable 20:59:47 ya 20:59:55 that could be the default impl 21:00:08 Whelp - we're almost at time 21:00:11 ok, we can chat more on the mailing list, sound good? 21:00:15 yep 21:00:18 good discussion :) 21:00:19 Yup 21:00:24 #endmeeting