20:00:11 #startmeeting state-management 20:00:12 Meeting started Thu Sep 19 20:00:11 2013 UTC and is due to finish in 60 minutes. The chair is harlowja. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:15 The meeting name has been set to 'state_management' 20:00:16 howdy folks! 20:00:23 #link https://wiki.openstack.org/wiki/Meetings/StateManagement#Agenda_for_next_meeting 20:00:29 hi there 20:00:54 hello 20:00:58 hello 20:01:02 I didn't do my action item. 20:01:06 :-( 20:01:06 hey guys 20:01:10 hi hi 20:01:11 ha 20:01:19 kebray u have been a bad boy 20:01:29 #link http://eavesdrop.openstack.org/meetings/state_management/2013/state_management.2013-09-12-20.00.html 20:01:43 hi caitlin56 melnikov changbl 20:01:56 so lets see 20:02:03 #topic action-items-from-last-time 20:02:21 hi 20:02:25 so i did some documentation updates for mine, reworking existing docs a little also 20:02:27 hi adrian_otto 20:02:59 #link https://wiki.openstack.org/wiki/TaskFlow/Engines was one of those 20:03:21 #link https://wiki.openstack.org/wiki/TaskFlow/Persistence was the other 20:03:48 i should probably continue improving those (of course) 20:04:22 since without docs, nobody knows what we are doing, ha 20:04:34 +1 harlowja 20:04:54 i did my action item, almost -- new convenient function is called taskflow.engines.run(), and it is under review: https://review.openstack.org/46458 20:05:03 thx melnikov ! 20:05:33 +1 melnikov , I put some comments there 20:05:40 looks like thats progressing well, will be a very nice function to make it really simple to use taskflow 20:05:43 thx changbl 20:05:58 yes, as simple as possible, please guys 20:06:12 :) 20:06:20 yes sir! 20:06:24 lol 20:06:49 kebray next week maybe u can do your action item ;) 20:06:58 hopefully. 20:07:04 Will certainly do my best. 20:07:17 :) 20:07:32 #action kebray mess around with taskflow 20:08:04 so cool, lets see about any needed coordination stuffs 20:08:14 #topic overall-effort-and-coordination 20:08:46 so this week i think the big things that are (hopefully) going in are melnikov run(), flow flattening, and the rework of the multi-threaded engine 20:09:08 #link https://review.openstack.org/#/c/46458/ 20:09:15 #link https://review.openstack.org/#/c/46692/ 20:09:22 #link https://review.openstack.org/#/c/47238/ 20:09:27 (in order of mentioning) 20:09:44 i think jessica has been chugging away on the distributed engine also 20:10:14 and lets see, what else, a shout out to sandywalsh_ :-P 20:10:19 we got mentioned on his blog, ha 20:10:25 #link http://www.sandywalsh.com/2013/09/notification-usage-in-openstack-report.html 20:10:46 "We have big hopes that the TaskFlow project will mature and that existing projects will start to use it. Having a common state management library will mean a central location for notification generation." :) 20:10:54 your welcome :) ... I just watched Jessica's taskflow video ... it's very good 20:10:55 so thanks sandywalsh_ 20:11:16 harlowja has TaskFlow been accepted into mainline yet? 20:11:27 if not, do you know what coordination stuff needs to happen? 20:11:28 mainline meaning? 20:11:31 to make that happen? 20:11:32 master 20:11:38 master of what? 20:11:42 or is it still wip 20:11:50 master of oslo? master of ? 20:11:50 gerrit review acceptance 20:11:56 master of TaskFlwo 20:12:06 confused 20:12:12 ok.. we can take that offline 20:12:13 Taskflow has been accepted into master of Taskflow 20:12:15 :-P 20:12:24 doh.. sorry, distributed engine. 20:12:25 err. 20:12:27 typo. 20:12:33 has distributed engine been accepted into TaskFlow? 20:12:43 not yet, still being worked on afaik 20:12:57 #link https://review.openstack.org/#/c/45585/ 20:13:15 Last I heard, it hadn't landed because so many TaskFlow changes were happening... it was like a moving target, and kept breaking distributed WIP. 20:13:40 Was wondering if there is any coordination that needs to happen... I want to make sure distributed gets in before the summit. 20:13:51 i think it will, it should calm down i think 20:13:52 Was wondering if others are in agreement on that, or against that. 20:14:18 if jessica feels like its ready, and others do as well, then i see no problem in getting it in 20:14:25 k. 20:15:15 I know there was some frustration that she couldn't get it in because everything kept changing under the hood... has caused a lot of reworking.. hoping ya'll can help with that, get distributed in, then do rework that is inclusive of ensuring distributed continues to work along the way. 20:15:39 otherwise she'll never be able to keep up as the lone distributed developer right now. 20:15:49 and, if things continue to be in flux. 20:16:29 sure, but this is the way of how software goes, things may always be in flux 20:16:36 *to some level* 20:16:59 although i do know that being a lone distributed developer is not sustainable, so hopefully that can change sometime 20:17:22 https://twitter.com/TheSandyWalsh/status/380776942196121600 20:17:32 thx sandywalsh_ 20:18:13 kebray so i understand the concern, and i agree with u that it needs two get in, although the issue of 'long distributed developer' doesn't go away if its in :) 20:18:19 *lone 20:18:34 especially since i think distributed is actually the most complicated one :) 20:18:49 and has to be done really really carefully 20:18:56 *even if celery is behind it* 20:19:25 but we can chat afterwards maybe about this more? 20:19:40 sound ok kebray ? 20:20:00 sounds good. 20:20:40 cool 20:20:55 #topic new-use-cases 20:21:16 so there was a new use-case and maybe some new ideas for cinder backend activities that caitlin56 had recently 20:21:18 #link https://wiki.openstack.org/wiki/Cinder_Backend_Activities 20:21:36 caitlin56 do u might giving a little summary of your idea 20:21:49 *and maybe how u think it releates to taskflow 20:23:01 hmmm, ok, maybe she went away 20:23:25 so the part of that wiki that i was wondering about was 20:23:26 'To properly utilize Volume Driver abilities, the taskflow code will need to be able to read a list of Volume Driver attributes that document these capabilities. ' 20:23:51 where attributes could be 20:23:54 Stateless Activities: 20:23:58 Snapshot Replication: 20:23:59 ... 20:24:12 so i'm working on wrapping my head around how that might affect taskflow 20:24:26 * caitlin56 apologizes - phone call from her boss. 20:24:29 np 20:24:34 I can discuss the cinder backend now. 20:24:36 sweet 20:24:38 thx caitlin56 20:25:29 As to the capabilities: these need to cover things like whether the Cinder volume is on the same machine as Cinder or on an external machine. 20:25:54 The current code assumes the former, and fetches the payload, compresses it and then puts it object storage. 20:26:28 This is not the optimal algorithm if the volume is actually on a third machine -- you want to tell *that* machine to put it to the Object Server. 20:26:44 You can't write a taskflow that is smart without being able to query attributes like that. 20:27:33 so this might affect the construction/runtime of the flow based on those attributes? 20:27:47 Yes. 20:27:53 k 20:28:00 So we want to keep them minimal and big. 20:28:39 do u think that the attributes need to be ran at runtime (while executing the flow) or before (during construction) 20:28:55 something like if 'xattribute' use this flow; otherwise use that one 20:29:02 I'm assuming they would be static. 20:29:16 You aren't going to change your method of doing snapshots on the fly. 20:29:20 sure 20:29:36 That's probably an important thing to state, however. 20:29:47 ya, runtime makes it alot harder :-P 20:29:56 ahead of time, not so bad 20:30:12 Incredibly hard if the backend things it can change dynamically and the taskflow assumes it is static. 20:30:32 ya 20:31:19 There are several storage scenarios, but what I think is best is to concentrate on backing a volume up to an object as the first, get it right, and then generalize. 20:31:28 agreed 20:31:31 so how would these attributes affect taskflow i guess is my question, if the attributes are ahead-of-time determined, then the thing that uses taskflow can look at those attributes and form a 'set' of tasks using those attributes? 20:31:52 or do u imagine something else there? 20:31:55 yes, a big outer if statement. 20:32:20 the way i could see it affecting taskflow is if u have something like the following 20:32:22 I think we can have 2 or 3 strategies, and deal with hundreds of backends. 20:32:36 def volume_get_flow_for_snapshot() 20:32:40 this would then return a flow 20:32:51 and then the outter thing would analyze that returned flow for 'attributes' 20:33:02 but it seems like the outter if statement wouldn't need that 20:33:35 so would taskflow i guess be the thing responsible for just holding the attribute -> flow mapping 20:33:51 simialr to #link https://review.openstack.org/#/c/43051/ 20:33:52 Well, to cite one specific example, you have storage backends with relatively low cost snapshots and those with heavy cost snapshots. 20:34:10 sure 20:34:26 With cheap snapshots, you do a backup by taking an anonymous snapshot, backing the snapshot up and then deleting it. 20:34:40 The state of the volume is not impacted by your backing it up. 20:35:03 With expensive snapshots, you want to quiesce the volume (putting it in a 'backing-up' state) and copy from the volume itself. 20:35:22 But that's two strategies, not one strategy for each vendor. 20:35:26 right 20:35:27 gotcha 20:35:43 so u thinking that taskflow could provide that attribute -> strategy layer (?) 20:35:48 or should that recide in cinder 20:35:49 The trick is to provide progress in either case. 20:36:19 sure 20:36:32 which taskflow can do since its hookin in at that 'level' 20:36:37 It will vary. I think that one can be done with clever coding of the backup_volume method. Others will require a more outer switch. 20:37:01 sure 20:37:21 BTW, everything we're discussing with Cinder would apply to Manila (NFS shares for opensgtack) if that project is approved. 20:37:34 of course, i think it applies elsewhere also :) 20:38:12 Anywhere you are dealing with long running operations and differring implementation strategies. 20:38:12 just trying to distill i down to how it might look in taskflow :) 20:38:21 *it down 20:39:05 sure, so maybe it starts off with that attribute -> strategy object, this object can be queried given a bunch of attributes (and gives back the strategy) 20:39:28 then cinder can register its attributes/strategies 20:39:43 and tell taskflow to go execute the strategy with XYZ attributes 20:40:00 I'm not an expert on python style, so I'm very flexible about exact coding representation. 20:40:05 :) 20:40:28 does the general idea though seem to be what u are aiming for? 20:41:15 Yes, plus creating some sort of abstraction that is inclusive of a normal task and something running on a server not under openstack control. 20:41:40 Both would report progress, but you wouldn't be able to control priorities/etc for an external "activity". 20:41:57 So I suppose it's a subset interface. 20:42:10 sure, i think #link https://review.openstack.org/#/c/46331/ is part of that second one 20:42:36 so 20:42:40 although priority control taskflow currently doesn't do (although its an interesting idea), ha 20:42:40 what's goin' on ? 20:42:51 hi ekarlso 20:43:02 harlowja: yes. That combines the progress reporting too. 20:43:14 ekarlso just chatting about https://wiki.openstack.org/wiki/Cinder_Backend_Activities 20:44:03 But none of us (Nexenta) are really expert at core Nova compute stuff. If we're doing things in a way that is awkward do not be afraid to tell us what would be a more normal interface. 20:44:14 i think it seems like an ok interface 20:44:27 its a generally set of useful concepts i think :) 20:45:19 caitlin56 let me write up a blueprint for this in taskflow, and see if i captured it correctly 20:45:21 sound ok? 20:45:21 And I think it would even take as far as having taskflow that managed failovers between hot standbys -- without requiring each vendor to code that. 20:45:35 sounds good. 20:46:03 #action harlowja writeup blueprint with distilled taskflow (work idea) for https://wiki.openstack.org/wiki/Cinder_Backend_Activities 20:46:21 managed failovers with hot standbys, hmmmm 20:46:44 that requires the persistence stuff though, to know where to 'pick up last' 20:46:58 correct? 20:47:02 correct. 20:47:18 k, np, thats being polished as we speak 20:47:33 The strategy I favor is incremental snapshots, and cloning the volume from the most recent snapshot. 20:47:52 But you can also do a continuous transaction feed. 20:48:03 One policy - multiple implementations. 20:48:09 sure 20:48:26 thats the openstack way :-P 20:49:21 cool, thx caitlin56 for discussing, i'll try to see if i can write what might need to be done up :) 20:49:30 and pass it by u 20:49:34 *and others* 20:50:01 #topic open-discuss 20:50:21 anything anyone wants to talk about that i missed?? :) 20:50:41 thx changbl for helping out with some reviews 20:50:46 np harlowja 20:50:48 what about some crazy ideas? #link https://blueprints.launchpad.net/taskflow/+spec/eliminate-patterns 20:50:48 maybe out of scope, I have one question: 20:50:58 oh crazy ideas, haha! 20:51:51 i like the crazy ideas, although i want to make sure jessica doesn't hate us to much by changing that while she is still working on the big distributed engine piece 20:51:55 generalize linear_flow to be graph_flow? 20:52:02 lol 20:52:15 changbl i think u have the same crazy idea as melnikov :-P 20:52:28 i have one question on concurrency though 20:52:54 sure 20:52:59 so say I have flow1 which touches some resources, and so is flow2, and I execute them simultaneously 20:53:13 k 20:53:21 any way to guarantee flow1 and flow2 does not touch the same resources? 20:53:24 by openstack? 20:53:32 not without https://wiki.openstack.org/wiki/StructuredWorkflowLocks 20:53:51 *which doesn't exist yet 20:54:17 so basically there is no concurrency control yet? 20:54:23 file level concurrency control 20:54:35 some attempts at database level locking used for concurrency control 20:54:59 nova and cinder use a 'state' field in the DB to try to not squash themselves 20:55:05 and file level locks 20:55:14 caitlin56 i think has experience with the cinder one 20:55:24 *since its problematic for her i think 20:55:37 ok, go the DB part. still confused with the file-level one... 20:55:56 can you illustrate more file-level one? 20:55:59 nova locks a file to make sure its simulatenously mutating a hypervisor 20:56:03 *a vm i mean 20:56:16 *to make sure its not simulatenously mutating a vm 20:56:18 ok, got it 20:56:24 thanks 20:56:32 sure, its not ideal imho 20:56:57 i was hoping for something like that above wiki, and a way for tasks to 'define' what resources they are using, and taskflow would lock them 20:57:05 and unlock them 20:57:28 and u could use different locking implemenations (up to the deployer) 20:57:33 yes, in our TROPIC paper, we built some kind of lock manager 20:57:36 ya 20:57:54 although u guys did a much more exhaustive lock analysis :-P 20:58:01 :) 20:58:16 in openstack, if taskflow can just attempt to do resource locks, that will be a good start, ha 20:58:24 and tasks can declare what resources they will touch 20:58:24 borrowed some ideas from DB locking 20:58:31 ya 20:58:44 Avoiding locks is even better, but with current DBs itmight be the only way 20:58:57 so full on tropic stuff is hard i think with openstack, but something in the middle might be doable 20:59:23 so thats my thinking anyway changbl 20:59:29 * harlowja i did read your guys paper :) 20:59:34 thanks harlowja 20:59:38 :) 20:59:42 ok, times up 20:59:42 eck 20:59:44 i think it is time 20:59:54 jump into #openstack-state-management if u want to talk more 20:59:58 #endmeeting