18:01:44 #startmeeting trove 18:01:45 Meeting started Wed Jan 15 18:01:44 2014 UTC and is due to finish in 60 minutes. The chair is hub_cap. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:01:46 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:01:48 o/ 18:01:49 The meeting name has been set to 'trove' 18:01:51 o/ 18:01:53 so um 18:01:53 o/ 18:02:01 o/ 18:02:03 we had one item on the agenda but 18:02:03 o/ 18:02:04 #help 18:02:10 awe... 18:02:13 our meeting agenda is nil 18:02:15 o/ 18:02:29 since grapex isn't here to talk to it it's not fair to debate and decide without him 18:02:31 o/ 18:02:38 o/ 18:03:15 o/ 18:03:33 hello 18:03:49 so what shall we talk about! 18:04:00 o/ 18:04:08 I had one thing I was about to bring up on the email list but maybe we can discuss it here first? 18:04:08 Incremental backups? 18:04:14 did you see that ludicrous display last night? 18:04:19 o/ 18:04:31 jimbobhickville: lets do 18:04:41 https://wiki.openstack.org/wiki/Trove/scheduled-tasks 18:04:53 go for it jimbobhickville 18:05:03 there's part of the spec about sending success/failure notifications to customers in relation to scheduled tasks like backups 18:05:24 #topic task notifications 18:05:33 I think really trove should just emit a message indicating the information and let an external program like ceilometer consume the message and handle the notification part 18:05:37 datsun180b: the thing about arsenal is they always try to walk it in 18:06:00 jimbobhickville: +1 18:06:26 jimbobhickville: notifications sound great 18:06:52 mqaas has BP for notifications 18:06:56 yes let's do 18:07:03 #link https://github.com/openstack/trove/blob/f18e0936264b6dd96ddacc7594b661b50de1729f/trove/taskmanager/models.py#L66 18:07:14 we have a mixing for sending events 18:07:34 that's basically it, just wanted to run it by the group before I ripped out all the logic for storing the notification data in the database in my branch 18:07:43 jimbobhickville: ++ to just emitting a msg 18:07:44 kevinconway: for usage correct 18:09:04 jimbobhickville: heh 18:09:08 rip that shiz out 18:09:11 alrighty, I'll update the blueprint later today to reflect the desired functionality 18:09:14 yep agreed, notifications should just be emittted, no need to store them 18:09:15 trove dont care about storing it 18:09:15 i think adding a new message makes sense 18:09:35 cp16net: too late for input its been decided (i kid) 18:09:35 guys, gate is broken ... 18:09:38 +1 to not storing notifications. 18:09:45 i just find out it 18:09:47 ahhh 18:09:47 denis_makogon: is that worth a topic? 18:09:50 denis_makogon: What do you mean, do you have a link? 18:10:02 denis_makogon: maybe put that in the #openstack-trove channel guys 18:10:07 so we can do meeting here 18:10:31 sorry for off-topic 18:10:35 np 18:10:50 so jimbobhickville you good to go? 18:11:13 yep, I'm good 18:11:22 ok awesome. 18:11:41 #topic open discussion 18:11:44 Will look into the gate as soon as the meeting is done. 18:12:36 I didn't get a chance to schedule the tempest meeting as yet. 18:12:39 Will do it this week. 18:12:43 wrt replication / cluster api.. we seemed to have agreement that replicaiton would be two separate calls, first to create the master, another to create a slave that points to the master 18:12:49 imsplitbit: is that the approach you were taking 18:12:55 So please keep an eye out on the mailing list for that. 18:13:40 and is Heat a requirement for implemention master/slave 18:14:00 vipul: yes 18:14:00 or were you going to just add that logic to Taskmanager 18:14:06 this is what we discussed 18:14:17 and is the assumption I am operating on 18:14:39 vipul: imsplitbit i hate to say it 18:14:52 but im on the fence still in some sort (if we can consolidate it) 18:14:59 i think itll be easier once one of the apis is created 18:15:21 but imsplitbits assumption is valid for now 18:15:25 :) 18:15:49 hmm ok.. we might pick up the implementation soon so maybe we'll need to adjust it during review 18:15:54 vipul imsplitbit can we get a rundown of the assumptions and talking points made this past week wrt clusters/topologies? 18:15:54 what about Heat? 18:16:07 (over mailing list preferably) 18:16:38 Sure we can do that 18:16:44 one quick note here is I am *not* writing a datastore impl 18:16:56 I am hooking in metadata into the instance object 18:17:29 Ok, so purely an API change -- no code to actually look into the metadata and act on it 18:18:23 correct 18:18:40 then I'll hook in the validation for the replication contract descriptive language 18:18:53 at that point someone can write a datastore impl 18:19:40 imsplitbit, of course 18:19:58 it IH release we could have mongo and mysql replicas 18:20:26 so is there a stance on Heat for provisioning? should all new provisioning code go through Heat, and only support a single code path 18:21:04 denis_makogon: that might be a stretch :) 18:21:07 I would say that for better or worse heat is intended to do all that stuff 18:21:22 I've had a few convos with heat devs 18:21:31 vipul to be clear, when you say provisioning you're only referring to creation/launching, not the modification of, correct? 18:21:31 heat should be doing the provisioning work 18:21:35 all so far have asked "now wait... *why* are you using heat for that?" 18:21:37 vipul, i assume that we would use only heat 18:21:52 amcrn: I would assume modification of the stack as well, not necessarily of an instance 18:22:00 imsplitbit: we dont have a choice 18:22:06 but that's not within the realm of things I care enough to even investigate to formulate an informed desicion 18:22:10 vipul: amcrn yes to grow/shrink a stack 18:22:11 amcrn: so stack-create may create 1 master node.. if we go with the 2 api call approach 18:22:14 i'm working on heat and search for missing funcs for trove 18:22:15 hence my previous statement hub_cap :) 18:22:19 then we'd do stack update to add slave 18:22:26 but first we need to solve resize + heat + trove it hink 18:22:35 hub_cap: yep 18:22:37 how will slave promotion work? 18:22:41 stack create could handle instance group provisioning 18:22:43 vipul thanks for the clarification 18:23:03 both manually, and automated, in the event of a failed master 18:23:12 demorris: I believe there shoudl be an API for that 18:23:34 removal of the metadata "replicates_from" on a slave instance would promote it 18:23:45 automated... 18:23:46 vipul: cool 18:23:52 but that is a bit obscure ihmo 18:24:11 vipul: correct 18:24:24 vs a call that says promote or promotToMaster and removes the replicates_from 18:24:27 we would likely need another component 'observer' or something that watches the states reported (or not reported) by guest agents 18:25:14 there are better ways to do this.. by introducing 3rd party things that would watch a master/slave pair 18:25:32 but in the context of Trove, we probably would have to rely on the status of the guests themselves 18:25:43 vipul: yes 18:25:43 it would be more polling, but don't knkow of a better way 18:26:09 the health or replication, wrt mysql, is something that *must* be queried actively from the replicants 18:26:21 vipul: is this so that we can watch the status to ensure that replication is healthy? 18:26:29 right 18:26:29 therefore it is data that would have to bubble up from the guest agent 18:26:34 or a client connection 18:26:45 I agree with demorris that we should have a convenience API to make the process easier to grok 18:26:48 yea lets get a clustering api working before we solve this guys :) 18:26:49 could you not add replication/cluster health to the heartbeat? 18:26:58 I don't disagree with making it easier 18:27:03 so it's likely that most of the time when there is a failure, there won't be a status from the guest heartbeat.. absense of a heartbeat 18:27:07 no, we must solve everything and do nothing 18:27:10 :P 18:27:13 but before we build the lego castle we first need the lego no? 18:27:34 which will be difficult to determine why a heartbeat was missing, was it cuz gueatagent died... was it cuz instance died.. 18:27:41 i am fine with get manual promotion working first 18:27:51 vipul: wouldn't both indicate a problem with the node though? 18:28:07 kevinconway: You could, but I think the observer needs to be actively observing it to automate any slave-promotion scenarios (on an unhealthy master, for instance). 18:28:08 kevinconway: sure, but guestagent being down doesn't mean mysql is down 18:28:22 kevinconway: yes but not necessarily a problem with replication 18:28:28 but if the guest is down then we have no control over the db or the instance anymore 18:28:30 kevinconway: it could lead to premature slave promotions 18:28:36 shouldn't it be pruned from the set for that reason? 18:28:44 vipul: +100000000 18:29:23 kevinconway: yea that's one way to look at it.. we can't actively manage it.. so maybe a promotion is justified.. but not a good experience for the user 18:29:29 can we call it a rogue master. is that an oxymoron? 18:29:52 so a shutdown message that came to conductor 2s ago, which has been shutdown for the last few msgs, indicates mysql is failed, act upon it, via the guest 18:30:02 no communciation to conductor means the guest is down, there is nothing we can do 18:30:13 so i dont think we need to program to #2 yet 18:30:20 agreed 18:30:20 lets focus on #1 when we go down this route 18:30:26 agreed 18:30:33 I can't wait until we're circularly promoting slaves infinitely 18:30:35 :) 18:30:38 +1 to focusing on the API first. 18:30:47 imsplitbit: hehehe yes, yes it will 18:30:48 since we have heartbeat timeout we could totally say that guest is down 18:31:32 i was hoping heat would be able to to act on this kind of stuff... but i don't think it's possible to feed in our states 18:31:39 imsplitbit: if we promote them at different intervals we can eventually detect a cycle, right ;) 18:31:44 w/o sending those through ceilometer or something 18:32:38 anywya, since heat is something we'll need, there is a bunch of stuff missing in the current heat integration 18:32:42 like keeping track of the 'stack id' 18:32:46 there are definitely valid rules that need to be wrapped around this yet. I just want to make sure that we first have the ability to accurately describe that one instances data is being replicated to another before we try to solve how you stop a rogue master and circular replication slave promotion :) 18:33:12 vipul, i'm working on that (traking stack id) 18:33:34 imsplitbit: agree, good to know we're all on the same track 18:33:40 definitely 18:33:42 denis_makogon: cool 18:33:47 there is a plan to keep trove simple and make heat quite smart 18:34:02 there are already bunch of BPs for that 18:34:20 denis_makogon: sure, i think there are integrations with a lot of different things required to get to that point (i.e. Ceilometer) 18:34:51 so what of the API discussion with replication and slave promotion..are Greg and I the only ones who think that an explicit API call is needed for these, rather than just setting or modifying metadata? 18:34:54 #link https://blueprints.launchpad.net/heat/+spec/search-stack-by-resource-id 18:35:01 #link https://blueprints.launchpad.net/heat/+spec/search-resource-by-physical-resource-id 18:35:59 demorris: i don't think a separate call should be introduced.. it really is an update to an existing resource 18:36:25 it would just be a convenience method to make it simpler for users 18:36:33 we may want to add to the metadata.. and explicitly introduce a 'type' = slave or something 18:36:44 to be honest, without a specification or quick walkthrough via bp/wiki, it's hard to objectively say one approach is better than the other; the devil is in the details. 18:36:46 I get skittish about "convenience" methods 18:37:08 in the current design, how does it work if I have a single master and I want to add a slave? 18:37:12 amcrn: https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API 18:37:13 ok, I don't. I think making the API easy to use is the primary concern 18:37:14 but we should first examine how the existing proposal is hard and see if we can simplify that 18:37:17 amcrn: at the very bottom 18:37:28 jimbobhickville: i think "convenience method" could also be called "application" 18:37:31 vipul: isn't this outdated? 18:37:45 I have a single instance and its my master and I want to create a slave of it, but no other instances exist 18:37:46 demorris: the current proposal is to change the metadata of the resource 18:37:56 that triggers the needful 18:38:03 imsplitbit, amcrn: this is the only one i know of https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API -- is this what you're working off of? 18:38:11 vipul: yep 18:38:23 can I create a new instance and have it be a slave of the master in a single operation? 18:38:23 if there is anything that is missing then lets iron it out 18:38:29 or do I have to do multiple operations 18:38:43 cause if this changes again I may commit seppuku 18:38:57 demorris: it's a single operation to create a slave instance and have it start replicating 18:38:58 i think it should be an atomic operation for the user 18:39:00 lol imsplitbit 18:39:09 vipul: okay cool 18:39:12 demorris: it's 2 operations to create a master+ slave 18:39:52 vipul: maybe i've just had these conversations in my head (quite possible), but i swear i remember talking about things like replicates_from being non-sensical from a NoSQL perspective, and a couple of other things 18:40:21 i mean the examples still have service_type vs. datastore, it's very mysql-centric 18:40:25 amcrn: it may for some datastores yes? 18:40:42 how does that not apply to say redis? 18:40:45 sorry, I thought we were talking about promoting a slave, and it sounded like it was two separate operations to promote the slave and demote the master, and I thought a single atomic convenience method made sense for that operation 18:40:46 or postgres? 18:40:50 does mongo only do master-master replication? 18:40:51 perhaps redundant 18:41:11 right, but the api schema should not wildly change depending upon the datastore; it's possible to construct terms/structures that are fairly agnostic 18:41:21 kevinconway: mongodb does replica sets 18:41:30 I don't think the verbiage was ever accepted 18:41:47 I was gonna hook in metadata and then ML the replicaton contract verbiage 18:41:51 what was the suggested alternative? i don't remember the details of that 18:42:00 I don't know that there were vipul 18:42:10 I remember hearing that replicates_from didn't make sense 18:42:49 but there needs to be a way to describe a bidirectional replication contract imo 18:42:52 imsplitbit: just to make sure: so i can take that wiki as the source of truth for the current state of the proposal, and iterate on it (by sending thoughts to ML, etc.)? 18:43:12 that is the basis on which I am operating amcrn 18:43:17 amcrn: the bigger the changes the better 18:43:18 IIRC, the master-master replication scenarios for other datastore types were going to be handled by the clustering API (using a master-master cluster type) 18:44:18 SlickNik: I don't know that that is a great idea but I'm not prepared to argue it right now :D 18:44:22 also it is metadata, we need to figure out how much flexibility we want to add to that 18:44:31 vipul: +1 18:44:56 because i don't think there will be a verb that makes sense for all use cases 18:44:57 imsplitbit: I'm not arguing for it either. It was just something that was brought up in a different conversation earlier. 18:44:58 so *no* work has been put into the replication contract language. the proposal I believe includes some jumping off points 18:45:11 vipul imsplitbit i remember bringing up handling metadata like glance (w/ its pros and cons) 18:45:13 SlickNik: possibly I don't really recall 18:45:37 amcrn: imsplitbit any changes to the wiki should be ok'd by the group at this point 18:45:49 yeah for sure 18:45:52 amrith: u following this? :) 18:46:00 imsplitbit: is that a DSL for replication? i feel like i missed that bit. 18:46:02 alright, cool, glad to see this is gaining some mindshare/traction again, let me reacquaint mysql with the wiki and start collecting my thoughts. 18:46:12 lol, mysql = myself* 18:46:17 amcrn: i will dig out the email thread you had started, must have missed some of this stuff 18:46:17 we are following along 18:46:20 kevinconway: we have said we lack sufficient discussion on the dsl 18:46:24 so that is forthcoming 18:46:25 imsplitbit: I'm perfectly fine with not handling that until we have a working replication API for the mysql datastore. 18:46:28 amcrn: lol at that slip 18:46:42 lol amcrn 18:46:46 imsplitbit: likes to say dsl cuz it sounds like dsal 18:46:54 imsplitbit: I hear HEAT has a great DSL for describing configurations of things. 18:47:07 hahaha kevinconway 18:47:13 I don't know why we're doing anything, just tell heat to do it! 18:47:15 :) 18:47:20 kevinconway, somehow yes 18:47:39 ok so.. do we all have a understanding of whats going on w/ clustering currently? 18:47:46 replication yes 18:47:50 hub_cap: yes ... 18:47:51 until heat guys will say DIY 18:47:53 clustering we haven't even talked about 18:47:53 it sounds like we need to totally redo the wiki 18:47:57 imsplitbit: :O 18:48:09 semantics ;) 18:48:14 but I believe we've sync'd back up now on metadata and replication 18:48:49 we should try to nail this stuff down soon.. so folks can get some of the cooler things added to trove 18:49:20 vipul: +1 18:49:26 I should be done with metadata soon 18:49:32 hooking it into SimpleInstance now 18:49:42 s/hook/hack 18:49:47 for sure 18:49:49 :) 18:49:51 :P 18:50:00 then we can tear that apart and make it right 18:50:07 and then work on replication 18:50:12 ok so anything not related to clustering/replication? 18:50:23 SlickNik: did u mention a testing update? 18:50:28 man and I wasn't even *on* the agenda 18:50:32 vipul: looking at the document (https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API) referenced above, I think there's a serious set of issue around the fact that fundamentally Trove wishes to support both RDBMS and NoSQL DBMS. And each one has a different interpretation of "which way is up" 18:51:12 amrith: Yea we do want to create an API that supports both types of datastores, which is why this is much harder to do than it seemed 18:51:19 we should probably drop mongo support… denis_makogon 18:51:20 hub_cap: I'm still waiting for a couple of patches to merge into infra. 18:51:21 amcrn: heh 18:51:34 amrith, it's funny but NoSQL Cassandra perfectly fits to replication design)))) 18:51:40 kevinconway: :) 18:52:13 hub_cap: I've gotten started on writing basic services for trove in tempest. 18:52:21 kevinconway, yes, sounds like, oh way, it sounds like bad idea, lol 18:52:40 there are lots of similarities in describing clusters of NoSQL and SQL data stores, I don't see that as the big issue on the surface from an API perspective 18:52:58 #action: SlickNik to schedule some time to talk about tempest testing in openstack-trove and send an update out on the ML 18:53:13 good idea 18:53:14 That's all I have for now. 18:54:13 i've got a few things on making devstack/redstack easier for new contributors, plus getting a management-cli out in the public and consumable; but that's for next week. 18:54:32 amrith: i think we have to address your concerns outside this meeting 18:54:39 denis_makogon: I don't understand the proposal for repl fully yet so will get back to you later. 18:54:51 hub_cap: I have a lot to understand of the proposal 18:55:05 will do as you recommend; outside mth 18:55:15 s/mth/mtg/ 18:55:16 amrith: i think we all have a lot to discover about the proposal 18:55:31 and at best its a proposal currently hehe 18:56:48 are we done for the meeting ? 18:57:45 maybe we should talk about the specifics of clustering api with our remaining 2.5 minutes 18:58:05 kevinconway: I will find you 18:58:09 :) 18:58:09 LOL kevinconway 18:58:36 #topic make kevinconway our official master of ceremonies 18:58:53 concur! 18:58:54 lol 18:58:59 whoops that was supposed to be an action 18:59:02 not a topic change 18:59:03 HAHAHAAH 18:59:14 ok were done 18:59:22 #endmeeting