20:00:15 <hub_cap> #startmeeting trove 20:00:16 <openstack> Meeting started Wed Jul 17 20:00:15 2013 UTC. The chair is hub_cap. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:19 <openstack> The meeting name has been set to 'trove' 20:00:21 <vipul> o/ 20:00:22 <djohnstone> o/ 20:00:23 <hub_cap> #link https://wiki.openstack.org/wiki/Meetings/TroveMeeting 20:00:28 <datsun180b> o7 20:00:35 <hub_cap> &o 20:00:42 <juice> o/ 20:00:58 <kevinconway> \o/ 20:01:12 <hub_cap> crap i put a bad link on the wiki :p 20:01:22 <hub_cap> #link http://eavesdrop.openstack.org/meetings/trove/2013/trove.2013-07-10-20.00.html 20:01:30 <hub_cap> #topic action items 20:01:49 <hub_cap> not many AI's. SlickNik is not around? 20:01:58 <hub_cap> vipul: get a chance to do any more wikifying? 20:02:11 <vipul> hub_cap: No, didnt spend any time on this one 20:02:17 <vipul> likely an ongoing thing 20:02:28 <SlickNik> o/ 20:02:31 <grapex> o/ 20:02:31 <hub_cap> kk, lets action item it again 20:02:32 <imsplitbit> o/ 20:02:40 <vipul> #action Vipul to continue to update reddwarf -> trove 20:02:40 <hub_cap> SlickNik: hey, yer up. initial stab @ dev docs 20:02:40 <pdmars> o/ 20:02:42 <hub_cap> i saw something 20:02:47 <hub_cap> can u link the review? 20:02:57 <SlickNik> yeah, one sec. 20:03:20 <SlickNik> #link https://review.openstack.org/#/c/37379/ 20:03:31 <hub_cap> SWEET 20:03:40 <hub_cap> good work. 20:03:46 <hub_cap> anything else to add wrt the action items? 20:03:57 <SlickNik> I've taken the initial info from the wiki and the trove-integration README. 20:04:14 <grapex> SlickNik: Nice! 20:04:36 <SlickNik> Once that's approved, I can turn on the CI-doc job that builds it. 20:04:40 <hub_cap> :) 20:04:40 <vipul> thanks SlickNik 20:04:49 <hub_cap> lets get that done then!!! ;) 20:04:55 <SlickNik> And then I need to contact annegentle to add the link to the openstack site. 20:05:13 <hub_cap> okey moving on then? 20:05:16 <SlickNik> yup. 20:05:23 <hub_cap> #topic h2 milestone released 20:05:27 <hub_cap> #link https://github.com/openstack/trove/tree/milestone-proposed 20:05:28 <hub_cap> WOO 20:05:33 <datsun180b> WOO 20:05:34 <hub_cap> they will cut it i think, thursday? 20:05:39 <SlickNik> w00t! 20:05:44 <konetzed> \o/ 20:05:51 <hub_cap> #lnk http://tarballs.openstack.org/trove/ 20:05:53 <hub_cap> doh 20:05:55 <hub_cap> #link http://tarballs.openstack.org/trove/ 20:05:58 <hub_cap> there we are 20:06:04 <vipul> woah look at that 20:06:18 <datsun180b> Did you see all those issues marked as Released by Thierry C? 20:06:25 <hub_cap> yes i did 20:06:26 <SlickNik> yup :) 20:06:28 <hub_cap> cuz i get ALL of them ;) 20:06:38 <hub_cap> we can move critical bugs back to h2 if we need to 20:06:41 <hub_cap> i suspect we wont 20:06:49 <hub_cap> since no one is really gonna deploy it 20:07:01 <hub_cap> its more just to get us understanding how things work around here 20:07:06 <SlickNik> I don't know of any critical bugs, atm. 20:07:10 <hub_cap> Aight enough w/ the glass clinking, time to move on 20:07:20 <hub_cap> feel free to view the links 20:07:26 <hub_cap> #link https://wiki.openstack.org/wiki/GerritJenkinsGithub#Authoring_Changes_for_milestone-proposed 20:07:31 <hub_cap> #link https://wiki.openstack.org/wiki/PTLguide#Backporting_fixes_to_milestone-proposed_.28Wednesday.2FThursday.29 20:07:35 <hub_cap> if u want to know more about the process 20:07:46 <hub_cap> #topic Restart mysql 20:07:55 <hub_cap> doh forgot the word test 20:07:57 <hub_cap> #link https://github.com/openstack/trove/blob/master/trove/tests/api/instances_actions.py#L256-262 20:08:04 <hub_cap> lets spend a bit of time discussing the validity of this 20:08:10 <hub_cap> and then spend the rest of the time on replication 20:08:30 <hub_cap> SlickNik: all u 20:08:48 <SlickNik> So, I agree with grapex that we need a test to validate what the guest agent behavior is when mysql is down. 20:09:14 <SlickNik> But I think that that's exactly what the mysql stop tests are doing. 20:09:37 <hub_cap> link? 20:09:41 <vipul> #link https://github.com/openstack/trove/blob/master/trove/tests/api/instances_actions.py#L320-L324 20:10:11 <grapex> SlickNik: The only major difference is that explicitly tells the guest to stop MySQL, versus letting the status thread do its thing 20:10:40 <hub_cap> as in, we are testing the periodic task does its job? 20:10:44 <vipul> right but isn't the status thread still the thing that's updating status 20:10:51 <vipul> it's just a different way of stopping mysql 20:11:02 <vipul> one is explictly other is by messing up logfiles 20:11:20 <grapex> vipul: True, but the stop rpc call also updates the DB when its finished 20:11:32 <datsun180b> and that ib_logfile behavior is very deliberately for mysql, right? 20:12:05 <grapex> SlickNik: Can you give another summary of the issue the test itself is having? 20:12:23 <grapex> Isn't it that MySQL actually can't start up again when the test tries to restart it? 20:12:36 <SlickNik> grapex: when we corrupt the logfiles, mysql doesn't come up. 20:12:48 <SlickNik> the upstart scripts keep trying to respawn mysql since it can't come up. 20:13:01 <grapex> SlickNik: Does the reference guest not delete those iblogfiles? 20:13:17 <vipul> i think the tests do 20:13:30 <datsun180b> that sounds right 20:13:33 <hub_cap> correct 20:13:45 <SlickNik> grapex: not delete; but mess up so that they are zeroed out. 20:13:56 <hub_cap> so teh difference is 20:14:06 <hub_cap> 1 test stops mysql, the other kills it behind the scenes 20:14:07 <SlickNik> Now since upstart is trying to bring mysql up, it has a lock on the logfiles. 20:14:16 <hub_cap> the latter test waits for the periodic task to signal its broken 20:14:23 <hub_cap> the former test updates the db as part of the stop 20:14:24 <hub_cap> ya? 20:14:24 <grapex> So Sneaky Pete actually wipes the ib logfiles. Maybe that's something the reference guest should do? 20:14:41 <grapex> It does it as part of the restart command 20:14:57 <hub_cap> lets first try to figure out if the tests are truly different 20:15:10 <hub_cap> and then once we agree it needs to stay (if it does) we can think about solutions 20:15:11 <grapex> Well that one also makes sure the iblogfiles are wiped 20:15:30 <vipul> grapex: won't that mean mysql can start again? 20:15:40 <grapex> vipul: Yes. 20:15:43 <SlickNik> So there's also the point that this test takes about ~4-5 mins. 20:15:55 <vipul> then this test will fail, because the test expects that it cannot start 20:16:50 <SlickNik> So one question is that do we think that this 1 scenario (which isn't all that different from the stop tests) warrants an extra addition of ~4-5 minutes on every test run? 20:17:06 <hub_cap> if it tests something different i think its warranted 20:17:09 <SlickNik> (in parens) = my opinion 20:17:26 <hub_cap> is exactly the same != isint all that different 20:17:29 <hub_cap> are they testing different things? 20:17:35 <grapex> I'm sorry, I misspoke about wiping the iblogfiles - that happens on resizes and other cases, not for restart 20:17:36 <hub_cap> thats what i want us to agree upon 20:17:54 <SlickNik> well, in either case we are testing for a broken connection. 20:18:00 <hub_cap> are we? 20:18:06 <SlickNik> And mysql not running is causing the broken connection. 20:18:07 <grapex> SlickNik: I disagree 20:18:14 <hub_cap> i thought the screw_up_mysql tests that the periodic task updates the db properly 20:18:24 <hub_cap> and the explicit stop tests that the stop updates the db synchronously 20:18:25 <grapex> I think also whether a test has value is a different question from whether we want to run it every single time as part of CI if it's impeding people 20:18:28 <hub_cap> is that not the case? 20:18:41 <SlickNik> grapex: what code path does the restart tests hit that the resize tests don't also hit? 20:18:53 <SlickNik> do* 20:19:03 <grapex> restart truly makes sure the status thread sees MySQL die and updates appropriately 20:19:16 <vipul> so the stop_db code seems to set that state = None 20:19:19 <vipul> self.instance.update_db(task_status=inst_models.InstanceTasks.NONE) 20:19:22 <hub_cap> correct 20:19:24 <grapex> stop is actually stopping it, so it updates the database as part of that RPC code path, not the thread 20:19:27 <vipul> Which means the status thread will set it to shutdown 20:19:55 <hub_cap> sure but it does taht based on different circonstances vipul 20:20:10 <hub_cap> 1) it checks the task is NONE vs 2) it cant talk to mysql, right? 20:20:34 <vipul> it checks the status is shutdown and can't talk to mysql 20:20:52 <hub_cap> ok 20:20:59 <hub_cap> does the other tests update the task to none? 20:21:03 <hub_cap> *test 20:21:13 <vipul> restart also sets it to None 20:21:15 <grapex> Also keep in mind, the Sneaky Pete tests actually sets the status to stop as part of that RPC call. If you're saying the reference guest doesn't, I'm not sure why it wouldn't 20:21:44 <hub_cap> ok weve got 4 more min on this and im gonna call it for now 20:21:46 <hub_cap> as undecided 20:21:51 <hub_cap> id like to discuss replication 20:22:00 <imsplitbit> +1 on that 20:22:07 <hub_cap> lol imsplitbit 20:22:12 <grapex> Well, I want to suggest something 20:22:18 <hub_cap> sure 20:22:21 <hub_cap> youve got a few min 20:22:23 <KennethWilke> hub_cap: i will gladly accept a gist link of the chat 20:22:25 <hub_cap> go! 20:22:25 <grapex> If this test is really being a bother, lets just take it out of the "blackbox" group but keep it in the code. 20:22:46 <hub_cap> KennethWilke: its logged, you can see it on http://eavesdrop.openstack.org/meetings/trove/ 20:22:48 <grapex> We run it at Rackspace all the time and I find it useful. It could still be run nightly or something. 20:22:52 <KennethWilke> hub_cap: ty 20:23:08 <SlickNik> grapex: I'd totally be fine with that. 20:23:09 <grapex> (nightly for the completely public Ubuntu / KVM / Reference guest code) 20:23:27 <hub_cap> ya but id argue we shouldnt remove it till we do the nightly different tests 20:23:32 <grapex> SlickNik: Maybe the solution is to create a second group called "nightly" which just has more test groups added to it 20:23:35 <grapex> hub_cap: Seconded. 20:23:36 <hub_cap> if it in fact does test somethign different 20:23:45 <juice> +2 20:24:16 <grapex> hub_cap: +1 20:24:17 <cp16net> +1 20:24:18 <datsun180b> i vote keep it, even if it means moving it 20:24:26 <hub_cap> which im still not sure it _does_ test something different at this point 20:24:32 <hub_cap> but lets move on 20:24:39 <hub_cap> i think we have a reasonable consensus to keep it but move it 20:24:54 <hub_cap> i don't want your goddamn lettuce 20:25:08 <hub_cap> moving on? 20:25:09 <grapex> hub_cap: Are you talking to a rabbit? 20:25:09 <vipul> need some more research to verify that it is indeed different 20:25:12 <vipul> can we just action it? 20:25:13 <hub_cap> yes vipul 20:25:16 <hub_cap> go head 20:25:40 <vipul> #action SlickNik, vipul to compare Stop test and Unsuccessful Restart tests to identify differences 20:25:44 <hub_cap> grapex: no. google it 20:25:53 <hub_cap> ok. repl time 20:25:59 <hub_cap> #replication :o 20:26:01 <SlickNik> hub_cap / grapex: I'll move it for now so that we don't keep hitting it on rdjenkins. I'll also look to see if we can fix the test so we don't run into the upstart issue (also the research that vipul actioned). 20:26:02 <hub_cap> lol 20:26:12 <hub_cap> #topic replication :o 20:26:21 <imsplitbit> let me relink 20:26:21 <hub_cap> +1 SlickNik cuz we will have to deal w/ fedora soon too ;) 20:26:25 <hub_cap> plz do 20:26:32 <imsplitbit> #link https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API 20:26:41 <imsplitbit> #link https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API-Using-Instances 20:26:49 <imsplitbit> hub_cap: go! 20:26:56 <SlickNik> thanks guys, on to replication! 20:27:08 <hub_cap> #all i wanted was a cheeseburger 20:27:15 <hub_cap> ok SO 20:27:22 <hub_cap> weve gone back and forth on this topic for a while now 20:27:29 <SlickNik> #all I got was a lousy T-shirt? 20:27:35 <hub_cap> lol 20:27:47 <hub_cap> 2 schools, /instances has ALL instances, some error conditions on things like resize 20:28:06 <hub_cap> or /instances and /clusters, and things move from /instances to /clusters when they get promoted 20:28:16 <hub_cap> and cluster nodes are never a part of /instances 20:28:18 <vipul> no demorris :) we can decide w/o him 20:28:23 <hub_cap> HA nice 20:28:29 <imsplitbit> he should be here 20:28:31 <demorris> vipul: u wish 20:28:34 <imsplitbit> just to stir the pot 20:28:36 <imsplitbit> :) 20:28:36 <vipul> boo 20:28:38 <hub_cap> NICE 20:28:40 <SlickNik> lol, he's lurking. 20:28:45 <demorris> always 20:29:07 <hub_cap> ok i was of the opinion that when we promote to /clusters we move the instance there 20:29:12 <hub_cap> so u can do things u shouldnt do on it 20:29:17 <hub_cap> as in, if its a slave, u shouldnt add a user 20:29:23 <hub_cap> or u shodlnt delete a master 20:29:42 <hub_cap> but after thinking about it for a while, there arent a LOT of failure cases for modifying /instances 20:29:50 <hub_cap> the only one i can think of is deleting a master 20:30:18 <hub_cap> u shoudl be able to add a RO user to a slave, u shoudl be able to resize a slave to something that might not be ok for the cluster 20:30:26 <hub_cap> the permutations for what u shouldnt be able to do are NOT small 20:30:35 <hub_cap> and are different for differetn cases of a cluster 20:30:41 <hub_cap> and different types of clusters 20:30:55 <hub_cap> hell they are probably close to infinite given differente circonstances 20:31:13 <hub_cap> so id rather keep things in /instances, and just limit very few cases for modifying an instance in a cluster 20:31:33 <hub_cap> if we find something that should _never_ be done, then so be it, we add it as a failure case in /instances 20:31:42 <vipul> so.. would /cluster has the same set of operations that /instances has (create user, add db, etc) 20:31:47 <hub_cap> no 20:31:55 <hub_cap> it would be helper for doing things to an entire cluster 20:31:58 <hub_cap> and thats it 20:32:11 <imsplitbit> create/resize/delete 20:32:12 <hub_cap> add db/create user 20:32:18 <hub_cap> we cant really define how a user will use a slave 20:32:29 <vipul> but they may not always be slaves right 20:32:31 <hub_cap> i had a extra db on slaves on some of my setups w/ urchin a while ago 20:32:38 <hub_cap> and different users 20:32:56 <vipul> you may have a galera cluster.. wehre the users / schemas will all be replicated across 20:33:00 <vipul> no matter which one you write to 20:33:02 <hub_cap> yes 20:33:09 <hub_cap> so given that case it doesnt matter where u write it 20:33:15 <hub_cap> so then we cant restrict it 20:33:20 <vipul> so why not write it to /cluster.. why do they have to pick one 20:33:22 <imsplitbit> or shouldn't 20:33:23 <hub_cap> there is no "master master" there 20:33:34 <hub_cap> because i want to add a RO user to slave 1 20:33:36 <hub_cap> how do i do that 20:33:40 <imsplitbit> vipul: I think there is a good case to add some helper things to /cluster 20:33:55 <imsplitbit> but it isn't needed to implement a cluster and support it 20:33:58 <SlickNik> So what's the ultimate reason for not doing this on the cluster but doing it on the individual instances? 20:34:08 <SlickNik> Duplication of code? 20:34:14 <hub_cap> duplication of schema 20:34:20 <hub_cap> and complication ot end user 20:34:35 <hub_cap> 1/2 my instances in one /path and 1/2 in /another seems very unintuitive 20:34:43 <imsplitbit> agreed 20:34:48 <hub_cap> i have a bunch of /instances, period 20:34:54 <hub_cap> at the end of the day that what they are anywya 20:35:09 <hub_cap> and vipul im not tryign to define what we can and cant do on /clusters 20:35:16 <hub_cap> im tryin to get consensus on where /instances live 20:35:19 <vipul> It seems like as we do auto-failover, etc.. we'd want to abstract the actual 'type' of instance away from the user.. so the user only sees a db as a single endpoint 20:35:47 <vipul> in a cluster.. you could see a single endpoint that's load balalcned also 20:35:48 <imsplitbit> vipul: if we do that then you have to separate replication from clustering 20:35:54 <imsplitbit> because they aren't the same 20:36:00 <hub_cap> :o 20:36:03 <imsplitbit> yet they share alot of functionality 20:36:29 <vipul> but is it that different? if we promote a slave to a master on behalf of the user.. and spin up a new slave for them 20:36:35 <hub_cap> we will still ahve list /clusters 20:36:40 <hub_cap> and u can show a single endpoint 20:36:47 <vipul> from the user's perpective it doesn't matter if it's a multi-master or single master/slave 20:36:57 <hub_cap> fwiw tho all clsutering apis dont use single endpoint 20:37:08 <vipul> agreed, we can't yet 20:37:09 <hub_cap> i believe tungsten uses its own internal code to determine where to write to 20:37:14 <grapex> I've got a question as the infamous No-NoSQL guy. 20:37:15 <cp16net> imsplitbit: as i am catching up on this feature those links of the API and API with instances are the 2 proposed plans we are debating? 20:37:16 <hub_cap> in its connector api 20:37:38 <imsplitbit> vipul: but you're assuming use, what if I have a db on one instance and I want to keep a spare copy of it warm on another host but also want to use that host as a db server for a complete different dataset? 20:37:40 <hub_cap> but again, if you list /clusters we can provide a single endpoint 20:37:47 <imsplitbit> cp16net: yes 20:37:54 <imsplitbit> hub_cap: exactly 20:37:56 <hub_cap> but if you _want_ you can enact on a slave / "other master" in /instances 20:38:06 <hub_cap> im saying dont remove them from /instances 20:38:11 <imsplitbit> if the cluster type supports a single endpoint then /clusters should return that information 20:38:12 <hub_cap> we can still totally do what vipul wants in /clusters 20:38:18 <hub_cap> you are essentially paying for every /instance 20:38:21 <hub_cap> so we shoudl show them 20:38:28 <hub_cap> even if u ahve auto failover 20:38:33 <hub_cap> u buy 2 or 3 or X instances 20:38:39 <hub_cap> and use one ip 20:38:52 <vipul> yea i think the instnace info should be visible.. but at some point in the future.. we may have a single dns entry returned or something 20:38:55 <demorris> i would separate out billing from it though 20:38:56 <hub_cap> if i was paying for 9 instances in a auto failover cluster, id like to see them all in /instances 20:39:12 <imsplitbit> vipul: and that will be returned with the cluster ref 20:39:15 <hub_cap> demorris: there is no billing in it, just providing a point from a customer point of view 20:39:18 <imsplitbit> if applicable 20:39:24 <hub_cap> vipul: we can do that, now even if applic... grr dsal 20:39:31 <demorris> hub_cap: k 20:39:33 <hub_cap> just say what i was gonna say why dont ya 20:39:43 <hub_cap> i got a can of these baked beans too 20:39:49 <konetzed> vipul: why couldnt you create a single dns entery returned for the cluster but still have dns for each instance like it is now? 20:40:01 <hub_cap> id want that 20:40:09 <hub_cap> cuz if i had to connect to instance X to clean it up manually 20:40:12 <hub_cap> id want to be able to 20:40:19 <imsplitbit> konetzed: I would think most people would 20:40:21 <vipul> konetzed: I guess you could.. but then the customer would end up breaking if they happened ot use one of the instance entries 20:40:26 <hub_cap> like auto-failover is ont working, let me get on node X to prmote it 20:40:38 <konetzed> vipul: you can only protect stupid so much 20:40:43 <hub_cap> HAH 20:40:43 <konetzed> :D 20:40:47 <cp16net> hah 20:40:51 <hub_cap> ya none yall proteced fro me 20:40:54 <vipul> this is really a question of how much do we hide from the user, so even if they are stupid they can use it 20:40:59 <hub_cap> why u think they moved me to cali 20:41:13 <hub_cap> sure vipul and i think we could concede on some of that 20:41:20 <hub_cap> tahts not set in stone 20:41:27 <konetzed> +1 20:41:29 <hub_cap> we could even rev teh api a bit when we have > 1 cluster 20:41:30 <hub_cap> SHIT 20:41:32 <hub_cap> im out of power 20:41:32 <SlickNik> Okay, so I guess it depends on what we're shooting for here. 20:41:40 <imsplitbit> dude 20:41:44 <imsplitbit> hub_cap: FAIL 20:41:44 <hub_cap> sweet found a plug 20:41:45 <konetzed> vipul: i think you will find enough arguments for each way 20:41:53 <vipul> agreed 20:42:05 <imsplitbit> well there's 2 types of users right? power users and button pushers 20:42:13 <imsplitbit> you need to find enough to facilitate both 20:42:16 <hub_cap> yes 20:42:24 <hub_cap> or provide a RBAC solution 20:42:31 <hub_cap> that allows the installer to decide 20:42:39 <SlickNik> If we're looking for a managed DB solution here that exposes a simple clustering API to the user, then I think that is probably better served by having a single endpoint for it. 20:43:04 <hub_cap> i think we are looking to provide a service that is extensible enough to do that 20:43:09 <hub_cap> _or_ allow the user access to all 20:43:10 <hub_cap> frankly 20:43:20 <hub_cap> we WILL NEVER be able to provide a fully turnkey solution 20:43:29 <hub_cap> otherwise someone else woudlve 20:43:32 <hub_cap> mysql is a tricky beast 20:43:34 <imsplitbit> SlickNik: no one is arguing against providing a single endpoint for users who want one 20:43:52 <hub_cap> we will always need to provide a way for a user or operator to get to any instance 20:43:55 <vipul> one thing to keep in mind is the more that we hide, the less the user can faak us up.. like break our ability to auto-failover 20:43:55 <SlickNik> But if we're talking about letting users do things like have an instance as part of a cluster, as well as able to connect to the db directly, there's no way of getting away from a complex clustering API with actions spread across /instances and /clusters 20:44:07 <hub_cap> actions yes SlickNik 20:44:13 <hub_cap> but entities, no 20:44:18 <hub_cap> thats the first line of agreement 20:44:23 <hub_cap> as long as we are all on the same page tehre 20:44:30 <hub_cap> it makes the api closer to concrete 20:44:43 <hub_cap> im sure we can, eventually, hide instances if we want to 20:44:50 <hub_cap> shown_to_user=False 20:44:53 <hub_cap> easy as pie 20:44:59 <vipul> or at least not allow them to operate on them 20:45:02 <hub_cap> lets solve the easy solution first 20:45:07 <hub_cap> sure vipul 20:45:09 <demorris> I always go back to some of this being up to the provider / operator of Trove and separating that out from what the API supports 20:45:10 <hub_cap> managed vms 20:45:17 <konetzed> i was just going to say sounds like were going down a rabbit hole 20:45:17 <hub_cap> we need that anyway for nova 20:45:29 <demorris> why can't each cluster type have a policy that dictates what can and cannot be done to the cluster or instances themselves 20:45:34 <hub_cap> cuz they can just muck w/ them in nova if you are using their user to prov instances ;) 20:45:39 <hub_cap> yes demorris RBAC 20:45:40 <vipul> demorris: +1 20:45:47 <demorris> if my policy says, individual operations are not support on /instacnes, then you don't allow it 20:45:47 <hub_cap> i said that like ~5 min ago 20:45:58 <vipul> it really is a deployment type of decision it seems 20:46:00 <esp> SlickNik: having a single endpoint might restrict users from building a system that reads from all nodes and only writes to one. 20:46:01 <hub_cap> lets just solve the easy solution first tho 20:46:05 <hub_cap> we are getting out of hand 20:46:10 <demorris> hub_cap: you know I can't follow every message in here…brain won't allow it :) 20:46:10 <hub_cap> we need to solve master/slave 20:46:15 <hub_cap> before we get to magical clsutering 20:46:22 <hub_cap> demorris: transplant ;) 20:46:36 <vipul> hub_cap: is master/slave /cluster then? 20:46:39 <hub_cap> we understand the set of actions in /clusters can grow 20:46:42 <hub_cap> thats fine 20:46:44 <hub_cap> yes 20:46:50 <vipul> ok 20:46:53 <hub_cap> but both isntances are avail via /instances 20:47:02 <imsplitbit> I don't like the use of the word clusters for replication because it implies too much 20:47:03 <hub_cap> and u can resize the slave down via /instances/id/resize 20:47:08 <imsplitbit> but we can't think of a better term for it 20:47:16 * hub_cap shreds imsplitbit with a suspicious knife 20:47:21 * hub_cap boxes imsplitbit with an authentic cup 20:47:21 * hub_cap slaps imsplitbit around with a tiny and bloodstained penguin 20:47:23 * hub_cap belts imsplitbit with a medium sized donkey 20:47:26 * hub_cap tortures imsplitbit with a real shelf 20:47:32 <imsplitbit> :) 20:47:40 <imsplitbit> I won't give up that fight 20:48:01 <imsplitbit> but I acknowledge that it doesn't need to be fought right now 20:48:12 <vipul> even though cluster is overloaded, it does fit even if it's master/slave 20:48:14 <vipul> imo 20:48:18 <hub_cap> does what i say make sense vipul? 20:48:22 <hub_cap> create master slave via /cluster 20:48:31 <hub_cap> resize both nodes cuz youre on oprah, /cluster/id/resize 20:48:33 <vipul> yep, makese sense 20:48:42 <hub_cap> resize indiv node cuz youre cheap /instance/id/resize 20:49:15 <hub_cap> create db on slave cuz u need a local store for some operation on an application /instance/id/db 20:50:01 <SlickNik> what about create db/user on master? does that go through /instance/id or /cluster/id? 20:50:02 <hub_cap> if u want to create it on all of the, create it on the master ;) 20:50:18 <hub_cap> u _know_ u have a master, why not let the user just do that 20:50:27 <hub_cap> this only applies for master/slave 20:50:44 <imsplitbit> hub_cap: I think that is the least prescriptive approach 20:50:47 <hub_cap> for what its worth 20:50:58 <vipul> right, but we should allow it to be created on the /cluster as well 20:51:00 <hub_cap> /clusters/id/resize is NOT going to be easy 20:51:07 <hub_cap> i have 9 instances 20:51:09 <hub_cap> 3 failed 20:51:11 <hub_cap> 1 is now broken 20:51:17 <hub_cap> the master just went down 20:51:17 <SlickNik> So is there a difference between create db on master vs create db on cluster? 20:51:19 <konetzed> fix it so it never fails 20:51:19 <hub_cap> what do i do 20:51:32 <hub_cap> konetzed: youre out yo mind 20:51:41 <SlickNik> i.e. if I do /instance/id/db CREATE, it is a local instance that will not get replicated? 20:51:45 <konetzed> hub_cap: the hp ppl didnt know that already 20:51:47 <SlickNik> on the master 20:51:56 <vipul> hub_cap: but that same scenario would exist if you did a single instance resize... where that one failed 20:52:02 <vipul> now the user is stuck.. 20:52:06 <vipul> cuz they have to fix it 20:52:14 <vipul> where as in /cluster/resize we'd fix it 20:52:18 <hub_cap> right but thats up to you to control vipul 20:52:27 <hub_cap> think about the permutations there vipul 20:52:28 <konetzed> SlickNik: i think user adds on the master would be replicated 20:52:34 <hub_cap> lets at least defer it 20:52:41 <hub_cap> till we see some real world scenarios 20:52:51 <hub_cap> id prever "acting" on clusters to come later 20:52:56 <SlickNik> konetzed: what about db adds? 20:52:56 <hub_cap> because its /hard/ 20:53:16 <konetzed> imsplitbit: arnt all crud operations done on the master sent to slaves? 20:53:20 <esp> resizing a cluster sounds like it might easier to migrate the data to a new cluster.. 20:53:41 <hub_cap> :P esp 20:53:41 <imsplitbit> konetzed: yes 20:53:43 <esp> rather than trying to resize each individual node if that's what we are talking about :) 20:53:49 <vipul> esp: that could be one way to do it.. 20:53:56 <hub_cap> create db will go to a slave if issued on a master 20:54:00 <imsplitbit> esp: maybe so but if the dataset is 500GB that may not be true 20:54:05 <SlickNik> imsplitbit: you can choose to replicate only certain dbs if you so desire 20:54:05 <konetzed> imsplitbit: so to answer SlickNik's question user and db adds all get replicated 20:54:13 <esp> if you asked me to individually resize a 9 node cluster I would scream at you. 20:54:33 <hub_cap> esp: even if 90% of the time it failed for you if u did /cluster/id/resize 20:54:39 <imsplitbit> esp: agreed which is why we would want to support doing a cluster resize 20:54:42 <hub_cap> taht means you would have to issue it 9 times anwyay 20:54:47 <hub_cap> and if one failed to upgrade 20:54:50 <imsplitbit> but hub_cap's point is it's not gonna be easy 20:54:51 <hub_cap> then u gotta downgrade the others 20:54:52 <esp> imsplitbit: I gotcha, doesn't cover all cases. 20:54:52 <hub_cap> double downtime 20:54:57 <hub_cap> right 20:54:59 <SlickNik> imsplitbit: so why should we allow extraneous dbs (outside the cluster) to be created on slaves but not on master? 20:55:03 <hub_cap> lets defer "Actions" to /clusters 20:55:06 <hub_cap> to get _something_ done 20:55:13 <hub_cap> to summarize 20:55:15 <hub_cap> we have 5 min 20:55:22 <hub_cap> instances are all in /instances 20:55:23 <imsplitbit> SlickNik: because it's a mistake to assume what a user will want to do 20:55:27 <konetzed> i think we need to get past resizes failing, because that has nothing to do with clusters 20:55:27 <hub_cap> u can enact on them indiv 20:55:30 <vipul> SlickNik: good point.. is this a valid use case even? i'm no DBA.. but why would you do that 20:55:37 <hub_cap> ok maybe no summary............ 20:55:48 * hub_cap waits for the fire to calm down b4 going on 20:55:51 <vipul> do DBAs create dbs on slaves... 20:55:56 <hub_cap> why not vipul 20:56:02 <imsplitbit> vipul: I have configured db setups for very large corporations in our intensive department and I can say it happens often 20:56:07 <hub_cap> yes 20:56:08 <vipul> because at any time, you'd promote that 20:56:09 <hub_cap> i have done it 20:56:16 <hub_cap> not necessarily vipul 20:56:19 <vipul> and you'd need to do it again on the new slave 20:56:25 <hub_cap> read slaves are not 100% promotion material 20:56:29 <hub_cap> theya re sometimes to _juist_ read 20:56:45 <demorris> you may just have a slave to run backups on 20:56:45 <hub_cap> we cant guaranteee everyone will use it the same way 20:56:48 <vipul> yea I get that.. but they are reading master data 20:56:52 <hub_cap> hence the need to _not_ be perscriptive 20:56:59 <hub_cap> ya and could be 10 minutes behind vipul 20:57:09 <hub_cap> ok lets chill it out 20:57:12 <hub_cap> let me summarize 20:57:12 <vipul> demorris: then the additional dbs you created are also backed up.. 20:57:14 <hub_cap> we have 3 min 20:57:19 <vipul> lol hub_cap 20:57:24 <hub_cap> or ill just decide w/o anyone elses input 20:57:29 <hub_cap> ill be the DTL 20:57:29 <SlickNik> hub_cap: you need a timer bot :) 20:57:33 <hub_cap> u can decide what the D means 20:57:41 <SlickNik> Guido van hub_cap 20:57:44 <hub_cap> summary 20:57:44 <vipul> if we have backup slaves.. should those additional DBs/Users be backed up? 20:57:54 <hub_cap> lets take indiv questions offline vipul plz 20:58:02 <vipul> sorry :) 20:58:03 <hub_cap> here is the first cut of the api 20:58:19 <hub_cap> instances are in /instances, all of them, all visible, all actions can happen to them 20:58:35 <hub_cap> /clusters is used for create/delete only as a helper api 20:58:42 <hub_cap> that will be V1 of clusters 20:58:45 <demorris> hub_cap: and also some atomic actions 20:58:48 <hub_cap> as we decide we need more stuff, we will add it 20:58:50 <kevinconway> hub_cap: I'm bought on the idea of instance stuff going in /instances. But does the instance still contain cluster data now? 20:59:01 <kevinconway> this magic "attributes" addition? 20:59:06 <hub_cap> yes kevinconway it will have to, we can decide that stuff later 20:59:14 <hub_cap> there will be some indication 20:59:41 <hub_cap> once we have a need for more operations, be them atomic or acting upon many instances, we will add to /clusters 20:59:43 <demorris> hub_cap: when did we drop having actions on clusters? 20:59:49 <hub_cap> otherwise we will be coding this forever 20:59:54 <SlickNik> kevinconway: It's necessary if you want to have any sort of ruleset dictating what is possible on the instance vs on the cluster. 20:59:57 <hub_cap> demorris: i made an executive decison for R1 20:59:58 <hub_cap> V1 21:00:01 <hub_cap> we can always add them 21:00:05 <hub_cap> but if they suck we cant remove them 21:00:15 <hub_cap> lets just get something up 21:00:17 <hub_cap> and working 21:00:19 <imsplitbit> no actions!!!! 21:00:21 <demorris> hub_cap: I would vote for V1 to have at least atomic actions - add nodes, resize flavors, resize storage…in that they happen to the whole cluster 21:00:24 <kevinconway> SlickNik: you can mark an instance as part of a cluster without modifying the instance resource though 21:00:26 <vipul> it seems like it's easier to have a /clusters API that's completely isolated from /instances.. if we remove the 'promote' existing instance requirement 21:00:38 <hub_cap> demorris: we had a whole conversation about problem permutations 21:00:46 <hub_cap> goto #openstack-trove to contineu 21:00:48 <hub_cap> #endmeeting