17:02:04 #startmeeting Designate 17:02:04 Meeting started Wed Sep 10 17:02:04 2014 UTC and is due to finish in 60 minutes. The chair is vinod1. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:02:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:02:08 The meeting name has been set to 'designate' 17:02:14 Who's around? 17:02:19 o/ 17:02:20 o/ 17:02:33 mugsie is AFK today 17:02:54 Emmanuel and others are busy with some other issue here 17:02:56 Nobody else here today? 17:03:00 So thin attendance today 17:03:09 Yep.. Oh well :) 17:03:13 #topic Action Items from last week 17:03:44 kiall to do client release today 17:03:53 Done - 1.1.0 is released.. https://pypi.python.org/pypi/python-designateclient/ 17:04:12 cool 17:04:17 Kiall to discuss FF exceptions with theirry during 1:1 tomorrow. 17:04:45 Also done, as an incubated rather than integrated project, it's up to the core team to make FFE decisions. 17:04:59 The process for "documenting" that is to mark the BP as being for juno-rc1 17:06:28 #topic Release Status (kiall - recurring) 17:06:41 Okay, so j3 is out the door - woo - https://launchpad.net/designate/+milestone/juno-3 17:06:51 congratulations! 17:06:59 and juno-rc1 bugs/bp's are being tracked here https://launchpad.net/designate/+milestone/juno-rc1 17:07:21 bug 1366821 is a pretty big one, hopefully the curent review solves iut 17:07:22 Launchpad bug 1366821 in designate "Backends don't implement create/update/delete_recordset" [Critical,In progress] https://launchpad.net/bugs/1366821 17:07:45 Beyond that - I don't think we have much else to discuss on rc1 17:07:58 Other than - rc1 is Sept 25th 17:08:02 How about TSIG? 17:08:04 2 weeks 17:08:23 vinod1: well, dnspython released the fix, so in theory we can try implement it as a FEE 17:08:25 FFE* 17:08:45 just wanted to check if we want that as an FFE or move it to kilo? 17:09:01 But - getting the openstack/requirements change in to get the version we need will be harder - since dependancies are frozen too ;) 17:09:17 so move it to kilo then 17:09:33 I'll see about getting the o/r change in, if it does, we'll move from kilo->rc1? 17:09:50 how about transfer zones? are we still targeting it as an FFE? 17:10:31 Yes, mugsie was about 80% through the rebase yesterday, I suspect he'll have it done in the next few days 17:10:45 ok 17:10:49 moving on 17:10:52 #action kiall to attempt an o/r change for dnspython 17:11:21 i will switch the order a bit to utilize Kiall's time here 17:11:28 #topic Server Pools Implementation Order 17:11:42 #link https://wiki.openstack.org/wiki/Designate/SubTeams/Pools#Server_Pools_Implementation_Order 17:11:59 I wrote up an initial implementation order of the work items for the first pass of server pools 17:12:01 I was going to suggest leaving that till next week when mugsie is about, I've not personally put much thought into it 17:12:16 +1 17:12:22 Okay 17:12:31 The remaining 2 items too are about server pools 17:12:41 #topic Server Pools - some questions clarifications 17:12:50 currently we have status values of pending, active, deleted - should we have a value for error? How long can a change be in pending? Do we need to track pending_since? 17:13:20 my thinking is we don't need error or any other status. 17:13:42 rjrjr_: so, you think status should go entirely from the API? 17:13:55 sorry, no new status. 17:14:11 This is the status in the database tables and communicating the status the user 17:14:24 pending, active, deleted cover everything IMHO. 17:14:33 So.. I think we should - there are too many ways for things to fail, with an async request, how do we report failure to the user without "error"? 17:15:48 No other thoughts? 17:15:49 I agree 17:16:15 I'd agree with rjrjr_ partly though - I wouldn't want to see 1000's of status 17:16:19 Pending, Active, Deleted, Error seem fine to me. 17:16:33 what does error report exactly? 17:16:48 That something has gone wrong? 17:16:53 Backend failure or something like that. 17:16:58 one server failed to get updated? 17:17:08 Something that the user has no direct control over 17:17:09 the threshold failed to get updated? 17:17:11 If that's your threshold after a certain time. 17:17:38 rjrjr_: It reports that "something" exploded after your initial API call responded, that something is anything that we can't recover from automatically.. 17:18:03 Usually that means errors we didn't think would happen, so didn't code around 17:18:31 Does the status move from ERROR to ACTIVE? 17:19:28 i hate being the odd man out here, but we have the server pool actively retrying things. my problem is i would like the communication between mini-dns and pool manager to be less, not more. 17:19:29 I think ERROR should only ever show when we have no way to auto-correct, which means it wouldn't automatically go from ERROR -> ACTIVE, but an admin might "fix" whatever the issue was and reset the state.. cinder/nova/etc have similar concepts 17:20:05 rjrjr_: If it has to communicate active, what is communicating error adding? 17:20:20 rjrjr_: I actually think this isn't adding any more communication - it's what we do when, for example, an unhandled exception occurs 17:20:43 more calls. i wanted to propose when mini-dns cannot do something (errors) to just not report that to the pool manager. 17:21:07 we can look at the timestamps of the last successful polls to determine if things are okay or not. 17:21:32 this gets into the whole pool manager design in the spec. 17:21:43 failing silently seems risky to me 17:22:06 it's not silent. pool manager is keeping track of the date of successful polls. 17:22:34 rjrjr_: so, I agree that mdns/poolmgr needs to decide how it handles errors - but this status isn't really mDNS or even poolmgr specific :) 17:22:45 also, i'm hoping you are showing problems in the mini-dns logs. 17:23:30 kiall, i understand. 17:24:15 For example, creating a new domain once multiple pools exist, the domain goes to PENDING and we return the user.. In the background, the future scheduler starts trying to find a suitable pool - if no suitable pool is found - we should have a way to return "That Failed" to the user.. similar to what happens when you boot a VM and have no capacity remaining ;) 17:24:53 okay. but we do not have a scheduler right now. 17:25:08 Sure - There's lots of possible things that might be a trigger for moving a resource to ERROR, mdns may or may not be that thing 17:26:13 if mDNS doesn't feed back out to the status field on error, that might be OK, but I think we have plenty of other places for stuff to explode and need a reporting mechanism the moment we switch to async 17:26:50 here's the problem, unless we are tracking each and every update with a request ID of some sort, it will be hard to report an error to the user. 17:27:17 and if we are tracking each update with a request ID, our pool manager database grows exponentially. 17:27:49 i have a design that keeps the pool manager database (table) as small as possible for performance reasons. 17:27:51 Yea, I see the concern :) It easy-ish to reason about what an ERROR state is when something like domain creation fails, but harder to come up with good examples for things like RRSet modifications etc 17:28:20 i have no problem with Error for domain create/failure. 17:29:01 (I've gotta run in 5mins) 17:29:36 So - Anyway, I think we should come back to this one next week with everyone around - and some concrete examples of what might trigger an ERROR status etc 17:30:01 i want to get rid of the status when we are polling for serial number and just have successful serial numbers reported back to pool manager. 17:30:42 the status for updates can't be used anyway. 17:30:51 agree about waiting until next week. 17:30:58 :) 17:31:06 okay will come back to it next week 17:31:12 Moving very quickly on :D When a recordset is deleted - do we show it to the user in the api? We track when it moves from pending to deleted. Do we give this information back to the user? 17:31:15 maybe we can brainstorm on error reasons and have a point to start the discussion next week. 17:31:44 kiall, only if the user queries designate again. 17:31:47 No, I don't believe we should ever show normal users deleted resources... But, an admin might want to see them 17:32:00 i'm thinking very similar to what nova does. 17:32:07 when a VM is created/deleted. 17:32:16 rjrjr_: yep - agreed 17:32:30 My point here is when a domain is deleted, do we track whether it was removed from the nameservers? 17:32:39 yes. 17:32:50 We certainly should anyway :) 17:33:18 how about i write up something and the team can add to it. 17:33:26 rjrjr_: sure, sounds good 17:33:28 for where errors can occur. 17:33:30 If we track the information, why not show it the user? 17:33:45 vinod, we will, if they query again. 17:34:35 #topic Open Discussion 17:34:37 so, the user deletes a record. async call. they are done. (think 'nova boot'). then they run a query to see if the record has been deleted yet or not. 17:35:01 Okay - really gotta go :) Sorry for needing to bail early! 17:35:08 sure. l8r 17:35:36 #action rjrjr_: Write up on handling error status in server pools 17:36:12 okay if there is nothing else then we can end the meeting 17:36:37 #endmeeting