17:01:26 #startmeeting Designate 17:01:26 Meeting started Wed Mar 11 17:01:26 2015 UTC and is due to finish in 60 minutes. The chair is Kiall. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:01:30 The meeting name has been set to 'designate' 17:01:38 o/ 17:01:39 Weird - It missed the end of the ML2 meeting 17:01:42 o/ 17:01:44 Anyway.. Who's about? 17:01:47 Kiall: noticed that 17:02:06 o/ 17:02:19 Kiall: I noticed the ~openstack bot quit and rejoined in the middle of the last meeting 17:02:21 o? 17:02:24 #topic Action Items from last week 17:02:42 So - We skipped last week, and I had 2 actions from the week both. Both of which I fogot about :/ 17:03:09 Both were to do with old-style backends, I'll block off a few hours tomorrow to make that happen. 17:03:21 Apologies :( 17:03:32 #topic Kilo Release Status (kiall - recurring) 17:03:34 #link https://launchpad.net/designate/+milestone/kilo-3 17:03:51 k3 (aka feature freeze) is Mar 19th.. 8 days 17:04:19 We have a bunch of stuff in progress - TSIG, Validation, Secondaries and Pools API stuff 17:04:30 s/stuff/features/ 17:04:53 We want all that merged by then, yeah? 17:04:55 We'll need to try get all these in before 19th, and switch to find/fix after.. 17:05:04 yeah 17:05:36 you may have noticed a ton of validation stuff going up, that needs review soon - there is a ton of work left on it 17:06:26 Cool. Review all the things then. 17:06:26 TSIG stuff has some -1's, I'll address those today. 17:06:27 Validation Stuff - Parts are ready I think, but Graham is still working on them. 17:06:27 Secondaries - Mostly there, some known things we'll want to fix (using attributes like we decided was a bad idea for pools - but that's a fix we can do later) 17:06:27 Pools - I've been on/off this over the last week - nearly there, will be up and working before 19th. 17:06:51 Is secondary zones targeted for k3? 17:06:55 I think all the other reviews are minor features (e.g. the "Guru Meditation Reports") or bugfixes.. 17:07:20 vinod1: it seems to be working well enough - at least with a few more fixes - I think we can try merge in k3 17:07:46 ekarlso might riot if we don't ;) 17:07:52 :) 17:08:09 At HP - mugsie / ekarlso are working 100% on upstream stuff right now, while I'm split between an internal project and upstream.. So we should have time to get things solid enough to merge. 17:09:13 Anyway - re bugs - are we aware of any show stoppers? 17:09:54 is bug 1424621 still open? 17:09:55 bug 1424621 in Designate "eventlet 0.17.0 has broke dns.reversename.from_address() " [Critical,In progress] https://launchpad.net/bugs/1424621 - Assigned to Kiall Mac Innes (kiall) 17:10:38 and I would like 1338256 to be before k3, but it may not make iut 17:10:41 it* 17:10:46 bug 1338256 * 17:10:46 bug 1338256 in Designate "There's no record validation in v2" [Critical,Triaged] https://launchpad.net/bugs/1338256 - Assigned to Slawek Kaplonski (slawek-t) 17:10:52 Ehh - I think that can be closed. It's still broke with 0.17.0, but 0.17.1 or 0.17.2 fix it 17:11:17 that's what your working on ;) I guess I call that a feature rather than bug ;) 17:11:25 That bug fix in eventlet broke things for us :P I've got https://github.com/eventlet/eventlet/pull/212 going 17:12:31 timsim: glad you found the source of that! 17:12:33 Any others? 17:12:58 https://bugs.launchpad.net/designate/+bug/1427411 maybe? 17:12:59 Launchpad bug 1427411 in Designate "Recordset updates remain PENDING until periodic sync" [Undecided,New] 17:13:24 humm 17:13:26 yes 17:13:30 I think vinod1 might be working on this 17:13:36 Humm, I hadn't seen that one. vinod1's making changes around that code? 17:14:02 I am investigating that one currently 17:14:02 ello :p 17:14:33 vinod1: cool, I'll mark as k3 .. If it doesn't land, it's a bugfix and can land in rc phase... 17:15:35 Okay - So, before we move on, if we can all (myself included ;)) make some time to review the features that need to land by thu of next week :) 17:15:50 Sounds good. 17:15:55 will do 17:16:22 If there's non-critical bugs in them, and it's nearing mid-next week, +2 and file a critical/blocker for rc1 IMO. 17:16:53 Fair enough. 17:17:35 Yea, we have a month of feature freeze after k3, but before kilo releases, so I think that makes sense :) 17:17:52 Okay, let's move on.. 17:17:56 #topic Remove the APIv2 wrapping object (mugsie) 17:18:04 #link https://etherpad.openstack.org/p/designate-apiv2-wrapping-object 17:18:24 so - this has been annoying the hell out of me recently 17:18:27 mugsie - all yours.. 17:18:51 I don't think we need it, it adds complexity allover the place, and should just go away 17:19:05 or, does anyone disagree? 17:19:12 or even care ;) ? 17:19:20 I agree. 17:19:38 i was just thinking of this last week - so I agree 17:19:39 Back when we did the V2 - It was for consistency with other OS APIs.. but they all went different directions in the end anyway.. So.. Yea. 17:19:59 Hah. 17:19:59 cool. I will call this consensus 17:20:09 would the change apply to all of v2? 17:20:12 I can remove it as part of the views changes 17:20:15 yeah 17:20:19 * Kiall feels like saying no, just to wind mugsie up 17:20:27 Oh please do. 17:20:33 any request that returns a single resource 17:20:46 would no longer have the resource name 17:20:54 Well - listings still have it, resources do not... 17:21:02 how about posts? 17:21:06 same - gone 17:21:44 I've updated the etherpad with a little more.. 17:21:47 and patches ;) 17:22:34 and I fixed them ;) 17:22:41 I think I've confused myself with the etherpad changes.. lol 17:23:01 mugsie: all correct now? So still a "yep" for evertyone? 17:23:03 everyone* 17:23:11 looks right 17:23:18 Yup, sounds good 17:23:48 sorry ekarlso - v2 bindings will have to change again ;) 17:23:55 what does that mean ? 17:24:09 looks good to me 17:24:13 + 17:24:17 the output of the API is going to be different 17:24:22 +1 even 17:24:43 Okay.. Calling it settled so. Anything else before we move on? 17:24:56 * mugsie is now happy 17:25:32 I'm good 17:26:14 good? thats pushing it a bit :D 17:26:24 I'm ok 17:26:26 Next topic will be.. short.. Since I failed to do my actions :( 17:26:26 #topic "Old Style" Backend - Status/What's TODO? (kiall) 17:26:26 I had said I'd get a POC of this done, but totally forgot. I'll block a few hours tomorrow to Just Do It? (Wonder if I need a little "TM of Nike" after that? ;)) 17:26:36 thaty better timsim 17:26:51 So - I'm not sure we have anything new here to discuss till I get that done.. Thoughts? 17:27:03 nope - we need to have a POC i think 17:27:10 Don't think so. 17:27:31 Okay.. Let's move onto timsim so ;) 17:27:36 #topic Bug triage (timsim-recurring) 17:27:38 #link https://bugs.launchpad.net/designate/+bugs?search=Search&field.status=New 17:28:12 Three untriaged bugs 17:28:13 https://bugs.launchpad.net/designate/+bug/1427411 17:28:14 Launchpad bug 1427411 in Designate "Recordset updates remain PENDING until periodic sync" [High,New] 17:28:26 That's already k3 high, so I'm just going to triage that one? 17:28:51 yup 17:28:54 timsim: please do :) I thought I did that earlier along with the for k3 change 17:28:59 k 17:28:59 https://bugs.launchpad.net/designate/+bug/1425668 17:29:00 Launchpad bug 1425668 in Designate "Poor error message when using same database for designate and the pool manager cache" [Undecided,New] 17:29:23 Probably a nice-to-have type thing? 17:29:27 yeah 17:29:36 I think that will likely land post k3, but the LP milestone doesn't exist yet 17:29:42 not a show stopper, and somehting that could be rc1 17:29:44 I can ask Thierry to create that tomorrow.. 17:30:08 Alright, I'll leave it the way it is so that we remember to put it there next week 17:30:17 +1 17:30:18 perfect 17:30:21 https://bugs.launchpad.net/designate/+bug/1425117 17:30:21 Launchpad bug 1425117 in Designate "Designate does not work with postgres" [High,New] 17:30:29 yeah 17:30:33 this is .... interesting 17:30:45 how do we want to mark this 17:31:47 Would we have to like...go back and edit db migrations to fix that? 17:31:49 We (technically) support Postgres.. So it's certainly a bug.. I would mark for rc1, it's just not going to happen before then I think 17:32:05 timsim: yes 17:32:34 timsim: yea, we've done it before for postgres! The trick is to make sure you don't change anything which would affect mysql/sqlite :/ 17:32:49 Which is sadly harder than it should be for some changes.. 17:33:20 I wonder if, once this is fixed, we move the bind9 gate to Postgres.. (PowerDNS prefers MySQL!) 17:33:41 (I don't think it even actually supports psql..) 17:33:49 Sounds like a good idea. 17:34:27 So we don't have a rc1 milestone, so same story as the last one? 17:34:29 yeah, testing postgres would be a good first step ;) 17:34:30 Oh .. It does support it.. Either way, moving one of the gates over (or adding another) would help 17:34:58 timsim: yea, I've got a sync up with Thierry tommorrw - he creates all those for the projects :) 17:35:08 Sounds good. :) We're done then 17:35:11 I see a couple of bugs that are not triaged and do not show up on the list 17:35:14 https://bugs.launchpad.net/designate/+bug/1427425 17:35:16 Launchpad bug 1427425 in Designate "Zones remain in ERROR indefinitely with the noop pool manager cache" [Undecided,In progress] - Assigned to Vinod Mangalpally (vinod-mang) 17:35:16 I'll ask him to create rc1 17:35:38 https://bugs.launchpad.net/designate/+bug/1427433 17:35:39 Launchpad bug 1427433 in Designate "Pool Manager Recovery Code Needs to Update Status in Central" [Undecided,In progress] - Assigned to Vinod Mangalpally (vinod-mang) 17:36:04 Yeah I think those just need importance? 17:36:23 Fix is in code review for both of those - https://review.openstack.org/#/c/162754/ 17:36:24 http://paste.openstack.org/show/191653/ is the "untriaged-bot" that spams our room twice a day... 17:36:47 Maybe it's wrong? or the agenda link doesn't get everything? Not 100% sure :) 17:37:02 agenda link is wrong i think 17:37:07 Yeah, L27 looks for importance 17:37:42 Good catch vinod1, high for both of those? 17:37:54 I robbed it from tripleo - I'm betting they did 2 searches for a reason.. 17:38:28 timsim: I am ok with that 17:38:37 timsim: I think so, both seem like it 17:38:59 Cool. 17:39:05 There is one other bug too - without the importance field marked - https://review.openstack.org/#/c/162754/ 17:39:15 wrong link 17:39:16 https://bugs.launchpad.net/designate/+bug/1416337 17:39:17 wrong link? 17:39:18 Launchpad bug 1416337 in Designate "Designate server create with concurrent request is not listing all servers even after successful creation." [Undecided,Incomplete] 17:39:45 Yeah, that one is waiting for more info from the reporter. 17:39:53 we are still waiting for that reporter to come back 17:39:58 That probably shouldn't be showing up - Incomplete since someone here (can't remember who) tried + failed to reproduce 17:40:38 It's just that importance field that's catching it. Maybe just put it at a Low, and if they come back, change it? 17:40:50 Probably close it out at the end of Kilo otherwise 17:41:10 Well the bot code attempts to filter those out, clearly it's failing though :) 17:42:00 Anyway, I'd agree with timsim - Low - It's not an end user or commonly called API 17:42:32 Alright, actually done ;) 17:42:49 Cool :) Moving on so.. 17:42:53 #topic Open Discussion 17:42:59 Anything off-agenda to discuss? 17:43:39 Not from me :) 17:43:54 Nothing from me 17:44:01 I've noticed situations in Bind9 where we send a create/delete, and for some reason the thing actually happens, but the Pool Manager thinks it failed, and it gets in this nasty loop where it's trying to rndc addzone/delzone things that have already been done, which fail and get retried forever. 17:44:52 I've managed to spot that once with PowerDNS too - but didn't find the root cause 17:44:53 I could see this happening in other backends as well, I'm wondering if maybe we should take some stance on where to fix this, in the Pool Manager itself, or maybe in the backend (ie, Bind9 Backend checks before it deletes if something has been deleted, and if so, returns success without calling delzone) 17:45:22 I had a particularly nasty incident in Bind9 yesterday w/ the agent where it kept retrying like 100 delzones :P 17:45:24 So 17:45:30 V2 bindings, what's bad there again ? 17:45:38 timsim: I think it's probably somthing we need to do driver by driver 17:45:56 Each will have a differnt way of really knowing if the task suceeded 17:46:04 Kiall: That was my opinion. That way it just works on a periodic recovery/sync 17:47:01 not sure if it could work, but if there is a validation function assigned to each operation based on the driver it could be called before retrying the action. 17:47:02 timsim: how often have you managed to see it? I've only seen it once naturally and took 100s of requests to reproduce 17:47:25 ekarlso: 2 conversations at once in IRC - will come back after this one is done ;) 17:47:33 ekarlso: 2 conversations at once in IRC is hard - will come back after this one is done ;)* 17:47:55 * elarson imagines something being added to the message on the queue 17:48:00 I've seen it fairly often when working with situations where network connectivity was spotty or firewalls were going up and down, or (yesterday) when delzones were taking longer than a TCP timeout for some calls. 17:48:34 validation might be a good one ... 17:48:49 but that should be donje in the backend I think 17:48:59 timsim: That might explain it for bind9 for powerdns, it's a SQL call, and when it's happend to me, everything was local on 1 VM 17:49:05 but for powerdns* 17:49:19 Maybe there's a more basic root cause somewhere? 17:49:22 The other way to do it would basically be to call MiniDNS for that type of change (create delete) and see if it happened before retrying in periodic x 17:49:57 Kiall: I think it can happen in a variety of different situations. Basically anywhere the message coming back to Designate after reaching out to the backend doesn't get there. 17:50:44 I'm guessing the update_status call -> central? or another call? 17:50:47 What I said two msgs up there is similar to what elarson is saying 17:51:08 Well agent for example goes Mdns/PM->Agent -/> Mdns/PM 17:51:20 If that connection is severed for some reason, then mdns/pm assumes failure 17:51:38 Okay - You have a better handle on this than anyone else it seems ;) Pretty sure we can trust your judgement on a fix! 17:51:46 I would think if Someone---SQL Query---> PDNS and it timed out, but the thing actually worked, same issue 17:52:05 :P I've been thinking about it a lot in the last 24 hours 17:52:24 :) 17:52:41 I didn't really mean to hijack the meeting, just thought I'd mention it :x 17:52:46 lol :P 17:53:20 I'll file a bug and put my thoughts down there, how about that? 17:53:21 Okay .. ekarlso.. re v2 bindings, mugsie's proposed change https://etherpad.openstack.org/p/designate-apiv2-wrapping-object changes the V2 API.. I think that's what he meant 17:53:39 timsim: +1 17:54:06 ekarlso: what Kiall said ^ 17:54:30 ekarlso seems to be AFK - I'm betting a certain new baby called him over :) Oh well! Anything else before we call it a day? 17:55:00 * mugsie has nothing 17:55:22 I'm ok ;) 17:55:52 Guess that's it so! thanks all :) Will be a busy week getting stuff in before 19th.. See you in #openstack-dns :) 17:55:59 0/ 17:56:03 #endmeeting