Friday, 2016-01-29

*** bpokorny_ has joined #openstack-dns00:02
*** bpokorny has quit IRC00:05
*** rudrajit has joined #openstack-dns00:09
*** james_li has quit IRC00:11
*** rudrajit_ has quit IRC00:12
*** rcarrillocruz has left #openstack-dns00:12
*** ducttape_ has joined #openstack-dns00:18
*** pglass has quit IRC00:24
*** kei_yama has quit IRC00:24
*** kei_yama has joined #openstack-dns00:25
*** rudrajit_ has joined #openstack-dns00:39
*** rudrajit has quit IRC00:43
*** chlong has joined #openstack-dns00:50
*** rudrajit has joined #openstack-dns00:51
*** rudrajit_ has quit IRC00:54
*** rudrajit has quit IRC00:55
*** ducttape_ has quit IRC01:06
*** ccneill has quit IRC01:25
*** penick has quit IRC01:28
*** mlavalle has quit IRC01:38
*** fawadkhaliq has joined #openstack-dns01:38
*** fawadkhaliq has quit IRC01:38
*** RuiChen has quit IRC01:42
*** RuiChen has joined #openstack-dns01:42
*** ducttape_ has joined #openstack-dns01:43
*** stanzgy has joined #openstack-dns01:45
*** jasonsb has joined #openstack-dns01:58
*** ducttape_ has quit IRC02:01
*** bpokorny_ has quit IRC02:16
*** ducttape_ has joined #openstack-dns02:31
*** rudrajit has joined #openstack-dns02:50
*** jasonsb has quit IRC02:55
*** ducttape_ has quit IRC03:00
*** ducttape_ has joined #openstack-dns03:01
*** kei_yama has quit IRC03:11
*** ducttape_ has quit IRC03:11
*** ducttape_ has joined #openstack-dns03:18
openstackgerritJames Li proposed openstack/designate: Make supported record type configurable  https://review.openstack.org/27290203:31
*** richm has quit IRC03:33
*** jasonsb has joined #openstack-dns03:49
*** ducttape_ has quit IRC04:02
*** kei_yama has joined #openstack-dns04:02
*** ducttape_ has joined #openstack-dns04:22
*** ducttape_ has quit IRC04:36
*** ducttape_ has joined #openstack-dns04:36
*** ducttape_ has quit IRC04:56
*** chlong has quit IRC05:12
*** chlong has joined #openstack-dns05:33
*** kei_yama has quit IRC05:36
*** kei_yama has joined #openstack-dns05:37
*** km has quit IRC05:59
*** km has joined #openstack-dns06:00
*** jet-rongl has joined #openstack-dns06:02
*** RuiChen has quit IRC06:22
*** RuiChen has joined #openstack-dns06:23
*** chlong has quit IRC06:28
*** chlong has joined #openstack-dns06:39
*** cgross has joined #openstack-dns07:17
*** chlong has quit IRC07:32
*** rudrajit has quit IRC07:36
*** venkat_ has joined #openstack-dns07:43
*** boris-42 has joined #openstack-dns07:58
*** RuiChen has quit IRC08:04
*** RuiChen has joined #openstack-dns08:05
*** nyechiel has joined #openstack-dns08:23
*** venkat_ has quit IRC08:37
*** nyechiel has quit IRC08:59
*** RA_ has quit IRC09:04
*** km has quit IRC09:32
*** en_austin has joined #openstack-dns10:05
en_austinfederico3: ping?10:05
*** jordanP has joined #openstack-dns10:11
*** nyechiel has joined #openstack-dns10:36
*** RA_ has joined #openstack-dns10:44
federico3 /query en_austin10:50
en_austinyep :) i'm here10:51
en_austinso, I'm still experiencing an D failures with pool manager hanging... do you have any news or maybe new ideas abt this? :(10:52
en_austinjust couple of hours ago the last one was, but with another sympthoms10:52
en_austinzone failed in ERROR state and was unable to sync serials with real backend data, but records were still propagating to backends10:53
en_austinuntil couple of restarts of mdns, zone & pool manager10:53
en_austinthen it returned to ACTIVE state10:53
en_austinhow do you think, can this problem be solved by reducing an amount of backends from 2 to 1? :)10:53
federico3I haven't replicated the exact behavior yet, and it's rather puzzling that nobody else has been reporting the same issue. This happened only while the caching was enabled, is that correct? What is your caching configuration: do you have a memcached process running?10:59
federico3en_austin: btw, I assume that the latest incident started with the usual exception, right?11:04
en_austin- while the caching was enabled? -- I don't know, I've tried to disable a caching, but no luck with it.11:13
en_austin- What is your caching config? -- yep, memcached process.11:13
en_austin-- started with the usual exception? no, no MessagingTimeout's detected at this time.11:14
federico3"no luck with it" you mean you were unable to disable caching or you had the same issue while caching was disabled?11:15
federico3oh, that can be a different problem then - please upload the log from this last incident where MessagingTimeout did not happen11:16
en_austin"no luck" means that zones were still unable to sync w/o cache and I've re-enabled it again :)11:20
en_austinbtw, maybe, I should try to use mysql cache instead of memcached?...11:20
federico3did you disable the cache during the incident or before?11:28
en_austinthat was couple of weeks ago, now I'm running with memcached11:29
*** RA_ has quit IRC11:31
*** sonuk has quit IRC11:33
*** sonuk has joined #openstack-dns11:34
*** stanzgy has quit IRC11:36
en_austinWell, I've found another exception, but seems to be, that it is our network infrastructure issue.11:37
en_austinmdns.log.5:2016-01-29 12:29:22.499 10028 ERROR designate.service [req-4e43d316-3a28-46c7-b0be-bfb42f0bae58 - - - - -] Unhandled exception while processing request from x.x.x.x:3359411:37
en_austinbut, there were another error reports before this one (most of all - 'Calling pool manager for %domain%: UPDATE: ERROR: (serial) on nameserver x.x.x.x')11:38
*** kei_yama has quit IRC11:44
federico3en_austin: please update the bug report with today's issue - and can you please specify if caching was enabled/disabled and what memcached configuration you had in each incident? Be aware that if you configure memcached but do not specify any server there will be an in-memory local cache in use11:48
en_austini've attached a config file to issue - can you verify my memcached setup?11:50
federico3uhm, I'm not seeing it11:58
en_austinhttp://paste.openstack.org/show/dxFyS4Ba8Z9E9HR30uWR/11:58
en_austinbtw I've attached a new log11:59
en_austin> the cache serial is 1454056936 and the actual serial is 1454056936.12:00
en_austinlooks weird...12:00
en_austinfederico3: well, it failed again right now.12:04
en_austin2 zones, one failed in ERROR state, second one in PENDING.12:04
en_austinWhat data I can collect for you? :)12:04
federico3was there the timeout exception?12:04
federico3in general, this really look like you are having network issues12:05
en_austinnope. Will verify a record propagation now...12:05
en_austinfor failed zone: serial is 1454068962, actual is 145406873412:06
en_austinwill try to create new record in "error"-state zone12:06
federico3keep in mind that it's normal for zones to go in error state if there's transient loss of connectivity and the should restore by themselves  later on12:07
en_austinwell, record _not_ propagated. Designate serial 1454069199, real serial still 145406873412:08
en_austinwill now dump the query log...12:08
*** nyechiel has quit IRC12:09
en_austinmdns.log:2016-01-29 15:09:31.108 12371 WARNING designate.mdns.notify [req-b63274bc-6e81-414d-8d59-23c17736c728 noauth-user noauth-project - - -] Got lower serial for 'xxxxxxxx.' to 'yyyyyyyy:53'. Expected:'1454069365'. Got:'1454068734'.Retries left='9'12:10
en_austinToo much 'Sending SOA to YYYYY:53' and 'Got lower serial' message logs here...12:10
en_austinWell, zone failed to ERROR again (was PENDING before).12:11
Kiallthe wording of those logs also sucks.. That's sending BIND a notity packet to say "Hey, you have changes to pull".12:11
federico3when you restart the components please make sure you restart only one component at a time and wait before restarting another one12:12
en_austinfederico3: I don't restarted any components now, since I'm trying to capture logs in "failed state"12:12
federico3the fact that sometimes you did not have that timeout exception but you are still seeing a lot of zones in error state makes me think that you are having network issues or your nameservers are not updating for other reasons12:13
en_austin...zone returned to PENDING state, seems to be that it is trying to recover itselv12:14
en_austinf*12:14
en_austin"a lot" = 1. I have two zones at all, and ALWAYS - one of them in ERROR, second one in PENDING.12:15
Kiallen_austin: can you add "backdoor_port = 5432" to your config before you restart again, in the [DEFAULT] section12:15
federico3en_austin: as I suspected, we are mixing together different issues: 1) sometimes a zone can go in ERROR and then recover due to transient issues and that's ok12:15
en_austinAfter restart they are both returning to ACTIVE state12:15
en_austinKiall: yep, I can, what's this for? :)12:15
KiallI'm interested to know if the whole process is locked up at the time, or if just stops doing things12:15
en_austinI'm interesting to know this too :) sure, I can do it.12:15
Kiallit lets you telnet into the running process on that port, and get a python shell12:15
*** RA_ has joined #openstack-dns12:15
en_austinawesome12:15
en_austinwill you give me an manual how to acquire a needed info for you?12:16
KiallWorking on that now ;)12:16
en_austingreat12:16
Kiall(Trying to identify how I grab a reference to the running service from the shell ;))12:16
en_austinWell, zone is still in PENDING state and new records are not propagating to backend.12:17
federico3en_austin: 2) when we had that TimeoutError and PM was unable to restore the zones for hours -> this is a different issue12:17
en_austinBackend SOA is not increasing now (it is still ...8734, while Designate think it is 1454069787 as for now)12:17
en_austinfederico3: seems to be, but zone does not recover from ERROR state, that's a problem.12:18
en_austinAs now - it is in PENDING/ERROR states, new records are not propagating, and backend SOA is not increased.12:18
en_austinWell, I'll try to create new record again, will dump its request logs and try to restart service again (since it's a production one) :-(12:18
*** EricGonczer_ has joined #openstack-dns12:21
Kiallen_austin: also, can you `kill -USR1` your pool manager process when it's locked up, and grab all the stderr it prints out12:23
KiallIt'll print LOTS of info about the running process12:23
en_austinjust restarted it via Supervisord couple of seconds ago :(12:24
en_austinnext time i'll do it, surely12:24
KiallYou'll get something like this: http://paste.openstack.org/show/485398/12:24
KiallWhat I'm mostly interested in seeing is the # of green threads running once it's locked up12:25
en_austinno output by 'kill -USR1' at all...12:29
*** EricGonc_ has joined #openstack-dns12:30
KiallIt will be printed to stderr, which would be put somewhere by supervisordf12:32
Kiallmaybe the logfile?12:32
*** EricGonczer_ has quit IRC12:33
en_austinyea, you're right, it is right in the logfile.12:33
Kialloh, I .. might have just reproduced it - and then killed the process.12:33
Kiall(killed it before I realized what I was doing.. let me see what I did)12:34
en_austinawh, one of my two zones just became 'active' and synced its SOA to backend... *happy*12:34
en_austinsecond one synced serials too, but still in PENDING state.12:35
KiallQ - How much python do you know? Working though how we would check the running process through the "backdoor"12:37
en_austinit's my primary language since 2014 (except of native Russian and upper-intermediate English, hehe) :)12:38
en_austinso, I'm listening :)12:39
Kiallen_austin: Cool  - how much sense is this making them? http://paste.openstack.org/show/485404/12:40
Kiallfo(<class>) == Find all created objects of type <class>12:41
Kiall(that's probably the only backdoor specific thing there)12:41
en_austin> service = fo(Service)[0]12:43
en_austinwhere does "fo" comes from?12:43
KiallIt's magicked into place by the backdoor code :)12:43
Kiallhttps://github.com/openstack/oslo.service/blob/master/oslo_service/eventlet_backdoor.py#L10112:43
KiallOh, if you use 1 config file for all of the designate services.. comment the backdoor port out now or set it to a range - e.g. 2000:300012:44
Kiall(otherwise a restart of any other service will try grab the same port and fail)(12:45
en_austini'm using only /opt/designate/etc/designate/designate.conf12:45
en_austin> (otherwise a restart of any other service will try grab the same port and fail)(12:45
en_austindamn! that's why i've added it and failed to restart D :)12:45
Kialllol12:45
Kialltry a range port = 2000:300012:45
Kiallit'll pick the lowest free one, and log which one it chose12:45
en_austin> and log12:46
en_austin:(12:46
en_austinwhatever, netstat will help me, I think12:46
Kiallgrep logs for "Eventlet backdoor listening on"12:47
Kialle.g. Eventlet backdoor listening on 5432 for process 1866812:47
en_austinYea, I've got it.12:49
KiallAnyway, if you can try those out.. USR1 / Those backdoor commands the next time it happens, it might give us some insight into what's actually failing.12:50
en_austin>>> service._get_failed_domains(context, 'UPDATE')12:50
en_austin<designate.objects.domain.DomainList object at 0x37fb090>12:50
en_austinBut both of zones in ACTIVE state now.12:50
KiallAnd it still shows in get_failed_domains?12:51
en_austinYep.12:51
en_austinIs it normal?12:51
Kiallwhat does `service._get_failed_domains(context, 'UPDATE')[0].to_dict()` show12:52
Kiallehh12:52
Kiallyea actually12:52
Kiallservice._get_failed_domains(context, 'UPDATE').to_primitive() might also work and show all of them, rather than just 112:53
en_austinIndexError: list index out of range12:53
en_austin:)12:53
KiallIt's gone now :)12:53
en_austinbut there is still "DomainListObject" present as output of "_get_failed_domains". But if it's ok..)12:53
KiallYea, it's an empty list - I misread :)12:54
Kialltry service._get_failed_domains(context, 'UPDATE').to_primitive()12:54
en_austin{'designate_object.changes': [u'total_count'],12:55
en_austin 'designate_object.data': {'objects': [], 'total_count': 0},12:55
en_austin 'designate_object.name': 'DomainList',12:55
en_austin 'designate_object.original_values': {}}12:55
en_austinSeems that's now it's all ok (and that's true, both zones are in ACTIVE state)12:55
KiallOkay, so.. Anyway, next time it fails, try gather all ^ info before restarting!12:57
Kiall(Also, I rarely get an excuse to use the backdoor, always a bonus ;))12:58
en_austinundocumented feature :D13:01
en_austinWell, I've got your idea - when next time I will experience an issue with zones, i will capture a logs (as usual), dump a kill -USR1 data and provide a data from telnet :)13:01
KiallYea, it exists it most every openstack process :)13:03
Kiall(Also - Make sure that isn't public, lol)13:03
Kiallit's a backdoor with total control over yout server (well, as much as the user running designate!)13:04
KiallAlso, watch out for the USR1 dump, it's got a copy of your config - so, black out passwords13:05
Kialletc13:05
*** ducttape_ has joined #openstack-dns13:10
en_austinyea, I've got it. Thank you Kiall for support :)13:12
en_austinbtw, as you think, can this issue be related with >1 backend installed on different machines (e.g some network issues?)13:12
en_austinI'm going to merge Designate and BIND to one box tomorrow (and reduce backends count to 1) - can it help or that's an overhead for this?13:13
KiallI would be a little suprised if this was it TBH13:21
*** ducttape_ has quit IRC13:31
*** rsyed_away is now known as rsyed13:49
*** RA_ has quit IRC14:03
*** ducttape_ has joined #openstack-dns14:06
*** richm has joined #openstack-dns14:18
*** pglass has joined #openstack-dns14:40
*** johnbelamaric has joined #openstack-dns14:47
*** ducttape_ has quit IRC15:01
en_austinKiall: ping?15:13
KiallKinda here - happening again?15:13
en_austinYea.15:14
en_austin1454080412 on designate and 1454078691 on backend.15:14
KiallAny of the things turn out something useful?15:14
en_austinWill check it now :)15:15
en_austin>>> print b._execute_rndc(['status'])15:17
en_austinNone15:17
*** pglass has quit IRC15:20
*** james_li has joined #openstack-dns15:21
en_austinI've attached a "kill -USR1" output to Launchpad issue.15:22
en_austinhttps://bugs.launchpad.net/designate/+bug/153449015:22
openstacken_austin: Error: malone bug 1534490 not found15:22
Kiallen_austin: re execute rndc, I think that might be expected.. it didn't explode is the key thign15:22
en_austinsame, returns None15:23
en_austin_get_failed_domains(context, 'UPDATE').to_primitive() returns same as in normal state (but zones are now not in ERROR, they are in PENDING state)15:24
james_liKiall: timsim: could you please review https://review.openstack.org/#/c/268362/ and https://review.openstack.org/#/c/272669 ? Joe likes to see they land soon.15:24
en_austinSeems to be that records are propagated, but veeery slow...15:25
en_austin(since one of "failed" records is resolvable now)15:25
*** james_li has quit IRC15:26
en_austinit's approx 8 min delay right now (Designate's zone serial is 1454081213, backends is 1454080681)15:27
*** james_li has joined #openstack-dns15:27
*** ducttape_ has joined #openstack-dns15:27
en_austinbut, both of backends returns None for _execute_rndc(['status']).15:27
en_austinIs is normal?15:27
Kialljames_li: i thought I commented on the wildcard NS one.. oops.15:28
james_liKiall: oh let me check15:28
KiallI didn't15:28
james_liheh15:29
KiallI believe the RE_NS_HOSTNAME and RE_ZONENAME are functionally identical15:29
Kiallhttps://www.debuggex.com/r/t8rXZR3g-jHEhNc9 and https://www.debuggex.com/r/yCPxlCYFDi32eem715:29
KiallSo, it may be best to just use the existing one15:30
KiallThe other comment was.. is NS the only one it doesn't like?15:30
Kiallwas going to be*15:30
Kiallen_austin: I think it is normal15:31
Kiallyea, it is.. we don't return anything from that method, it's nothing or an exception15:32
en_austingot it15:32
Kiallyou didn't get an exception, so... it's communicating just fine with bind15:32
Kiall(at least via RNDC .. Notifications to update are different)15:32
KiallWhat about  service._get_failed_domains(context, 'UPDATE').to_primitive() ?15:33
*** sonuk has quit IRC15:33
KiallIt it working? slow? raising?15:33
KiallIt doens't matter if there are results, just that it works..15:33
en_austinIt works.15:33
en_austinBut, it is important - there are no 'ERROR' state zones.15:33
KiallOkay, any MessagingTimeout errors?15:33
KiallI see 78 greenthreads runing from the paste, which isn't huge15:34
en_austinYea, some MessagingTimeouts here.15:34
Kiallcan you USR1 mdns / central too?15:35
en_austini've restarted them to return dns propagation to live state :((15:35
en_austinnext time will do it, sure15:35
Kialloo - 62 of the 78 greenthreads are sitting in15:35
Kiall   `with lockutils.lock('update-status-%s' % domain.id):`15:35
en_austinwhat does it means?15:37
en_austinbtw I doesn't restarted Central - can dump a USR1 output from it.15:37
federico3are they all trying to acquire the same lock?15:38
en_austinbut SOAs are close to be synced (but so slow...)15:38
Kiallfederico3: It's hard to tell from the traces, but possible.15:39
james_liKiall: if RE_NS_HOSTNAME and RE_ZONENAME function the same then yeah we can just use the existing one, let me double check; and yes to your second question, wildcard *NS* record is the only one bind does not like. I recall silkey checked that AWS supports all wildcard records but NS.15:39
Kialljames_li: Ok, only applying it to NS is perfect then :)15:40
james_liKiall: thanks I'll do the needful.15:41
Kiallen_austin: can you add the latest pool manager logs, ideally all the way from 1 restart to the next15:43
KiallI'd like to try check if we're stalling on locks for HOURS15:44
en_austinI can archive all logs I have from pre-latest restart (when I've enabled a telnet port) and give it to you privately :)15:52
en_austinbtw, new records are propagating now, but with delay in ±5 min15:52
en_austinSOAs still out of sync15:52
*** mlavalle has joined #openstack-dns15:54
Kiallen_austin: perfect15:57
*** jordanP has quit IRC15:57
en_austinwell, after full restart they are synced in ±30 sec15:58
en_austinsomething is extremely slows down a sync process...15:58
*** james_li has quit IRC16:00
*** james_li has joined #openstack-dns16:01
en_austinKiall: can that be influenced by using memcached instead of mysql for caching?16:02
Kialldon't use mysql for caching ;) I still am not entirely sure why we have that!16:02
en_austin:D16:02
Kiallping me when you've managed to grab those logs, hopefully I can see something between them and the USR1 dump16:03
en_austinwell, strange16:04
en_austin2016-01-29 19:03:59.890 12511 INFO designate.mdns.notify [req-65c4bc3d-5af1-4d52-ba56-b600f7b8642f noauth-user noauth-project - - -] Sending 'SOA' for 'aqa.int.zone.' to '10.28.0.18:53'.16:04
en_austin...and nothing changes on backend side16:04
Kiallalso, BIND logs for the period would useful too, as each one of ^'s should line up16:04
en_austin(awh, false alarm - it updated after a while)16:04
en_austinsorry, I'm afraid of each lag now :D16:05
en_austinKiall: i can dump it right now, but how can I share it w/you?16:05
*** james_li has quit IRC16:05
KiallYea, I'm nearly certain we're seeing something related to the 62 locks we're trying to get! That would slow things down16:05
Kiallkiall - at - hpe.com ?16:05
en_austincopy, will send it now16:05
Kiallcool16:05
en_austinyou're workin at Hewlett Packard?)16:07
KiallYea16:07
en_austin:)16:07
en_austinKiall: check your email :)16:19
Kiallah, lol could have just PM'd me that URL :)16:20
en_austinactually, yes :))16:20
en_austinmeanwhile - SOAs again out of sync for 500+ secs16:21
KiallBah, how busy is this install?16:21
en_austinnot much, approx 100-200 read/delete records per hour16:22
en_austintotally serving approx 10k records16:22
KiallAll in 1 zone, or?16:22
en_austin2 zones16:22
en_austin7.6k and 3.2k records16:23
en_austin(too much logs, yea?) :)16:23
KiallSo, what time did you do the 2x restart?16:24
en_austinKiall: maybe I can dump some useful info now? Designate stopped to notify a BINDs again.16:24
en_austinLast one was 29-Jan-2016 19:11:44.612 (now 19:24)16:24
en_austinKiall: awh... i don't remember (but you can grep for "Eventlet" - that restart was when I've enabled a PMs backdoor port)16:25
en_austinmdns logs now are full of "sending SOA to x.x.x.x" (and that's all)16:26
Kiall2016-01-29 15:48:07.648 - 2016-01-29 18:31:05.863 from the looks of it16:26
en_austinawh, now again one zone became ACTIVE and second is very close to it16:27
en_austinbut, approx 15m delay between real sync and tons of "Sending SOA" messages in mdns logs...16:27
KiallSo, as a first thing, I think we're probably being WAYY aggressive with the default timers etc for the periodic recovery tasks, and timeouts for mdns to respond, given we have a 7.6k record zone16:27
*** james_li has joined #openstack-dns16:30
*** ccneill has joined #openstack-dns16:31
Kiallmy suspicion is that, mDNS is getting all locked up for an extended period of time while serving out that 7.6k record zone (likely many minutes, it's python!), which will cause pool managers attempts to query the  zone to sit in the queue and likely timeout etc as the heavy lifting is happening16:32
en_austincan we confirm or refute this assertion?16:35
en_austin(384s delay between last backends sync and Designate's data... and it's growing)16:36
en_austinand after ~200 more seconds there will be MANY MANY logs with "Transfer started..." "Transferred serial XXXX" from BIND's side16:36
*** EricGonc_ has quit IRC16:37
en_austinKiall: can I verify something in Designate's state while it's not syncing a data?16:38
en_austinnow16:38
en_austinsince we've already not in business hours - I can stay in "failed" state a little bit longer16:39
*** EricGonczer_ has joined #openstack-dns16:41
Kiallen_austin: ok, start by doing a AFXR of the zone from mDNS with a large timeout16:44
en_austinHow?16:45
Kialldig +timeout=3600 AXFR zone.com. @mDNS:535416:45
Kialltime dig +timeout=3600 AXFR zone.com. @mDNS:5354*&16:45
KiallIt's going to be slow16:45
*** EricGonczer_ has quit IRC16:46
timsimMight also check that it's responding on TCP at all.16:47
en_austinNot so slow - real0m3.631s16:47
en_austin;; XFR size: 7146 records (messages 4, bytes 230969)16:47
KiallOh. That's.. What's this running on? beefy bare metal?16:48
en_austinVirtuozzo centos7 container16:48
*** EricGonczer_ has joined #openstack-dns16:48
en_austin3.6s it's slow?16:49
en_austini thought it's fast :)16:49
KiallIt's slow compared to if BIND was serving that out, but not slow considering python.16:49
Kiallbind would be a fraction of a second.16:49
openstackgerritMerged openstack/designate: Add validation for MX, TXT, and SSHFP records  https://review.openstack.org/27266916:49
en_austinreal0m0.182s16:50
Kiallyep :)16:50
en_austinthat's BIND's response for same AXFR16:50
en_austinJFYI dig requires to specify port via "-p" param, not via ":5354" :)16:50
en_austinwell, again - Designate's serial 1454086243, real 145408596716:51
en_austin276s and up w/o sync16:51
Kialldooh, I've  in the last 2 days written {{ ip_address }}:{{ port }} about 456 times. ;)16:51
*** EricGonczer_ has quit IRC16:51
en_austinKiall: seems strange :(16:51
en_austin[root@ns2 ~]# dig +timeout=3600 AXFR aqa.int.zone. @10.28.0.16:535416:51
en_austindig: couldn't get address for '10.28.0.16:5354': not found16:51
*** EricGonczer_ has joined #openstack-dns16:51
KiallYea, my bad ;)16:52
en_austin:)16:52
KiallI've been witing non-dig stuff 100's of times over the last few days ;)16:52
en_austinso, any thoughts abt slow soa sync?16:52
*** bpokorny has joined #openstack-dns16:56
en_austin2016-01-29 20:05:07.525 12511 INFO designate.mdns.notify [req-573ce645-e5f1-4c3a-a7fd-3c685bec02cc noauth-user noauth-project - - -] Sending 'SOA' for 'aqa.int.zone.' to '10.28.0.17:53'.17:05
en_austinand nothing in response17:05
en_austinafter some unknown delay:17:05
en_austin2016-01-29 20:05:27.617 12511 INFO designate.mdns.notify [req-c9964d60-8f73-48a2-b26c-b2bacc2cb44c noauth-user noauth-project - - -] Sending 'NOTIFY' for 'aqa.int.zone.' to '10.28.0.17:53'.17:05
en_austin2016-01-29 20:05:27.621 12511 DEBUG designate.service [-] Handling UDP Request from: 10.28.0.17:8437 _dns_handle_udp /opt/designate/designate/service.py:30417:05
en_austinand zone became synced (ACTIVE state)17:06
en_austin62/75 greenlets were in "with lockutils" again while serial was out of sync...17:09
Kiallen_austin: sorry, was AFK17:18
KiallHow often do both zones change?17:18
*** ducttape_ has quit IRC17:18
en_austinfrom couple of changes per 10 seconds to one-two change per hour17:18
KiallIf it's pretty often, I'd be very tempted to remove the lock and see if things still wedge? I don't believe the lock is strictly necessary17:19
en_austinthe main scenario is - "we are deploying some stack with Heat, this stack contains X containers and each of them requests an unique hostname"17:19
Kialltimsim / mugsie etc - can I get some eyes on https://github.com/openstack/designate/blob/master/designate/pool_manager/service.py#L51417:20
KiallRemoving that lock, harmless ort suicidal? I think harmless, other than doing some excess work17:21
Kiallor*17:21
timsimSince it's just in one process, probably not that harmful.17:21
timsimOne of the things that happens in there, is that (i believe in the _get_consensus_serial path), it will call out to mdns.get_serial_number for all nameservers that don't have the right value in cache.17:22
Kiallhttps://github.com/openstack/designate/commit/565e5026eb39a06b86636661176e21cd9013b90a is where it was added17:22
en_austinYea, I'm running a single PM process.17:22
Kiallhttps://bugs.launchpad.net/designate/+bug/140820217:23
openstackLaunchpad bug 1408202 in Designate "Statuses in Pool Manager Cache Are Persistent" [Low,Fix released] - Assigned to Ron Rickard (rjrjr)17:23
en_austinI can test it on my instance, if you'll help me with the patch. Just to remove "with lockutils.lock..." or smth else?17:23
KiallThat feels like a mysql as a cache issue to me17:23
timsimOhhh17:24
Kiallen_austin: remove that line, and then unintent everything after it - can you paste bin your version of that file? (I'm unsure exactly what version your on, if $distro made changes, etc)17:24
Kialltimsim: good reason for it yet?17:24
timsimSo he's got a ton of update_statuses coming in for that zone, and it's locking up?17:24
Kiall62 of the 75 active greenthreads are in / waiting on that lock17:24
timsimlol17:24
mugsiewow17:24
en_austinKiall: i'm on 94d0f00b09044fcef36604a9c81accb0907c9705 stable/liberty 94d0f00b09044fcef36604a9c81accb0907c970517:25
timsimYeah, if you've got multiple pool manager processes, that's pretty much useless.17:25
Kiall<Kiall> oo - 62 of the 78 greenthreads are sitting in <Kiall>    `with lockutils.lock('update-status-%s' % domain.id):`17:25
Kiall^ from the last lockup17:25
Kialltimsim: I think it was just a "lets at least TRY not to do duplicate work"17:25
timsimlol17:25
en_austintimsim: I'm running a single process for each of services17:25
mugsieI can;t see anything really bad.17:25
mugsiewank it out and see what 'splodes17:26
timsimThat's a fart in a hurricane of trying not to do duplicate work in that code.17:26
mugsieyank*17:26
mugsie-_-17:26
timsimLOL mugsie17:26
mugsie /clear everyone17:26
en_austinOK, so I will remove that line and unindent all under it17:26
timsimfreudian slip17:26
Kiallen_austin: K - removed here, http://paste.openstack.org/show/485460/17:27
mugsietimsim: https://www.youtube.com/watch?v=rwaaFwTluLU17:27
en_austinKiall: installed your file, restarting now. :)17:29
Kiall*fingers crossed*17:30
en_austinwell... both zones switched to ACTIVE in 2 minutes17:32
en_austinnew record created in ± 20 seconds, zone ACTIVE again17:33
en_austinI still do not sure, but seems that it's running much more better :)17:33
*** rudrajit has joined #openstack-dns17:35
en_austin20:34:52 record created, 20:35:06 it became resolvable, 20:35:44 zone switched to ACTIVE state17:35
en_austingreat!17:35
*** jasonsb has quit IRC17:36
Kiallen_austin: cool, give it a few hours before calling success tho ;)17:36
en_austinI still don't call that "a success", but 30sec is much more appropriate delay than 30 minutes :)17:36
Kialltimsim: you guys mentioned seeing some issues with PM lockups, right? Any clue if you have 100's or 1000's of greenthreads hanging on that lock?17:36
timsimIt's all MDNS lockups for us :(17:37
en_austinKiall: PM lockups is that issue I've reported couple of weeks ago to federico317:37
KiallOh, well - `kill -USR1 mdns-pid` and see what it's doing ;)17:37
Kiallen_austin: yea, I thought you and tim might be having a similar / the same issue17:37
*** jasonsb has joined #openstack-dns17:37
Kialltimsim: also, check the backlog for "backdoor" details - there'll be enough in the scrollback for you to figure that out, if you've never used it before.17:38
en_austinWell, I'll report that temporary bugfix to Launchpad in order to not miss it, if I will finally succeed with it :)17:38
Kiallen_austin: yea, if you can report back after seeing it work for longer than usual, that would be great.17:38
en_austinsure I will17:39
en_austinbtw, I've noticed that cache is cleared after each request (or smth similar to)17:39
en_austintoo much "cache miss" messages17:39
en_austin2016-01-29 20:39:59.108 17900 DEBUG designate.pool_manager.service [req-188291ac-235e-4a6a-86fe-01014d90a4de noauth-user noauth-project - - -] Clearing cache for domain aqa.int.zone. with action UPDATE. _clear_cache /opt/designate/designate/pool_manager/service.py:61817:40
en_austinaha.17:40
Kialllet me eyeball the code again to see if that's harmful (other than more work..)17:41
*** ducttape_ has joined #openstack-dns17:41
*** ducttape_ has quit IRC17:42
en_austin(yea, I was right, no any "cache hit" messages in PM logs since this patch)17:42
*** rsyed is now known as rsyed_away17:43
en_austinbut, I think that fast records propagation is more important that this cache...17:43
*** ducttape_ has joined #openstack-dns17:43
en_austin(I hope to)17:43
Kiallcache miss will be harmless, other than you're doing more work17:44
Kiallbut.. there should probably be at least some hits17:44
KiallWithout the lock, you're going to see more of those, as that codepath can be reached concurrently - previously, it could only be, at any given moment, running once per zone17:44
*** james_li has quit IRC17:45
*** rudrajit has quit IRC17:45
Kiallalso - noauth? lol17:45
Kiall:)17:45
en_austinyea, noauth, I'm running it in secure env with firewalled access to api17:46
en_austin:)17:46
*** jasonsb has quit IRC17:46
en_austinsince we're running both Heat and Designate in our own DevCloud17:46
en_austinI don't think that the concurrent zone status request is a something I should be afraid of :)17:48
en_austinit's slightly more important to propagate a records asap17:48
en_austinwell... 20 min, seems running well17:50
Kiall*fingers crossed*17:50
en_austinunfortunately that's Friday evening, there won't be too much activity for domain provisioning during weekend17:52
en_austinwe can collect more relevant data from Monday, but, if I will notice something strange - I'll report here and/or to Launchpad, for sure.17:52
en_austin....maybe, a MessagingTimeouts was caused by that lock too?17:53
KiallPossibly, if they were stuck waiting, they could have exceeded the timeouts18:02
en_austinKiall: I've just saw a following in my logs: PM called mdns for serial 1454090601, and after couple of seconds it called for serial 1454090600.18:04
en_austinIt's not a dangerous think?18:04
en_austinthing*18:04
en_austine.g X+1 called earlier, than X (maybe, that's those lock was for?)18:05
KiallNo, the lock wouldn't have helped there.. I suspect that's just fine, and is just stuff working it's way through. federico3 has a patch up that buffers notifies and prevents us every doing the 00 one18:06
en_austinso, I shouldn't care about X and X+1 serials. Okay, I've got it :)18:07
KiallOkay, outta here for the day18:09
en_austinoook18:10
en_austinit's time to drive home, I think18:10
en_austinfriday evening... :)18:10
en_austinwell, thanks a lot Kiall and all others who helped me18:10
en_austinhave a nice day :)18:11
*** en_austin has quit IRC18:13
*** james_li has joined #openstack-dns18:15
*** EricGonczer_ has quit IRC18:17
*** EricGonczer_ has joined #openstack-dns18:19
*** rudrajit has joined #openstack-dns18:22
*** james_li has quit IRC18:25
*** penick has joined #openstack-dns18:27
*** ccneill has quit IRC18:27
*** sonuk has joined #openstack-dns18:29
*** sonuk has quit IRC18:36
*** pcaruana has joined #openstack-dns18:57
*** markd_ has joined #openstack-dns18:59
*** bpokorny_ has joined #openstack-dns19:01
*** markd_ has quit IRC19:04
*** bpokorny has quit IRC19:04
*** ccneill has joined #openstack-dns19:10
*** pglass has joined #openstack-dns19:10
*** rudrajit_ has joined #openstack-dns19:15
*** rudrajit has quit IRC19:18
*** EricGonc_ has joined #openstack-dns19:23
*** EricGonc_ has quit IRC19:23
*** EricGonc_ has joined #openstack-dns19:24
*** pcaruana has quit IRC19:24
*** james_li has joined #openstack-dns19:24
*** EricGonczer_ has quit IRC19:25
*** EricGonczer_ has joined #openstack-dns19:25
*** EricGonc_ has quit IRC19:25
*** ducttape_ has quit IRC19:31
*** ducttape_ has joined #openstack-dns19:34
*** pcaruana has joined #openstack-dns19:37
*** rudrajit_ has quit IRC19:38
*** GonZoPT has quit IRC19:50
*** EricGonczer_ has quit IRC19:52
*** EricGonczer_ has joined #openstack-dns19:52
openstackgerritMerged openstack/designate: Ensure mocks are stopped after use  https://review.openstack.org/27372319:54
*** jasonsb has joined #openstack-dns19:59
*** rsyed_away is now known as rsyed19:59
*** bpokorny_ has quit IRC20:04
*** bpokorny has joined #openstack-dns20:04
*** pcaruana has quit IRC20:06
*** james_li_ has joined #openstack-dns20:07
*** pcaruana has joined #openstack-dns20:09
*** james_li has quit IRC20:10
*** pcaruana has quit IRC20:19
*** penick has quit IRC20:24
*** richm has quit IRC20:30
*** ducttape_ has quit IRC20:31
openstackgerritPaul Glass proposed openstack/designate: Update functional test TLDCLient  https://review.openstack.org/27422320:35
openstackgerritOpenStack Proposal Bot proposed openstack/designate: Updated from global requirements  https://review.openstack.org/27093820:44
*** richm has joined #openstack-dns20:46
*** james_li has joined #openstack-dns20:48
openstackgerritOpenStack Proposal Bot proposed openstack/python-designateclient: Updated from global requirements  https://review.openstack.org/27168920:49
*** ducttape_ has joined #openstack-dns20:49
*** james_li_ has quit IRC20:51
*** bpokorny_ has joined #openstack-dns20:58
*** jasonsb has quit IRC20:59
*** bpokorny has quit IRC21:02
*** rudrajit_ has joined #openstack-dns21:24
*** sonuk has joined #openstack-dns21:29
openstackgerritJames Li proposed openstack/designate: Make supported record type configurable  https://review.openstack.org/27290221:56
*** ducttape_ has quit IRC22:01
openstackgerritJames Li proposed openstack/designate: Fix wildcard NS record  https://review.openstack.org/26836222:24
*** RA_ has joined #openstack-dns22:28
*** ducttape_ has joined #openstack-dns22:34
*** sonuk has quit IRC22:36
*** RA_ has quit IRC22:36
*** rsyed is now known as rsyed_away22:56
*** johnbelamaric has quit IRC22:57
*** bpokorny_ has quit IRC23:00
*** bpokorny has joined #openstack-dns23:01
*** penick has joined #openstack-dns23:03
*** EricGonczer_ has quit IRC23:07
*** EricGonczer_ has joined #openstack-dns23:09
*** pglass has quit IRC23:13
*** EricGonczer_ has quit IRC23:23
*** EricGonczer_ has joined #openstack-dns23:23
openstackgerritTim Simmons proposed openstack/designate: Actually poll for zone deletes  https://review.openstack.org/27429123:28
*** EricGonczer_ has quit IRC23:34
*** EricGonczer_ has joined #openstack-dns23:37
*** EricGonczer_ has quit IRC23:37
openstackgerritTim Simmons proposed openstack/designate: Add an on-demand single-target sync method  https://review.openstack.org/27429223:42
*** james_li has quit IRC23:52
*** ducttape_ has quit IRC23:52

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!