*** nicolasbock has quit IRC | 02:52 | |
*** aniketh has joined #openstack-dns | 04:53 | |
*** trident has quit IRC | 07:10 | |
*** trident has joined #openstack-dns | 07:18 | |
*** trident has quit IRC | 07:22 | |
*** trident has joined #openstack-dns | 07:30 | |
*** ivve has joined #openstack-dns | 07:50 | |
*** salmankhan has joined #openstack-dns | 09:47 | |
*** salmankhan has quit IRC | 09:50 | |
*** salmankhan has joined #openstack-dns | 09:51 | |
*** salmankhan has quit IRC | 09:51 | |
*** salmankhan has joined #openstack-dns | 09:52 | |
*** trident has quit IRC | 10:23 | |
*** bnemec has quit IRC | 10:23 | |
*** frickler has quit IRC | 10:23 | |
*** stingrayza has quit IRC | 10:23 | |
*** salmankhan has quit IRC | 10:24 | |
*** eandersson has quit IRC | 10:24 | |
*** trident has joined #openstack-dns | 10:24 | |
*** frickler has joined #openstack-dns | 10:24 | |
*** irclogbot_0 has quit IRC | 10:25 | |
*** salmankhan has joined #openstack-dns | 10:27 | |
*** stingrayza has joined #openstack-dns | 10:27 | |
*** irclogbot_2 has joined #openstack-dns | 10:27 | |
*** bnemec has joined #openstack-dns | 10:28 | |
*** brensen has joined #openstack-dns | 10:48 | |
brensen | Hi guys, we are having an issue for a day now running python-designate-7.0.0-1.el7.noarch where mdns is logging 2019-08-30 10:45:12.056 63 WARNING designate.mdns.xfr [req-ceaa5078-aced-4e8e-910e-e4a8b59d9dc2 c1f975ffe36b4de5855540b1cd7f1c0a f32a603f32624870afb39311f9b89e3f - - -] XFR failed for XXX.XXX.XXX.. No servers in [] was reached.: XFRFailur | 10:51 |
---|---|---|
brensen | e: XFR failed for XXX.XXX.XXX.. No servers in [] was reached. | 10:51 |
brensen | and no notifications are send out | 10:52 |
brensen | we are trying to find out where this list should be coming from but so far without succes | 10:52 |
brensen | any clues? | 10:53 |
brensen | pool show_config shows correct information | 10:53 |
brensen | database inspection also did not reveal anything unexpected | 10:54 |
*** nicolasbock has joined #openstack-dns | 11:05 | |
mugsie | brensen: and if you ssh to the mdns node, and try a `nc XXX.XXX.XXX.XXX 53` does it work? | 11:08 |
brensen | yes it seems to be running fine | 11:08 |
brensen | can also dig it, and get the latest expected serial etc | 11:09 |
brensen | it seems it's getting to this point with an empty list | 11:09 |
brensen | for srv in servers: to = eventlet.Timeout(timeout) log_info = {'name': zone_name, 'host': srv} try: LOG.info("Doing AXFR for %(name)s from %(host)s", log_info) xfr = dns.query.xfr(srv['host'], zone_name, relativize=False, timeout=1, port=srv['port'], source=source) | 11:09 |
brensen | oh | 11:09 |
brensen | formatting.... | 11:09 |
brensen | def do_axfr(zone_name, servers, timeout=None, source=None): | 11:10 |
brensen | there servers is empty | 11:10 |
brensen | dnsutils.py | 11:11 |
mugsie | can you do a `openstack zone list --all-projects` and see if there is any secondary zones created? | 11:11 |
brensen | let me check | 11:11 |
mugsie | that comes from secondary zones, not normal operations | 11:11 |
brensen | only type 'PRIMARY' in the list | 11:11 |
brensen | now I got a couple in PENDING state when I ran a 'zone set' on it | 11:12 |
mugsie | the 'do_axfr' method should only be used for secondary (designate pulling from other DNS servers) | 11:12 |
brensen | ah cool, thanks | 11:13 |
brensen | so this should not happen at all | 11:13 |
brensen | hmmm | 11:13 |
mugsie | try `openstack zone list --type SECONDARY --all-projects` ? | 11:14 |
brensen | returns nothing | 11:15 |
mugsie | can you see if there is one in the DB? someone might have created and deleted it | 11:15 |
mugsie | and the old "pull" zone task could still be active | 11:16 |
brensen | ah good idea | 11:16 |
brensen | let me check | 11:16 |
brensen | how do I post snippets via webchat here? | 11:16 |
brensen | only PRIMARY in DB | 11:17 |
mugsie | paste.openstack.org | 11:17 |
mugsie | is XXX.XXX.XXX a domain, or an IP | 11:17 |
brensen | http://paste.openstack.org/raw/SweuGPPL1LgY0FWM0JaS/ | 11:17 |
brensen | in the log it shows domain names | 11:18 |
mugsie | the config looks good | 11:19 |
mugsie | did you try restarting the mdns service? | 11:19 |
brensen | yeah it did not change and worked fine | 11:19 |
brensen | yes we've redeployed all the components multiple times already | 11:20 |
brensen | we are running them under nomad in containers | 11:20 |
mugsie | OK . | 11:20 |
mugsie | let me dig a little - brb | 11:20 |
brensen | there used to be a zookeeper issue reported from central, but that was fixed | 11:20 |
brensen | thanks man! | 11:20 |
mugsie | OK - something is sending a notify to mdns for that zone I think | 11:25 |
mugsie | is there a line "Scheduling AXFR for %(zone_id)s" in the mdns logs? | 11:26 |
brensen | it happens whenever we do a `openstack zone set blah.cloud.` | 11:26 |
brensen | nope | 11:26 |
mugsie | is there a "Triggered XFR for" log in the API logs? | 11:27 |
brensen | 2019-08-30 10:45:12.051 72 INFO designate.utils [-] Opening UDP Listening Socket on 172.17.49.14:25453[00m2019-08-30 10:45:12.051 70 INFO designate.service [-] _handle_udp thread started[00m | 11:27 |
brensen | lots of these | 11:27 |
mugsie | lots? that should be at boot | 11:27 |
brensen | 2019-08-30 10:45:11.981 52 INFO designate.mdns.base [-] Initialized mDNS notify endpoint[00m2019-08-30 10:45:11.982 52 INFO designate.mdns.base [-] Initialized mDNS xfr endpoint[00m | 11:27 |
brensen | hmmm | 11:28 |
brensen | http://paste.openstack.org/raw/II8gwc84YTkRdlUgHB88/ | 11:29 |
brensen | it does seem to align with workers=10 | 11:30 |
mugsie | openstack dns service list ? | 11:31 |
mugsie | I am assuming 25453 is mapped to 5453 by nomad? | 11:32 |
brensen | that service list seems to be outdated, it lists al lot of old hostnames | 11:33 |
brensen | let me double check that port mapping | 11:33 |
brensen | why you ask about 25453 <-> 5453? | 11:35 |
brensen | am I missing something? | 11:35 |
mugsie | ah, no I miss read your pools config | 11:36 |
mugsie | and the zone doesn't work? | 11:36 |
mugsie | is there anyway to get some more of the logs around the error? | 11:37 |
brensen | well it's on the pdns master already, but it does not update anymore, until the master determines the zone is stale an initiate a transfer itself | 11:37 |
mugsie | if you search across all the logs from the services for the request id does it show anything? | 11:37 |
brensen | mdns does not seem to send out a notification anymore | 11:37 |
brensen | let me check, our centralised logging is not worky so we have to check all containers separately, 1 sec | 11:38 |
brensen | does it matter if the service list is not correct? | 11:40 |
brensen | it still lists old worker nodes with status UP | 11:41 |
mugsie | no, thats kind of expected | 11:41 |
mugsie | (that feature is problematic) | 11:41 |
brensen | it all worked fine after some re-deployments, never actually looked at the service list tbh | 11:41 |
brensen | ok | 11:41 |
brensen | 2019-08-30 11:45:49.834 18 WARNING designate.central.service [-] Managed Resource Tenant ID is not properly configured[00m | 11:47 |
brensen | i that important? | 11:47 |
*** jawad_axd has joined #openstack-dns | 11:48 | |
brensen | could it be related to some zookeeper issues? | 11:50 |
brensen | what does it use zookeeper for? | 11:50 |
brensen | 2019-08-30 11:52:36.984 23 INFO designate.mdns.rpcapi [req-328ad6f4-a8a1-44f5-8a2b-b52efc4d0c2e c1f975ffe36b4de5855540b1cd7f1c0a f32a603f32624870afb39311f9b89e3f - - -] perform_zone_xfr: Calling mdns for zone blah.cloud.[00m | 11:53 |
brensen | from central log | 11:54 |
brensen | I really cannot find anything else related in the logs | 12:00 |
brensen | http://paste.openstack.org/raw/uYt99EuvWJt4ECHWku41/ | 12:02 |
brensen | provisioner UNMANAGED <- is that correct? | 12:02 |
mugsie | that is | 12:13 |
mugsie | it is set up for private (per project ) pools | 12:14 |
brensen | ok thanks | 12:14 |
mugsie | but they don't exist yet, so they are all unmanagd | 12:14 |
mugsie | what is before the line for calling perform_zone_xfr | 12:14 |
mugsie | ? | 12:14 |
mugsie | there is no way the "perform_zone_xfr" should be called if the zone is not SECONDARY | 12:16 |
brensen | http://paste.openstack.org/raw/NQkLsAmlw1gji6yEJQht/ | 12:16 |
brensen | it's happening for all zones right now, which are all PRIMARY | 12:19 |
mugsie | does designate-producer have anythig in its logs? | 12:24 |
brensen | http://paste.openstack.org/raw/l0m3sLxbFrZbA1C0dSub/ | 12:26 |
mugsie | can you run with DEBUG on? | 12:27 |
brensen | I think we have it on, or we misconfigured something | 12:30 |
brensen | ah that might not be the case, let me fix that | 12:32 |
brensen | restarting with debug=true | 12:35 |
* mugsie is grabbing some lunch, will be back in a bit | 12:41 | |
brensen | enjoy | 12:49 |
brensen | http://paste.openstack.org/raw/CX8OK4JnSKDqH5MSHKmt/ | 12:49 |
*** jawad_axd has quit IRC | 13:02 | |
*** ygk_12345 has joined #openstack-dns | 13:09 | |
ygk_12345 | hi all | 13:09 |
ygk_12345 | i am having issues with designate. the zones I create are not transferring properly. It is intermittently working | 13:09 |
ygk_12345 | can someone helpme pelase | 13:10 |
ygk_12345 | I see this error often | 13:10 |
ygk_12345 | Stderr: u'rndc: connection to remote host closed\nThis may indicate that\n* the remote server is using an older version of the command protocol,\n* this host is not authorized to connect,\n* the clocks are not synchronized,\n* the key signing algorithm is incorrect, or\n* the key is invalid.\n'. | 13:11 |
*** KeithMnemonic has joined #openstack-dns | 13:21 | |
mugsie | ygk_12345: can you run rndc manually from the command line? | 13:36 |
ygk_12345 | from the backend bind server u mean ? | 13:36 |
mugsie | from the designate server | 13:36 |
ygk_12345 | or in teh designate coantiners ? | 13:36 |
ygk_12345 | what is the full command ? | 13:36 |
mugsie | in the containers | 13:36 |
mugsie | it should be logged just before that line afaik | 13:36 |
ygk_12345 | mugsie what command do I need to run ? | 13:39 |
mugsie | ygk_12345: it should be in the logs just before the error you pasted | 13:39 |
ygk_12345 | oh ok | 13:40 |
ygk_12345 | let me try once | 13:40 |
ygk_12345 | mugsie i see this error | 13:42 |
ygk_12345 | mugsie http://paste.openstack.org/show/767640/ | 13:42 |
mugsie | rndc -s 172.29.236.18 -p 953 addzone trsuted-zone.com '{ type slave; masters { 172.29.236.103 port 5354;}; file "slave.trsuted-zone.com.9b6d5299-3c18-4324-833a-b8a48f20eece"; }; | 13:43 |
mugsie | bash: syntax error near unexpected token `}' | 13:43 |
mugsie | you are missing the first "' ' " | 13:43 |
ygk_12345 | oh ok | 13:44 |
ygk_12345 | mugsie can you paste the exact command please. I am unable to figure out the syantx | 13:46 |
mugsie | /openstack/venvs/designate-18.1.9/bin/designate-rootwrap /etc/designate/rootwrap.conf rndc -s 172.29.236.18 -p 953 addzone trsuted-zone.com '{ type slave; masters { 172.29.236.103 port 5354;}; file "slave.trsuted-zone.com.9b6d5299-3c18-4324-833a-b8a48f20eece"; }; | 13:47 |
ygk_12345 | mugsieit is sshowing > in the next line | 13:48 |
ygk_12345 | mugsie got it now | 13:49 |
ygk_12345 | it is working one second and not working the other second | 13:49 |
mugsie | is there anything in the bind logs? | 13:51 |
ygk_12345 | mugsie when it works the zone transfer is very slow and sometime errors out.. buit when I restart the workers in the two containers , it works then and again after sometime gets back to that same state | 13:54 |
mugsie | workers, or the mdns services? | 13:55 |
ygk_12345 | workers | 13:57 |
ygk_12345 | now I am not able to delete the zones | 13:57 |
ygk_12345 | Stderr: u"rndc: 'delzone' failed: not found\nno matching zone 'fun-zone.com' in any view\n" | 13:57 |
mugsie | what does the bind logs say? | 13:57 |
* mugsie has a meeting, will be back | 13:58 | |
ygk_12345 | showing this http://paste.openstack.org/show/767642/ | 14:09 |
ygk_12345 | Aug 30 14:10:19 dns named[2544]: invalid command from 172.29.236.103#59741: expired (from the bind logs) | 14:10 |
*** ygk_12345 has quit IRC | 14:24 | |
frickler | sound like there may be a clock sync issue | 14:48 |
brensen | we have 2 older instances of designate running, and I'm comparing the DB, seems like in the newer nothing is set in pool_attributes table while in the older it contains a key "internal" | 14:51 |
brensen | could that be related? | 14:51 |
*** ivve has quit IRC | 14:51 | |
brensen | inspecting the object just before the XFR stuff shows: {u'transferred_at': None, u'attributes': OVO Objects, u'masters': OVO Objects} | 14:53 |
brensen | what does that mean? | 14:53 |
mugsie | it means that the masters have been set on the zone :/ | 14:59 |
mugsie | (which should only happen when it is a secondard) | 14:59 |
mugsie | secondary* | 14:59 |
brensen | that sound like it should not happen in our case | 15:00 |
mugsie | the pool attributes are just key value pairs, for when you have multiple pools, and want to schedule zones between then | 15:00 |
mugsie | them* | 15:00 |
brensen | ok | 15:00 |
brensen | so somehow it thinks the zones are SECONDARY | 15:01 |
mugsie | it seems son | 15:01 |
brensen | in the logs it shows as PRIMARY | 15:01 |
mugsie | so* | 15:01 |
brensen | printing the object reveals: {u'transferred_at': None, u'attributes': OVO Objects, u'masters': OVO Objects} <Zone id:'803841a6-30dd-4103-880e-7c721cc38387' type:'PRIMARY' name:'blah.cloud.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1567177080' action:'UPDATE' status:'PENDING'> | 15:02 |
mugsie | can you print zone.masters? | 15:02 |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/designate master: New service layer https://review.opendev.org/678432 | 15:03 |
brensen | we are checking... | 15:03 |
brensen | <ZoneMaster count:‘0’ object:‘ZoneMasterList’> | 15:04 |
mugsie | ok. so not trying it as a secondary then | 15:07 |
mugsie | brensen: does https://opendev.org/openstack/designate/src/branch/master/designate/producer/tasks.py#L221 get logged at all? | 15:07 |
brensen | that would be on producer? | 15:08 |
mugsie | yeah | 15:08 |
brensen | checking | 15:10 |
brensen | I don't think we saw something like this before, and we don't have much history (anymore) | 15:11 |
brensen | how often does it do this/ | 15:11 |
brensen | ah it's configurable on the zones, let me check | 15:12 |
brensen | it does not much more than: 2019-08-30 15:11:27.064 7 INFO designate.producer.tasks [req-0b130db0-a3ba-46a5-a81a-63c5dbeb3adf - - - - -] Recovering zones for shards 0 to 4094[00m | 15:12 |
mugsie | is that in debug mode? | 15:20 |
mugsie | wth is that call coming from :/ | 15:20 |
brensen | ? | 15:23 |
mugsie | talkign to myself :) | 15:24 |
mugsie | I honestly have no idea why it is calling do_afxr() | 15:25 |
brensen | haha, I need to attend to the kids, we're still really stuck, but I need to leave it for now | 15:25 |
brensen | super thanks for your time and effort, we'll keep digging | 15:26 |
*** eandersson has joined #openstack-dns | 15:30 | |
*** ygk_12345 has joined #openstack-dns | 15:51 | |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/designate master: Refactored service layer https://review.opendev.org/678432 | 15:52 |
mugsie | ygk_12345: are the clocks in sync? as frickler says ^ is sounds like a clock issue | 15:54 |
ygk_12345 | mugsie before that I have some confusion. can you clear that for me please | 15:55 |
mugsie | sure - what is the confusion? | 15:56 |
ygk_12345 | mugsie i have an openstack ansible rocky setup with two controller. so I have two designate controllers in all. I have setup a backend bind server for the designate. Apart from that , should I also install bind9 servers in both the designate containers as well ? | 15:56 |
mugsie | no, just the server that you are controlling from designate | 15:57 |
ygk_12345 | so no need of the bind9 servers in the designate containers ? | 15:58 |
ygk_12345 | shall I delete them now ? | 15:58 |
mugsie | are they in your pools.yaml file/ | 15:58 |
mugsie | ?* | 15:58 |
ygk_12345 | nope | 15:58 |
mugsie | yeah, you can delete them then | 15:59 |
ygk_12345 | ok i will delete them now and I will check the clock issue after that and let you know | 15:59 |
ygk_12345 | mugsie one container is lagging 1 minute and a half or so behind other container and the dns server | 16:02 |
*** ivve has joined #openstack-dns | 16:03 | |
*** ginopc has quit IRC | 16:05 | |
mugsie | ygk_12345: ok that needs to be fixed | 16:09 |
ygk_12345 | mugsie now when I deleted those bind9 servers on the containers, I am seeing this | 16:09 |
ygk_12345 | mugsie Stderr: u'rndc: neither /etc/bind/rndc.conf nor /etc/bind/rndc.key was found\ | 16:09 |
ygk_12345 | on the containers | 16:09 |
ygk_12345 | in the worker logs | 16:09 |
mugsie | OK, that looks weird - you were using the bind9 servers to store the RNDC keys? | 16:10 |
ygk_12345 | the only bind9 server I now have is the backend dns server for the designate | 16:10 |
ygk_12345 | but how to fix the clock on the contianer ? it is ubuntu 18 | 16:11 |
mugsie | the host needs the time synced - you need to look at ntp | 16:12 |
ygk_12345 | shall I install ntp server on that container ? | 16:12 |
mugsie | this should have been set up as part of the openstack ansible install | 16:13 |
mugsie | I honestly don't know what they do - #openstack-ansible is the best place to ask | 16:13 |
jrosser | OSA installs chrony out-of-the-box | 16:46 |
mugsie | jrosser: I thought it did | 16:52 |
*** bnemec is now known as beekneemech | 16:53 | |
*** salmankhan has quit IRC | 17:13 | |
eandersson | mugsie, pretty happy with the service refactor now | 17:48 |
eandersson | if you or frickler have some time over please take a look https://review.opendev.org/#/c/678432/ | 17:48 |
eandersson | a little unfortunate that the ssl piece does not work on py3 | 17:48 |
*** brensen has quit IRC | 18:13 | |
*** ygk_12345 has quit IRC | 18:17 | |
*** KeithMnemonic has quit IRC | 19:22 | |
*** aniketh has quit IRC | 21:39 | |
*** ivve has quit IRC | 23:10 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!