*** hamalq has joined #openstack-dns | 00:14 | |
*** hamalq has quit IRC | 00:54 | |
*** hamalq has joined #openstack-dns | 04:54 | |
hamalq | mgagne: jens: hi how are u, if u have time lets discuss the bug https://bugs.launchpad.net/designate/+bug/1875939 i think it will be great a feature to add and the use can always have an option to disable through config (we can add an option for that) | 04:57 |
---|---|---|
openstack | Launchpad bug 1875939 in Designate "DNS notification based on TSIG is not supported" [Wishlist,Triaged] | 04:57 |
hamalq | mugsie_: sorry wrong mention ^^ | 04:59 |
hamalq | frickler: ^^ | 05:01 |
*** hamalq has quit IRC | 05:04 | |
*** hamalq has joined #openstack-dns | 05:06 | |
*** hamalq has quit IRC | 05:38 | |
*** hamalq has joined #openstack-dns | 05:59 | |
*** hamalq has quit IRC | 06:04 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/designate master: Imported Translations from Zanata https://review.opendev.org/731655 | 06:05 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/designate-dashboard master: Imported Translations from Zanata https://review.opendev.org/731660 | 06:26 |
*** hamalq has joined #openstack-dns | 06:35 | |
*** hamalq has quit IRC | 06:40 | |
frickler | hamalq: sorry, I wanted to answer earlier, but you were offline. for your neutron bug let's move to #openstack-neutron, as neutron cores will eventually have to decide on that patch | 06:40 |
frickler | there isn't a regular IRC meeting anymore, due to lack of interest | 06:41 |
frickler | regarding the TSIG feature I personally have not interest in that, maybe you can try to convince eandersson, nicolasbock or mugsie of it | 06:42 |
*** njohnston has quit IRC | 07:47 | |
*** salmankhan has joined #openstack-dns | 08:32 | |
*** salmankhan has quit IRC | 08:40 | |
*** salmankhan has joined #openstack-dns | 08:41 | |
*** hamalq has joined #openstack-dns | 09:33 | |
*** hamalq has quit IRC | 09:37 | |
*** sorin-mihai has joined #openstack-dns | 10:02 | |
*** sorin-mihai_ has quit IRC | 10:04 | |
*** sorin-mihai_ has joined #openstack-dns | 10:21 | |
*** sorin-mihai has quit IRC | 10:22 | |
*** sorin-mihai has joined #openstack-dns | 10:26 | |
*** sorin-mihai_ has quit IRC | 10:28 | |
*** njohnston has joined #openstack-dns | 11:02 | |
openstackgerrit | Merged openstack/designate master: Imported Translations from Zanata https://review.opendev.org/731655 | 11:31 |
*** sorin-mihai has quit IRC | 12:16 | |
*** sorin-mihai has joined #openstack-dns | 12:16 | |
openstackgerrit | Merged openstack/designate master: Cap jsonschema 3.2.0 as the minimal version https://review.opendev.org/730944 | 12:30 |
openstackgerrit | Merged openstack/designate-dashboard master: Imported Translations from Zanata https://review.opendev.org/731660 | 12:30 |
*** bnemec is now known as beekneemech | 15:25 | |
*** sorin-mihai has quit IRC | 15:28 | |
*** sorin-mihai_ has joined #openstack-dns | 15:28 | |
*** also_stingrayza has joined #openstack-dns | 15:50 | |
*** stingrayza has quit IRC | 15:53 | |
*** hamalq has joined #openstack-dns | 16:20 | |
*** hamalq has quit IRC | 16:25 | |
*** hamalq has joined #openstack-dns | 16:30 | |
openstackgerrit | Sean McGinnis proposed openstack/designate master: Use unittest.mock instead of third party mock https://review.opendev.org/721036 | 16:35 |
*** sorin-mihai_ has quit IRC | 17:02 | |
*** sorin-mihai has joined #openstack-dns | 17:02 | |
*** sorin-mihai has quit IRC | 17:11 | |
*** sorin-mihai has joined #openstack-dns | 17:12 | |
*** hamalq has quit IRC | 17:19 | |
*** hamalq has joined #openstack-dns | 17:19 | |
*** hamalq has quit IRC | 17:22 | |
*** hamalq has joined #openstack-dns | 17:22 | |
*** sorin-mihai_ has joined #openstack-dns | 17:33 | |
*** sorin-mihai has quit IRC | 17:34 | |
*** salmankhan1 has joined #openstack-dns | 17:53 | |
*** salmankhan has quit IRC | 17:57 | |
*** salmankhan1 has quit IRC | 17:57 | |
hamalq | eandersson: nicolasbock: mugsie: hi how are u, if u have time i would like to discuss https://bugs.launchpad.net/designate/+bug/1875939 | 18:05 |
openstack | Launchpad bug 1875939 in Designate "DNS notification based on TSIG is not supported" [Wishlist,Triaged] | 18:05 |
hamalq | what am trying to do is to add TSIG signe to the also_notify and add some support for the split view to designate | 18:06 |
*** sorin-mihai has joined #openstack-dns | 18:07 | |
*** sorin-mihai_ has quit IRC | 18:08 | |
nicolasbock | Hi hamalq! Today is going to be difficult for me but Monday or any other time next week works. | 18:26 |
hamalq | nicolasbock: hi, sure Monday is good for me too, i will send u in Monday | 18:27 |
nicolasbock | Thanks | 18:28 |
* andrewbogott is still in the market for a designate debugging buddy if anyone needs a distraction | 18:32 | |
* andrewbogott 's city is in flames so he /really/ needs a distraction | 18:33 | |
*** sorin-mihai_ has joined #openstack-dns | 18:41 | |
*** sorin-mihai has quit IRC | 18:42 | |
*** sorin-mihai has joined #openstack-dns | 18:46 | |
*** sorin-mihai_ has quit IRC | 18:47 | |
hamalq | andrewbogott: whats is the issue u trying to solve? | 18:49 |
andrewbogott | hamalq: I'm seeing a variety of issues with pdns4 not syncing properly with designate | 18:49 |
andrewbogott | and no errors or warnings as far as I can see | 18:49 |
andrewbogott | Basically everything in designate is marked as PENDING which take to mean it's trying to sync and waiting for a response or failing | 18:50 |
hamalq | did u check the pdns using command line to see its working right? | 18:50 |
andrewbogott | I'm just now curling to the pdns api from both of my designate hosts, and that seems to work fine | 18:51 |
andrewbogott | (for getting a list of zones at least) | 18:51 |
andrewbogott | and digging against pdns works for things that pdns knows about | 18:52 |
andrewbogott | (Is that what you mean, or is there another cli test you suggest?) | 18:52 |
hamalq | there is pdnsutil that can do that? but using APIs should be ok. do u use TSIG | 18:53 |
andrewbogott | I don't think so but I can check :) Is that a flag in pdns.conf? | 18:54 |
andrewbogott | grep -i tsig /etc/powerdns/pdns.conf returns nothing | 18:55 |
hamalq | so u did not create TSIG for zone | 18:55 |
andrewbogott | nope | 18:56 |
andrewbogott | btw, I see failures both with creating new zones AND with adding new records to an existing zone. | 18:56 |
andrewbogott | As I understand it those use different code paths? (Or they did with pdns3, maybe they're both API calls with pdns4) | 18:57 |
hamalq | https://docs.openstack.org/designate/pike/contributor/backends/pdns4.html | 18:58 |
andrewbogott | oh yeah, I've read that a dozen times in the last couple of days :) | 18:58 |
andrewbogott | lemme redact my pools.yaml and paste, hang on | 18:58 |
andrewbogott | ok, here it comes | 18:59 |
andrewbogott | https://www.irccloud.com/pastebin/sElAw8fU/ | 19:00 |
andrewbogott | that's the output of 'designate-manage pool generate_file' so should reflect the actual state in the designate db | 19:00 |
andrewbogott | I assume that any relevant warnings would be in the worker log but maybe I'm looking in the wrong place? I have my error level set to WARN but tried turning on DEBUG and didn't see anything new there (other than a million heartbeats) | 19:02 |
hamalq | mmmm designate-manage powerdns sync <pool_id> | 19:03 |
hamalq | personally i prefer this https://docs.openstack.org/designate/pike/contributor/backends/powerdns.html | 19:03 |
hamalq | but it does not work for pdns4 | 19:04 |
andrewbogott | hamalq: iirc the powerdns sync command was for pdns3 and doesn't do anything with the pdns4 backend | 19:05 |
andrewbogott | hm, is there a good cli way to test mdns? Should it respond to 'dig @'? | 19:06 |
hamalq | yeah i know, but i still like that approach , mdns (dig command should do that) | 19:07 |
andrewbogott | ok, confirmed, if I dig @ mdns it responds with correct records | 19:08 |
andrewbogott | (it knows about the things that are missing from pdns) | 19:08 |
hamalq | then try to create a record in pdns using the API | 19:09 |
andrewbogott | good idea, googling for the syntax for that... | 19:12 |
andrewbogott | it's telling me '{"error": "Key 'name' not present or not a String"}' which is surely from a typo but I can't find it | 19:18 |
andrewbogott | curl PATCH --data '{"rrsets": [ {"name":"andrewtesttest.andrewtest.example.org.", "type":"A", "ttl":30, "changetype":"REPLACE", "records": [ {"content": "192.0.5.4", "disabled": false } ] } ] }' -s -H 'X-API-Key: <redacted>' http://208.80.154.135:8081/api/v1/servers/localhost/zones | 19:18 |
hamalq | zones/zone_name | 19:20 |
andrewbogott | oh! | 19:21 |
andrewbogott | ok, trying... | 19:21 |
andrewbogott | you mean in the url? Like http://208.80.154.135:8081/api/v1/servers/localhost/zone_name ? | 19:22 |
andrewbogott | or http://208.80.154.135:8081/api/v1/servers/localhost/zone_name/andrewtest.example.org ? | 19:23 |
andrewbogott | Both get me a 'not found' | 19:23 |
hamalq | http://127.0.0.1:8081/api/v1/servers/localhost/zones/example.org. | 19:23 |
*** sorin-mihai_ has joined #openstack-dns | 19:24 | |
andrewbogott | hm, 'Method Not Allowed' — that's interesting! | 19:24 |
andrewbogott | let me see if the same happens on my working install | 19:24 |
hamalq | am not sure though about the API i never used it before ( did u check the API key is right) | 19:25 |
*** sorin-mihai has quit IRC | 19:25 | |
andrewbogott | yes, creating the zone worked, but creating the record fails | 19:25 |
andrewbogott | hm, nope, 'Method Not Allowed' on the working host as well | 19:26 |
hamalq | https://n40lab.wordpress.com/2015/05/16/centos-7-using-the-powerdns-web-api-to-add-and-edit-records/ | 19:29 |
andrewbogott | I think that agrees with what I'm doing already... | 19:33 |
eandersson | Not super familiar with the TSIG code :'( | 19:34 |
eandersson | or how TSIG works in general | 19:34 |
andrewbogott | this may be a red herring since I'm pretty sure records are sync'd via xfr and not the API anyway | 19:35 |
hamalq | it seems so https://github.com/openstack/designate/blob/master/designate/backend/impl_pdns4.py | 19:38 |
eandersson | Are you seeing any errors in the logs? | 19:38 |
andrewbogott | eandersson: complaints about stale domains but no actual errors during sync attempts | 19:38 |
andrewbogott | (this would be the -worker logs right?) | 19:38 |
eandersson | If you do an AXFR against both designate and pdns does it match? | 19:39 |
andrewbogott | e.g. 'Found 5 zones PENDING for more than 455 seconds' | 19:39 |
andrewbogott | eandersson: is there a way to do axfr by hand? | 19:39 |
andrewbogott | I mean, I can see in the pdns database that it doesn't know about a bunch of things | 19:39 |
andrewbogott | so I'd expect them to be missing from xfr as well | 19:40 |
eandersson | sure jsut do dig @localhost <zone> AXFR | 19:40 |
eandersson | and do the same against both pdns and designate | 19:40 |
andrewbogott | ah, ok! stay tuned... | 19:40 |
andrewbogott | by 'designate' you mean mdns? | 19:40 |
eandersson | yep | 19:41 |
eandersson | dig @208.80.153.109 -p 5354 <zone> AXFR | 19:42 |
eandersson | I believe | 19:42 |
hamalq | yub that should could call the mdns service | 19:43 |
andrewbogott | against mdns I get lots of things | 19:43 |
andrewbogott | against pdns I get | 19:43 |
andrewbogott | https://www.irccloud.com/pastebin/MJDB7cQK/ | 19:43 |
andrewbogott | vs | 19:43 |
andrewbogott | https://www.irccloud.com/pastebin/6enRBvVj/ | 19:43 |
eandersson | What about with just the ip? 208.80.153.109 (e.g. dig @208.80.153.109 <zone> AXFR) | 19:44 |
eandersson | but something looks off with pdns there | 19:44 |
andrewbogott | same, Transfer failed. | 19:45 |
eandersson | Is it the same against both ns0 and ns1? | 19:45 |
andrewbogott | yep, same | 19:45 |
eandersson | Does any of the servers actually resolve? | 19:45 |
eandersson | *a records | 19:45 |
andrewbogott | they do, yes | 19:46 |
andrewbogott | log says | 19:46 |
andrewbogott | May 29 19:45:51 cloudservices2003-dev pdns_server[1167]: AXFR of domain 'codfw1dev.wikimedia.cloud' failed: 208.80.153.76 cannot request AXFR | 19:46 |
andrewbogott | so that suggests there's an ACL someplace that I need to fill in | 19:46 |
andrewbogott | full log snippet is | 19:46 |
andrewbogott | https://www.irccloud.com/pastebin/WC9oNqDU/ | 19:46 |
andrewbogott | let me see if that happens on my working system | 19:46 |
eandersson | https://github.com/openstack/designate/blob/master/devstack/designate_plugins/backend-pdns4 | 19:47 |
eandersson | This is always a good reference | 19:47 |
hamalq | allow-axfr-ips - must list the IPs of the Designate nodes, which will be located on the OpenStack API nodes | 19:47 |
andrewbogott | nope, same ' | 19:47 |
andrewbogott | same 'Transfer failed' on my working host | 19:48 |
andrewbogott | but let me try that nevertheless! | 19:48 |
eandersson | https://github.com/openstack/designate/blob/master/devstack/designate_plugins/backend-pdns4#L105 | 19:48 |
andrewbogott | sorry, is there supposed to be an 'allow-axfr-ips' config documented someplace? I don't see that in your links | 19:49 |
andrewbogott | ok, found it elsewhere, trying... | 19:50 |
eandersson | https://doc.powerdns.com/authoritative/settings.html#allow-axfr-ips | 19:50 |
eandersson | Can you try dig @localhost <zone> AXFR | 19:51 |
eandersson | on the host itself | 19:51 |
eandersson | if that works its the allow-axfr-ips that is causing the issues | 19:51 |
andrewbogott | adding that setting makes the dig AXFR work. | 19:54 |
andrewbogott | Now testing to see if that actually fixes things :) | 19:54 |
andrewbogott | still testing — so far no real sign that this has changed anything (although the change seems correct in any case) | 20:02 |
andrewbogott | yeah, created a new record and it's stuck in | PENDING | CREATE | | 20:03 |
eandersson | Do you still see the AXFR errors in the pdns logs? | 20:04 |
andrewbogott | I only saw those when doing an explicit xfr request with dig | 20:04 |
eandersson | I see | 20:04 |
andrewbogott | backing up… my understanding is that pdns does an axfr request of mdns | 20:04 |
andrewbogott | but not the other way around | 20:04 |
eandersson | Yea | 20:04 |
eandersson | You are right | 20:04 |
andrewbogott | so even though adding that allow-axfr thing makes sense, I don't think it allows a thing that we need for this particular problem | 20:05 |
andrewbogott | (although could be useful for debugging/comparing) | 20:05 |
eandersson | you may need to allow notify | 20:05 |
eandersson | but pretty sure it allows that by default | 20:05 |
* andrewbogott tries it | 20:06 | |
eandersson | Are there no errors in designate? | 20:06 |
andrewbogott | yeah, it says Default: 0.0.0.0/0,::/0 | 20:06 |
eandersson | Also, try to create a dummy record to bump the serial | 20:06 |
andrewbogott | eandersson: only complaints about domains being stuck in pending | 20:06 |
hamalq | from the pdns servers can u do the AXFR to designate? | 20:07 |
andrewbogott | I've been testing by creating a new VM and confirming that the new record appears in designate | 20:07 |
andrewbogott | that should be equivalent to creating the dummy record I think | 20:07 |
eandersson | Yep | 20:07 |
andrewbogott | hamalq: I believe that designate notifies pdns that it needs to do an axfr, then pdns initiates an axfr sync | 20:07 |
eandersson | also test what hamalq said | 20:07 |
eandersson | yep | 20:07 |
eandersson | but still worth making sure that pdns can hit both designate servers | 20:08 |
andrewbogott | i will doublecheck that | 20:08 |
eandersson | iptables etc | 20:08 |
andrewbogott | hm, connection refused if I use the AAAA address | 20:10 |
andrewbogott | does mdns have an acl for this? | 20:10 |
andrewbogott | or is that somehow in the pool config I wonder... | 20:10 |
eandersson | What version of Designate? | 20:11 |
eandersson | I believe I fixed IPV6 support in like Train | 20:11 |
andrewbogott | I'm running R | 20:12 |
eandersson | https://github.com/openstack/designate/commit/2ad08a6a0554b1166520b40d503fca5973672870 | 20:12 |
eandersson | I don't think that is it | 20:13 |
andrewbogott | how can I tell which ip/name/whatever pdns is using for its axfr request? | 20:13 |
andrewbogott | I already hacked my /etc/hosts to ensure that hostname lookups would always get v4 addresses | 20:14 |
andrewbogott | but that doesn't mean that the outgoing address from a request is v4 | 20:14 |
eandersson | It should be using what ever you put in pools, but honestly don't know | 20:14 |
eandersson | maybe you can enable debug logging for pdns | 20:15 |
eandersson | loglevel = 6 I believe | 20:16 |
andrewbogott | yeah, it was at 6 already — pdns logs don't say much | 20:17 |
andrewbogott | let me see if I can get it to say what it's doing though... | 20:17 |
eandersson | btw highly recommend upgrading Designate to U :D | 20:18 |
eandersson | I am running Train (with Nova/Neutron running Rocky) | 20:19 |
andrewbogott | eandersson: I can't upgrade past R until I upgrade my base OS to Buster | 20:20 |
andrewbogott | which is what I'm doing now — testing moving R from Stretch to Buster | 20:20 |
eandersson | You using rpms? | 20:20 |
eandersson | *debs | 20:20 |
andrewbogott | yeah | 20:20 |
eandersson | I would just install designate in a venv :D | 20:20 |
eandersson | but probably a lot of effort if you don't have the tooling | 20:21 |
andrewbogott | It's nice to use validated upstream packages when you have the option :) | 20:21 |
eandersson | Yep | 20:22 |
andrewbogott | I see /some/ axfr success in the pdns log | 20:22 |
andrewbogott | https://www.irccloud.com/pastebin/ozcsCsvH/ | 20:22 |
eandersson | Nice | 20:22 |
andrewbogott | oddly, the domain in which I just now created a record does not appear there | 20:22 |
eandersson | That looks good | 20:22 |
eandersson | lol | 20:22 |
andrewbogott | and it is still showing as PENDING | 20:22 |
andrewbogott | (sorry, I should mention — I always see /some/ activity like that) | 20:22 |
eandersson | So one domain/zone is working | 20:23 |
andrewbogott | I have two nodes, and it might be that it's working on one node and not the other, or something | 20:23 |
eandersson | but the new one isn't? | 20:23 |
eandersson | Yea - try hitting port 5354 from host1 to host2 (and host2 -> host1) | 20:23 |
eandersson | use telnet or similar | 20:23 |
andrewbogott | I have definitely tried that 30 times but will try again :) | 20:23 |
eandersson | because tcp vs udp | 20:23 |
eandersson | dig wouldn't detect that | 20:24 |
andrewbogott | and of course without knowing what the orig ip is I have to test a bunch of other things... | 20:24 |
eandersson | since it just uses udp | 20:24 |
andrewbogott | in all cases I get a telnet connection and then 'Connection closed by foreign host.' | 20:24 |
eandersson | It tends to be something silly when you finally find it :D | 20:24 |
andrewbogott | it will definitely be something silly | 20:25 |
eandersson | Did you test 53 as well with telnet? | 20:25 |
andrewbogott | doing | 20:25 |
eandersson | designate-manage pool update --delete TRUE | 20:26 |
eandersson | Might be worth running as well | 20:26 |
eandersson | and then restart all of designate | 20:27 |
andrewbogott | ok, will try | 20:27 |
andrewbogott | I still suspect this has to do with ipv6 origination IPs. That's one thing I'm pretty sure changed when I upgraded | 20:27 |
eandersson | --delete will make sure that the db matches what is the pools config | 20:28 |
eandersson | (it will delete anything that isn't supposed to be there) | 20:28 |
andrewbogott | I wonder if there's some system-wide setting I can make to just not using ipv6 at all | 20:28 |
eandersson | Yea you can just apply sysctl | 20:28 |
andrewbogott | ok, one thing at a time, will do the —delete | 20:28 |
eandersson | make sure to restart central, worker and producer after that (but probably worth just restarting all of them) | 20:29 |
eandersson | Are you using the worker/producer btw? | 20:29 |
andrewbogott | yes | 20:29 |
andrewbogott | Believe me, I already have | 20:29 |
andrewbogott | service designate-sink restart && service designate-mdns restart && service designate-central restart && service designate-producer restart && service designate-worker restart | 20:29 |
andrewbogott | in my command history :) | 20:30 |
eandersson | systemctl restart designate-* | 20:30 |
eandersson | :D | 20:30 |
andrewbogott | it takes a /really/ long time for all those services to shut down | 20:30 |
eandersson | I haven't used debian in a long time | 20:30 |
eandersson | Yea - that has been fixed in U :D | 20:30 |
eandersson | https://github.com/openstack/designate/commit/a09064a5d15859703b97d61a1f014681a17799c6 | 20:31 |
andrewbogott | nice | 20:32 |
hamalq | sysctl -w net.ipv6.conf.all.disable_ipv6=1, sysctl -w net.ipv6.conf.default.disable_ipv6=1 to disable all ipv6 in debian (not debian user myself :P) | 20:33 |
andrewbogott | cool, will try that next | 20:33 |
hamalq | am glad i joined this discussion it revised amost every thing in designate :) | 20:34 |
andrewbogott | ok, first, going to try creating another test record after the —delete | 20:34 |
andrewbogott | there it is in openstack recordset list | 20:35 |
andrewbogott | | PENDING | CREATE | | 20:35 |
andrewbogott | pdns says "Domain 'svc.newprojectdomaintest3.codfw1dev.wikimedia.cloud' is fresh (no DNSSEC)" | 20:35 |
andrewbogott | oh nm | 20:35 |
andrewbogott | that's not the same domain I touched | 20:35 |
andrewbogott | weird that pdns is telling me about some other domain that I haven't touched in a month | 20:36 |
andrewbogott | I wonder how long I should give this to catch up before I decide it's still broken? I have so many PENDING zones at this point, even if all is well it could take a while | 20:37 |
andrewbogott | although I guess I should be seeing >0 zones change from PENDING | 20:37 |
hamalq | if pdns shows no log of AXFR it should be sign that its not working | 20:38 |
andrewbogott | ok, going to disable ipv6 and reboot these boxes | 20:40 |
andrewbogott | sorry, will be a long suspensful wait now :) | 20:40 |
andrewbogott | (thank you both, btw, for talking me through this! Lots of good ideas I didn't think of yesterday) | 20:41 |
hamalq | you welcome (but i should thank u also i enjoyed this) | 20:42 |
andrewbogott | huh, running 'sysctl -w net.ipv6.conf.all.disable_ipv6=1' on one of my hosts works fine | 20:51 |
andrewbogott | but on the other it causes it to fall off the network entirely | 20:51 |
andrewbogott | after a reboot it comes back but is also back to using v6 | 20:51 |
*** agomez has quit IRC | 20:56 | |
*** sorin-mihai has joined #openstack-dns | 20:57 | |
*** sorin-mihai_ has quit IRC | 20:59 | |
eandersson | You need to add it to /etc/sysctl.conf | 21:03 |
eandersson | most likely | 21:03 |
eandersson | weird that you would lose network entirely | 21:03 |
*** sorin-mihai_ has joined #openstack-dns | 21:27 | |
andrewbogott | yeah, these hosts don't work at all without ipv6 enabled. Must be something in the upstream network config | 21:27 |
andrewbogott | so, that experiment isn't going to help | 21:27 |
*** sorin-mihai has quit IRC | 21:28 | |
hamalq | remove the server with ipv6 from the pool | 21:28 |
hamalq | and keep only one | 21:28 |
andrewbogott | nah, they both fall off the network, I just didn't do a proper test with the first one | 21:31 |
*** sorin-mihai has joined #openstack-dns | 21:32 | |
hamalq | can the old installation u have do AXFR from pdns servers to designate servers? | 21:32 |
*** sorin-mihai_ has quit IRC | 21:34 | |
andrewbogott | I'm unclear on what which direction you mean by 'from' but — things work in the old installation; records appear in pdns as soon as they're created in designate. | 21:35 |
hamalq | try the dig command and @designate-server-ip | 21:36 |
hamalq | from the pdns servers | 21:36 |
andrewbogott | I have two bare-metal hosts; each runs one instance of designate and one instance of pdns | 21:37 |
* andrewbogott tries | 21:37 | |
andrewbogott | hamalq: AXFR from pdns isn't relevant is it? We don't want records to propagate from pdns to designate, only the other way | 21:39 |
hamalq | the port should 5453 | 21:39 |
hamalq | -p5453 | 21:39 |
andrewbogott | ok | 21:44 |
andrewbogott | I think we did this before, but here's the recap: | 21:44 |
andrewbogott | dig -p5453 with hostname works in all directions | 21:44 |
andrewbogott | it also works with the ipv4 address | 21:44 |
andrewbogott | it does NOT work with the ipv6 address | 21:44 |
andrewbogott | in the case of accessing the local host, it times out. In the case of accessing the other host, the connection is refused | 21:45 |
andrewbogott | That fits the sort of very-slow/intermittent/unpredictable nature of the failure I'm seeing | 21:45 |
andrewbogott | sorry, wrong way around: connection refused locally, times out remotely | 21:46 |
hamalq | do pdns on the two hosts share the same database? | 21:46 |
andrewbogott | no | 21:47 |
andrewbogott | hm, how do I do a telnet test with a v6 address? | 21:47 |
andrewbogott | pdns each has their own database, designate has a shared db on a different host | 21:47 |
hamalq | times out remotely ( this could be the issue u should solve) | 21:48 |
hamalq | since every pdns server will request AXFR from one of the designate servers right? | 21:49 |
andrewbogott | yeah, looking at that now | 21:49 |
andrewbogott | I'm staring right at the firewall rule that should allow it :) | 21:49 |
*** KeithMnemonic has quit IRC | 21:50 | |
andrewbogott | ok, due to ipv6 addresses being impossible to read, my test had a typo in it | 21:51 |
andrewbogott | I'm now seeing connection refused in all directions (now that I have the address right) | 21:52 |
hamalq | yub that should be the issue | 21:52 |
andrewbogott | https://www.irccloud.com/pastebin/SVhS8Gt1/ | 21:52 |
andrewbogott | so is it possible for me to tell mdns to respond to those queries? | 21:52 |
hamalq | i dont think its designate refusing | 21:54 |
hamalq | its an ACL or something in network i think | 21:55 |
hamalq | try telnet the port | 21:55 |
*** sorin-mihai_ has joined #openstack-dns | 21:55 | |
hamalq | can u check if the dig works on the old system (if its then for sure its an ACL) | 21:56 |
andrewbogott | I don't think the old system is using ipv6 | 21:57 |
andrewbogott | telnet is refused, same as dig | 21:57 |
andrewbogott | If the port were blocked it would time out wouldn't it? | 21:57 |
*** sorin-mihai has quit IRC | 21:57 | |
hamalq | telnet: Unable to connect to remote host: Connection refused. This error means that firewall is blocking connections to the specified port on the remote host. The firewall can be at the remote host or at the intermediate level. | 21:58 |
andrewbogott | hm... | 21:59 |
andrewbogott | ok, you're right, I tried it with a different port and a simple server and saw the same pattern | 22:01 |
andrewbogott | hm | 22:01 |
eandersson | What do you have in designate.conf for mdns? | 22:02 |
andrewbogott | all defaults at the moment | 22:03 |
andrewbogott | hamalq: except it also fails if I telnet to 5354 on the current host — /that/ can't be a network filter | 22:04 |
* andrewbogott starting to think he has multiple problems | 22:04 | |
eandersson | for sure some weird stuff going on | 22:05 |
andrewbogott | anyway, for the moment let's pretend like this is the issue: | 22:05 |
andrewbogott | https://www.irccloud.com/pastebin/wMpMdtLe/ | 22:05 |
hamalq | if u bind the mdns service to the ip of designate server IP that should not work (its expected) the real problem is if host1 receive a notify form host2 he can do the AXFR from the based on the timeout | 22:08 |
eandersson | maybe try adding host=::1 in the mdns config | 22:08 |
hamalq | sorry he cant do the AXFR | 22:08 |
hamalq | i hope that make sense | 22:08 |
andrewbogott | eandersson: that seems to help! and also maybe doesn't break ipv4? | 22:09 |
eandersson | or :: not ::1 I believe | 22:09 |
eandersson | It shouldn't afaik | 22:09 |
eandersson | if it does | 22:10 |
eandersson | just change host to listen | 22:10 |
eandersson | and do somthing like | 22:10 |
eandersson | listen=[::]:5453,0.0.0.0:5453 | 22:10 |
eandersson | I just hope Rocky handles the brackets properly :D | 22:11 |
andrewbogott | so far this is promising with just ::, testing more things... | 22:11 |
andrewbogott | ok, confirmed, I can now get axfr on v4 and v6 in both directions with mdns | 22:14 |
andrewbogott | going to finalize this change before I move on to more testing | 22:14 |
* andrewbogott restarts everything everywhere, again | 22:16 | |
andrewbogott | btw, what does the designate agent do? I'm thinking I don't need it with my current setup but it keeps popping up in docs | 22:18 |
eandersson | I am a top 5 contributor to designate and I don't know | 22:23 |
eandersson | mugsie_ explained it at some point to me | 22:23 |
andrewbogott | eandersson: is my install broken if it's not running? | 22:25 |
eandersson | Nope | 22:25 |
hamalq | i dont see that in the https://docs.openstack.org/designate/train/contributor/architecture.html | 22:25 |
andrewbogott | :) | 22:25 |
andrewbogott | Back to my original issue… if axfr was failing, where would you expect the warnings to appear? the mdns log or the worker log or the producer log? Or…? | 22:26 |
eandersson | worker logs | 22:26 |
andrewbogott | 'k | 22:26 |
andrewbogott | I don't see any evidence that designate is initiating axfr | 22:27 |
andrewbogott | pdns is all 'No new unfresh slave domains, 0 queued for AXFR already, 0 in progress' | 22:27 |
eandersson | Did you compare the axfr output between designate and pdns? | 22:27 |
eandersson | using dig | 22:27 |
andrewbogott | hm, identical | 22:28 |
eandersson | that is weird | 22:29 |
eandersson | how about serial? | 22:29 |
andrewbogott | oh, wait, hang on, it's because the error message is identical | 22:29 |
*** sorin-mihai has joined #openstack-dns | 22:29 | |
andrewbogott | have to enable that in pdns again | 22:29 |
eandersson | probably easiest to just do it from the host using localhost | 22:30 |
*** sorin-mihai_ has quit IRC | 22:30 | |
andrewbogott | ok, here's a diff at last | 22:33 |
andrewbogott | https://www.irccloud.com/pastebin/YQleHn8P/ | 22:33 |
andrewbogott | pretty much what you'd expect from "isn't updating very often" | 22:34 |
eandersson | Are you seeing anything like | 22:35 |
eandersson | > Timeout on NOTIFY | 22:35 |
eandersson | In the designate-worker logs? | 22:35 |
eandersson | or maybe Could not find %(serial)s for %(zone)s on enough | 22:35 |
andrewbogott | grep -i "Could not find" designate-worker.log is totally empty | 22:36 |
andrewbogott | as is grep "Timeout on NOTIFY" designate-worker.log | 22:36 |
andrewbogott | it's like it isn't trying | 22:37 |
*** sorin-mihai_ has joined #openstack-dns | 22:38 | |
eandersson | You mean that designate-worker.log is completely empty? | 22:38 |
eandersson | or just no hits on the grep? | 22:38 |
andrewbogott | just no hits on the grep | 22:39 |
andrewbogott | the log the log has a lot of | 22:39 |
andrewbogott | https://www.irccloud.com/pastebin/r8MttB69/ | 22:39 |
*** sorin-mihai has quit IRC | 22:39 | |
andrewbogott | (this is with log-level WARNING) | 22:40 |
eandersson | btw when you posted the pools config | 22:41 |
eandersson | not sure if it is just copy and paste | 22:42 |
eandersson | but it looks like the spacing is off | 22:42 |
andrewbogott | here it is again, I just regenerated it | 22:42 |
andrewbogott | https://www.irccloud.com/pastebin/KNCrA8el/ | 22:42 |
andrewbogott | whoops, with my live API token, guess I'll go rotate that | 22:42 |
eandersson | haha happens | 22:42 |
eandersson | I am always terrified when pasting into irc :D | 22:43 |
andrewbogott | I even remembered to redact it and pasted a file with .redacted in its name. Must've done :q! instead of :wq! or something | 22:44 |
andrewbogott | *shrug* | 22:44 |
andrewbogott | it bugs me that designate-manage inserts those quotes around the pdns port but they aren't in the database so I'm trying to get over it :) | 22:45 |
eandersson | http://paste.openstack.org/show/0t7eh1qA6DjXBJdz7q8g/ | 22:52 |
eandersson | Can you try this just in case | 22:52 |
eandersson | followed by | 22:53 |
eandersson | designate-manage pool update --delete TRUE | 22:53 |
eandersson | and then a restart of all fun services again | 22:53 |
andrewbogott | how is that different? Just removing | 22:55 |
andrewbogott | https://www.irccloud.com/pastebin/PTeUhcYy/ | 22:55 |
andrewbogott | ? | 22:55 |
eandersson | just worried about the spacing | 22:55 |
andrewbogott | 'k | 22:55 |
* andrewbogott waits for restarts | 22:55 | |
eandersson | actually tried yours in a python script and looks fine | 22:55 |
andrewbogott | the file I pasted is generated /by/ designate-admin so I would hope it would parse :) | 22:56 |
eandersson | but does not hurt trying | 22:56 |
andrewbogott | um, designate-manate | 22:56 |
andrewbogott | ugh | 22:56 |
eandersson | hehe | 22:56 |
andrewbogott | I'm all out of types for the day | 22:56 |
andrewbogott | what does it mean to specify or not specify the pool ID in that file? | 22:56 |
andrewbogott | am I in danger of getting a new different pool now? | 22:57 |
eandersson | it just defaults to the same one | 22:57 |
eandersson | feel free to re-add it | 22:57 |
eandersson | I was just hand re-writing it | 22:57 |
eandersson | > default='794ccc2c-d751-44fe-b57f-8894c9f5c842', | 22:58 |
eandersson | This is the default pool | 22:58 |
eandersson | so if you don't put anything it will just default to that one | 22:58 |
andrewbogott | 'k | 22:58 |
andrewbogott | inspecting the db confirms, still just one pool | 22:58 |
eandersson | btw does it work after a while? | 22:59 |
eandersson | If so it's almost certanily an issue with NOTIFY | 22:59 |
andrewbogott | it's often the case that hours later I see records showing up | 23:01 |
andrewbogott | although I don't think I ever see designate move something out of PENDING | 23:02 |
eandersson | Can you enable debug and give me like 5 minutes of logs? | 23:02 |
eandersson | Feel free to PM them to me if you don't want to share with the world | 23:02 |
andrewbogott | sure, which service? | 23:02 |
andrewbogott | all? | 23:02 |
eandersson | worker | 23:02 |
andrewbogott | ok | 23:02 |
eandersson | worker and maybe producer | 23:02 |
andrewbogott | I don't have central logging so it'll be multiple files, alas | 23:02 |
eandersson | journalctl helps a lot :D | 23:03 |
andrewbogott | want me to create a record during those logs, or just give you the steady state logs? | 23:04 |
eandersson | ideally yea | 23:04 |
andrewbogott | I'll let them settle in for a minute first | 23:05 |
andrewbogott | eandersson: in those logs I waited a minute or two, then created a single domain (test8.svc.andrewtestproject.codfw1dev.wmcloud.org.) and then waited another minute or so and then created VM named dnstest-34 which should have prompted at least two other record creations. | 23:13 |
andrewbogott | There aren't any new domains in there, just new records. | 23:13 |
andrewbogott | I need to stretch my legs but will return to receive whatever wisdom you learn from those logs :) | 23:15 |
*** sorin-mihai_ is now known as sorin-mihai | 23:18 | |
eandersson | How many designate hosts do you have? | 23:20 |
eandersson | Is it just one? | 23:20 |
hamalq | can u try something if u can just re-create the zones (delete/create) that worked for me once | 23:22 |
hamalq | he have two servers both with designate and pdns | 23:27 |
andrewbogott | eandersson: two hosts, I sent you logs from each | 23:45 |
andrewbogott | Is it possible I just have rabbit split-brain such that the worker never finds out that there are things to update? | 23:46 |
andrewbogott | lemme kill all but one of my rabbits | 23:46 |
eandersson | It's possible but would result in rpc timeouts | 23:48 |
andrewbogott | hamalq: I can't create new domains, they get stuck in 'pending' as well | 23:48 |
andrewbogott | yeah, I'd think it would show up someplace | 23:48 |
andrewbogott | rabbit murder doesn't seem to make a difference | 23:50 |
andrewbogott | eandersson: do you agree that the logs are totally cheerful about failure? Or are there warnings hiding in there that I missed? | 23:53 |
eandersson | https://zuul.opendev.org/t/openstack/build/bd3aa12677da4f76a34fb20ce3bf58af/log/controller/logs/screen-designate-worker.txt | 23:56 |
eandersson | You should be seeing a lot of these | 23:56 |
eandersson | > Attempting UPDATE on zone 932833962.com. | 23:56 |
andrewbogott | yeah, I agree :) | 23:57 |
andrewbogott | I mean, that should be showing even if pdns isn't running at all, right? | 23:58 |
andrewbogott | Because 'Attempting' | 23:58 |
eandersson | Yea afaik | 23:58 |
eandersson | I mean it's possible it is stuck trying to connect | 23:59 |
andrewbogott | But it has some kind of exponential backoff right? | 23:59 |
andrewbogott | So maybe all existing domains are in a state of despair where it's going to wait 8 hours before trying to refresh? | 23:59 |
andrewbogott | If that's in the db I can try to clear it | 23:59 |
* andrewbogott not totally sure that's how it works | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!