*** hamalq has quit IRC | 00:13 | |
*** livelace has quit IRC | 01:18 | |
*** awalende has joined #openstack-dns | 02:07 | |
*** awalende has quit IRC | 02:12 | |
*** awalende has joined #openstack-dns | 06:00 | |
*** awalende has quit IRC | 06:04 | |
*** njohnston has quit IRC | 06:05 | |
*** bersace has joined #openstack-dns | 06:15 | |
*** kaveh has quit IRC | 07:13 | |
*** kaveh has joined #openstack-dns | 07:19 | |
*** awalende has joined #openstack-dns | 07:39 | |
*** livelace has joined #openstack-dns | 08:12 | |
*** salmankhan has joined #openstack-dns | 08:30 | |
*** livelace has quit IRC | 08:50 | |
*** salmankhan1 has joined #openstack-dns | 08:57 | |
*** livelace has joined #openstack-dns | 08:57 | |
*** salmankhan has quit IRC | 08:59 | |
*** salmankhan1 is now known as salmankhan | 08:59 | |
*** livelace has quit IRC | 09:02 | |
*** livelace has joined #openstack-dns | 09:08 | |
*** livelace has quit IRC | 09:13 | |
*** livelace has joined #openstack-dns | 09:34 | |
mindthecap | mugsie: dig works as expected and returns two NS records. designate-mdns show that it's doing AXFR (version: 1). Serial for the zone is 88, which i can see from the master NS server logs (transfer of xxxx: AXFR started (serial 88... ended). | 10:51 |
---|---|---|
mindthecap | After the zone is created in Designate, one of the designate containers (i have three controllers so three designate containers) creates the zone in designate backend. Backend then tries to transfer the zone from designate container but gives an error " zone xxx has no NS records" | 10:55 |
*** njohnston has joined #openstack-dns | 11:01 | |
*** salmankhan1 has joined #openstack-dns | 11:28 | |
*** salmankhan has quit IRC | 11:28 | |
*** salmankhan1 is now known as salmankhan | 11:28 | |
*** livelace has quit IRC | 12:23 | |
*** livelace has joined #openstack-dns | 12:29 | |
*** ianychoi has quit IRC | 12:55 | |
openstackgerrit | Merged openstack/designate master: Fix slow zone imports. https://review.opendev.org/721793 | 13:48 |
*** livelace has quit IRC | 14:45 | |
*** hamalq has joined #openstack-dns | 16:51 | |
hamalq | can i get +1 on https://review.opendev.org/#/c/726214/ ( all code review changes done ) | 16:58 |
*** salmankhan has quit IRC | 17:22 | |
*** hamalq has quit IRC | 18:33 | |
*** hamalq has joined #openstack-dns | 18:34 | |
*** livelace has joined #openstack-dns | 19:09 | |
*** roukoswarf has joined #openstack-dns | 19:15 | |
roukoswarf | anyone know how the notify works in designate? im trying to fix a bug with updating pool ns that gets everything stuck pending update | 19:16 |
openstackgerrit | Andreas Jaeger proposed openstack/designate-tempest-plugin master: Update hacking for Python3 https://review.opendev.org/715689 | 19:18 |
johnsom | Is the notification handler "nova_fixed" deprecated? Was that just for nova networking or does it have some purpose in the neutron networking world? | 19:52 |
mugsie | johnsom: no, it is still supported, kind of | 19:53 |
mugsie | it only uses unversioned notifications though | 19:53 |
johnsom | Is it just for nova networking deployments? | 19:53 |
mugsie | no, it just reacts when the VM is created | 19:53 |
mugsie | so it could have the VM name | 19:54 |
mugsie | where neutron may not | 19:54 |
johnsom | Ok, thanks. Yeah, I can see those timing issues. | 19:54 |
mugsie | roukoswarf: https://opendev.org/openstack/designate/src/branch/master/designate/mdns/notify.py | 19:54 |
roukoswarf | so, im having issues with update_pool | 19:55 |
roukoswarf | i fixed it breaking on tenants having their own NS records, that was easy. but it still sticks things in update/pending status | 19:56 |
roukoswarf | i created a _change_ns which swaps the ns and then updates in 1 step without set_delayed_notify, but it still gets stuck in pending | 19:57 |
roukoswarf | if i do a set on the zones, they get out of pending immediately. | 19:57 |
mugsie | ah, that code is weird | 19:58 |
mugsie | one sec | 19:58 |
roukoswarf | s/weird/broken, trying to make it work | 19:58 |
mugsie | if you set zone.delayed_notify = False then do central.update_zone(context, zone) it should work | 19:59 |
roukoswarf | updating a recordset with delated_notify=False doesnt trigger that? | 20:00 |
mugsie | eh, I can't remember the flow off the top of my head - but I *think* that block it being sent | 20:01 |
roukoswarf | im setting the NS recordsets, then the parent function calls @notification('dns.pool.update') which doesnt trigger any sync | 20:02 |
mugsie | ah - that is for OpenStack billing notifications | 20:02 |
mugsie | not DNS notifications | 20:02 |
mugsie | (naming sucks, sorry) | 20:02 |
roukoswarf | alright, well at least now i know that was meaningless for my uses, so... i update the NS, then call update_zone to get it to move? or is there a way updating a recordset should be able to notify? | 20:03 |
mugsie | updating a recordset should trigger a notify | 20:04 |
roukoswarf | it does not. | 20:04 |
roukoswarf | at least, not in this context | 20:04 |
mugsie | what function are you calling to update it? | 20:04 |
mugsie | central.update_recordset() ? | 20:04 |
mugsie | https://opendev.org/openstack/designate/src/branch/master/designate/central/service.py#L1411 - that will trigger a DNS Notify | 20:05 |
roukoswarf | i was just following the example of _add_ns, which called self._update_recordset_in_storage(context, zone, ns_recordset) with set_delayed_notify=True | 20:05 |
roukoswarf | i removed the set_delayed_notify, and expected it to work | 20:05 |
mugsie | ah, OK. | 20:06 |
roukoswarf | this code never seems to call update_recordset directly, should i be? | 20:06 |
roukoswarf | im working under update_pool | 20:06 |
mugsie | Oh! - so you are implementing updating the NS records for all zones in a pool after the zones are created? | 20:07 |
mugsie | OK. | 20:07 |
roukoswarf | after a pools.yaml update | 20:07 |
mugsie | yeah | 20:07 |
roukoswarf | which, is currently busted | 20:07 |
roukoswarf | at least in stein, not sure if the code changed in master, doesnt look like it. | 20:08 |
mugsie | Ok. in that case call the update_recordset() fuction | 20:08 |
roukoswarf | instead or in addition to the update_recordset_in_storage? | 20:08 |
mugsie | the _add_ns() is part of a bigger set fo code that will cause a NOTIFY elsewhere | 20:08 |
mugsie | instead of | 20:08 |
roukoswarf | well, 80% of my test zones are now out of pending | 20:11 |
mugsie | the delayed notify chunks up the work | 20:11 |
mugsie | so it may take a little bit | 20:12 |
roukoswarf | but im not delaying notify, i removed that from my new _change_ns function which bulk updates all the pool ns changes. | 20:12 |
roukoswarf | i do all the recordset changes from the pool change in one pass, then update_recordset it. | 20:12 |
mugsie | then it could just be RMQ backlog, waiting for the workers to chew through the notify tasks | 20:14 |
roukoswarf | http://paste.openstack.org/show/793538/ | 20:15 |
roukoswarf | my code, im calling instead of _add_ns and _delete_ns in update_pool | 20:15 |
roukoswarf | my rmq is empty | 20:16 |
roukoswarf | calling openstack zone set <id> instantly unsticks the zone | 20:17 |
mugsie | yeah, that can happen if there is a long term issue. | 20:18 |
roukoswarf | guess ill run a set on every stuck zone and start from a clean active state | 20:18 |
mugsie | there is a period task that can be run on a 24 hourly basis to basically send a notify for all zones | 20:18 |
mugsie | but I need to dig that out | 20:19 |
roukoswarf | everything else has worked fine, we just hit a crash in designate on doing a pool update, so im trying to make the process smooth | 20:19 |
roukoswarf | ill have to PR the bugfix, but this sync problem seems to be real too | 20:19 |
roukoswarf | the crash was an easy fix, looked like an oversight. | 20:20 |
mugsie | thats great - more than happy to review + help merge | 20:20 |
roukoswarf | if you make a zone, then a tenant adds NS records to it, then you do a pool update that changes the NS records, it will crash. | 20:20 |
mugsie | crash? yeah - that is an oversight | 20:20 |
roukoswarf | _delete_ns doesnt search for managed, and does a get_recordset, singular, when in this case, there will be multiple | 20:21 |
roukoswarf | add selects it right, so, must have just been missed | 20:21 |
mugsie | yeap. | 20:21 |
mugsie | we don't do functional testing of the pool modifcations, so that might be a good additon to make sure this stays fixed :) | 20:22 |
roukoswarf | i thought i saw tests, but they dont test this case, it works perfectly if users never add NS recordsets of their own | 20:22 |
roukoswarf | do you know, by reading the code, why they did it with _add_ns and _delete_ns in 2 steps, and then not notifying? | 20:23 |
roukoswarf | i combined the work into a single pass, but... it must have been that way for a reason | 20:24 |
* mugsie goes to git blame to check it wasn't him | 20:24 | |
roukoswarf | doing it in 2 steps means you would spam the queue if you did notify, which is probably why this whole notify fiasco is happening to me | 20:25 |
roukoswarf | alright... got all ym zones into active/none, lets try with update_recordset again from a clean slate. | 20:26 |
mugsie | no, notify was for a much bigger issue | 20:26 |
roukoswarf | well, why call add and delete separately with delay_notify on both? | 20:27 |
mugsie | in super active installs (HP Cloud / Racksapce etc), the number of changes was causing issues, and it was easier to batch up a zones changes into a single notify | 20:27 |
roukoswarf | just gets stuck 100% of the time | 20:27 |
roukoswarf | yeah, which is what i did by making each zone a single write to the db. | 20:28 |
roukoswarf | if you do a pool update, do you get out of pending state? | 20:28 |
mugsie | I honestly havent tried that in a long time | 20:29 |
mugsie | changing the NS records | 20:29 |
roukoswarf | i have 4 clusters in multiple versions, and every cluster gets all zones stuck on ns change. | 20:29 |
mugsie | yeah, I am looking at commits from 5 years ago that could have caused this | 20:30 |
mugsie | most peoples NS records are fairly stable I guess | 20:30 |
roukoswarf | updated_pool = self.storage.update_pool(context, pool) should this line be after changing the ns? would that magically fix the notify? | 20:30 |
roukoswarf | currently its before the ns changes | 20:30 |
mugsie | no - that is the DB call | 20:31 |
roukoswarf | well, theres no other call than db calls in the code, update_recordset, update_zone, etc, never called, only the db calls are there. | 20:32 |
mugsie | I think this is an aritfact of the old DB schema, where records were a thing on their own | 20:32 |
roukoswarf | so all these _in_storage queries should be replaced? | 20:33 |
mugsie | the set_delayed_notify=True on add_ns should trigger the delayed notify task runner | 20:33 |
mugsie | OH | 20:33 |
roukoswarf | now yer seeing it. | 20:36 |
mugsie | no - the worker may not be reading the delay notify | 20:36 |
roukoswarf | well hey if its an easier fix than i thought thatd be great, ive spent the last 2 days reading through the code to try and pick up the pieces on how designate was written, my head hurts. | 20:38 |
mugsie | yeah. it grew organically | 20:38 |
mugsie | in desiognate.conf - in the [service:producer] section - do you have an "enabled_tasks" item? | 20:40 |
mugsie | try adding one with "delayed_notify" | 20:41 |
mugsie | as the value | 20:41 |
mugsie | I have to drop off unfortunately, but leave info here, or email the mailing list, and I can look in the morning | 20:42 |
mugsie | roukoswarf: ^ | 20:42 |
roukoswarf | hehwould it be the worker? | 20:43 |
roukoswarf | oh, producer, got it | 20:43 |
mugsie | no - so what is supposed to happen is the producer is supposed to look at the db for zones with a "delayed notify" - bundle them up and tell the workers to send the notify on a regular basis | 20:44 |
roukoswarf | enabled_tasks = None, nice. this is a kolla deployment, so maybe thats something i need to point at them for instead in this case. | 20:44 |
mugsie | and shard that out across a few producers and worker processes | 20:44 |
mugsie | I just checked the - docs and sample config aree wrong | 20:45 |
mugsie | not kollas fault :) | 20:45 |
roukoswarf | None is a very pythony value for a config file, ill check if kolla has it as a var somewhere undocumented. | 20:45 |
mugsie | i suspect it came from the config firel generation - https://docs.openstack.org/designate/latest/admin/samples/config.html | 20:46 |
mugsie | but yet - that should fix it | 20:46 |
mugsie | o/ | 20:46 |
roukoswarf | thanks a bunch, ill go find some contrib guide somewhere for the bugfix, as thats still valid, but at least i know im not crazy on the notify issues. | 20:46 |
mugsie | https://bugs.launchpad.net/designate | 20:55 |
roukoswarf | well, i have the fixed code, figured i could just open a PR | 21:00 |
roukoswarf | or is that not a thing | 21:00 |
*** livelace has quit IRC | 21:11 | |
*** livelace has joined #openstack-dns | 21:17 | |
*** livelace has quit IRC | 21:18 | |
roukoswarf | mugsie: so yes, after setting the task things work perfectly as intended, with the original code. thank you very much. not sure the best way to get kolla setting it, but ill talk to them. | 21:30 |
*** awalende has quit IRC | 21:38 | |
*** awalende has joined #openstack-dns | 21:39 | |
*** awalende has quit IRC | 21:43 | |
*** roukoswarf has quit IRC | 22:50 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!