Wednesday, 2020-05-13

*** hamalq has quit IRC		00:13
*** livelace has quit IRC		01:18
*** awalende has joined #openstack-dns		02:07
*** awalende has quit IRC		02:12
*** awalende has joined #openstack-dns		06:00
*** awalende has quit IRC		06:04
*** njohnston has quit IRC		06:05
*** bersace has joined #openstack-dns		06:15
*** kaveh has quit IRC		07:13
*** kaveh has joined #openstack-dns		07:19
*** awalende has joined #openstack-dns		07:39
*** livelace has joined #openstack-dns		08:12
*** salmankhan has joined #openstack-dns		08:30
*** livelace has quit IRC		08:50
*** salmankhan1 has joined #openstack-dns		08:57
*** livelace has joined #openstack-dns		08:57
*** salmankhan has quit IRC		08:59
*** salmankhan1 is now known as salmankhan		08:59
*** livelace has quit IRC		09:02
*** livelace has joined #openstack-dns		09:08
*** livelace has quit IRC		09:13
*** livelace has joined #openstack-dns		09:34
mindthecap	mugsie: dig works as expected and returns two NS records. designate-mdns show that it's doing AXFR (version: 1). Serial for the zone is 88, which i can see from the master NS server logs (transfer of xxxx: AXFR started (serial 88... ended).	10:51
mindthecap	After the zone is created in Designate, one of the designate containers (i have three controllers so three designate containers) creates the zone in designate backend. Backend then tries to transfer the zone from designate container but gives an error " zone xxx has no NS records"	10:55
*** njohnston has joined #openstack-dns		11:01
*** salmankhan1 has joined #openstack-dns		11:28
*** salmankhan has quit IRC		11:28
*** salmankhan1 is now known as salmankhan		11:28
*** livelace has quit IRC		12:23
*** livelace has joined #openstack-dns		12:29
*** ianychoi has quit IRC		12:55
openstackgerrit	Merged openstack/designate master: Fix slow zone imports. https://review.opendev.org/721793	13:48
*** livelace has quit IRC		14:45
*** hamalq has joined #openstack-dns		16:51
hamalq	can i get +1 on https://review.opendev.org/#/c/726214/ ( all code review changes done )	16:58
*** salmankhan has quit IRC		17:22
*** hamalq has quit IRC		18:33
*** hamalq has joined #openstack-dns		18:34
*** livelace has joined #openstack-dns		19:09
*** roukoswarf has joined #openstack-dns		19:15
roukoswarf	anyone know how the notify works in designate? im trying to fix a bug with updating pool ns that gets everything stuck pending update	19:16
openstackgerrit	Andreas Jaeger proposed openstack/designate-tempest-plugin master: Update hacking for Python3 https://review.opendev.org/715689	19:18
johnsom	Is the notification handler "nova_fixed" deprecated? Was that just for nova networking or does it have some purpose in the neutron networking world?	19:52
mugsie	johnsom: no, it is still supported, kind of	19:53
mugsie	it only uses unversioned notifications though	19:53
johnsom	Is it just for nova networking deployments?	19:53
mugsie	no, it just reacts when the VM is created	19:53
mugsie	so it could have the VM name	19:54
mugsie	where neutron may not	19:54
johnsom	Ok, thanks. Yeah, I can see those timing issues.	19:54
mugsie	roukoswarf: https://opendev.org/openstack/designate/src/branch/master/designate/mdns/notify.py	19:54
roukoswarf	so, im having issues with update_pool	19:55
roukoswarf	i fixed it breaking on tenants having their own NS records, that was easy. but it still sticks things in update/pending status	19:56
roukoswarf	i created a _change_ns which swaps the ns and then updates in 1 step without set_delayed_notify, but it still gets stuck in pending	19:57
roukoswarf	if i do a set on the zones, they get out of pending immediately.	19:57
mugsie	ah, that code is weird	19:58
mugsie	one sec	19:58
roukoswarf	s/weird/broken, trying to make it work	19:58
mugsie	if you set zone.delayed_notify = False then do central.update_zone(context, zone) it should work	19:59
roukoswarf	updating a recordset with delated_notify=False doesnt trigger that?	20:00
mugsie	eh, I can't remember the flow off the top of my head - but I think that block it being sent	20:01
roukoswarf	im setting the NS recordsets, then the parent function calls @notification('dns.pool.update') which doesnt trigger any sync	20:02
mugsie	ah - that is for OpenStack billing notifications	20:02
mugsie	not DNS notifications	20:02
mugsie	(naming sucks, sorry)	20:02
roukoswarf	alright, well at least now i know that was meaningless for my uses, so... i update the NS, then call update_zone to get it to move? or is there a way updating a recordset should be able to notify?	20:03
mugsie	updating a recordset should trigger a notify	20:04
roukoswarf	it does not.	20:04
roukoswarf	at least, not in this context	20:04
mugsie	what function are you calling to update it?	20:04
mugsie	central.update_recordset() ?	20:04
mugsie	https://opendev.org/openstack/designate/src/branch/master/designate/central/service.py#L1411 - that will trigger a DNS Notify	20:05
roukoswarf	i was just following the example of _add_ns, which called self._update_recordset_in_storage(context, zone, ns_recordset) with set_delayed_notify=True	20:05
roukoswarf	i removed the set_delayed_notify, and expected it to work	20:05
mugsie	ah, OK.	20:06
roukoswarf	this code never seems to call update_recordset directly, should i be?	20:06
roukoswarf	im working under update_pool	20:06
mugsie	Oh! - so you are implementing updating the NS records for all zones in a pool after the zones are created?	20:07
mugsie	OK.	20:07
roukoswarf	after a pools.yaml update	20:07
mugsie	yeah	20:07
roukoswarf	which, is currently busted	20:07
roukoswarf	at least in stein, not sure if the code changed in master, doesnt look like it.	20:08
mugsie	Ok. in that case call the update_recordset() fuction	20:08
roukoswarf	instead or in addition to the update_recordset_in_storage?	20:08
mugsie	the _add_ns() is part of a bigger set fo code that will cause a NOTIFY elsewhere	20:08
mugsie	instead of	20:08
roukoswarf	well, 80% of my test zones are now out of pending	20:11
mugsie	the delayed notify chunks up the work	20:11
mugsie	so it may take a little bit	20:12
roukoswarf	but im not delaying notify, i removed that from my new _change_ns function which bulk updates all the pool ns changes.	20:12
roukoswarf	i do all the recordset changes from the pool change in one pass, then update_recordset it.	20:12
mugsie	then it could just be RMQ backlog, waiting for the workers to chew through the notify tasks	20:14
roukoswarf	http://paste.openstack.org/show/793538/	20:15
roukoswarf	my code, im calling instead of _add_ns and _delete_ns in update_pool	20:15
roukoswarf	my rmq is empty	20:16
roukoswarf	calling openstack zone set <id> instantly unsticks the zone	20:17
mugsie	yeah, that can happen if there is a long term issue.	20:18
roukoswarf	guess ill run a set on every stuck zone and start from a clean active state	20:18
mugsie	there is a period task that can be run on a 24 hourly basis to basically send a notify for all zones	20:18
mugsie	but I need to dig that out	20:19
roukoswarf	everything else has worked fine, we just hit a crash in designate on doing a pool update, so im trying to make the process smooth	20:19
roukoswarf	ill have to PR the bugfix, but this sync problem seems to be real too	20:19
roukoswarf	the crash was an easy fix, looked like an oversight.	20:20
mugsie	thats great - more than happy to review + help merge	20:20
roukoswarf	if you make a zone, then a tenant adds NS records to it, then you do a pool update that changes the NS records, it will crash.	20:20
mugsie	crash? yeah - that is an oversight	20:20
roukoswarf	_delete_ns doesnt search for managed, and does a get_recordset, singular, when in this case, there will be multiple	20:21
roukoswarf	add selects it right, so, must have just been missed	20:21
mugsie	yeap.	20:21
mugsie	we don't do functional testing of the pool modifcations, so that might be a good additon to make sure this stays fixed :)	20:22
roukoswarf	i thought i saw tests, but they dont test this case, it works perfectly if users never add NS recordsets of their own	20:22
roukoswarf	do you know, by reading the code, why they did it with _add_ns and _delete_ns in 2 steps, and then not notifying?	20:23
roukoswarf	i combined the work into a single pass, but... it must have been that way for a reason	20:24
* mugsie goes to git blame to check it wasn't him		20:24
roukoswarf	doing it in 2 steps means you would spam the queue if you did notify, which is probably why this whole notify fiasco is happening to me	20:25
roukoswarf	alright... got all ym zones into active/none, lets try with update_recordset again from a clean slate.	20:26
mugsie	no, notify was for a much bigger issue	20:26
roukoswarf	well, why call add and delete separately with delay_notify on both?	20:27
mugsie	in super active installs (HP Cloud / Racksapce etc), the number of changes was causing issues, and it was easier to batch up a zones changes into a single notify	20:27
roukoswarf	just gets stuck 100% of the time	20:27
roukoswarf	yeah, which is what i did by making each zone a single write to the db.	20:28
roukoswarf	if you do a pool update, do you get out of pending state?	20:28
mugsie	I honestly havent tried that in a long time	20:29
mugsie	changing the NS records	20:29
roukoswarf	i have 4 clusters in multiple versions, and every cluster gets all zones stuck on ns change.	20:29
mugsie	yeah, I am looking at commits from 5 years ago that could have caused this	20:30
mugsie	most peoples NS records are fairly stable I guess	20:30
roukoswarf	updated_pool = self.storage.update_pool(context, pool) should this line be after changing the ns? would that magically fix the notify?	20:30
roukoswarf	currently its before the ns changes	20:30
mugsie	no - that is the DB call	20:31
roukoswarf	well, theres no other call than db calls in the code, update_recordset, update_zone, etc, never called, only the db calls are there.	20:32
mugsie	I think this is an aritfact of the old DB schema, where records were a thing on their own	20:32
roukoswarf	so all these _in_storage queries should be replaced?	20:33
mugsie	the set_delayed_notify=True on add_ns should trigger the delayed notify task runner	20:33
mugsie	OH	20:33
roukoswarf	now yer seeing it.	20:36
mugsie	no - the worker may not be reading the delay notify	20:36
roukoswarf	well hey if its an easier fix than i thought thatd be great, ive spent the last 2 days reading through the code to try and pick up the pieces on how designate was written, my head hurts.	20:38
mugsie	yeah. it grew organically	20:38
mugsie	in desiognate.conf - in the [service:producer] section - do you have an "enabled_tasks" item?	20:40
mugsie	try adding one with "delayed_notify"	20:41
mugsie	as the value	20:41
mugsie	I have to drop off unfortunately, but leave info here, or email the mailing list, and I can look in the morning	20:42
mugsie	roukoswarf: ^	20:42
roukoswarf	hehwould it be the worker?	20:43
roukoswarf	oh, producer, got it	20:43
mugsie	no - so what is supposed to happen is the producer is supposed to look at the db for zones with a "delayed notify" - bundle them up and tell the workers to send the notify on a regular basis	20:44
roukoswarf	enabled_tasks = None, nice. this is a kolla deployment, so maybe thats something i need to point at them for instead in this case.	20:44
mugsie	and shard that out across a few producers and worker processes	20:44
mugsie	I just checked the - docs and sample config aree wrong	20:45
mugsie	not kollas fault :)	20:45
roukoswarf	None is a very pythony value for a config file, ill check if kolla has it as a var somewhere undocumented.	20:45
mugsie	i suspect it came from the config firel generation - https://docs.openstack.org/designate/latest/admin/samples/config.html	20:46
mugsie	but yet - that should fix it	20:46
mugsie	o/	20:46
roukoswarf	thanks a bunch, ill go find some contrib guide somewhere for the bugfix, as thats still valid, but at least i know im not crazy on the notify issues.	20:46
mugsie	https://bugs.launchpad.net/designate	20:55
roukoswarf	well, i have the fixed code, figured i could just open a PR	21:00
roukoswarf	or is that not a thing	21:00
*** livelace has quit IRC		21:11
*** livelace has joined #openstack-dns		21:17
*** livelace has quit IRC		21:18
roukoswarf	mugsie: so yes, after setting the task things work perfectly as intended, with the original code. thank you very much. not sure the best way to get kolla setting it, but ill talk to them.	21:30
*** awalende has quit IRC		21:38
*** awalende has joined #openstack-dns		21:39
*** awalende has quit IRC		21:43
*** roukoswarf has quit IRC		22:50

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!