Friday, 2017-08-25

*** masber has joined #openstack-dns		00:27
trungnv	abaindur, I am not sure exactly issue along with 2nd or 3rd. Feel free with some logs where occurs these issues in here or https://bugs.launchpad.net/designate.	01:18
abaindur	Unfortunately there aren't any logs...	01:49
abaindur	im trying to add my own and re-pro	01:49
abaindur	but hitting some issues printing the DB records	01:49
abaindur	trungnv: https://gist.github.com/xagent003/ca15d59e61875babddb1c733c2e3dfe5	01:52
abaindur	https://gist.githubusercontent.com/xagent003/ca15d59e61875babddb1c733c2e3dfe5/raw/55105558ebadb93c7841591031b9a1d14ab5cb58/zone%2520action	01:52
abaindur	one thing i do notice is, right before that log at end of _update_zone_in_storage (which we added), where zone is returned as NONE/ACTIVE	01:53
abaindur	there are a bunch of notification logs for "dns.domain.update" and "dns.zone.update". the queueing notificaiton and emitting notification	01:54
abaindur	when the records are created without issue, we dont see those	01:54
abaindur	it is just the logs for _create_recordset_in_storage, Checking if ... belongs in any subzones..., Updating serial for zone..., and Updating zone	01:55
abaindur	followed by that log at end of _update_zone_in_storage	01:55
abaindur	But in case where _update_zone_in_storage returns wrong action for zone, we see logs like that ^^	01:55
abaindur	trungnv: what causes those Queuing notification and emitting notifications? Suspect that has something to do with _update_zone_in_storage not properly returning zone from storage as UPDATE/PENDING	01:57
trungnv	abaindur, when you update zone then you got "action:'UPDATE' status:'PENDING'" then worker will update to backend. If update processing success then return "action:'NONE' status:'ACTIVE'" and if update processing fail then return status 'ERROR'	02:01
trungnv	I look at your log then nothing issues in there.	02:02
abaindur	yes but problem here is, "then worker will update to backen" - central itself sends zone action as NONE to the worker. so worker can't update	02:02
abaindur	trungnv: when central returns from updating the zone in storage with UPDATE/PENDING, the zone is returned as NONE/ACTIVE	02:02
abaindur	https://github.com/openstack/designate/blob/stable/newton/designate/central/service.py#L1065	02:02
abaindur	We even see here, https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L309	02:03
abaindur	it has correct UPDATE/PENDING. but when that function is storage returns the zone, it has NONE/ACTIVE	02:04
abaindur	We see those Queuing/Emitting notification logs around that time period when it fails - when central correct gets the UPDATE/PENDING for the zone to send to worker, it goes fine	02:05
trungnv	in your log which you sent me then I also saw log at https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L309 which is 2017-08-23 00:48:17.208 24214 DEBUG designate.storage.impl_sqlalchemy [req-f87ca390-b775-4ceb-83bb-8b3559fbc42b 332fa775cff44c0cbf4282cc4537b339 9b4e484efcb64b718944efaf05042df6 - - -] Updating zone <Zone id:'a0058952-4d23-4bf7-a87d-7184d0dbc048'	02:06
trungnv	type:'PRIMARY' name:'net1.test-du-neutron-t-mx-c7-41216.platform9.horse.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1503449297' action:'UPDATE' status:'PENDING'> update_zone /opt/pf9/designate/lib/python2.7/site-packages/designate/storage/impl_sqlalchemy/__init__.py:309	02:06
trungnv	you can see that action:'UPDATE' status:'PENDING'	02:06
abaindur	correct, that's still in central.	02:06
abaindur	thats before the updated_zone has been returned via storage.update_zone	02:06
abaindur	from designate/storage/imply_sqlalchemy/__init__.py: update_zone() function	02:06
abaindur	when https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L432	02:07
abaindur	returns, it returns updated_zone with action=NONE, status=ACTIVE	02:07
abaindur	But at L #302, we see that update function was called with action=UPDATE, status=PENDING	02:08
abaindur	(thats also the log you just pasted, line #'s in reference code are a little off due to extra debugs I added)	02:09
abaindur	that function is not properly updating and returning the zone in DB. Or there is some race condition that changes it to ACTIVE/NONE	02:10
trungnv	abaindur, Yep. perhaps I need investigate more info with update zone method.	02:11
abaindur	as mentioned - we only see this so far when we add multiple records to a zone. It's usually the 2nd or 3rd record that fails	02:12
abaindur	1st record always goes thru fine	02:12
trungnv	yep.	02:12
abaindur	and we also see those Queuing and Emitting notifications in logs around that time, when failure occurs	02:12
abaindur	during record creation for passing cases, they are not there (or they are present at some time before the next records has been created)	02:13
trungnv	before push zone object into worker then action is None and status is ACTIVE then the worker don't understand next action for this session. ir right?	02:14
abaindur	What triggers those? Are they somehow updating the DB? We don't see any extra sqlalchemy logs though with wrong action/status	02:14
abaindur	yea. so that zone is returned from storage with NONE/ACTIVE	02:14
abaindur	central sends that updated zone to worker, which errors out	02:15
abaindur	I am wondering if for a workaround/hack fix, we could just force set zone.action=UPDATE, status=PENDING	02:15
trungnv	Did you check new zone which updated?	02:15
abaindur	here: https://github.com/openstack/designate/blob/stable/newton/designate/central/service.py#L1065	02:15
abaindur	after the storage is returned	02:15
abaindur	in that case if the DB messed up, we at least send correct actio and status to worker	02:16
abaindur	the zone remains in NONE/ACTIVE	02:16
abaindur	the recordset remains forever in UPDATE/PENDING - worker never processes it due to action=NONE	02:17
trungnv	No. Could you check values in new zone?	02:17
abaindur	new zone?	02:17
trungnv	yeah.	02:17
abaindur	what do you mean?	02:18
abaindur	in our tests, we create a couple of zones - per tenant network	02:18
abaindur	then we create around 3 A records per zone	02:18
abaindur	half the time everything goes fine	02:18
abaindur	half the time, 1 records (the 2nd or 3rd) in one of the zones faces this issue	02:18
abaindur	the zone stays as NONE and ACTIVE	02:19
trungnv	abaindur, Any log at worker when you create 2nd record?	02:21
abaindur	as i said, worker has stack trace because central sent zone.action=NONE	02:21
abaindur	so worker does nothing for that record	02:21
trungnv	nothing logs in worker. right?	02:22
abaindur	i mean there are logs... it just gets the update_zone RPC from central with NONE action	02:22
abaindur	so worker errors out	02:22
abaindur	Let me report back in about 30-45 minutes... we have added more logs to https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L309	02:25
abaindur	and to https://github.com/openstack/designate/blob/stable/newton/designate/sqlalchemy/base.py#L524	02:25
abaindur	The problem must lie somewhere in there, right? zone comes in with UPDATE/PENDING, but updated_zone that is returned has NONE/ACTIVE	02:26
trungnv	Yep. try to print some logs at there.	02:27
abaindur	i see it does a table.update, then a select to "# Refetch the row, for generated columns etc", then calls return _set_object_from_model(obj, resultproxy.fetchone())	02:27
trungnv	abaindur, I am caring about 2rd record in backend more than 2nd record in DB.	02:30
trungnv	what is backend which you used?	02:30
abaindur	2nd record in backend or worker gets nowhere. worker throws exception immediately because central has sent zone with NONE action	02:30
trungnv	yep. this is issue if 2nd record don't created in backend.	02:32
trungnv	abaindur, Cool. I also look at this issue soon.	02:32
abaindur	How can 2nd record get created in backend in first place though? If central sends wrong update_zone to worker?	02:33
*** rektide has joined #openstack-dns		02:33
abaindur	the update_zone from central -> worker is sent after central updates the zone in storage to UPDATE/PENDING, right?	02:33
trungnv	Yeah. exactly.	02:34
abaindur	1. central's create_recordset calls 2. increment_zone_serial, which calls 3. _update_zone_in_storage, which calls the 4. DB storage driver's update_zone	02:34
abaindur	#4 is where we are seeing issue	02:34
trungnv	we are get zone object forn DB firstly then central call worker to create to backend then worker call backe central to update zone.	02:35
abaindur	Finally 5. worker gets the RPC req. from central. it fails because zone has wrong values due to #4	02:35
abaindur	so central first has worker create the record in backend, BEFORE it increments the zone serial in central/DB? Then it sends another update_zone RPC to worker?	02:36
trungnv	Yep.	02:37
trungnv	Did you see 2nd record in backend(BIND9 or Pdns4)?	02:38
abaindur	so is there any possibility of race condition then? worker tells central to update zone to NONE/ACTIVE. Meanwhile central increments zone serial and tries to update zone in DB to UPDATE/PENDING	02:38
abaindur	but that fails, so worker rejects the update_zone	02:39
abaindur	No, the record is unpingable - fails to resolve	02:39
trungnv	Yep. this is an issue.	02:39
abaindur	and DB shows recordset itself forever remains in UPDATE and PENDING state, while zone remains NONE/ACTIVE	02:39
trungnv	abaindur, I understand which you said.	02:40
abaindur	looks like tests this time are passing... will have to re-run	02:40
trungnv	I will look at this issue soon. Could you push Bug in launchpad along with logs?	02:40
abaindur	take about another 1-1.5 hrs to re-run testbeds	02:40
trungnv	abaindur, http://paste.ubuntu.com/25387060/ no action and no status on 2nd and 3rd record. This is an issue on your case?	03:49
abaindur	No	03:50
abaindur	recordset shows as PENDING	03:50
abaindur	the zone as NONE and ACTIVE	03:50
abaindur	we dont notice it missing altogether	03:51
trungnv	status is PENDING on 2nd record?	03:52
trungnv	abaindur, http://paste.ubuntu.com/25387095/ this worker log when I create 4th record.	04:00
abaindur	Are you creating these records quickly?	04:08
abaindur	or waiting to create each one one by one	04:08
abaindur	We are creating it in a loop via python API - We attach a floating IP, wait for floating IP status to become ACTIVE, then create recordset	04:08
abaindur	So it happens a lot quickre than say, manually using CLI	04:08
trungnv	abaindur, just via devstack along with one node without floating IP.	04:28
frickler	is there a sample devstack config somewhere that sets up neutron with designate integration?	05:49
frickler	I haven't been able to find any test for that scenario in jenkins, is that correct or did I just miss it?	05:50
trungnv	frickler, https://docs.openstack.org/designate/latest/contributor/integrations.html this link which you need about integration.	06:09
trungnv	frickler, We have some test-case about notify https://github.com/openstack/designate/blob/master/designate/tests/test_notification_handler/	06:11
*** abalutoiu_ has joined #openstack-dns		06:16
*** abalutoiu_ has quit IRC		06:20
frickler	trungnv: thx, but those tests seem to work with fixtures for notification only, no real interaction with neutron, right?	06:29
frickler	maybe the neutron folks have a bit more, I'll ask over there	06:32
*** pcaruana has joined #openstack-dns		06:40
*** trungnv has quit IRC		07:18
*** trungnv has joined #openstack-dns		07:19
*** abaindur has quit IRC		07:30
*** abalutoiu_ has joined #openstack-dns		07:31
*** egonzalez has joined #openstack-dns		07:35
*** egonzalez has quit IRC		07:50
*** abalutoiu_ has quit IRC		07:51
*** egonzalez has joined #openstack-dns		07:51
*** vipul has quit IRC		07:54
*** vipul has joined #openstack-dns		07:57
*** vipul has quit IRC		08:23
*** vipul has joined #openstack-dns		08:25
*** abalutoiu_ has joined #openstack-dns		08:30
*** carthaca_ has quit IRC		08:37
*** carthaca_ has joined #openstack-dns		08:37
*** MarkBaker has joined #openstack-dns		08:44
*** abalutoiu_ has quit IRC		09:13
frickler	kiall: https://review.openstack.org/352843 is still W-1ed by you, could you take a look please?	09:14
*** abalutoiu_ has joined #openstack-dns		09:18
*** jmccarthy has joined #openstack-dns		09:47
*** abalutoiu_ has quit IRC		09:56
*** trungnv has quit IRC		10:02
*** abalutoiu_ has joined #openstack-dns		10:34
*** daidv has quit IRC		11:00
*** MarkBaker has quit IRC		11:04
*** frickler has quit IRC		11:26
*** andreaf has quit IRC		11:41
*** vcn[m] has quit IRC		11:41
*** Guest45420 has quit IRC		11:41
*** fouxm has quit IRC		11:41
*** odyssey4me has quit IRC		11:41
*** ircuser-1 has quit IRC		11:41
*** d0ugal has quit IRC		11:41
*** simonmcc has quit IRC		11:41
*** timsim has quit IRC		11:41
*** serverascode has quit IRC		11:41
*** johnsom has quit IRC		11:41
*** fyxim has quit IRC		11:41
*** pcaruana has quit IRC		11:41
*** masber has quit IRC		11:41
*** KeithMnemonic has quit IRC		11:41
*** nyloc has quit IRC		11:41
*** mrhillsman has quit IRC		11:41
*** amitry has quit IRC		11:41
*** ChanServ has quit IRC		11:41
*** abalutoiu_ has quit IRC		11:41
*** jmccarthy has quit IRC		11:41
*** harmw has quit IRC		11:41
*** bauruine has quit IRC		11:41
*** mugsie has quit IRC		11:41
*** dhellmann has quit IRC		11:41
*** asettle has quit IRC		11:41
*** hieulq has quit IRC		11:41
*** mikal has quit IRC		11:41
*** brad[] has quit IRC		11:41
*** filler has quit IRC		11:41
*** logan- has quit IRC		11:41
*** ianychoi has quit IRC		11:41
*** ChrisDevita has quit IRC		11:41
*** raginbajin has quit IRC		11:41
*** zigo has quit IRC		11:41
*** Krenair has quit IRC		11:41
*** basilAB has quit IRC		11:41
*** vaishali has quit IRC		11:41
*** clayton has quit IRC		11:41
*** JonathanD has quit IRC		11:41
*** therve_ has quit IRC		11:41
*** egonzalez has quit IRC		11:41
*** puck has quit IRC		11:41
*** lkoranda has quit IRC		11:41
*** afranc has quit IRC		11:41
*** d34dh0r53 has quit IRC		11:41
*** mordred has quit IRC		11:41
*** rackertom has quit IRC		11:41
*** andrewbogott has quit IRC		11:41
*** ying_zuo has quit IRC		11:41
*** jmccrory has quit IRC		11:41
*** mwhahaha has quit IRC		11:41
*** kbyrne has quit IRC		11:41
*** sapcc-bot has quit IRC		11:41
*** carthaca_ has quit IRC		11:41
*** nkinder has quit IRC		11:41
*** yee379 has quit IRC		11:41
*** vipul has quit IRC		11:41
*** rektide has quit IRC		11:41
*** kiall has quit IRC		11:41
*** eandersson has quit IRC		11:41
*** cliles has quit IRC		11:41
*** openstack has joined #openstack-dns		12:56
*** ChanServ sets mode: +v openstack		12:56
*** openstackstatus has joined #openstack-dns		12:57
*** ChanServ sets mode: +v openstackstatus		12:57
*** hieulq has quit IRC		13:29
frickler	I'm seeing this message when trying a default devstack installation, is this a known issue with pdns? "gmysql Connection failed: Could not prepare statement: SELECT domain_id,name,type,modified_at,account,comment FROM comments WHERE domain_id=?: Table 'designate_pdns.comments' doesn't exist"	13:34
frickler	also the bugs dashboard link in the topic looks broken to me	13:34
mugsie	frickler: damn - it is down. Will fix that over the weekend	13:36
mugsie	it is not a know issue	13:37
mugsie	known*	13:37
mugsie	can you log it @ https://bugs.launchpad.net/designate ?	13:38
frickler	mugsie: possibly pdns 4 on xenial is too new? I noticed the devstack instructions talk about trusty still, but that is kinda deprecated	13:38
frickler	mugsie: will do	13:38
mugsie	oh, there is a different driver for xenial	13:41
mugsie	it needs to be pdn4 instead of powerdns	13:41
frickler	mugsie: yeah, I just found that when grepping through the devstack plugin code. maybe worth changing the default? or at least generate an error early when trying to run the old driver on xenial?	13:46
mugsie	frickler: I though we did error on xenial -_-	13:46
mugsie	but yes, it definitly is	13:46
mugsie	(doing both)	13:47
*** d0ugal has quit IRC		13:49
frickler	hmm, there is a check in pdns4, that fails on anything other than trusty\|xenial, so making that the default will break things for everything non-ubuntu	13:51
frickler	the "real" installation guides seem to use bind9 anyway, maybe make that the default for devstack, too? +1 for consistency	13:53
*** saphi has joined #openstack-dns		14:01
frickler	mugsie: FYI, dumped all that into https://bugs.launchpad.net/designate/+bug/1713069	14:02
openstack	Launchpad bug 1713069 in Designate "Devstack setup with default settings fails on Ubuntu Xenial" [Undecided,New]	14:02
*** saphi has quit IRC		14:50
*** saphi has joined #openstack-dns		14:54
*** pcaruana has quit IRC		15:06
*** d0ugal has joined #openstack-dns		15:09
*** egonzalez has quit IRC		15:12
*** catintheroof has joined #openstack-dns		15:14
*** abalutoiu_ has quit IRC		15:15
*** egonzalez has joined #openstack-dns		15:24
*** trungnv has joined #openstack-dns		15:33
*** trungnv has quit IRC		15:34
*** trungnv has joined #openstack-dns		15:34
*** trungnv_ has joined #openstack-dns		15:38
*** trungnv has quit IRC		15:38
*** trungnv_ has quit IRC		15:38
*** trungnv has joined #openstack-dns		15:39
*** trungnv has quit IRC		15:47
*** trungnv has joined #openstack-dns		15:47
*** d0ugal has quit IRC		15:52
*** catintheroof has quit IRC		15:59
*** trungnv has left #openstack-dns		16:00
*** trungnv_ has joined #openstack-dns		16:00
*** trungnv has joined #openstack-dns		16:00
*** trungnv has quit IRC		16:03
*** trungnv_ has quit IRC		16:03
*** trungnv has joined #openstack-dns		16:12
*** trungnv has quit IRC		16:12
*** trungnv has joined #openstack-dns		16:14
*** trungnv has quit IRC		16:15
*** jmccarthy has left #openstack-dns		16:31
*** catintheroof has joined #openstack-dns		16:52
*** catintheroof has quit IRC		17:11
*** catintheroof has joined #openstack-dns		17:56
*** masber has quit IRC		18:31
*** saphi has quit IRC		18:37
*** catintheroof has quit IRC		20:55
*** catintheroof has joined #openstack-dns		20:56
*** masber has joined #openstack-dns		21:27
*** masber has quit IRC		21:32
*** abalutoiu_ has joined #openstack-dns		22:17
*** egonzalez has quit IRC		22:51
*** catintheroof has quit IRC		23:17

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!