*** masber has joined #openstack-dns | 00:27 | |
trungnv | abaindur, I am not sure exactly issue along with 2nd or 3rd. Feel free with some logs where occurs these issues in here or https://bugs.launchpad.net/designate. | 01:18 |
---|---|---|
abaindur | Unfortunately there aren't any logs... | 01:49 |
abaindur | im trying to add my own and re-pro | 01:49 |
abaindur | but hitting some issues printing the DB records | 01:49 |
abaindur | trungnv: https://gist.github.com/xagent003/ca15d59e61875babddb1c733c2e3dfe5 | 01:52 |
abaindur | https://gist.githubusercontent.com/xagent003/ca15d59e61875babddb1c733c2e3dfe5/raw/55105558ebadb93c7841591031b9a1d14ab5cb58/zone%2520action | 01:52 |
abaindur | one thing i do notice is, right before that log at end of _update_zone_in_storage (which we added), where zone is returned as NONE/ACTIVE | 01:53 |
abaindur | there are a bunch of notification logs for "dns.domain.update" and "dns.zone.update". the queueing notificaiton and emitting notification | 01:54 |
abaindur | when the records are created without issue, we dont see those | 01:54 |
abaindur | it is just the logs for _create_recordset_in_storage, Checking if ... belongs in any subzones..., Updating serial for zone..., and Updating zone | 01:55 |
abaindur | followed by that log at end of _update_zone_in_storage | 01:55 |
abaindur | But in case where _update_zone_in_storage returns wrong action for zone, we see logs like that ^^ | 01:55 |
abaindur | trungnv: what causes those Queuing notification and emitting notifications? Suspect that has something to do with _update_zone_in_storage not properly returning zone from storage as UPDATE/PENDING | 01:57 |
trungnv | abaindur, when you update zone then you got "action:'UPDATE' status:'PENDING'" then worker will update to backend. If update processing success then return "action:'NONE' status:'ACTIVE'" and if update processing fail then return status 'ERROR' | 02:01 |
trungnv | I look at your log then nothing issues in there. | 02:02 |
abaindur | yes but problem here is, "then worker will update to backen" - central itself sends zone action as NONE to the worker. so worker can't update | 02:02 |
abaindur | trungnv: when central returns from updating the zone in storage with UPDATE/PENDING, the zone is returned as NONE/ACTIVE | 02:02 |
abaindur | https://github.com/openstack/designate/blob/stable/newton/designate/central/service.py#L1065 | 02:02 |
abaindur | We even see here, https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L309 | 02:03 |
abaindur | it has correct UPDATE/PENDING. but when that function is storage returns the zone, it has NONE/ACTIVE | 02:04 |
abaindur | We see those Queuing/Emitting notification logs around that time period when it fails - when central correct gets the UPDATE/PENDING for the zone to send to worker, it goes fine | 02:05 |
trungnv | in your log which you sent me then I also saw log at https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L309 which is 2017-08-23 00:48:17.208 24214 DEBUG designate.storage.impl_sqlalchemy [req-f87ca390-b775-4ceb-83bb-8b3559fbc42b 332fa775cff44c0cbf4282cc4537b339 9b4e484efcb64b718944efaf05042df6 - - -] Updating zone <Zone id:'a0058952-4d23-4bf7-a87d-7184d0dbc048' | 02:06 |
trungnv | type:'PRIMARY' name:'net1.test-du-neutron-t-mx-c7-41216.platform9.horse.' pool_id:'794ccc2c-d751-44fe-b57f-8894c9f5c842' serial:'1503449297' action:'UPDATE' status:'PENDING'> update_zone /opt/pf9/designate/lib/python2.7/site-packages/designate/storage/impl_sqlalchemy/__init__.py:309 | 02:06 |
trungnv | you can see that action:'UPDATE' status:'PENDING' | 02:06 |
abaindur | correct, that's still in central. | 02:06 |
abaindur | thats before the updated_zone has been returned via storage.update_zone | 02:06 |
abaindur | from designate/storage/imply_sqlalchemy/__init__.py: update_zone() function | 02:06 |
abaindur | when https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L432 | 02:07 |
abaindur | returns, it returns updated_zone with action=NONE, status=ACTIVE | 02:07 |
abaindur | But at L #302, we see that update function was called with action=UPDATE, status=PENDING | 02:08 |
abaindur | (thats also the log you just pasted, line #'s in reference code are a little off due to extra debugs I added) | 02:09 |
abaindur | that function is not properly updating and returning the zone in DB. Or there is some race condition that changes it to ACTIVE/NONE | 02:10 |
trungnv | abaindur, Yep. perhaps I need investigate more info with update zone method. | 02:11 |
abaindur | as mentioned - we only see this so far when we add multiple records to a zone. It's usually the 2nd or 3rd record that fails | 02:12 |
abaindur | 1st record always goes thru fine | 02:12 |
trungnv | yep. | 02:12 |
abaindur | and we also see those Queuing and Emitting notifications in logs around that time, when failure occurs | 02:12 |
abaindur | during record creation for passing cases, they are not there (or they are present at some time before the next records has been created) | 02:13 |
trungnv | before push zone object into worker then action is None and status is ACTIVE then the worker don't understand next action for this session. ir right? | 02:14 |
abaindur | What triggers those? Are they somehow updating the DB? We don't see any extra sqlalchemy logs though with wrong action/status | 02:14 |
abaindur | yea. so that zone is returned from storage with NONE/ACTIVE | 02:14 |
abaindur | central sends that updated zone to worker, which errors out | 02:15 |
abaindur | I am wondering if for a workaround/hack fix, we could just force set zone.action=UPDATE, status=PENDING | 02:15 |
trungnv | Did you check new zone which updated? | 02:15 |
abaindur | here: https://github.com/openstack/designate/blob/stable/newton/designate/central/service.py#L1065 | 02:15 |
abaindur | after the storage is returned | 02:15 |
abaindur | in that case if the DB messed up, we at least send correct actio and status to worker | 02:16 |
abaindur | the zone remains in NONE/ACTIVE | 02:16 |
abaindur | the recordset remains forever in UPDATE/PENDING - worker never processes it due to action=NONE | 02:17 |
trungnv | No. Could you check values in new zone? | 02:17 |
abaindur | new zone? | 02:17 |
trungnv | yeah. | 02:17 |
abaindur | what do you mean? | 02:18 |
abaindur | in our tests, we create a couple of zones - per tenant network | 02:18 |
abaindur | then we create around 3 A records per zone | 02:18 |
abaindur | half the time everything goes fine | 02:18 |
abaindur | half the time, 1 records (the 2nd or 3rd) in one of the zones faces this issue | 02:18 |
abaindur | the zone stays as NONE and ACTIVE | 02:19 |
trungnv | abaindur, Any log at worker when you create 2nd record? | 02:21 |
abaindur | as i said, worker has stack trace because central sent zone.action=NONE | 02:21 |
abaindur | so worker does nothing for that record | 02:21 |
trungnv | nothing logs in worker. right? | 02:22 |
abaindur | i mean there are logs... it just gets the update_zone RPC from central with NONE action | 02:22 |
abaindur | so worker errors out | 02:22 |
abaindur | Let me report back in about 30-45 minutes... we have added more logs to https://github.com/openstack/designate/blob/stable/newton/designate/storage/impl_sqlalchemy/__init__.py#L309 | 02:25 |
abaindur | and to https://github.com/openstack/designate/blob/stable/newton/designate/sqlalchemy/base.py#L524 | 02:25 |
abaindur | The problem must lie somewhere in there, right? zone comes in with UPDATE/PENDING, but updated_zone that is returned has NONE/ACTIVE | 02:26 |
trungnv | Yep. try to print some logs at there. | 02:27 |
abaindur | i see it does a table.update, then a select to "# Refetch the row, for generated columns etc", then calls return _set_object_from_model(obj, resultproxy.fetchone()) | 02:27 |
trungnv | abaindur, I am caring about 2rd record in backend more than 2nd record in DB. | 02:30 |
trungnv | what is backend which you used? | 02:30 |
abaindur | 2nd record in backend or worker gets nowhere. worker throws exception immediately because central has sent zone with NONE action | 02:30 |
trungnv | yep. this is issue if 2nd record don't created in backend. | 02:32 |
trungnv | abaindur, Cool. I also look at this issue soon. | 02:32 |
abaindur | How can 2nd record get created in backend in first place though? If central sends wrong update_zone to worker? | 02:33 |
*** rektide has joined #openstack-dns | 02:33 | |
abaindur | the update_zone from central -> worker is sent after central updates the zone in storage to UPDATE/PENDING, right? | 02:33 |
trungnv | Yeah. exactly. | 02:34 |
abaindur | 1. central's create_recordset calls 2. increment_zone_serial, which calls 3. _update_zone_in_storage, which calls the 4. DB storage driver's update_zone | 02:34 |
abaindur | #4 is where we are seeing issue | 02:34 |
trungnv | we are get zone object forn DB firstly then central call worker to create to backend then worker call backe central to update zone. | 02:35 |
abaindur | Finally 5. worker gets the RPC req. from central. it fails because zone has wrong values due to #4 | 02:35 |
abaindur | so central first has worker create the record in backend, *BEFORE* it increments the zone serial in central/DB? Then it sends another update_zone RPC to worker? | 02:36 |
trungnv | Yep. | 02:37 |
trungnv | Did you see 2nd record in backend(BIND9 or Pdns4)? | 02:38 |
abaindur | so is there any possibility of race condition then? worker tells central to update zone to NONE/ACTIVE. Meanwhile central increments zone serial and tries to update zone in DB to UPDATE/PENDING | 02:38 |
abaindur | but that fails, so worker rejects the update_zone | 02:39 |
abaindur | No, the record is unpingable - fails to resolve | 02:39 |
trungnv | Yep. this is an issue. | 02:39 |
abaindur | and DB shows recordset itself forever remains in UPDATE and PENDING state, while zone remains NONE/ACTIVE | 02:39 |
trungnv | abaindur, I understand which you said. | 02:40 |
abaindur | looks like tests this time are passing... will have to re-run | 02:40 |
trungnv | I will look at this issue soon. Could you push Bug in launchpad along with logs? | 02:40 |
abaindur | take about another 1-1.5 hrs to re-run testbeds | 02:40 |
trungnv | abaindur, http://paste.ubuntu.com/25387060/ no action and no status on 2nd and 3rd record. This is an issue on your case? | 03:49 |
abaindur | No | 03:50 |
abaindur | recordset shows as PENDING | 03:50 |
abaindur | the zone as NONE and ACTIVE | 03:50 |
abaindur | we dont notice it missing altogether | 03:51 |
trungnv | status is PENDING on 2nd record? | 03:52 |
trungnv | abaindur, http://paste.ubuntu.com/25387095/ this worker log when I create 4th record. | 04:00 |
abaindur | Are you creating these records quickly? | 04:08 |
abaindur | or waiting to create each one one by one | 04:08 |
abaindur | We are creating it in a loop via python API - We attach a floating IP, wait for floating IP status to become ACTIVE, then create recordset | 04:08 |
abaindur | So it happens a lot quickre than say, manually using CLI | 04:08 |
trungnv | abaindur, just via devstack along with one node without floating IP. | 04:28 |
frickler | is there a sample devstack config somewhere that sets up neutron with designate integration? | 05:49 |
frickler | I haven't been able to find any test for that scenario in jenkins, is that correct or did I just miss it? | 05:50 |
trungnv | frickler, https://docs.openstack.org/designate/latest/contributor/integrations.html this link which you need about integration. | 06:09 |
trungnv | frickler, We have some test-case about notify https://github.com/openstack/designate/blob/master/designate/tests/test_notification_handler/ | 06:11 |
*** abalutoiu_ has joined #openstack-dns | 06:16 | |
*** abalutoiu_ has quit IRC | 06:20 | |
frickler | trungnv: thx, but those tests seem to work with fixtures for notification only, no real interaction with neutron, right? | 06:29 |
frickler | maybe the neutron folks have a bit more, I'll ask over there | 06:32 |
*** pcaruana has joined #openstack-dns | 06:40 | |
*** trungnv has quit IRC | 07:18 | |
*** trungnv has joined #openstack-dns | 07:19 | |
*** abaindur has quit IRC | 07:30 | |
*** abalutoiu_ has joined #openstack-dns | 07:31 | |
*** egonzalez has joined #openstack-dns | 07:35 | |
*** egonzalez has quit IRC | 07:50 | |
*** abalutoiu_ has quit IRC | 07:51 | |
*** egonzalez has joined #openstack-dns | 07:51 | |
*** vipul has quit IRC | 07:54 | |
*** vipul has joined #openstack-dns | 07:57 | |
*** vipul has quit IRC | 08:23 | |
*** vipul has joined #openstack-dns | 08:25 | |
*** abalutoiu_ has joined #openstack-dns | 08:30 | |
*** carthaca_ has quit IRC | 08:37 | |
*** carthaca_ has joined #openstack-dns | 08:37 | |
*** MarkBaker has joined #openstack-dns | 08:44 | |
*** abalutoiu_ has quit IRC | 09:13 | |
frickler | kiall: https://review.openstack.org/352843 is still W-1ed by you, could you take a look please? | 09:14 |
*** abalutoiu_ has joined #openstack-dns | 09:18 | |
*** jmccarthy has joined #openstack-dns | 09:47 | |
*** abalutoiu_ has quit IRC | 09:56 | |
*** trungnv has quit IRC | 10:02 | |
*** abalutoiu_ has joined #openstack-dns | 10:34 | |
*** daidv has quit IRC | 11:00 | |
*** MarkBaker has quit IRC | 11:04 | |
*** frickler has quit IRC | 11:26 | |
*** andreaf has quit IRC | 11:41 | |
*** vcn[m] has quit IRC | 11:41 | |
*** Guest45420 has quit IRC | 11:41 | |
*** fouxm has quit IRC | 11:41 | |
*** odyssey4me has quit IRC | 11:41 | |
*** ircuser-1 has quit IRC | 11:41 | |
*** d0ugal has quit IRC | 11:41 | |
*** simonmcc has quit IRC | 11:41 | |
*** timsim has quit IRC | 11:41 | |
*** serverascode has quit IRC | 11:41 | |
*** johnsom has quit IRC | 11:41 | |
*** fyxim has quit IRC | 11:41 | |
*** pcaruana has quit IRC | 11:41 | |
*** masber has quit IRC | 11:41 | |
*** KeithMnemonic has quit IRC | 11:41 | |
*** nyloc has quit IRC | 11:41 | |
*** mrhillsman has quit IRC | 11:41 | |
*** amitry has quit IRC | 11:41 | |
*** ChanServ has quit IRC | 11:41 | |
*** abalutoiu_ has quit IRC | 11:41 | |
*** jmccarthy has quit IRC | 11:41 | |
*** harmw has quit IRC | 11:41 | |
*** bauruine has quit IRC | 11:41 | |
*** mugsie has quit IRC | 11:41 | |
*** dhellmann has quit IRC | 11:41 | |
*** asettle has quit IRC | 11:41 | |
*** hieulq has quit IRC | 11:41 | |
*** mikal has quit IRC | 11:41 | |
*** brad[] has quit IRC | 11:41 | |
*** filler has quit IRC | 11:41 | |
*** logan- has quit IRC | 11:41 | |
*** ianychoi has quit IRC | 11:41 | |
*** ChrisDevita has quit IRC | 11:41 | |
*** raginbajin has quit IRC | 11:41 | |
*** zigo has quit IRC | 11:41 | |
*** Krenair has quit IRC | 11:41 | |
*** basilAB has quit IRC | 11:41 | |
*** vaishali has quit IRC | 11:41 | |
*** clayton has quit IRC | 11:41 | |
*** JonathanD has quit IRC | 11:41 | |
*** therve_ has quit IRC | 11:41 | |
*** egonzalez has quit IRC | 11:41 | |
*** puck has quit IRC | 11:41 | |
*** lkoranda has quit IRC | 11:41 | |
*** afranc has quit IRC | 11:41 | |
*** d34dh0r53 has quit IRC | 11:41 | |
*** mordred has quit IRC | 11:41 | |
*** rackertom has quit IRC | 11:41 | |
*** andrewbogott has quit IRC | 11:41 | |
*** ying_zuo has quit IRC | 11:41 | |
*** jmccrory has quit IRC | 11:41 | |
*** mwhahaha has quit IRC | 11:41 | |
*** kbyrne has quit IRC | 11:41 | |
*** sapcc-bot has quit IRC | 11:41 | |
*** carthaca_ has quit IRC | 11:41 | |
*** nkinder has quit IRC | 11:41 | |
*** yee379 has quit IRC | 11:41 | |
*** vipul has quit IRC | 11:41 | |
*** rektide has quit IRC | 11:41 | |
*** kiall has quit IRC | 11:41 | |
*** eandersson has quit IRC | 11:41 | |
*** cliles has quit IRC | 11:41 | |
*** openstack has joined #openstack-dns | 12:56 | |
*** ChanServ sets mode: +v openstack | 12:56 | |
*** openstackstatus has joined #openstack-dns | 12:57 | |
*** ChanServ sets mode: +v openstackstatus | 12:57 | |
*** hieulq has quit IRC | 13:29 | |
frickler | I'm seeing this message when trying a default devstack installation, is this a known issue with pdns? "gmysql Connection failed: Could not prepare statement: SELECT domain_id,name,type,modified_at,account,comment FROM comments WHERE domain_id=?: Table 'designate_pdns.comments' doesn't exist" | 13:34 |
frickler | also the bugs dashboard link in the topic looks broken to me | 13:34 |
mugsie | frickler: damn - it is down. Will fix that over the weekend | 13:36 |
mugsie | it is not a know issue | 13:37 |
mugsie | known* | 13:37 |
mugsie | can you log it @ https://bugs.launchpad.net/designate ? | 13:38 |
frickler | mugsie: possibly pdns 4 on xenial is too new? I noticed the devstack instructions talk about trusty still, but that is kinda deprecated | 13:38 |
frickler | mugsie: will do | 13:38 |
mugsie | oh, there is a different driver for xenial | 13:41 |
mugsie | it needs to be pdn4 instead of powerdns | 13:41 |
frickler | mugsie: yeah, I just found that when grepping through the devstack plugin code. maybe worth changing the default? or at least generate an error early when trying to run the old driver on xenial? | 13:46 |
mugsie | frickler: I though we did error on xenial -_- | 13:46 |
mugsie | but yes, it definitly is | 13:46 |
mugsie | (doing both) | 13:47 |
*** d0ugal has quit IRC | 13:49 | |
frickler | hmm, there is a check in pdns4, that fails on anything other than trusty|xenial, so making that the default will break things for everything non-ubuntu | 13:51 |
frickler | the "real" installation guides seem to use bind9 anyway, maybe make that the default for devstack, too? +1 for consistency | 13:53 |
*** saphi has joined #openstack-dns | 14:01 | |
frickler | mugsie: FYI, dumped all that into https://bugs.launchpad.net/designate/+bug/1713069 | 14:02 |
openstack | Launchpad bug 1713069 in Designate "Devstack setup with default settings fails on Ubuntu Xenial" [Undecided,New] | 14:02 |
*** saphi has quit IRC | 14:50 | |
*** saphi has joined #openstack-dns | 14:54 | |
*** pcaruana has quit IRC | 15:06 | |
*** d0ugal has joined #openstack-dns | 15:09 | |
*** egonzalez has quit IRC | 15:12 | |
*** catintheroof has joined #openstack-dns | 15:14 | |
*** abalutoiu_ has quit IRC | 15:15 | |
*** egonzalez has joined #openstack-dns | 15:24 | |
*** trungnv has joined #openstack-dns | 15:33 | |
*** trungnv has quit IRC | 15:34 | |
*** trungnv has joined #openstack-dns | 15:34 | |
*** trungnv_ has joined #openstack-dns | 15:38 | |
*** trungnv has quit IRC | 15:38 | |
*** trungnv_ has quit IRC | 15:38 | |
*** trungnv has joined #openstack-dns | 15:39 | |
*** trungnv has quit IRC | 15:47 | |
*** trungnv has joined #openstack-dns | 15:47 | |
*** d0ugal has quit IRC | 15:52 | |
*** catintheroof has quit IRC | 15:59 | |
*** trungnv has left #openstack-dns | 16:00 | |
*** trungnv_ has joined #openstack-dns | 16:00 | |
*** trungnv has joined #openstack-dns | 16:00 | |
*** trungnv has quit IRC | 16:03 | |
*** trungnv_ has quit IRC | 16:03 | |
*** trungnv has joined #openstack-dns | 16:12 | |
*** trungnv has quit IRC | 16:12 | |
*** trungnv has joined #openstack-dns | 16:14 | |
*** trungnv has quit IRC | 16:15 | |
*** jmccarthy has left #openstack-dns | 16:31 | |
*** catintheroof has joined #openstack-dns | 16:52 | |
*** catintheroof has quit IRC | 17:11 | |
*** catintheroof has joined #openstack-dns | 17:56 | |
*** masber has quit IRC | 18:31 | |
*** saphi has quit IRC | 18:37 | |
*** catintheroof has quit IRC | 20:55 | |
*** catintheroof has joined #openstack-dns | 20:56 | |
*** masber has joined #openstack-dns | 21:27 | |
*** masber has quit IRC | 21:32 | |
*** abalutoiu_ has joined #openstack-dns | 22:17 | |
*** egonzalez has quit IRC | 22:51 | |
*** catintheroof has quit IRC | 23:17 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!