opendevreview | Don Kehn proposed openstack/designate-tempest-plugin master: Adds test for the multipool bind9 configuration. https://review.opendev.org/c/openstack/designate-tempest-plugin/+/871184 | 01:14 |
---|---|---|
opendevreview | Vadym Markov proposed openstack/designate-dashboard master: Fix "Masters IP Address" for Zone update form https://review.opendev.org/c/openstack/designate-dashboard/+/879737 | 12:19 |
ozzzo_work | johnsom: I discussed the patch with my team; it doesn't look like that is going to fly. Whatever we do to fix it in the lab needs to be suitable for dev/qa/prod also | 13:11 |
ozzzo_work | I tried introducing an extra delay before deletion, and that fixes the problem so it must be some kind of race condition | 13:11 |
eandersson | ozzzo_work: The most common race condition is that the delete notification happens before the create notification has been processed properly. You cannot delete a record that does not properly exist yet. | 14:39 |
eandersson | I worked around this myself by only allowing one A record per VM, and during the create portion just delete any previous A records that has the same name. | 14:43 |
eandersson | It wouldn't fix dangling A records, but it would prevent new VMs with the same name from having old data. | 14:43 |
eandersson | A potentially hacky work around would be to try to find the records a few times with a delay in between each attempt but that isn’t great | 14:45 |
ozzzo_work | that's what I'm trying now | 14:55 |
ozzzo_work | eandersson: That didn't fix it: https://paste.openstack.org/show/bnj6zSHrVJIJ1nhUrqVl/ | 18:35 |
ozzzo_work | it seems like 10 seconds should be plenty of time for a race condition to resolve itself. It looks like the failure is occurring here: https://github.com/openstack/designate/blob/train-eol/designate/objects/base.py#L407 | 18:38 |
ozzzo_work | When I generated a python error during this condition I got: https://paste.openstack.org/show/bgigVrtMPEK6zmQw6btx/ | 18:41 |
ozzzo_work | I don't understand how this is possible. I'm iterating through the list, and seeing the item there, and then I get "x not in list" when I try to remove it | 18:42 |
ozzzo_work | if it was a race condition, shouldn't I see the item missing when I iterate through the l ist? | 18:42 |
eandersson | Yea - that is odd | 18:51 |
eandersson | I don't think this will make a difference but can you try this patch just in case | 18:51 |
eandersson | https://review.opendev.org/c/openstack/designate/+/879305 | 18:51 |
eandersson | btw what version were you running again ozzzo_work? | 18:55 |
eandersson | oh train right? | 18:59 |
eandersson | I wrote a lot of tests last night. I can back port them to train and see if it behaves any different. | 18:59 |
eandersson | btw the retry would need to re-fetch the recordset each time | 19:01 |
eandersson | Honestly you would want the retry to happen here | 19:02 |
eandersson | https://github.com/openstack/designate/blob/master/designate/notification_handler/base.py#L252 | 19:02 |
eandersson | start with requesting find_records again as a first step in the retry process | 19:02 |
ozzzo_work | We're running train but we have have johnsom's "Fix race condition in the sink when deleting records " patch | 20:24 |
ozzzo_work | the train-eol code was last updated in 2020. The version we're running looks like this: https://github.com/openstack/designate/blob/60edc59ff765b406e4b936deb4d200a2d9b411ce/designate/notification_handler/base.py | 20:25 |
johnsom | FYI, I just backported that patch | 20:26 |
ozzzo_work | I wasn't re-calling find_records so that explains why it didn't work. I'll try your patch without the retry, and if that doesn't fix it I'll do a proper retry | 20:30 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline for extended periods between 22:00 and 00:00 UTC for software upgrades and project renames: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/VW2O56AXI4OX34CWDNRNZDCWJDZR3QJP/ | 21:04 | |
eandersson | btw the more logs you can provide us with the better. Ideally from worker / central / sink | 21:37 |
eandersson | But I also understand if that is difficult. | 21:47 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline for extended periods over the next two hours for software upgrades and project renames: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/VW2O56AXI4OX34CWDNRNZDCWJDZR3QJP/ | 21:58 | |
opendevreview | Merged openstack/designate master: Move to a batch model for incrementing serial https://review.opendev.org/c/openstack/designate/+/871255 | 22:51 |
johnsom | Wahooo Nice work eandersson | 22:51 |
eandersson | Exciting | 22:53 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!