openstackgerrit | Merged openstack/designate stable/ussuri: Fix pdns4 devstack plugin https://review.opendev.org/c/openstack/designate/+/755138 | 00:38 |
---|---|---|
openstackgerrit | Merged openstack/designate stable/train: Fix pdns4 devstack plugin https://review.opendev.org/c/openstack/designate/+/775902 | 01:12 |
openstackgerrit | Nicolas Bock proposed openstack/designate stable/stein: Fix pdns4 devstack plugin https://review.opendev.org/c/openstack/designate/+/775903 | 01:15 |
openstackgerrit | Michael Chapman proposed openstack/python-designateclient stable/victoria: Increase hacking version https://review.opendev.org/c/openstack/python-designateclient/+/775956 | 02:00 |
openstackgerrit | Michael Chapman proposed openstack/python-designateclient stable/victoria: Fix lower-constrains job https://review.opendev.org/c/openstack/python-designateclient/+/775960 | 03:41 |
*** jpward has quit IRC | 04:33 | |
*** hamalq has quit IRC | 04:41 | |
michchap | eandersson, should I squash those two designateclient stable/victoria patches to get the gate going, or is there a better way? | 05:27 |
eandersson | michchap I think it's fine to just merge the working patch and then rebase the failing once after that | 05:36 |
eandersson | If you want you can just rebase the hacking change on top of the lower-req change | 05:37 |
nicolasbock | We could also drop the lower constraints as we did for designate | 13:04 |
michchap | eandersson, the lower req change is currently on top of the hacking change - they're each fixing a different job | 13:15 |
michchap | nicolasbock I could put it in with the hacking change if they both need to go back a few versions | 13:20 |
nicolasbock | Yes, I think that would work michchap | 13:21 |
openstackgerrit | Michael Chapman proposed openstack/python-designateclient stable/victoria: Increase hacking version https://review.opendev.org/c/openstack/python-designateclient/+/775956 | 13:24 |
openstackgerrit | Nicolas Bock proposed openstack/designate master: Increment zone serial instead of using time https://review.opendev.org/c/openstack/designate/+/776173 | 13:33 |
nicolasbock | eandersson, johnsom: Colud you have a look at https://review.opendev.org/c/openstack/designate/+/776173 ? This is still a draft. I wanted to get your input first to see whether I am going in the right direction. | 13:35 |
nicolasbock | eandersson: The background of this is that we discovered in Stein (since it's lacking the central global lock) the way zone serials are incremented lead to a race condition and result in missing records in the BIND9 backend | 13:36 |
nicolasbock | Since the serial increment granularity is seconds, BIND9 does not always AXFR the zone because it doesn't see a serial increment | 13:37 |
nicolasbock | We are using the serial also as a time stamp though, which means that we need to separate those two uses (as a serial and as a timestamp) to make this work properly | 13:37 |
nicolasbock | I added some more information for Designate/devstack usage at https://review.opendev.org/c/openstack/designate/+/765541 | 15:37 |
nicolasbock | I found this helpful because I don't do this often enough to remember how this works :) | 15:37 |
eandersson | nicolasbock I'll take a look, but another fix is to enable a coordinator | 20:10 |
nicolasbock | As in a global lock? | 20:11 |
eandersson | It's a global lock yea, but per domain etc | 20:11 |
nicolasbock | I was concerned about the performance impact of that though | 20:11 |
eandersson | Not super sure this would fix it thou | 20:11 |
eandersson | We probably want mysql to handle the increment | 20:12 |
nicolasbock | Your global lock fixes it | 20:12 |
nicolasbock | I confirmed that | 20:12 |
nicolasbock | Right | 20:12 |
nicolasbock | That's what my change is doing | 20:12 |
nicolasbock | So it's locked on that operation only | 20:12 |
nicolasbock | In our case a customer is using Terraform to create a log of recordsets in parallel | 20:12 |
nicolasbock | Your global lock fixes the issue but the performance is pretty much serial | 20:13 |
nicolasbock | I don't mean to criticize your change though. I think it's great :) | 20:13 |
eandersson | Yea - it would be for a single domain | 20:13 |
eandersson | Nah my fix was fixing the existing global lock :D | 20:13 |
eandersson | So no worries | 20:13 |
nicolasbock | Haha | 20:14 |
nicolasbock | Locking on the SQL UPDATE operation is a lot more fine grained | 20:14 |
nicolasbock | But I don't quite see whether this will beak the delayed actions | 20:14 |
nicolasbock | Since we are using the serial also as a timestamp | 20:14 |
eandersson | You will need to add a get operation as well | 20:15 |
eandersson | because I don't think current_serial will work right? | 20:16 |
nicolasbock | No, that's just the timestamp | 20:16 |
nicolasbock | I have https://review.opendev.org/c/openstack/designate/+/776173/1/designate/storage/impl_sqlalchemy/__init__.py#432 | 20:17 |
eandersson | Like this code https://review.opendev.org/c/openstack/designate/+/776173/1/designate/worker/tasks/zone.py | 20:17 |
nicolasbock | That updates the serial with the DB version | 20:17 |
eandersson | Yea | 20:17 |
nicolasbock | Ah right, yes, that's where I am not sure I might have broken something | 20:18 |
eandersson | but twhat is this criterion used for? | 20:18 |
eandersson | Probably best to just remove the serial here and see what breaks lol | 20:18 |
nicolasbock | Hah | 20:18 |
eandersson | (or fetch it from the db) | 20:18 |
nicolasbock | Ok I will try that | 20:18 |
nicolasbock | I'll go through that code path more carefully | 20:18 |
nicolasbock | Thanks for the help eandersson ! | 20:18 |
eandersson | I wonder if there are other race condiitons | 20:19 |
eandersson | besides just serial increment | 20:20 |
eandersson | If not we could look at removing the lock all together | 20:20 |
nicolasbock | Yes, we don't test anything in parallel as far as I know | 20:20 |
nicolasbock | I would be for removing it | 20:20 |
nicolasbock | It's a performance issue for sure | 20:20 |
nicolasbock | But on the other hand we should test the parallel aspect | 20:20 |
eandersson | There are some nasty nested calls in central that we sohuld fix as well | 20:20 |
nicolasbock | Yes, agreed | 20:21 |
eandersson | https://github.com/openstack/designate/blob/master/designate/rpc.py#L225 | 20:21 |
eandersson | I really want to remove this code someday | 20:21 |
nicolasbock | I love that naming ;) | 20:21 |
nicolasbock | Are you aware of other projects testing parallel API calls? | 20:21 |
eandersson | It might be worth asking in #openstack-qa maybe | 20:22 |
nicolasbock | True | 20:22 |
nicolasbock | Let me head over there and ask | 20:22 |
johnsom | Rally should be for some projects. | 20:23 |
eandersson | btw one of the reasons I think they moved away from incremental increases with Designate was because the serial value would get too high | 20:23 |
johnsom | Octavia also has a "beat the heck out of it" test tool for the API. I wrote it so I could test our DB locking. | 20:23 |
eandersson | We had to modify the schema at the time to bump the serial from uin32 to uint64 | 20:23 |
johnsom | Well, the protocol has a limit too | 20:24 |
eandersson | I used that a lot johnsom | 20:24 |
johnsom | I think it's uint32 | 20:24 |
nicolasbock | So how does a dynamic DNS provider deal with this issue eandersson ? | 20:24 |
eandersson | Well we can just bump it to a uint64 | 20:24 |
eandersson | Ideal would probably be to allow batch updates, but that is practically impossible to implement today :D | 20:25 |
johnsom | https://tools.ietf.org/html/rfc1035#section-3.3.13 | 20:26 |
johnsom | Serial is a uint32 | 20:26 |
eandersson | oh fun | 20:26 |
eandersson | Another approach we could do is move the increment to a periodic job :D | 20:26 |
nicolasbock | Right, just found the same as johnsom | 20:26 |
nicolasbock | We could batch zone updates | 20:27 |
nicolasbock | I guess just as you suggest eandersson | 20:27 |
nicolasbock | BIND9 can't really issue AXFRs that fast anyway | 20:27 |
nicolasbock | We already have the delayed NOTIFY setting | 20:27 |
nicolasbock | Batching could be folded into that option | 20:28 |
eandersson | Yea | 20:28 |
eandersson | That is probably the ideal | 20:28 |
eandersson | No reason to inc serial for each tiny little change | 20:29 |
eandersson | I was very confused when I first hit this bug, because I thought it was a worker / producer issue | 20:29 |
nicolasbock | Yes, it took me a while to find the problem | 20:29 |
eandersson | https://bugs.launchpad.net/designate/+bug/1768618 | 20:29 |
openstack | Launchpad bug 1768618 in Designate "Designate-Worker occasionally raises Bad Action" [Critical,Fix released] - Assigned to Erik Olof Gunnar Andersson (eandersson) | 20:29 |
eandersson | This was the initial bug I hit | 20:30 |
johnsom | The other is to let serial just increment. Everything *should* know how to handle a roll over | 20:30 |
nicolasbock | Ah, interesting | 20:30 |
johnsom | But, yeah, I don't see harm in some form of short batching. | 20:30 |
eandersson | Yea - and it should be partly implemented already | 20:31 |
nicolasbock | I'll have a look at that then | 20:31 |
eandersson | It would drastically improve performance as well central does not have to care about it. | 20:31 |
nicolasbock | I'll expand https://mxtoolbox.com/problem/dns/dns-soa-serial-number-format#:~:text=It%20has%20become%20common%20to,with%20the%20format%20of%20YYYYmmddss. | 20:31 |
nicolasbock | True | 20:31 |
eandersson | Central is such a train wreck and this code path is part of the problem | 20:31 |
nicolasbock | Haha, very diplomatically put | 20:32 |
eandersson | btw I wonder if maybe old Designate was using a signed int instead of unsigned | 20:32 |
eandersson | https://github.com/openstack/designate/blob/stable/stein/designate/backend/impl_powerdns/migrate_repo/versions/001_add_initial_schema.py#L30 | 20:32 |
johnsom | That would have been an....oversight | 20:32 |
eandersson | Not sure what Integer() equals to | 20:33 |
johnsom | Ha, and the changing tide of what sqlalchemy decided an "Integer" was for that backend.... | 20:33 |
nicolasbock | Oh did that change johnsom ? | 20:35 |
johnsom | Well, I don't know for sure, but things like that have been an issue for me in the past. | 20:36 |
nicolasbock | The documentation https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.Integer is a little unclear here | 20:36 |
eandersson | I have a hard time thinking we hit 4,294,967,295 :D | 20:36 |
nicolasbock | In Python2 there are two int types, no? | 20:37 |
nicolasbock | eandersson: Well I wouldn't put anything past an intrepid user :) | 20:37 |
nicolasbock | But that's a lot of updates | 20:37 |
nicolasbock | We could start the serial at 0 by the way | 20:38 |
nicolasbock | That would give us the full range | 20:38 |
johnsom | Yeah, we have what, 17 more years? | 20:38 |
eandersson | Yea - for new zones | 20:38 |
nicolasbock | I like to panic early johnsom :) lol | 20:38 |
johnsom | FYI, serial should start with 1. grin | 20:39 |
nicolasbock | Haha. Back in the day I wrote a lot of Fortran and C code and this always got me | 20:40 |
johnsom | eandersson So, when are you posting the RFC to bump the SOA SERIAL field to uint64? | 20:43 |
* johnsom has a nagging feeling people would be like, "just roll it over" | 20:44 | |
nicolasbock | lol | 20:44 |
nicolasbock | We should also at some point work on fixing ERROR conditions. I have a zone in ERROR. BIND9 doesn't have the zone anymore, but Designate has it in the DB. | 20:45 |
nicolasbock | There is no way to remove it without DB surgery | 20:45 |
nicolasbock | Octavia runs into something like that once in a while as well | 20:45 |
johnsom | Ah, but Octavia has two tools to fix ERROR: failover and delete | 20:46 |
nicolasbock | We could add a `--force` option to the CLI? | 20:46 |
nicolasbock | Right | 20:46 |
nicolasbock | True | 20:46 |
nicolasbock | We need something here in Designate | 20:46 |
johnsom | Now, I don't know the whole situation, but wouldn't https://docs.openstack.org/api-ref/dns/?expanded=abandon-zone-detail#abandon-zone work for you? | 20:47 |
johnsom | I.e. that is my disclaimer of "If you try this, you are on your own" | 20:47 |
nicolasbock | Wow, I had no idea this API existed :) | 20:47 |
nicolasbock | Haha | 20:47 |
nicolasbock | I will try that, but the description fits the bill | 20:48 |
johnsom | I have never used it | 20:48 |
eandersson | Never seen that api before lol | 20:51 |
nicolasbock | haha | 20:52 |
nicolasbock | I am going to try it out | 20:52 |
johnsom | Watch, it's documented but not implemented. lol | 20:53 |
nicolasbock | I am sure a user will find it at some point and blow up their Designate ;) | 20:53 |
johnsom | I *hope* it has Admin RBAC on it | 20:53 |
nicolasbock | OMG it works! | 20:53 |
nicolasbock | Well, good question johnsom | 20:53 |
eandersson | https://github.com/openstack/designate/blob/master/designate/api/v2/controllers/zones/tasks/abandon.py | 20:53 |
eandersson | Would probably be neat to add that to the cli | 20:54 |
nicolasbock | https://github.com/openstack/designate/blob/1ea6d44a51df01f5a15eca2d380c849b8e297941/designate/common/policies/zone.py#L57 | 20:55 |
nicolasbock | It's in the CLI | 20:55 |
nicolasbock | I was just using it | 20:55 |
eandersson | oh | 20:55 |
eandersson | haha | 20:55 |
nicolasbock | And yes, johnsom, it's admin | 20:55 |
nicolasbock | I had never noticed that command though eandersson | 20:55 |
eandersson | I have only touched that file once so that is probably why lol | 20:55 |
nicolasbock | https://github.com/openstack/python-designateclient/blob/e6e7b191a3b19a2af99fd2c8a860dea3083bc6b1/designateclient/v2/cli/zones.py#L260 | 20:56 |
nicolasbock | Haha | 20:57 |
* johnsom disappears to yet another meeting today | 20:58 | |
nicolasbock | o/ johnsom | 20:59 |
*** eandersson has quit IRC | 21:40 | |
*** gmann is now known as gmann_afk | 21:51 | |
openstackgerrit | Nicolas Bock proposed openstack/designate stable/stein: Adding distributed locking to central https://review.opendev.org/c/openstack/designate/+/776288 | 22:42 |
openstackgerrit | Luigi Toscano proposed openstack/designate stable/ussuri: Native Zuul v3 designate-grenade-pdns4 job https://review.opendev.org/c/openstack/designate/+/752754 | 23:17 |
*** gmann_afk is now known as gmann | 23:22 | |
openstackgerrit | Luigi Toscano proposed openstack/designate stable/train: WIP DNM more debug for devstack https://review.opendev.org/c/openstack/designate/+/776293 | 23:27 |
openstackgerrit | Luigi Toscano proposed openstack/designate stable/ussuri: WIP DNM more debug for devstack https://review.opendev.org/c/openstack/designate/+/776294 | 23:28 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!