__ministry | eandersson: it loaded with duplicate data of cluster, each cluster was loaded more time. I'm running with 3 controller and 32 worker for healthcheck each node. In a node, healthcheck process do load to 29 times with same cluster id. | 01:28 |
---|---|---|
dtruong | __ministry What detection mode are using for the health policy? We have seen problems with the NODE_STATUS_POLL_URL not scaling well to large number of nodes. | 05:01 |
eandersson | I suspect that this is a worker bug. Can you try to set the workers to the default 1 and see if the issue consists? | 06:10 |
eandersson | We run it with one worker, and I suspect that the CI only uses a single worker. | 06:11 |
eandersson | Actually the CI does use more than one worker. | 06:15 |
eandersson | __ministry can you add something like this and see if this gets triggered? https://paste.openstack.org/show/bAi27dkPPHyYdWviIoXm/ | 06:33 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: [DNM] Testing https://review.opendev.org/c/openstack/senlin/+/839473 | 06:39 |
eandersson | Also, can you confirm if this is the policy you are using? | 07:23 |
eandersson | https://github.com/openstack/senlin/blob/master/examples/policies/health_policy_poll.yaml | 07:23 |
eandersson | Unfortunately none of my theories checked out and not able to reproduce this. | 07:38 |
__ministry | eandersson: yep. I just use native above health policy. I don't get this error often, but I've seen it a few times, and this time I found the root cause. | 08:27 |
__ministry | dtruong: I used "NODE_STATUS_POLLING" | 08:28 |
__ministry | when a cluster was running registry healthcheck in a node 1, I has seen it auto run and load in node 2, and error was happend. | 08:30 |
__ministry | this is log when worker start register: https://pastebin.com/EfY3K1WR | 08:38 |
__ministry | it do register more time with policy: https://pastebin.com/a8LpaLF6 | 08:40 |
__ministry | above logs just a part of all log. | 08:41 |
__ministry | policy healthcheck: https://pastebin.com/xE1rHq8k | 08:45 |
eandersson | Interesting. I wonder why I wasn't able to reproduce this. Maybe I need to run two separate instances and not just more workers. | 20:23 |
eandersson | Any ideas dtruong? | 20:23 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!