__ministry | today, i also had meet above error about senlin-health-manager loading wrong about clusters make high load in keystone. | 01:34 |
---|---|---|
__ministry | I just run senlin-health-manager in 1 node. senlin-health-manager was auto loading cluster many times. | 01:35 |
eandersson | __ministry: What version ae you running | 01:53 |
eandersson | Also, do you see any SQL errors anywhere? | 01:53 |
eandersson | It does not have to be on the health manager | 01:53 |
__ministry | i install senlin from git with commit: f99412750cc068a302150c27074818f57bcdbdba | 01:56 |
__ministry | let me see, i will do query interval about register in sql. | 01:57 |
eandersson | So based on the logs it looks like it fails to report to the database for some reason | 02:08 |
eandersson | Something must be locking up the database so that the services cannot report their own health | 02:09 |
eandersson | Do you see anything like | 02:12 |
eandersson | > Breaking locks for dead engine | 02:12 |
eandersson | In the logs? probably in central | 02:12 |
eandersson | You may be able to fix this by increasing the periodic_interval in your config | 02:12 |
eandersson | https://docs.openstack.org/senlin/ocata/configuration.html#DEFAULT.periodic_interval | 02:13 |
eandersson | Maybe try to double it to give it more time to recover safely | 02:14 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: Removed two previously unused config options https://review.opendev.org/c/openstack/senlin/+/842020 | 02:20 |
__ministry | yep. I had edit code to logs db_registries and self.registries. let me follow it. | 02:48 |
eandersson | btw are you still running 32 workers? Might be worth trying to cut that down. | 02:58 |
eandersson | Each worker needs to update its health in the database | 02:58 |
eandersson | If you have 32 workers for all services times 3 that is a lot of updates | 02:59 |
__ministry | yep. i had run 32 workers in one node. | 03:00 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: Added protection against premature service cleanups https://review.opendev.org/c/openstack/senlin/+/842025 | 03:11 |
eandersson | I wonder if there are just too many health checks here. | 03:14 |
eandersson | Each worker will basically update its own health every 60 seconds, and if your database is under load maybe it's just falling behind | 03:14 |
eandersson | Another theory could be that the worker is too busy under load and not able to update its own health fast enough | 03:15 |
eandersson | Can you let me know if you see any logs from here | 03:16 |
eandersson | https://github.com/openstack/senlin/blob/3b0f21972be0bb067e8c4391a6b77aa8815a0ca2/senlin/conductor/service.py#L155 | 03:16 |
eandersson | The logs would be under the conductor | 03:16 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: Added protection against premature service cleanups https://review.opendev.org/c/openstack/senlin/+/842025 | 03:19 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: Added protection against premature service cleanups https://review.opendev.org/c/openstack/senlin/+/842025 | 03:21 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: Move service health update_at check to sql query https://review.opendev.org/c/openstack/senlin/+/842026 | 03:58 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: Added protection against premature service cleanups https://review.opendev.org/c/openstack/senlin/+/842025 | 03:59 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: Move service health update_at check to sql query https://review.opendev.org/c/openstack/senlin/+/842026 | 04:00 |
eandersson | dtruong: If you have some time over. The above are just some general ideas on improving things that could go "wrong". ^ | 04:00 |
eandersson | Actually. I misunderstood how that code worked. | 05:38 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning https://review.opendev.org/c/openstack/senlin/+/842025 | 06:05 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning https://review.opendev.org/c/openstack/senlin/+/842025 | 06:32 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning https://review.opendev.org/c/openstack/senlin/+/842025 | 07:06 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning https://review.opendev.org/c/openstack/senlin/+/842025 | 07:16 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning https://review.opendev.org/c/openstack/senlin/+/842025 | 07:19 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning https://review.opendev.org/c/openstack/senlin/+/842025 | 07:38 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning https://review.opendev.org/c/openstack/senlin/+/842025 | 07:46 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning https://review.opendev.org/c/openstack/senlin/+/842025 | 07:54 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: Fixed service manage not cleaning properly https://review.opendev.org/c/openstack/senlin/+/842025 | 08:34 |
eandersson | __ministry: Not sure if this fixes your issue, but was able to identify some issues with the current code. https://review.opendev.org/c/openstack/senlin/+/842025 | 08:36 |
opendevreview | Erik Olof Gunnar Andersson proposed openstack/senlin master: Fixed services not cleaning properly on startup https://review.opendev.org/c/openstack/senlin/+/842025 | 08:44 |
__ministry | yep. let me test this code | 10:02 |
eandersson | __ministry: Let me know if you run into any issues. I don't know if this will solve your issue, but I am pretty sure there is a bug with how we clean up dead services that this fix addresses. | 19:48 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!