opendevreview | Satish Patel proposed openstack/openstack-ansible-os_neutron master: Add CentOS-8-Stream OVN support https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/803798 | 03:34 |
---|---|---|
opendevreview | Satish Patel proposed openstack/openstack-ansible-os_neutron master: Add CentOS-8-Stream OVN support https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/803798 | 03:37 |
opendevreview | Satish Patel proposed openstack/openstack-ansible-os_neutron master: Add CentOS-8-Stream OVN support https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/803798 | 05:22 |
depasquale | ciao everyone. I am facing this error executing a stable/wallaby OSA https://paste.opendev.org/show/807941/ | 15:15 |
depasquale | it is something about os-ceilometer.yml playbook | 15:15 |
depasquale | any idea? | 15:15 |
admin1 | never seen this error .. my galera install on 22.2.0 fails on haproxy_endpoints : Set haproxy service state with a FileNotFound error.. logs here https://gist.githubusercontent.com/a1git/9895beefd1c8680b7dea311781fa1637/raw/ebe44836b87e74e544b51a8bc18ed55a025f12a4/gistfile1.txt | 16:59 |
admin1 | i have done a couple of 22.2.0 installs .. but this one is new to me, and i have no idea how to solve this or what even is wrong | 16:59 |
admin1 | any help .. pointers appreciated .. | 17:00 |
admin1 | rest all playbooks of setup-host and setup-infra ran just fine | 17:00 |
admin1 | i have deleted and re-created the galera containers a few times .. no help | 19:53 |
jrosser | depasquale: you need this fix on the ceilometer ansible role https://github.com/openstack/openstack-ansible-os_ceilometer/commit/87fdc3a17a211a3f896dc20c6090021bfa5c10ef | 20:36 |
jrosser | admin1: you have to look at the information it gives you about the stack trace in the error message | 20:47 |
jrosser | that points to here https://github.com/ansible-collections/community.general/blob/main/plugins/modules/net_tools/haproxy.py#L264 | 20:48 |
jrosser | so you can see that it’s failed when trying to open the haproxy unix socket inside the haproxy ansible module | 20:48 |
jrosser | that suggests that the socket isn’t there, which in turn would make me think haproxy has failed to start properly | 20:49 |
jrosser | so start with the haproxy journal | 20:49 |
jrosser | I don’t really think it’s anything to do with galera at all | 20:50 |
admin1 | jrosser, thanks for replying this late | 20:56 |
admin1 | i mean on a weekend | 20:56 |
admin1 | the haproxy playbooks run well, haproxy seems to be working | 20:56 |
admin1 | wouldn't the playbooks fix this ? | 20:57 |
jrosser | a bunch of the other roles interact with haproxy, setting backends as active/not active as needed | 20:57 |
jrosser | that’s what’s failing | 20:57 |
admin1 | ok | 20:57 |
jrosser | in this case it’s in the galera role, but the failed tasks are delegated to the haproxy node | 20:58 |
jrosser | it needs to be able to acces the socket to communicate with it | 20:58 |
jrosser | and I am thinking that is what’s giving you the “file not found” error | 20:58 |
jrosser | so either haproxy is broken, or the path. to the socket is incorrect for some reason | 20:59 |
admin1 | the socket will be in the active haproxy server only right ? | 20:59 |
jrosser | no I think they all have one | 20:59 |
jrosser | haproxy itself doesn’t know if it’s the active one or not | 21:00 |
admin1 | oh yes .. its in all | 21:00 |
admin1 | hmm.. my external LB where cloud.domain.com points to , is only active in 1 node, | 21:01 |
admin1 | so in that case, haproxy is only active in 1 node | 21:01 |
admin1 | in the rest, it does not bind | 21:01 |
admin1 | coz the ip does not exist | 21:01 |
admin1 | i see .. | 21:02 |
admin1 | its up in one and down in another one ( both non master ) | 21:02 |
admin1 | i mean in the servers where the bind ip is not present by keepalive | 21:02 |
admin1 | ok .. so in my case, the external bind is on c2 . and in c3 the haproxy is running fine ,but it was down on c1 .. and i think the playbook tried to use the one in c1 .. | 21:04 |
admin1 | one more question .. why does this come "galera_server : Fail if galera_cluster_name doesnt match provided value" -- when we re-run galera role .. with zero changes ? | 21:04 |
jrosser | it’s not really to do with the ip binding | 21:05 |
admin1 | i understood that now | 21:05 |
jrosser | it’s a unix domain socket (looks like a file) | 21:05 |
jrosser | I think your deleting the galera containers and recreating has confused the state | 21:06 |
jrosser | there is data written to the nodes I think which says if the cluster is bootstrapped - that will have been done for the first deployment | 21:07 |
jrosser | but if you delete them all then the state is wrong, the expectation in a cloud is that once bootstrapped you try to keep the cluster valid | 21:08 |
jrosser | there are some vars in the galera role to force a re-bootstrap which might help | 21:09 |
admin1 | my thought was deleting all the containers and redoing it again was equivalent to doing a fresh install | 21:10 |
admin1 | i mean all the "galera" containers | 21:10 |
jrosser | it may be | 21:11 |
jrosser | the bootstrap flag is a fact | 21:11 |
jrosser | which may get cached..... blah blah blah | 21:11 |
admin1 | this time i rm -rf the facts :) | 21:11 |
admin1 | before running the galera playbook | 21:11 |
admin1 | hopefully this works this time | 21:11 |
jrosser | fingers crossed - good luck :) | 21:12 |
admin1 | rest of the infra playbooks ran fine .. was only stuck on this | 21:12 |
admin1 | it passed the "RUNNING HANDLER [haproxy_endpoints : Set haproxy service state] " step .. and no errors :) | 21:14 |
admin1 | thank you jrosser .. you rock \o/ | 21:14 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!