Saturday, 2021-08-07

opendevreview	Satish Patel proposed openstack/openstack-ansible-os_neutron master: Add CentOS-8-Stream OVN support https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/803798	03:34
opendevreview	Satish Patel proposed openstack/openstack-ansible-os_neutron master: Add CentOS-8-Stream OVN support https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/803798	03:37
opendevreview	Satish Patel proposed openstack/openstack-ansible-os_neutron master: Add CentOS-8-Stream OVN support https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/803798	05:22
depasquale	ciao everyone. I am facing this error executing a stable/wallaby OSA https://paste.opendev.org/show/807941/	15:15
depasquale	it is something about os-ceilometer.yml playbook	15:15
depasquale	any idea?	15:15
admin1	never seen this error .. my galera install on 22.2.0 fails on haproxy_endpoints : Set haproxy service state with a FileNotFound error.. logs here https://gist.githubusercontent.com/a1git/9895beefd1c8680b7dea311781fa1637/raw/ebe44836b87e74e544b51a8bc18ed55a025f12a4/gistfile1.txt	16:59
admin1	i have done a couple of 22.2.0 installs .. but this one is new to me, and i have no idea how to solve this or what even is wrong	16:59
admin1	any help .. pointers appreciated ..	17:00
admin1	rest all playbooks of setup-host and setup-infra ran just fine	17:00
admin1	i have deleted and re-created the galera containers a few times .. no help	19:53
jrosser	depasquale: you need this fix on the ceilometer ansible role https://github.com/openstack/openstack-ansible-os_ceilometer/commit/87fdc3a17a211a3f896dc20c6090021bfa5c10ef	20:36
jrosser	admin1: you have to look at the information it gives you about the stack trace in the error message	20:47
jrosser	that points to here https://github.com/ansible-collections/community.general/blob/main/plugins/modules/net_tools/haproxy.py#L264	20:48
jrosser	so you can see that it’s failed when trying to open the haproxy unix socket inside the haproxy ansible module	20:48
jrosser	that suggests that the socket isn’t there, which in turn would make me think haproxy has failed to start properly	20:49
jrosser	so start with the haproxy journal	20:49
jrosser	I don’t really think it’s anything to do with galera at all	20:50
admin1	jrosser, thanks for replying this late	20:56
admin1	i mean on a weekend	20:56
admin1	the haproxy playbooks run well, haproxy seems to be working	20:56
admin1	wouldn't the playbooks fix this ?	20:57
jrosser	a bunch of the other roles interact with haproxy, setting backends as active/not active as needed	20:57
jrosser	that’s what’s failing	20:57
admin1	ok	20:57
jrosser	in this case it’s in the galera role, but the failed tasks are delegated to the haproxy node	20:58
jrosser	it needs to be able to acces the socket to communicate with it	20:58
jrosser	and I am thinking that is what’s giving you the “file not found” error	20:58
jrosser	so either haproxy is broken, or the path. to the socket is incorrect for some reason	20:59
admin1	the socket will be in the active haproxy server only right ?	20:59
jrosser	no I think they all have one	20:59
jrosser	haproxy itself doesn’t know if it’s the active one or not	21:00
admin1	oh yes .. its in all	21:00
admin1	hmm.. my external LB where cloud.domain.com points to , is only active in 1 node,	21:01
admin1	so in that case, haproxy is only active in 1 node	21:01
admin1	in the rest, it does not bind	21:01
admin1	coz the ip does not exist	21:01
admin1	i see ..	21:02
admin1	its up in one and down in another one ( both non master )	21:02
admin1	i mean in the servers where the bind ip is not present by keepalive	21:02
admin1	ok .. so in my case, the external bind is on c2 . and in c3 the haproxy is running fine ,but it was down on c1 .. and i think the playbook tried to use the one in c1 ..	21:04
admin1	one more question .. why does this come "galera_server : Fail if galera_cluster_name doesnt match provided value" -- when we re-run galera role .. with zero changes ?	21:04
jrosser	it’s not really to do with the ip binding	21:05
admin1	i understood that now	21:05
jrosser	it’s a unix domain socket (looks like a file)	21:05
jrosser	I think your deleting the galera containers and recreating has confused the state	21:06
jrosser	there is data written to the nodes I think which says if the cluster is bootstrapped - that will have been done for the first deployment	21:07
jrosser	but if you delete them all then the state is wrong, the expectation in a cloud is that once bootstrapped you try to keep the cluster valid	21:08
jrosser	there are some vars in the galera role to force a re-bootstrap which might help	21:09
admin1	my thought was deleting all the containers and redoing it again was equivalent to doing a fresh install	21:10
admin1	i mean all the "galera" containers	21:10
jrosser	it may be	21:11
jrosser	the bootstrap flag is a fact	21:11
jrosser	which may get cached..... blah blah blah	21:11
admin1	this time i rm -rf the facts :)	21:11
admin1	before running the galera playbook	21:11
admin1	hopefully this works this time	21:11
jrosser	fingers crossed - good luck :)	21:12
admin1	rest of the infra playbooks ran fine .. was only stuck on this	21:12
admin1	it passed the "RUNNING HANDLER [haproxy_endpoints : Set haproxy service state] " step .. and no errors :)	21:14
admin1	thank you jrosser .. you rock \o/	21:14

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!