opendevreview | Merged openstack/openstack-ansible-rabbitmq_server master: Remove old repos for Debian https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/827221 | 02:03 |
---|---|---|
opendevreview | Merged openstack/openstack-ansible-rabbitmq_server master: Use journald logging for RabbitMQ https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/826345 | 02:10 |
jrosser | noonedeadpunk: re. 828932 i am unsure, first it would listen on the external vip with the default hosts layout? and also https://opendev.org/openstack/openstack-manuals/src/branch/master/doc/install-guide/source/shared/edit_hosts_file.txt#L22-L25 | 09:29 |
jrosser | 127.0.1.1 resolving to the hostname feels just wrong anyway? | 09:29 |
mirek186 | hi, could someone help troubleshooting lost connection in mysql | 12:47 |
mirek186 | random connection to DB via haproxy will fail and then deployment will fail as well | 12:48 |
mirek186 | mysql --defaults-file=/root/.my.cnf | 12:48 |
mirek186 | ERROR 2013 (HY000): Lost connection to server at 'handshake: reading initial communication packet', system error: 11 | 12:48 |
mirek186 | Feb 12 12:14:48 infra1-utility-container-c5a86fc4 ansible-community.mysql.mysql_db[5955]: Invoked with name=['magnum_service'] login_host=172.16.71.20 login_port=3306 config_file=/root/.my.cnf connect_timeout=30 encoding= collation= stervice'] login_host=172.16.71.20 login_port=3306 config_file=/root/.my.cnf connect_timeout=30 encoding= collation= state=present single_transaction=False quick=True ignore_tables=[] hex_blob=False force=False | 12:49 |
mirek186 | master_data=0 skip_lock_ta>te=present single_transaction=False quick=True ignore_tables=[] hex_blob=False force=False master_data=0 skip_lock_tables=False use_shell=False unsafe_login_password=NOT_LOGGING_PARAMETER restrict_config_file=False check_implicit_admin>les=False use_shell=False unsafe_login_password=NOT_LOGGING_PARAMETER restrict_config_file=False check_implicit_admin=False config_overrides_defaults=False login_user=None login_ | 12:49 |
mirek186 | password=NOT_LOGGING_PARAMETER login_unix_socket=None cli>False config_overrides_defaults=False login_user=None login_password=NOT_LOGGING_PARAMETER login_unix_socket=None client_cert=None client_key=None ca_cert=None check_hostname=None target=None dump_extra_args=None | 12:49 |
mirek186 | I know issue is around haproxy or galera, just don't know where to start identifing the issue | 12:49 |
jrosser | mirek186: first thing I would check is that you are not exceeding the galera max connections limit | 14:01 |
jrosser | also please use paste.opendev.org for debug output | 14:02 |
jrosser | if you’re enabling more services you might need to increase the connection limit | 14:03 |
mirek186 | thanks jrosser, I think that's the issue, looking at haproxy log: Feb 12 14:08:50 srv4-infra-1 haproxy[138066]: Connect() failed for backend galera-back: no free ports. | 14:09 |
jrosser | oh well that feels different maybe | 14:26 |
jrosser | there is a hard limit in the galera config for connections | 14:27 |
jrosser | no free ports suggests running out of ports on the haproxy node to connect to the galera backend | 14:27 |
mirek186 | I found following recommendation from haproxy blog: https://www.haproxy.com/blog/haproxy-high-mysql-request-rate-and-tcp-source-port-exhaustion/ | 14:28 |
mirek186 | to alow for quicker reuse of time_wait ports, plus expand src port, default anything above 32k but they recommend in busy env to do anything above reserved 1024 | 14:29 |
jrosser | oh well….. | 14:29 |
mirek186 | i'll re-run the deploment see what happen | 14:29 |
jrosser | the openstack services should use a connection pool for the db | 14:29 |
jrosser | so you should not see a heavy churn of db connections at haproxy | 14:29 |
mirek186 | It's my first time deploying using openstack-ansible so just trying to fix one error at the time | 14:30 |
jrosser | however, hitting the galera backend connection limit may make haproxy think the backend is down, and that can cause an instant 2x requirement on ports as they fail over to another backend | 14:31 |
jrosser | I would still double check the # connections that galera thinks it has | 14:31 |
mirek186 | I had it set to 4096 | 14:32 |
jrosser | did you get things working ok with the core services before moving on to extras like magnum? | 14:35 |
mirek186 | yes, it seams ok, however as I said those are my first deployments using ansible | 14:37 |
jrosser | ok cool | 14:37 |
mirek186 | In the past all my builds where done using Juju, so I haven't checked all services. Trying to get clean install of all components I need | 14:37 |
jrosser | does haproxy think your galera backend goes down? | 14:39 |
jrosser | anyway - I need to weekend :) make a bug on launchpad if you’re really stuck | 14:40 |
jrosser | or generally things are quite active here EU time on weekdays | 14:41 |
admin1 | mirek186, try this in user_variables: galera_max_connections: 4000 | 14:46 |
admin1 | or 1000 .. or 6000 . depends on how big the total cluster is | 14:47 |
mirek186 | thanks guys, I alrady had it on galera_max_connections: 4096 | 14:50 |
mirek186 | just wiped out all hosts for redeployment. I've added the following two as recomended by haproxy blog as well. | 14:51 |
mirek186 | openstack_user_kernel_options: | 14:51 |
mirek186 | - { key: 'net.ipv4.ip_local_port_range', value: '1025 65000' } | 14:51 |
mirek186 | - { key: 'net.ipv4.tcp_tw_reuse', value: 1 } | 14:51 |
mirek186 | I also had deployment using --forks 10, maybe I do final setup-openstack on defaults I won't hit any limits. Looking at hatop all galera backends were fine | 14:52 |
admin1 | upgrading from 23 -> 24.0.1 i am getting: virtualenv --no-download --python=python3 --always-copy /openstack/venvs/keystone-24.0.1", "msg": "[Errno 2] No such file or directory: b'virtualenv'", "rc": 2, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} | 15:13 |
admin1 | doh ! | 15:13 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!