Thursday, 2024-08-08

noonedeadpunk	platta: ok, so to literally reproduce the error, you can attach to the utility container (ie `lxc-attach -n ark-utility-container-e8c00157`) and run `mysql` command	06:44
noonedeadpunk	I don't think it's a networking issue, given that haproxy considers galera containers as alive	06:44
noonedeadpunk	do you see anything in mariadb logs?	06:45
noonedeadpunk	also it could be some kind of tls mess as well - like when client wants to connect through TLS but mariadb configured without tls support	06:46
noonedeadpunk	so if you played some vars related to tls - it could be related to that	06:47
grauzikas	Hello, finally yesterday after using git clone -b 2024.1 i got it installed. with master there was errors. after installing i instantly started implementing ceph in to it and again issues :). i have found in manuals https://docs.openstack.org/openstack-ansible-ceph_client/latest/config-from-file.html so i should create directory /etc/openstack_deploy/ceph-keyrings/ keys and name it like this: admin.keyring	07:03
grauzikas	cinder-backup.keyring cinder.keyring glance.keyring manila.keyring nova.keyring, but it is unclear about ceph.conf file and what playbooks should be reruned. i tryed yesterday rerun setup-openstack.yml, but seems not worked. https://pastebin.com/fVjZ5qN9 . as you can see i commented our keyring lines for now… also i have networks management where all ips is nated to get access to internet, ceph client network and	07:03
grauzikas	ceph cluster network. as i understand storage_hosts there i should use ceph clients ip addresses and storage-infra_hosts there use management ips?	07:03
noonedeadpunk	grauzikas: I usually place ceph.conf in the same folder and then add to vars smth like that: `ceph_conf_file: "{{ lookup('file', ceph_keyrings_dir ~ '/ceph.conf') }}"`	07:05
jrosser	grauzikas: master is the development (unreleased) branch which will form the next release - don't use that unless you are working on code for the next release	07:06
noonedeadpunk	also, storage_hosts is where cinder-volume is going to spawn https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/env.d/cinder.yml#L38-L43	07:06
noonedeadpunk	I'm not sure if that what you wanted or not	07:07
noonedeadpunk	as to spawn ceph on these hosts using ceph-ansible you'd need to define ceph-mon_hosts and ceph-osd_hosts at very least	07:08
jrosser	i think also be sure to read the documentation / release notes to manage expectations about ceph deployed with OSA	07:08
grauzikas	globaly or for every service like rbd_ceph_conf, cinder_backup_ceph_conf, glance_rbd_store_ceph_conf and so on or enough one line: ceph_conf_file: "{{ lookup('file', ceph_keyrings_dir ~ '/ceph.conf') }}"	07:10
noonedeadpunk	ceph.conf usually is very minimalistic and having just monitors and cluster uuid - so it's kind of same for glance/nova/cinder	07:12
jrosser	why use config-from-file if osa/ceph_ansible is deploying the ceph cluster?	07:15
jrosser	does that make sense at all?	07:15
grauzikas	i mean what version should i use? https://pastebin.com/RZKt4A5i	07:19
grauzikas	jrosser i was reading manuals and somewhere found that ceph is not part of osa and i should deploy it manually	07:20
grauzikas	thats why i deployed ceph manually seperated from osa	07:20
jrosser	well thats not entirely true	07:20
grauzikas	yes yesterday i found ceph playbooks :) so was confused :)	07:20
jrosser	https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html	07:21
jrosser	openstack-ansible will call out to ceph-ansible to deploy ceph if you wish	07:21
jrosser	but there is a big red warning on that page that we don't test upgrades for that	07:21
jrosser	if you deployed your own ceph cluster then i would question needing storage_hosts?	07:23
grauzikas	yes storage hosts is my mistake :)	07:23
jrosser	the need for config-from-file for an externally deployed ceph cluster is to cover the use case where the OSA ceph_client role cannot ssh to the ceph monitor to retrieve they config	07:24
jrosser	this typically occurs in deplyments where there is a "cloud team" and a "storage team" and organisational boundaries prevent that ssh access being possible	07:24
jrosser	if you look here https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html#integration-with-ceph	07:25
jrosser	there are 3 bullet points describing 3 different scenarios under which you might need to connect to a ceph cluster	07:26
noonedeadpunk	jrosser: config-from-file if osa/ceph_ansible -> that depends. If you wanna have a separate ceph cluster per AZ - that's the only way kinda	07:26
jrosser	yeah	07:27
jrosser	but this is a pretty good example of how OSA is a toolbox and you need to pick/choose how you want it to work	07:27
noonedeadpunk	as then you can do `ceph_cluster_name: "ceph-{{ az_name }}"` and `ceph_keyrings_dir: "/etc/openstack_deploy/ceph/{{ ceph_cluster_name }}"`	07:27
grauzikas	ok thanks, probably will try to use osa and its ceph ansible implementation… so now need to destroy current cluster :)	07:47
grauzikas	btw about logs…	07:48
grauzikas	now i can see logs only with journalctl -u service and inside lxc i cant find logs in a file https://pastebin.com/92bhD4rd	07:50
grauzikas	in configs i can see for example same glance api : https://pastebin.com/A4prQ3Qc	07:52
grauzikas	no logs enabled	07:52
grauzikas	i was looking for centralized logs, then went to them github and found that it is not anymore supported	07:54
grauzikas	and i was not able to find logging playbook	07:54
jrosser	grauzikas: all logging goes to systemd journal	07:55
jrosser	we have tried to migrate everything away from log files to the journal	07:58
noonedeadpunk	grauzikas: well, our suggestion usually different - have an independent ceph cluster from OSA :D	07:59
noonedeadpunk	so I think you're on right track now with having a standalone ceph	07:59
jrosser	grauzikas: for centalised logging we do have support for journald-remote https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/setup-infrastructure.yml#L60	08:00
jrosser	though most people overlay their own exsiting logging on top of openstack-ansible, with whichever collector they favour	08:01
jrosser	for example, elastic stack journal collector is able to get the journal entries for all the container just by installing it on the host	08:01
jrosser	but we don't provide any of this built in by default, as operators generally have their own individual preference for log collection	08:02
grauzikas	ok thanks	08:04
jrosser	a bunch of the components have easy ways to turn on prometheus exporters so you can integrate with metrics collection	08:05
grauzikas	in case of ceph integration if i remove storage_hosts section syntax-check gives error so probably i should leave it, but without any hosts?	08:37
noonedeadpunk	we're using vector for kind of same purpose and it can deal with journals nicely as well	08:37
noonedeadpunk	and then there was also journal-to-gelf for graylog..	08:38
noonedeadpunk	but not sure if that's really maintained today....	08:38
grauzikas	for example if i will want to give a try to install ceph by in OSA integrated ansible ceph how to tell in OSA public_network and cluster_network of ceph?	09:23
grauzikas	probably public_network automatically will be taken on what ceph-mon and ceph-osd	09:24
noonedeadpunk	same as with plain ceph-ansible - define vars public_network and cluster_network	09:25
noonedeadpunk	but then you'll need to pass cluster_network to containers (in case you're going LXC path)	09:25
grauzikas	isnt cluster network used only for sync betwean ceph nodes and client to connect all apis and so on?	09:27
noonedeadpunk	smth like that https://paste.openstack.org/show/buNvku6lKxeus4K9qOW2/	09:28
noonedeadpunk	cluster used to sync between nodes, yes	09:28
noonedeadpunk	and public used by client to connect to osds and mons	09:29
noonedeadpunk	cluster is kind of optional and can be the same as public one	09:29
noonedeadpunk	but if you wanna split them between different interfaces - that might make sense as well	09:29
grauzikas	its simply to have posibility to use slower nics like 10G and not overload them with internal syncs	09:34
noonedeadpunk	well, I would even say to avoid issues with internal operations due to jammed throughput...	09:54
noonedeadpunk	but yeah, you're kind right	09:55
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Verify OS for containers installation https://review.opendev.org/c/openstack/openstack-ansible/+/925974	12:00
platta	noonedeadpunk: I was able to run the same `mysql` command pointing directly at the galera container successfully. So, I think you're right. The network connectivity is fine, but there's something happening negotiating the connection through HAProxy. I don't see anything in the MariaDB logs, but I do see the HAProxy logs show the attempted	12:28
platta	connection and a termination state of SD, which looks like it means there was an error.	12:28
platta	I'm going to go through my configuration again to see what might be getting in the way. The majority of my config is pulled from what AIO had, so I have to admit I don't understand all of the settings I'm applying.	12:29
noonedeadpunk	so haproxy does quite stupid L4 balancing	12:31
noonedeadpunk	but also haproxy does check backends based on the service running on 9100 (or smth like that)	12:31
noonedeadpunk	and that service is quite strict on haproxy source IP	12:32
noonedeadpunk	so it expects haproxy to talks with them not through internal VIP but through it's management address	12:32
noonedeadpunk	there's a quite widespread mistake to add internal keepalived cidr as /24 instead of /32 which makes haproxy to talk through the wrong src IP	12:33
platta	I don't think I have any settings related to keepalived, the comments made it seem optional.	12:38
noonedeadpunk	it also depends how much hosts you have as keepalived is enabled only when there're more then 1 host with haproxy	12:43
platta	Ah, and I do have a single host. Originally I left HAProxy out entirely because of that, but then I got other errors.	12:44
platta	https://opendev.org/openstack/openstack-ansible/commit/0f521b5d6d848761d5887389a067bc37bc3909ea Wondering if this could be it. My settings have it set to 0.0.0.0/0, which I would think should be wide open, but maybe I need to explicitly specify the internal load balancer ip of haproxy.	12:51
noonedeadpunk	well, I'd guess that then haproxy would mark backend as down	13:04
platta	Ah, good point. I'm looking for ways to up the logging so I can get more insight into where the failure is.	13:18
noonedeadpunk	but are you sure that haproxy shows it as healthy?	13:27
noonedeadpunk	can you share the output of "echo 'show stat' \| nc -U /run/haproxy.stat \| grep galera"?	13:28
platta	galera-front-1,FRONTEND,,,0,2,4096,4,108,540,0,0,0,,,,,OPEN,,,,,,,,,1,6,0,,,,0,0,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,tcp,,0,1,4,,0,0,0,,,,,,,,,,,0,,,,,,,,,-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,	14:11
platta	galera-back,ark-galera-container-d769107e,0,0,0,1,,3,108,540,,0,,0,3,0,0,UP,1,1,0,3,1,140,4213,,1,7,1,,3,,2,0,,1,L7OK,200,95,,,,,,,,,,,0,3,,,,,4,,,0,0,0,0,,,,Layer7 check passed,,3,3,5,,,,172.29.236.166:3306,,tcp,,,,,,,,0,3,0,,,0,,0,0,0,0,0,0,0,0,1,1,,,,-,0,0,0,,,,,,,,,,,,,,,,,,,,,,	14:11
platta	galera-back,BACKEND,0,0,0,1,410,4,108,540,0,0,,1,3,0,0,UP,1,1,0,,1,140,4213,,1,7,0,,3,,1,0,,1,,,,,,,,,,,,,,0,3,0,0,0,0,4,,,0,0,0,0,,,,,,,,,,,,,,tcp,leastconn,,,,,,,0,3,0,,,,,0,0,0,0,0,,,,,1,0,0,0,-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3680,3680,434372,196168,0,0,	14:11
platta	Oh my, sorry I didn't realize how much text that was, I would've thrown it in pastern.	14:11
platta	*pastebin	14:11
noonedeadpunk	huh, yeah, looks healthy	14:14
noonedeadpunk	and do you have some `ssl` settings in /root/.my.cnf inside utility container?	14:14
noonedeadpunk	would it help commenting these out?	14:14
platta	I was able to increase the verbosity of MariaDB's logging and got this: [Warning] Aborted connection 6407 to db: 'unconnected' user: 'unauthenticated' host: '172.29.236.100' (This connection closed normally without authentication)	14:19
platta	So, the connectivity is fine, I would assume even the SSL and TLS portions, but MariaDB is rejecting the authentication.	14:20
noonedeadpunk	hm	14:20
noonedeadpunk	that sounds like smth we were fighting on master	14:21
noonedeadpunk	and that was related to certificate verification failure	14:21
platta	I'm curious about that IP, my config specifies the internal load balancer IP should be 172.29.236.101	14:21
noonedeadpunk	https://jira.mariadb.org/browse/CONC-712	14:22
noonedeadpunk	yeah	14:22
noonedeadpunk	but it's LB IP	14:22
noonedeadpunk	but the host should access mariadb by it's management address I guess	14:23
noonedeadpunk	(I can be mixing up things)	14:23
platta	br-mgmt is set up to have two IPs, 100 and 101, in my config.	14:23
noonedeadpunk	but this really sounds like some SSL verification issue	14:23
noonedeadpunk	which can be the case if you're trying to access via unexpected IP	14:23
platta	I'll take a look at that ticket and see what I can find out. I'd be happy if I could end up helping resolve a known issue!	14:25
noonedeadpunk	well, that's an issue with mariadb 11.4	14:26
noonedeadpunk	where they enforce usage of TLS for socket connections	14:27
noonedeadpunk	but it also errors with `[Warning] Aborted connection 11 to db: 'unconnected' user: 'unauthenticated' host: 'localhost' (This connection closed normally without authentication)`	14:27
noonedeadpunk	so I'd really try to connect to mariadb without tls from utility container first	14:27
noonedeadpunk	and check from which IP connections coming to mariadb after all. and if this IP is in cert SAN	14:28
platta	Oh wow, it’s been ages since I’ve thought about cert SANs. I’ll work on checking all those things. Thank you.	14:30
noonedeadpunk	platta: so that's the variable which defines what certificate will be generated for mariadb: https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L242	14:33
noonedeadpunk	and galera_address is internal_lb_vip_address	14:34
noonedeadpunk	https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/infra.yml#L51	14:34
noonedeadpunk	so yeah, your concern about vip being 172.29.236.101 and connection happening from 172.29.236.100 is valid	14:35
noonedeadpunk	platta: ^	14:41
platta	noonedeadpunk: I'm not seeing the error described in the bug report when I try to reproduce, so I'm thinking it's the vip discrepancy. The only reason I have two IPs defined is because that's what was in either AIO or sample configs. With a single node cloud, there doesn't seem to be much of a reason to assign two IPs, is there?	14:55
noonedeadpunk	so on multinode deployment, you still have a management IP and then a VIP managed by keepalived	14:57
noonedeadpunk	but this VIP is supposed to be added as /32 alias to the interface	14:58
noonedeadpunk	but I somehow getting a bit confused right now...	14:59
platta	This may be my mistake, I think. The dual IP config I have came from AIO: https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/prepare_networking.yml. I think in at least one other place in my config, I'm referencing 101 when I should be referencing 100.	15:01
platta	Yep, "internal_lb_vip_address: 172.29.236.101"	15:02
platta	I'm rather annoyed with myself now for missing that. If I change the config and re-run the playbooks should that repair it?	15:03
noonedeadpunk	frankly - for vip address I'd suggest just adding keepalived to mimic multinode env, as this ip is not supposed to be added by hands anyway	15:03
noonedeadpunk	well. you might need to run with `-e pki_regen_cert=true`	15:04
platta	Ok. I will try both ways to see how everything behaves.	15:05
noonedeadpunk	you can set `haproxy_use_keepalived: true` to force having keepalived	15:06
platta	And I will use the other configuration options from the comments in user_variables.yml	15:08
jrosser	platta: the 100 / 101 thing on an AIO is becasue in some situations the backend is on (lets say) port 5000 and needs to bind to an IP - the internal vip also wants to bind to port 5000 so the VIP and backends have to use unique IP/port combinarions	15:57
jrosser	this is not so relevant for an LXC based deployment, but is very much an issue for deployments without containers where everything is collapsed onto the host	15:58
jrosser	thats why there is the distinction between 100 and 101 addresses in the AIO config	15:58
jrosser	one is "things bound on this host" the other is "things bound on the internal endpoint"	15:59
jrosser	just happens that for an AIO those are in the same place	15:59
platta	Ah, that makes sense.	16:09
platta	Trying with the `-e pki_regen_cert=true` option without keepalived settings didn't work. Forcing keepalived now. Can you sanity check the settings I'm using? external_vip_cidr: physical IP of the machine, /32. internal_vip_cidr: 172.29.236.100/32. external_interface: physical device (not sure if it should be that or br-vlan). internal_interface:	16:14
platta	br-mgmt.	16:14
platta	The physical NIC has no IP, br-vlan does. I'm still trying to grow my understanding of how all the virtual network/bridging concepts work.	16:15
noonedeadpunk	well. keepalived won't fix anything kind of	16:18
platta	Ah, ok. I'm going to re-image the server and start fresh with the updated config just to be sure I haven't mixed something up during all my troubleshooting.	16:20
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Remove excessive bindings for uWSGI https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/924945	20:54
opendevreview	Merged openstack/openstack-ansible master: Use haproxy_endpoint_manage role from osa collection rather than common-tasks https://review.opendev.org/c/openstack/openstack-ansible/+/923368	20:58
platta	No luck. I tried a few different configurations, re-imaging in between. Maybe there's still something I'm missing that needs to be changed or removed. Here's my openstack_user_config and user_variables. If someone could give me a fresh set of eyes on it, I'd appreciate it: https://pastebin.com/LNVQ3v9v	21:28
opendevreview	Merged openstack/openstack-ansible-os_neutron stable/2023.1: Correct 'neutron-policy-override' tag https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/925735	23:08

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!