noonedeadpunk | platta: ok, so to literally reproduce the error, you can attach to the utility container (ie `lxc-attach -n ark-utility-container-e8c00157`) and run `mysql` command | 06:44 |
---|---|---|
noonedeadpunk | I don't think it's a networking issue, given that haproxy considers galera containers as alive | 06:44 |
noonedeadpunk | do you see anything in mariadb logs? | 06:45 |
noonedeadpunk | also it could be some kind of tls mess as well - like when client wants to connect through TLS but mariadb configured without tls support | 06:46 |
noonedeadpunk | so if you played some vars related to tls - it could be related to that | 06:47 |
grauzikas | Hello, finally yesterday after using git clone -b 2024.1 i got it installed. with master there was errors. after installing i instantly started implementing ceph in to it and again issues :). i have found in manuals https://docs.openstack.org/openstack-ansible-ceph_client/latest/config-from-file.html so i should create directory /etc/openstack_deploy/ceph-keyrings/ keys and name it like this: admin.keyring | 07:03 |
grauzikas | cinder-backup.keyring cinder.keyring glance.keyring manila.keyring nova.keyring, but it is unclear about ceph.conf file and what playbooks should be reruned. i tryed yesterday rerun setup-openstack.yml, but seems not worked. https://pastebin.com/fVjZ5qN9 . as you can see i commented our keyring lines for now… also i have networks management where all ips is nated to get access to internet, ceph client network and | 07:03 |
grauzikas | ceph cluster network. as i understand storage_hosts there i should use ceph clients ip addresses and storage-infra_hosts there use management ips? | 07:03 |
noonedeadpunk | grauzikas: I usually place ceph.conf in the same folder and then add to vars smth like that: `ceph_conf_file: "{{ lookup('file', ceph_keyrings_dir ~ '/ceph.conf') }}"` | 07:05 |
jrosser | grauzikas: master is the development (unreleased) branch which will form the next release - don't use that unless you are working on code for the next release | 07:06 |
noonedeadpunk | also, storage_hosts is where cinder-volume is going to spawn https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/env.d/cinder.yml#L38-L43 | 07:06 |
noonedeadpunk | I'm not sure if that what you wanted or not | 07:07 |
noonedeadpunk | as to spawn ceph on these hosts using ceph-ansible you'd need to define ceph-mon_hosts and ceph-osd_hosts at very least | 07:08 |
jrosser | i think also be sure to read the documentation / release notes to manage expectations about ceph deployed with OSA | 07:08 |
grauzikas | globaly or for every service like rbd_ceph_conf, cinder_backup_ceph_conf, glance_rbd_store_ceph_conf and so on or enough one line: ceph_conf_file: "{{ lookup('file', ceph_keyrings_dir ~ '/ceph.conf') }}" | 07:10 |
noonedeadpunk | ceph.conf usually is very minimalistic and having just monitors and cluster uuid - so it's kind of same for glance/nova/cinder | 07:12 |
jrosser | why use config-from-file if osa/ceph_ansible is deploying the ceph cluster? | 07:15 |
jrosser | does that make sense at all? | 07:15 |
grauzikas | i mean what version should i use? https://pastebin.com/RZKt4A5i | 07:19 |
grauzikas | jrosser i was reading manuals and somewhere found that ceph is not part of osa and i should deploy it manually | 07:20 |
grauzikas | thats why i deployed ceph manually seperated from osa | 07:20 |
jrosser | well thats not entirely true | 07:20 |
grauzikas | yes yesterday i found ceph playbooks :) so was confused :) | 07:20 |
jrosser | https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html | 07:21 |
jrosser | openstack-ansible will call out to ceph-ansible to deploy ceph if you wish | 07:21 |
jrosser | but there is a big red warning on that page that we don't test upgrades for that | 07:21 |
jrosser | if you deployed your own ceph cluster then i would question needing storage_hosts? | 07:23 |
grauzikas | yes storage hosts is my mistake :) | 07:23 |
jrosser | the need for config-from-file for an externally deployed ceph cluster is to cover the use case where the OSA ceph_client role cannot ssh to the ceph monitor to retrieve they config | 07:24 |
jrosser | this typically occurs in deplyments where there is a "cloud team" and a "storage team" and organisational boundaries prevent that ssh access being possible | 07:24 |
jrosser | if you look here https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html#integration-with-ceph | 07:25 |
jrosser | there are 3 bullet points describing 3 different scenarios under which you might need to connect to a ceph cluster | 07:26 |
noonedeadpunk | jrosser: config-from-file if osa/ceph_ansible -> that depends. If you wanna have a separate ceph cluster per AZ - that's the only way kinda | 07:26 |
jrosser | yeah | 07:27 |
jrosser | but this is a pretty good example of how OSA is a toolbox and you need to pick/choose how you want it to work | 07:27 |
noonedeadpunk | as then you can do `ceph_cluster_name: "ceph-{{ az_name }}"` and `ceph_keyrings_dir: "/etc/openstack_deploy/ceph/{{ ceph_cluster_name }}"` | 07:27 |
grauzikas | ok thanks, probably will try to use osa and its ceph ansible implementation… so now need to destroy current cluster :) | 07:47 |
grauzikas | btw about logs… | 07:48 |
grauzikas | now i can see logs only with journalctl -u service and inside lxc i cant find logs in a file https://pastebin.com/92bhD4rd | 07:50 |
grauzikas | in configs i can see for example same glance api : https://pastebin.com/A4prQ3Qc | 07:52 |
grauzikas | no logs enabled | 07:52 |
grauzikas | i was looking for centralized logs, then went to them github and found that it is not anymore supported | 07:54 |
grauzikas | and i was not able to find logging playbook | 07:54 |
jrosser | grauzikas: all logging goes to systemd journal | 07:55 |
jrosser | we have tried to migrate everything away from log files to the journal | 07:58 |
noonedeadpunk | grauzikas: well, our suggestion usually different - have an independent ceph cluster from OSA :D | 07:59 |
noonedeadpunk | so I think you're on right track now with having a standalone ceph | 07:59 |
jrosser | grauzikas: for centalised logging we do have support for journald-remote https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/setup-infrastructure.yml#L60 | 08:00 |
jrosser | though most people overlay their own exsiting logging on top of openstack-ansible, with whichever collector they favour | 08:01 |
jrosser | for example, elastic stack journal collector is able to get the journal entries for all the container just by installing it on the host | 08:01 |
jrosser | but we don't provide any of this built in by default, as operators generally have their own individual preference for log collection | 08:02 |
grauzikas | ok thanks | 08:04 |
jrosser | a bunch of the components have easy ways to turn on prometheus exporters so you can integrate with metrics collection | 08:05 |
grauzikas | in case of ceph integration if i remove storage_hosts section syntax-check gives error so probably i should leave it, but without any hosts? | 08:37 |
noonedeadpunk | we're using vector for kind of same purpose and it can deal with journals nicely as well | 08:37 |
noonedeadpunk | and then there was also journal-to-gelf for graylog.. | 08:38 |
noonedeadpunk | but not sure if that's really maintained today.... | 08:38 |
grauzikas | for example if i will want to give a try to install ceph by in OSA integrated ansible ceph how to tell in OSA public_network and cluster_network of ceph? | 09:23 |
grauzikas | probably public_network automatically will be taken on what ceph-mon and ceph-osd | 09:24 |
noonedeadpunk | same as with plain ceph-ansible - define vars public_network and cluster_network | 09:25 |
noonedeadpunk | but then you'll need to pass cluster_network to containers (in case you're going LXC path) | 09:25 |
grauzikas | isnt cluster network used only for sync betwean ceph nodes and client to connect all apis and so on? | 09:27 |
noonedeadpunk | smth like that https://paste.openstack.org/show/buNvku6lKxeus4K9qOW2/ | 09:28 |
noonedeadpunk | cluster used to sync between nodes, yes | 09:28 |
noonedeadpunk | and public used by client to connect to osds and mons | 09:29 |
noonedeadpunk | cluster is kind of optional and can be the same as public one | 09:29 |
noonedeadpunk | but if you wanna split them between different interfaces - that might make sense as well | 09:29 |
grauzikas | its simply to have posibility to use slower nics like 10G and not overload them with internal syncs | 09:34 |
noonedeadpunk | well, I would even say to avoid issues with internal operations due to jammed throughput... | 09:54 |
noonedeadpunk | but yeah, you're kind right | 09:55 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Verify OS for containers installation https://review.opendev.org/c/openstack/openstack-ansible/+/925974 | 12:00 |
platta | noonedeadpunk: I was able to run the same `mysql` command pointing directly at the galera container successfully. So, I think you're right. The network connectivity is fine, but there's something happening negotiating the connection through HAProxy. I don't see anything in the MariaDB logs, but I do see the HAProxy logs show the attempted | 12:28 |
platta | connection and a termination state of SD, which looks like it means there was an error. | 12:28 |
platta | I'm going to go through my configuration again to see what might be getting in the way. The majority of my config is pulled from what AIO had, so I have to admit I don't understand all of the settings I'm applying. | 12:29 |
noonedeadpunk | so haproxy does quite stupid L4 balancing | 12:31 |
noonedeadpunk | but also haproxy does check backends based on the service running on 9100 (or smth like that) | 12:31 |
noonedeadpunk | and that service is quite strict on haproxy source IP | 12:32 |
noonedeadpunk | so it expects haproxy to talks with them not through internal VIP but through it's management address | 12:32 |
noonedeadpunk | there's a quite widespread mistake to add internal keepalived cidr as /24 instead of /32 which makes haproxy to talk through the wrong src IP | 12:33 |
platta | I don't think I have any settings related to keepalived, the comments made it seem optional. | 12:38 |
noonedeadpunk | it also depends how much hosts you have as keepalived is enabled only when there're more then 1 host with haproxy | 12:43 |
platta | Ah, and I do have a single host. Originally I left HAProxy out entirely because of that, but then I got other errors. | 12:44 |
platta | https://opendev.org/openstack/openstack-ansible/commit/0f521b5d6d848761d5887389a067bc37bc3909ea Wondering if this could be it. My settings have it set to 0.0.0.0/0, which I would think should be wide open, but maybe I need to explicitly specify the internal load balancer ip of haproxy. | 12:51 |
noonedeadpunk | well, I'd guess that then haproxy would mark backend as down | 13:04 |
platta | Ah, good point. I'm looking for ways to up the logging so I can get more insight into where the failure is. | 13:18 |
noonedeadpunk | but are you sure that haproxy shows it as healthy? | 13:27 |
noonedeadpunk | can you share the output of "echo 'show stat' | nc -U /run/haproxy.stat | grep galera"? | 13:28 |
platta | galera-front-1,FRONTEND,,,0,2,4096,4,108,540,0,0,0,,,,,OPEN,,,,,,,,,1,6,0,,,,0,0,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,tcp,,0,1,4,,0,0,0,,,,,,,,,,,0,,,,,,,,,-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, | 14:11 |
platta | galera-back,ark-galera-container-d769107e,0,0,0,1,,3,108,540,,0,,0,3,0,0,UP,1,1,0,3,1,140,4213,,1,7,1,,3,,2,0,,1,L7OK,200,95,,,,,,,,,,,0,3,,,,,4,,,0,0,0,0,,,,Layer7 check passed,,3,3,5,,,,172.29.236.166:3306,,tcp,,,,,,,,0,3,0,,,0,,0,0,0,0,0,0,0,0,1,1,,,,-,0,0,0,,,,,,,,,,,,,,,,,,,,,, | 14:11 |
platta | galera-back,BACKEND,0,0,0,1,410,4,108,540,0,0,,1,3,0,0,UP,1,1,0,,1,140,4213,,1,7,0,,3,,1,0,,1,,,,,,,,,,,,,,0,3,0,0,0,0,4,,,0,0,0,0,,,,,,,,,,,,,,tcp,leastconn,,,,,,,0,3,0,,,,,0,0,0,0,0,,,,,1,0,0,0,-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3680,3680,434372,196168,0,0, | 14:11 |
platta | Oh my, sorry I didn't realize how much text that was, I would've thrown it in pastern. | 14:11 |
platta | *pastebin | 14:11 |
noonedeadpunk | huh, yeah, looks healthy | 14:14 |
noonedeadpunk | and do you have some `ssl` settings in /root/.my.cnf inside utility container? | 14:14 |
noonedeadpunk | would it help commenting these out? | 14:14 |
platta | I was able to increase the verbosity of MariaDB's logging and got this: [Warning] Aborted connection 6407 to db: 'unconnected' user: 'unauthenticated' host: '172.29.236.100' (This connection closed normally without authentication) | 14:19 |
platta | So, the connectivity is fine, I would assume even the SSL and TLS portions, but MariaDB is rejecting the authentication. | 14:20 |
noonedeadpunk | hm | 14:20 |
noonedeadpunk | that sounds like smth we were fighting on master | 14:21 |
noonedeadpunk | and that was related to certificate verification failure | 14:21 |
platta | I'm curious about that IP, my config specifies the internal load balancer IP should be 172.29.236.101 | 14:21 |
noonedeadpunk | https://jira.mariadb.org/browse/CONC-712 | 14:22 |
noonedeadpunk | yeah | 14:22 |
noonedeadpunk | but it's LB IP | 14:22 |
noonedeadpunk | but the host should access mariadb by it's management address I guess | 14:23 |
noonedeadpunk | (I can be mixing up things) | 14:23 |
platta | br-mgmt is set up to have two IPs, 100 and 101, in my config. | 14:23 |
noonedeadpunk | but this really sounds like some SSL verification issue | 14:23 |
noonedeadpunk | which can be the case if you're trying to access via unexpected IP | 14:23 |
platta | I'll take a look at that ticket and see what I can find out. I'd be happy if I could end up helping resolve a known issue! | 14:25 |
noonedeadpunk | well, that's an issue with mariadb 11.4 | 14:26 |
noonedeadpunk | where they enforce usage of TLS for socket connections | 14:27 |
noonedeadpunk | but it also errors with `[Warning] Aborted connection 11 to db: 'unconnected' user: 'unauthenticated' host: 'localhost' (This connection closed normally without authentication)` | 14:27 |
noonedeadpunk | so I'd really try to connect to mariadb without tls from utility container first | 14:27 |
noonedeadpunk | and check from which IP connections coming to mariadb after all. and if this IP is in cert SAN | 14:28 |
platta | Oh wow, it’s been ages since I’ve thought about cert SANs. I’ll work on checking all those things. Thank you. | 14:30 |
noonedeadpunk | platta: so that's the variable which defines what certificate will be generated for mariadb: https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L242 | 14:33 |
noonedeadpunk | and galera_address is internal_lb_vip_address | 14:34 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/infra.yml#L51 | 14:34 |
noonedeadpunk | so yeah, your concern about vip being 172.29.236.101 and connection happening from 172.29.236.100 is valid | 14:35 |
noonedeadpunk | platta: ^ | 14:41 |
platta | noonedeadpunk: I'm not seeing the error described in the bug report when I try to reproduce, so I'm thinking it's the vip discrepancy. The only reason I have two IPs defined is because that's what was in either AIO or sample configs. With a single node cloud, there doesn't seem to be much of a reason to assign two IPs, is there? | 14:55 |
noonedeadpunk | so on multinode deployment, you still have a management IP and then a VIP managed by keepalived | 14:57 |
noonedeadpunk | but this VIP is supposed to be added as /32 alias to the interface | 14:58 |
noonedeadpunk | but I somehow getting a bit confused right now... | 14:59 |
platta | This may be my mistake, I think. The dual IP config I have came from AIO: https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/prepare_networking.yml. I think in at least one other place in my config, I'm referencing 101 when I should be referencing 100. | 15:01 |
platta | Yep, "internal_lb_vip_address: 172.29.236.101" | 15:02 |
platta | I'm rather annoyed with myself now for missing that. If I change the config and re-run the playbooks should that repair it? | 15:03 |
noonedeadpunk | frankly - for vip address I'd suggest just adding keepalived to mimic multinode env, as this ip is not supposed to be added by hands anyway | 15:03 |
noonedeadpunk | well. you might need to run with `-e pki_regen_cert=true` | 15:04 |
platta | Ok. I will try both ways to see how everything behaves. | 15:05 |
noonedeadpunk | you can set `haproxy_use_keepalived: true` to force having keepalived | 15:06 |
platta | And I will use the other configuration options from the comments in user_variables.yml | 15:08 |
jrosser | platta: the 100 / 101 thing on an AIO is becasue in some situations the backend is on (lets say) port 5000 and needs to bind to an IP - the internal vip also wants to bind to port 5000 so the VIP and backends have to use unique IP/port combinarions | 15:57 |
jrosser | this is not so relevant for an LXC based deployment, but is very much an issue for deployments without containers where everything is collapsed onto the host | 15:58 |
jrosser | thats why there is the distinction between 100 and 101 addresses in the AIO config | 15:58 |
jrosser | one is "things bound on this host" the other is "things bound on the internal endpoint" | 15:59 |
jrosser | just happens that for an AIO those are in the same place | 15:59 |
platta | Ah, that makes sense. | 16:09 |
platta | Trying with the `-e pki_regen_cert=true` option without keepalived settings didn't work. Forcing keepalived now. Can you sanity check the settings I'm using? external_vip_cidr: physical IP of the machine, /32. internal_vip_cidr: 172.29.236.100/32. external_interface: physical device (not sure if it should be that or br-vlan). internal_interface: | 16:14 |
platta | br-mgmt. | 16:14 |
platta | The physical NIC has no IP, br-vlan does. I'm still trying to grow my understanding of how all the virtual network/bridging concepts work. | 16:15 |
noonedeadpunk | well. keepalived won't fix anything kind of | 16:18 |
platta | Ah, ok. I'm going to re-image the server and start fresh with the updated config just to be sure I haven't mixed something up during all my troubleshooting. | 16:20 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Remove excessive bindings for uWSGI https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/924945 | 20:54 |
opendevreview | Merged openstack/openstack-ansible master: Use haproxy_endpoint_manage role from osa collection rather than common-tasks https://review.opendev.org/c/openstack/openstack-ansible/+/923368 | 20:58 |
platta | No luck. I tried a few different configurations, re-imaging in between. Maybe there's still something I'm missing that needs to be changed or removed. Here's my openstack_user_config and user_variables. If someone could give me a fresh set of eyes on it, I'd appreciate it: https://pastebin.com/LNVQ3v9v | 21:28 |
opendevreview | Merged openstack/openstack-ansible-os_neutron stable/2023.1: Correct 'neutron-policy-override' tag https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/925735 | 23:08 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!