Thursday, 2024-08-08

noonedeadpunkplatta: ok, so to literally reproduce the error, you can attach to the utility container (ie `lxc-attach -n ark-utility-container-e8c00157`) and run `mysql` command06:44
noonedeadpunkI don't think it's a networking issue, given that haproxy considers galera containers as alive06:44
noonedeadpunkdo you see anything in mariadb logs?06:45
noonedeadpunkalso it could be some kind of tls mess as well - like when client wants to connect through TLS but mariadb configured without tls support06:46
noonedeadpunkso if you played some vars related to tls - it could be related to that06:47
grauzikasHello, finally yesterday after using git clone -b 2024.1 i got it installed. with master there was errors. after installing i instantly started implementing ceph in to it and again issues :). i have found in manuals https://docs.openstack.org/openstack-ansible-ceph_client/latest/config-from-file.html so i should create directory /etc/openstack_deploy/ceph-keyrings/ keys and name it like this: admin.keyring 07:03
grauzikascinder-backup.keyring  cinder.keyring glance.keyring  manila.keyring  nova.keyring, but it is unclear about ceph.conf file and what playbooks should be reruned. i tryed yesterday rerun setup-openstack.yml, but seems not worked. https://pastebin.com/fVjZ5qN9 . as you can see i commented our keyring lines for now… also i have networks management where all ips is nated to get access to internet, ceph client network and 07:03
grauzikasceph cluster network. as i understand storage_hosts there i should use ceph clients ip addresses and storage-infra_hosts there use management ips?07:03
noonedeadpunkgrauzikas: I usually place ceph.conf in the same folder and then add to vars smth like that: `ceph_conf_file: "{{ lookup('file', ceph_keyrings_dir ~ '/ceph.conf') }}"`07:05
jrossergrauzikas: master is the development (unreleased) branch which will form the next release - don't use that unless you are working on code for the next release07:06
noonedeadpunkalso, storage_hosts is where cinder-volume is going to spawn https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/env.d/cinder.yml#L38-L43 07:06
noonedeadpunkI'm not sure if that what you wanted or not07:07
noonedeadpunkas to spawn ceph on these hosts using ceph-ansible you'd need to define ceph-mon_hosts and ceph-osd_hosts at very least07:08
jrosseri think also be sure to read the documentation / release notes to manage expectations about ceph deployed with OSA07:08
grauzikasglobaly or for every service like rbd_ceph_conf, cinder_backup_ceph_conf, glance_rbd_store_ceph_conf and so on or enough one line: ceph_conf_file: "{{ lookup('file', ceph_keyrings_dir ~ '/ceph.conf') }}"07:10
noonedeadpunkceph.conf usually is very minimalistic and having just monitors and cluster uuid - so it's kind of same for glance/nova/cinder07:12
jrosserwhy use config-from-file if osa/ceph_ansible is deploying the ceph cluster?07:15
jrosserdoes that make sense at all?07:15
grauzikasi mean what version should i use? https://pastebin.com/RZKt4A5i07:19
grauzikasjrosser i was reading manuals and somewhere found that ceph is not part of osa and i should deploy it manually07:20
grauzikasthats why i deployed ceph manually seperated from osa07:20
jrosserwell thats not entirely true07:20
grauzikasyes yesterday i found ceph playbooks :) so was confused :)07:20
jrosserhttps://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html07:21
jrosseropenstack-ansible will call out to ceph-ansible to deploy ceph if you wish07:21
jrosserbut there is a big red warning on that page that we don't test upgrades for that07:21
jrosserif you deployed your own ceph cluster then i would question needing storage_hosts?07:23
grauzikasyes storage hosts is my mistake :)07:23
jrosserthe need for config-from-file for an externally deployed ceph cluster is to cover the use case where the OSA ceph_client role cannot ssh to the ceph monitor to retrieve they config07:24
jrosserthis typically occurs in deplyments where there is a "cloud team" and a "storage team" and organisational boundaries prevent that ssh access being possible07:24
jrosserif you look here https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html#integration-with-ceph07:25
jrosserthere are 3 bullet points describing 3 different scenarios under which you might need to connect to a ceph cluster07:26
noonedeadpunkjrosser: config-from-file if osa/ceph_ansible -> that depends. If you wanna have a separate ceph cluster per AZ - that's the only way kinda 07:26
jrosseryeah07:27
jrosserbut this is a pretty good example of how OSA is a toolbox and you need to pick/choose how you want it to work07:27
noonedeadpunkas then you can do `ceph_cluster_name: "ceph-{{ az_name }}"` and `ceph_keyrings_dir: "/etc/openstack_deploy/ceph/{{ ceph_cluster_name }}"`07:27
grauzikasok thanks, probably will try to use osa and its ceph ansible implementation… so now need to destroy current cluster :)07:47
grauzikasbtw about logs…07:48
grauzikasnow i can see logs only with journalctl -u service and inside lxc i cant find logs in a file https://pastebin.com/92bhD4rd07:50
grauzikasin configs i can see for example same glance api : https://pastebin.com/A4prQ3Qc07:52
grauzikasno logs enabled07:52
grauzikasi was looking for centralized logs, then went to them github and found that it is not anymore supported07:54
grauzikasand i was not able to find logging playbook07:54
jrossergrauzikas: all logging goes to systemd journal07:55
jrosserwe have tried to migrate everything away from log files to the journal07:58
noonedeadpunkgrauzikas: well, our suggestion usually different - have an independent ceph cluster from OSA :D07:59
noonedeadpunkso I think you're on right track now with having a standalone ceph07:59
jrossergrauzikas: for centalised logging we do have support for journald-remote https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/setup-infrastructure.yml#L6008:00
jrosserthough most people overlay their own exsiting logging on top of openstack-ansible, with whichever collector they favour08:01
jrosserfor example, elastic stack journal collector is able to get the journal entries for all the container just by installing it on the host08:01
jrosserbut we don't provide any of this built in by default, as operators generally have their own individual preference for log collection08:02
grauzikasok thanks08:04
jrossera bunch of the components have easy ways to turn on prometheus exporters so you can integrate with metrics collection08:05
grauzikasin case of ceph integration if i remove storage_hosts section syntax-check gives error so probably i should leave it, but without any hosts?08:37
noonedeadpunkwe're using vector for kind of same purpose and it can deal with journals nicely as well08:37
noonedeadpunkand then there was also journal-to-gelf for graylog..08:38
noonedeadpunkbut not sure if that's really maintained today....08:38
grauzikasfor example if i will want to give a try to install ceph by in OSA integrated ansible ceph how to tell in OSA public_network and cluster_network of ceph?09:23
grauzikasprobably public_network automatically will be taken on what ceph-mon and ceph-osd09:24
noonedeadpunksame as with plain ceph-ansible - define vars public_network and cluster_network09:25
noonedeadpunkbut then you'll need to pass cluster_network to containers (in case you're going LXC path)09:25
grauzikasisnt cluster network used only for sync betwean ceph nodes and client to connect all apis and so on?09:27
noonedeadpunksmth like that https://paste.openstack.org/show/buNvku6lKxeus4K9qOW2/09:28
noonedeadpunkcluster used to sync between nodes, yes09:28
noonedeadpunkand public used by client to connect to osds and mons09:29
noonedeadpunkcluster is kind of optional and can be the same as public one09:29
noonedeadpunkbut if you wanna split them between different interfaces - that might make sense as well09:29
grauzikasits simply to have posibility to use slower nics like 10G and not overload them with internal syncs09:34
noonedeadpunkwell, I would even say to avoid issues with internal operations due to jammed throughput...09:54
noonedeadpunkbut yeah, you're kind right09:55
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Verify OS for containers installation  https://review.opendev.org/c/openstack/openstack-ansible/+/92597412:00
plattanoonedeadpunk: I was able to run the same `mysql` command pointing directly at the galera container successfully. So, I think you're right. The network connectivity is fine, but there's something happening negotiating the connection through HAProxy. I don't see anything in the MariaDB logs, but I do see the HAProxy logs show the attempted12:28
plattaconnection and a termination state of SD, which looks like it means there was an error.12:28
plattaI'm going to go through my configuration again to see what might be getting in the way. The majority of my config is pulled from what AIO had, so I have to admit I don't understand all of the settings I'm applying.12:29
noonedeadpunkso haproxy does quite stupid L4 balancing12:31
noonedeadpunkbut also haproxy does check backends based on the service running on 9100 (or smth like that)12:31
noonedeadpunkand that service is quite strict on haproxy source IP12:32
noonedeadpunkso it expects haproxy to talks with them not through internal VIP but through it's management address12:32
noonedeadpunkthere's a quite widespread mistake to add internal keepalived cidr as /24 instead of /32 which makes haproxy to talk through the wrong src IP12:33
plattaI don't think I have any settings related to keepalived, the comments made it seem optional.12:38
noonedeadpunkit also depends how much hosts you have as keepalived is enabled only when there're more then 1 host with haproxy12:43
plattaAh, and I do have a single host. Originally I left HAProxy out entirely because of that, but then I got other errors.12:44
plattahttps://opendev.org/openstack/openstack-ansible/commit/0f521b5d6d848761d5887389a067bc37bc3909ea Wondering if this could be it. My settings have it set to 0.0.0.0/0, which I would think should be wide open, but maybe I need to explicitly specify the internal load balancer ip of haproxy.12:51
noonedeadpunkwell, I'd guess that then haproxy would mark backend as down13:04
plattaAh, good point. I'm looking for ways to up the logging so I can get more insight into where the failure is.13:18
noonedeadpunkbut are you sure that haproxy shows it as healthy?13:27
noonedeadpunkcan you share the output of "echo 'show stat' | nc -U /run/haproxy.stat | grep galera"?13:28
plattagalera-front-1,FRONTEND,,,0,2,4096,4,108,540,0,0,0,,,,,OPEN,,,,,,,,,1,6,0,,,,0,0,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,tcp,,0,1,4,,0,0,0,,,,,,,,,,,0,,,,,,,,,-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14:11
plattagalera-back,ark-galera-container-d769107e,0,0,0,1,,3,108,540,,0,,0,3,0,0,UP,1,1,0,3,1,140,4213,,1,7,1,,3,,2,0,,1,L7OK,200,95,,,,,,,,,,,0,3,,,,,4,,,0,0,0,0,,,,Layer7 check passed,,3,3,5,,,,172.29.236.166:3306,,tcp,,,,,,,,0,3,0,,,0,,0,0,0,0,0,0,0,0,1,1,,,,-,0,0,0,,,,,,,,,,,,,,,,,,,,,,14:11
plattagalera-back,BACKEND,0,0,0,1,410,4,108,540,0,0,,1,3,0,0,UP,1,1,0,,1,140,4213,,1,7,0,,3,,1,0,,1,,,,,,,,,,,,,,0,3,0,0,0,0,4,,,0,0,0,0,,,,,,,,,,,,,,tcp,leastconn,,,,,,,0,3,0,,,,,0,0,0,0,0,,,,,1,0,0,0,-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3680,3680,434372,196168,0,0,14:11
plattaOh my, sorry I didn't realize how much text that was, I would've thrown it in pastern.14:11
platta*pastebin14:11
noonedeadpunkhuh, yeah, looks healthy14:14
noonedeadpunkand do you have some `ssl` settings in /root/.my.cnf inside utility container?14:14
noonedeadpunkwould it help commenting these out?14:14
plattaI was able to increase the verbosity of MariaDB's logging and got this: [Warning] Aborted connection 6407 to db: 'unconnected' user: 'unauthenticated' host: '172.29.236.100' (This connection closed normally without authentication)14:19
plattaSo, the connectivity is fine, I would assume even the SSL and TLS portions, but MariaDB is rejecting the authentication.14:20
noonedeadpunkhm14:20
noonedeadpunkthat sounds like smth we were fighting on master14:21
noonedeadpunkand that was related to certificate verification failure14:21
plattaI'm curious about that IP, my config specifies the internal load balancer IP should be 172.29.236.10114:21
noonedeadpunkhttps://jira.mariadb.org/browse/CONC-71214:22
noonedeadpunkyeah14:22
noonedeadpunkbut it's LB IP14:22
noonedeadpunkbut the host should access mariadb by it's management address I guess14:23
noonedeadpunk(I can be mixing up things)14:23
plattabr-mgmt is set up to have two IPs, 100 and 101, in my config.14:23
noonedeadpunkbut this really sounds like some SSL verification issue14:23
noonedeadpunkwhich can be the case if you're trying to access via unexpected IP14:23
plattaI'll take a look at that ticket and see what I can find out. I'd be happy if I could end up helping resolve a known issue!14:25
noonedeadpunkwell, that's an issue with mariadb 11.414:26
noonedeadpunkwhere they enforce usage of TLS for socket connections14:27
noonedeadpunkbut it also errors with `[Warning] Aborted connection 11 to db: 'unconnected' user: 'unauthenticated' host: 'localhost' (This connection closed normally without authentication)`14:27
noonedeadpunkso I'd really try to connect to mariadb without tls from utility container first14:27
noonedeadpunkand check from which IP connections coming to mariadb after all. and if this IP is in cert SAN14:28
plattaOh wow, it’s been ages since I’ve thought about cert SANs. I’ll work on checking all those things. Thank you.14:30
noonedeadpunkplatta: so that's the variable which defines what certificate will be generated for mariadb: https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L24214:33
noonedeadpunkand galera_address is internal_lb_vip_address14:34
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/infra.yml#L5114:34
noonedeadpunkso yeah, your concern about vip being 172.29.236.101 and connection happening from 172.29.236.100 is valid14:35
noonedeadpunkplatta: ^14:41
plattanoonedeadpunk: I'm not seeing the error described in the bug report when I try to reproduce, so I'm thinking it's the vip discrepancy. The only reason I have two IPs defined is because that's what was in either AIO or sample configs. With a single node cloud, there doesn't seem to be much of a reason to assign two IPs, is there?14:55
noonedeadpunkso on multinode deployment, you still have a management IP and then a VIP managed by keepalived14:57
noonedeadpunkbut this VIP is supposed to be added as /32 alias to the interface14:58
noonedeadpunkbut I somehow getting a bit confused right now... 14:59
plattaThis may be my mistake, I think. The dual IP config I have came from AIO: https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/prepare_networking.yml. I think in at least one other place in my config, I'm referencing 101 when I should be referencing 100.15:01
plattaYep, "internal_lb_vip_address: 172.29.236.101"15:02
plattaI'm rather annoyed with myself now for missing that. If I change the config and re-run the playbooks should that repair it?15:03
noonedeadpunkfrankly - for vip address I'd suggest just adding keepalived to mimic multinode env, as this ip is not supposed to be added by hands anyway15:03
noonedeadpunkwell. you might need to run with `-e pki_regen_cert=true`15:04
plattaOk. I will try both ways to see how everything behaves.15:05
noonedeadpunkyou can set `haproxy_use_keepalived: true` to force having keepalived15:06
plattaAnd I will use the other configuration options from the comments in user_variables.yml15:08
jrosserplatta: the 100 / 101 thing on an AIO is becasue in some situations the backend is on (lets say) port 5000 and needs to bind to an IP - the internal vip also wants to bind to port 5000 so the VIP and backends have to use unique IP/port combinarions15:57
jrosserthis is not so relevant for an LXC based deployment, but is very much an issue for deployments without containers where everything is collapsed onto the host15:58
jrosserthats why there is the distinction between 100 and 101 addresses in the AIO config15:58
jrosserone is "things bound on this host" the other is "things bound on the internal endpoint"15:59
jrosserjust happens that for an AIO those are in the same place15:59
plattaAh, that makes sense.16:09
plattaTrying with the `-e pki_regen_cert=true` option without keepalived settings didn't work. Forcing keepalived now. Can you sanity check the settings I'm using? external_vip_cidr: physical IP of the machine, /32. internal_vip_cidr: 172.29.236.100/32. external_interface: physical device (not sure if it should be that or br-vlan). internal_interface:16:14
plattabr-mgmt.16:14
plattaThe physical NIC has no IP, br-vlan does. I'm still trying to grow my understanding of how all the virtual network/bridging concepts work.16:15
noonedeadpunkwell. keepalived won't fix anything kind of16:18
plattaAh, ok. I'm going to re-image the server and start fresh with the updated config just to be sure I haven't mixed something up during all my troubleshooting.16:20
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Remove excessive bindings for uWSGI  https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/92494520:54
opendevreviewMerged openstack/openstack-ansible master: Use haproxy_endpoint_manage role from osa collection rather than common-tasks  https://review.opendev.org/c/openstack/openstack-ansible/+/92336820:58
plattaNo luck. I tried a few different configurations, re-imaging in between. Maybe there's still something I'm missing that needs to be changed or removed. Here's my openstack_user_config and user_variables. If someone could give me a fresh set of eyes on it, I'd appreciate it: https://pastebin.com/LNVQ3v9v21:28
opendevreviewMerged openstack/openstack-ansible-os_neutron stable/2023.1: Correct 'neutron-policy-override' tag  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/92573523:08

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!