Thursday, 2023-07-20

opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant master: Install mysqlclient devel package https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/888985	07:21
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant master: Fix linters and metadata https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/888469	07:23
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant master: Fix linters and metadata https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/888469	07:23
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Refactor LXC image expiration https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/888278	07:25
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Fix linters issue and metadata https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/888180	07:26
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Fix linters issue and metadata https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/888180	07:27
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Add retries to LXC base build command https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/888750	07:27
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server stable/zed: Add optional compression to mariabackup https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/887143	07:44
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Include proper vars_file for rally https://review.opendev.org/c/openstack/openstack-ansible/+/888656	07:45
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_rally stable/yoga: Include proper commit in rally_upper_constraints_url https://review.opendev.org/c/openstack/openstack-ansible-os_rally/+/887681	07:45
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_rally stable/yoga: Include proper commit in rally_upper_constraints_url https://review.opendev.org/c/openstack/openstack-ansible-os_rally/+/887681	07:46
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Include proper vars_file for rally https://review.opendev.org/c/openstack/openstack-ansible/+/888656	07:48
kleini	https://paste.opendev.org/show/bZj7Yq3mmW8wWi1e9pqj/ <- I have this issue during upgrade to 26.1.2. The SSH keyfiles are there and I can properly read them with ssh-keygen. generated public key matches the public key file. Do you have any hints, what is wrong? ssh-keygen does not ask me for a passphrase for the private key, when showing the public one with ssh-keygen -y -e -f private	08:26
noonedeadpunk	kleini: what mode is file in?	09:03
noonedeadpunk	as IIRC it does fail if it is not 0600	09:03
noonedeadpunk	And if it is stored in git - it won't be 0600	09:04
noonedeadpunk	but if you say ssh-keygen can read them... huh	09:06
noonedeadpunk	as I was thinking about this thingy https://github.com/ansible-collections/community.crypto/issues/564	09:06
noonedeadpunk	kleini: do you have `backend: cryptography` in /etc/ansible/ansible_collections/openstack/osa/roles/ssh_keypairs/tasks/standalone/create_keypair.yml ?	09:08
noonedeadpunk	as this could be fixed with https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/870997 but yeah, it's available only for Antelope and not Zed	09:09
kleini	it was the file permissions. many thanks!	09:10
noonedeadpunk	we probably can backport this patch to Zed	09:10
noonedeadpunk	or you can propose it as well ;)	09:11
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/2023.1: Use wildcards to specify rabbit/erlang versions https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/888657	09:12
kleini	on the staging system, I don't have those files in Git, just the configuration without upper constraints, pki, keypairs and so on. because I deploy staging freshly every time, IPs change, container UUIDs change and so on. but for production I have all files in Git. resulting in wrong file permissions for SSH private key files. facepalm	09:13
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/2023.1: Use wildcards to specify rabbit/erlang versions https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/888657	09:13
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/2023.1: Use wildcards to specify rabbit/erlang versions https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/888657	09:15
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/2023.1: Use wildcards to specify rabbit/erlang versions https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/888657	09:16
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/2023.1: Use wildcards to specify rabbit/erlang versions https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/888657	09:17
opendevreview	Marcus Klein proposed openstack/openstack-ansible-plugins stable/zed: Use cryptography backend for openssh_keypair https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/888658	09:18
kleini	too easy ;-)	09:19
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.1: Use include_role in task to avoid lack of access to vars https://review.opendev.org/c/openstack/openstack-ansible/+/888659	09:20
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Use include_role in task to avoid lack of access to vars https://review.opendev.org/c/openstack/openstack-ansible/+/888660	09:20
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Use include_role in task to avoid lack of access to vars https://review.opendev.org/c/openstack/openstack-ansible/+/889021	09:20
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Re-enable CI jobs after rally is fixed https://review.opendev.org/c/openstack/openstack-ansible/+/889016	09:23
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Re-enable CI jobs after rally is fixed https://review.opendev.org/c/openstack/openstack-ansible/+/889018	09:24
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Pin version of setuptools https://review.opendev.org/c/openstack/openstack-ansible/+/889022	09:25
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Pin version of setuptools https://review.opendev.org/c/openstack/openstack-ansible/+/889022	09:25
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Pin version of setuptools https://review.opendev.org/c/openstack/openstack-ansible/+/889022	09:25
noonedeadpunk	kleini: nobody said it will be hard :) but this way it will get reviewed faster	09:27
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Restore an ability for HAProxy to bind on interal IP https://review.opendev.org/c/openstack/openstack-ansible/+/887577	09:28
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Restore an ability for HAProxy to bind on interal IP https://review.opendev.org/c/openstack/openstack-ansible/+/887574	09:29
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.1: Gather facts before including common-playbooks https://review.opendev.org/c/openstack/openstack-ansible/+/889023	09:30
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-plugins stable/zed: Skip updating service password by default https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/888153	09:30
kleini	I have now an issue with zookeeper in production deployment. incoming connection from other zookeeper instances (in containers) seem to come in from their hosts, not the containers. what can be a possible cause for that? IPs and routing looks the same as with all other LXC containers.	10:43
kleini	the certificate check fails then because the source IP is wrong and zookeeper drops the connection from the other zookeeper instance.	10:44
noonedeadpunk	kleini: so it's cluster connection that fails? Or client connection?	10:51
noonedeadpunk	As for client connection, there's a bug in tooz library (was fixed quite recently), that does not allow to enable encryption for clients	10:51
noonedeadpunk	but clustering encryption should work	10:51
kleini	it is for the cluster connection	10:51
anskiy	noonedeadpunk: https://zuul.opendev.org/t/openstack/build/42f7de398a5a42a498ffd264914301b1/log/logs/host/glance-api.service.journal-10-08-02.log.txt now it's glance that is broken. I think there is some problem with keystone, not nova	10:51
anskiy	hmm: https://zuul.opendev.org/t/openstack/build/42f7de398a5a42a498ffd264914301b1/log/logs/host/keystone-wsgi-public.service.journal-10-08-02.log.txt#17579	10:53
noonedeadpunk	anskiy: I have quite vague understanding why this can happen to be frank, and only in upgrade jobs	10:53
noonedeadpunk	ah	10:53
kleini	https://paste.opendev.org/show/bKFZxWBwt20oLxi8memH/ 10.20.150.2-4 are the infra hosts, while 127,132,184 are the zookeeper containers	10:53
noonedeadpunk	but that's kinda "expected"	10:53
noonedeadpunk	kleini: the only guess how this might happen - is that zookeeper attempts to use eth0 instead of eth1	10:54
noonedeadpunk	and eth0 has src nat	10:54
kleini	okay, so bind seems to be wrong	10:54
noonedeadpunk	But, eth0 should not be routable, as lxcbr0 is isolated	10:54
noonedeadpunk	I _think_ it binds to 0.0.0.0	10:54
noonedeadpunk	but not sure	10:55
noonedeadpunk	and default route is through eth0 actually	10:55
anskiy	I wonder why does it say 10.20.150.132 in ListenHandler, does it bind on it?	10:55
noonedeadpunk	there're different set of settings for client and clustering	10:55
anskiy	I mean, what's that address anyway, as it seems, that's not an infra host	10:56
noonedeadpunk	I woudl need to read zookeper docs to recall what is what	10:56
noonedeadpunk	but I'd check `ss` output to check where it is binded	10:56
kleini	zookeeper is bound to eth1 address in containers. strangely incoming connections seem to come in from own host not the other containers host...	11:02
noonedeadpunk	just to ensure - zookeper is not running on hosts as well?	11:04
noonedeadpunk	as might be you ended up with 6 zookeepers or smth?	11:05
kleini	damn, it is	11:05
noonedeadpunk	ok, that's interesting	11:05
kleini	I have 6 zookeepers	11:05
anskiy	noonedeadpunk: well, for `openstack-ansible-upgrade_yoga-aio_metal-ubuntu-focal`, which succeeded glance_service_password is 47 chars, for `openstack-ansible-upgrade-aio_metal-rockylinux-9` it's 61	11:05
noonedeadpunk	is it our env.d that is failing or your own inventory weird?	11:05
noonedeadpunk	anskiy: well, there's a configuration for keystone to change hashing method to remove this issue	11:06
kleini	https://paste.opendev.org/show/biAjsSQbx4XG3GsA0CVU/ <- issue in inventory	11:06
noonedeadpunk	it's due to bcache or smth like that	11:06
noonedeadpunk	kleini: yeah, but I wonder what caused it....	11:07
noonedeadpunk	you used default env.d file?	11:07
noonedeadpunk	anskiy: password_hash_algorithm https://docs.openstack.org/keystone/latest/configuration/config-options.html#identity.password_hash_algorithm	11:07
noonedeadpunk	bcrypt has a limit of 54, scrypt does not	11:08
kleini	env.d is only modified for having cinder volume in containers for Ceph only backing storage	11:08
noonedeadpunk	anskiy: https://opendev.org/openstack/keystone/src/commit/8ad765e0230ceeb5ca7c36ec3ed6d25c57b22c9d/releasenotes/notes/bug_1543048_and_1668503-7ead4e15faaab778.yaml	11:09
anskiy	noonedeadpunk: so, it's better to change user_secrets generation?	11:10
noonedeadpunk	another way would be to ensure that our tooling does not make passowrds longer then 54 by default	11:10
noonedeadpunk	I'd say that for new deployments it makes sense to start using scrypt to be frank...	11:10
noonedeadpunk	anskiy: but I really wonder why it's the issue for this specific patch only	11:10
noonedeadpunk	this should be adjsuted then to be 54 max https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/pw-token-gen.py#L89	11:11
kleini	sorry, I was wrong. there is no zookeeper instance on the hosts. the inventory looks the same in staging	11:12
noonedeadpunk	ugh... Having zookeeper on hosts would be way easier explanation	11:13
noonedeadpunk	but yeah, coordination_all should contain containers and hosts - it "as designed"	11:14
noonedeadpunk	as playbook runs against `zookeeper_all`	11:14
noonedeadpunk	which should contain only containers	11:14
kleini	I found some iptables rules, maybe causing this	11:16
kleini	https://paste.opendev.org/show/bCO5z7LFXcb6zI383WBB/ <- some masquerading rule for computes and network nodes to access outside world through infra hosts was not strict enough...	11:26
anskiy	noonedeadpunk: there is this thing https://review.opendev.org/c/openstack/openstack-ansible/+/887866 and it's 2023.1 too, which fails in `nova-status upgrade check` (jammy) and glance (rocky)	11:29
noonedeadpunk	kleini: I think at least one of the rules is created by lxc-hosts role	11:31
noonedeadpunk	the one for 10.0.3.0/24	11:31
noonedeadpunk	not sure about second one though	11:31
kleini	yes. and the other one was from systemd-networkd to allow computes and network nodes to access outside world through infra hosts. in my setup only infra hosts have outside world IPs and floating IPs are reachable from outside	11:32
kleini	computes and network nodes need to access outside world through management network using infra hosts as SNAT gateways	11:33
noonedeadpunk	ah, ok, I see then.	12:03
lsudre	Hi, I try to deploy openstack+ceph with OSA when I run "openstack-ansible setup-infrastructure.yml" I have an issue with this task: [ceph-osd : wait for all osd to be up] skipping osd1 and osd2, and failing with osd3 and retrying 60times. Do you have any ideas, about what is causing this? Thx	14:02
noonedeadpunk	lsudre: worth checking `ceph -s` or `ceph health`	14:06
lsudre	in osd?	14:07
noonedeadpunk	on monitor host	14:07
noonedeadpunk	and might be smth like `ceph osd tree`	14:07
lsudre	auth: unable to find a keyring	14:07
anskiy	lsudre: do you run this on one of controller nodes, as opposed to deploy host?	14:10
noonedeadpunk	is there even monitor service running?	14:10
lsudre	noonedeadpunk: sudo systemctl status ceph-mon.service return inactive	14:11
lsudre	anskiy: in my ceph-mon host	14:11
anskiy	lsudre: service should be called like `ceph-mon@<HOST>`	14:12
lsudre	noonedeadpunk: and sudo systemctl status ceph-mon@os-deploy-ceph-host.service return active and running	14:13
lsudre	anskiy: like this ? ceph-mon@os-deploy-ceph-host.service	14:13
noonedeadpunk	and what's in /etc/ceph then? There should be ceph.conf and keyrings	14:14
lsudre	my infra is: one mon(mon1) and 3 osd ([osd1, osd2, osd3])	14:14
lsudre	ok now I can run a ceph -s command	14:14
lsudre	the command return a HEALTH_WARN	14:15
lsudre	mon is allowing insecure global_id reclaim 1 MDSs report slow metadata IOs Reduced data availability: 2 pgs inactive OSD count 0 < osd_pool_default_size 3	14:15
anskiy	lsudre: so what is the status of OSDs: `systemctl status ceph-osd@<OSD ID>`?	14:16
noonedeadpunk	lsudre: `mon is allowing insecure global_id reclaim` is relatively minor	14:18
lsudre	anskiy: in the mon? I haven't this service, only a ceph-osd.target	14:18
noonedeadpunk	nah. on osd node	14:19
noonedeadpunk	saying, osd3	14:19
lsudre	same shit only ceph-osd.target and is running	14:20
lsudre	the ceph -s command return services: mon: 1 daemons, quorum os-deploy-ceph-host (age 26m) mgr: os-deploy-ceph-host(active, since 26m) mds: 1/1 daemons up osd: 0 osds: 0 up, 0 in	14:20
anskiy	lsudre: I would still try to run `systemctl status ceph-osd@3` on osd3 -- there could be some logs of the previous attempt to start it	14:21
anskiy	or `journalctl -u ceph-osd@3`	14:22
anskiy	or whichever is ID for osd3	14:22
lsudre	○ ceph-osd@3.service - Ceph object storage daemon osd.3 Loaded: loaded (/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: enabled) Active: inactive (dead)	14:24
noonedeadpunk	and what if you try to start it?	14:28
noonedeadpunk	or indeed - check journalctl	14:28
lsudre	he ask me a password	14:28
lsudre	OSD data directory /var/lib/ceph/osd/ceph-3 does not exist; bailing out.	14:29
lsudre	I have no files in /var/lib/ceph/osd/ folder	14:30
lsudre	something is missing in my OSA conf?	14:32
anskiy	lsudre: could you please show us your `openstack_user_config.yml` via paste.opendev.org and user_variables.yml?	14:32
lsudre	shure	14:32
lsudre	https://paste.opendev.org/show/bGDj5FVzWWKEjkaxhKJX/ https://paste.opendev.org/show/bJBfMzednhy5Y5IOgwiL/	14:34
anskiy	I wonder how are example configurations suppose to work, without settings `lvm_volumes`...	14:48
anskiy	lsudre: so, I suppose, you need to set `lvm_volumes` variable to some disk devices (like `/dev/sdX`) that you're willing to use as OSDs.	14:52
anskiy	noonedeadpunk: so, I've forcefully set https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/pw-token-gen.py#L89 this thing to 64 and succesfully bootstrapped antelope, so it's something else -_-	14:54
lsudre	ok thank you for your time	14:54
*** dviroel__ is now known as dviroel		14:55
noonedeadpunk	anskiy: from what I read in keystone code - it should jsut "strip" to 54 anything that is longer	14:56
noonedeadpunk	and it happens only to upgrade, so might be smth realted to re-hashing... And then when we did "update_password" - it was resetting it, so it was not a concern I guess	14:57
anskiy	noonedeadpunk: didn't the patch was only applicable to nova? As Glance get 401 too	14:59
noonedeadpunk	Nope, we jsut disabled resetting password by default	15:00
noonedeadpunk	or updating it	15:00
noonedeadpunk	so if you need to update password - you'de need to define a variable for that	15:00
noonedeadpunk	so it could be result of keystone upgrade. But then it's good we've catched that	15:04
lsudre	anskiy: why I need a volume_group: cinder-volumes in user_variables like the openstack_user_config.yml.test.example when i should use the rbd_volumes specifically made for ceph configuration?	15:23
anskiy	lsudre: Ceph needs some disks to be used as OSDs, so it could provide block storage for your cluster. And defining which devices should be used by Ceph is done like this: https://github.com/ceph/ceph-ansible/blob/main/group_vars/osds.yml.sample#L21-L122. I, for example set this via `lvm_volumes` list for each OSD node.	15:41
opendevreview	Merged openstack/openstack-ansible-openstack_hosts master: Fix linters issue and metadata https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/888455	21:59
opendevreview	Merged openstack/openstack-ansible-ceph_client master: Fix linters and metadata https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/888216	22:01
opendevreview	Merged openstack/openstack-ansible-ceph_client master: Apply tags to included tasks https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/888461	22:27
opendevreview	Merged openstack/openstack-ansible-repo_server master: Fix linters and metadata https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/888280	22:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!