Monday, 2022-05-09

*** ysandeep\|out is now known as ysandeep\|rover		04:58
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-repo_server master: Add upgrade path from lsyncd to shared filesystem. https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/839411	06:46
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-repo_server master: Remove all code for lsync, rsync and ssh https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/837588	06:46
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-repo_server master: Clean up legacy lsycnd, rsync and ssh key config https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/837859	06:46
*** ysandeep\|rover is now known as ysandeep\|rover\|lunch		07:20
*** ysandeep\|rover\|lunch is now known as ysandeep\|rover		08:23
jrosser	noonedeadpunk: i am not sure how we merge this https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/839411	09:35
jrosser	seems it is very dependant on this https://review.opendev.org/c/openstack/openstack-ansible/+/837589	09:35
jrosser	the filesystem is created in the integrated repo atm, becasue the serial: keyword in the repo_server playbook really breaks forming the gluster cluster which can't be done with the tasks serial	09:41
jrosser	oh also i sent an email direct to the Derek guy from the ML in case all the replies are ending up in his spam filter	09:47
noonedeadpunk	^ I did same actually...	09:50
jrosser	:)	09:51
jrosser	noonedeadpunk: https://github.com/jrosser/openstack-ansible-os_skyline	09:59
noonedeadpunk	I can create a repo in opendev for that:)	10:00
noonedeadpunk	Regarding repo - sorry, I'm away today, can't check details for that	10:00
noonedeadpunk	But https://github.com/jrosser/openstack-ansible-os_skyline/blob/master/tasks/db_setup.yml and https://github.com/jrosser/openstack-ansible-os_skyline/blob/master/tasks/service_setup.yml are obsolete :D	10:01
jrosser	oh yeah it's a big hack	10:01
jrosser	thats how i found we still didnt tidy up os_placement which is what this is created from	10:01
jrosser	i need to fix up the copyright lines as well	10:02
*** dviroel_ is now known as dviroel\		11:14
*** dviroel\ is now known as dviroel		11:14
*** ysandeep\|rover is now known as ysandeep\|rover\|break		11:57
opendevreview	Merged openstack/openstack-ansible stable/xena: Fix extra facts gathering with tags https://review.opendev.org/c/openstack/openstack-ansible/+/840479	12:07
*** ysandeep\|rover\|break is now known as ysandeep\|rover		12:57
johnd_	Hello there, I would like to ask you some question about cinder deployment. We have 3 controllers with 3 cinder-volume and 3 cinder-api each. Those services are installed o	14:27
johnd_	Hello there, I would like to ask you some question about cinder deployment. We have 3 controllers with 3 cinder-volume and 3 cinder-api each. Those services are installed in LXC and connected to a same Ceph Backend.	14:28
johnd_	When we create VMs, volumes are managed by different cinder volumes. We can see this in Horizon (os-controller-2-cinder-volumes-container-0533e258@rbd#rbd / os-controller-3-cinder-volumes-container-871bcbc8@rbd#rbd/...). Now, if we want to migrate the volume from one service to another in horizon, we can only select ceph@rbd#rbd and this fails. If we migrate with CLI, the volume always ends on os-controller-3-cinder-volumes-container-871bcbc8@r	14:30
johnd_	Is this normal ? How to fix this ?	14:31
mgariepy	you can set the `backend_host` config in the ceph secion on cinder. this will allow any 3 cinder-volume to manage the volumes.	14:39
jrosser	johnd_: is this an existing/old deployment, becasue I think this is somthing that was adjusted a while ago	14:41
johnd_	this is an old deployment	14:41
johnd_	the cluster was created on pike	14:42
jrosser	mgariepy: i remember this being not quite so simple https://opendev.org/openstack/openstack-ansible/src/branch/master/releasenotes/notes/enable-active-active-9af1551759468dc8.yaml	14:45
jrosser	https://github.com/openstack/openstack-ansible-os_cinder/commit/c148d77e29af6faebc1c9b012ae08aed447cd179	14:48
jrosser	https://github.com/openstack/openstack-ansible-os_cinder/commit/c6b9f011b777aa0513e99a35fba7d976a4c9d4c1	14:50
johnd_	We have "cluster = ceph" in our config	14:50
jrosser	it seems a bit opaque to me how this is actually supposed to work	14:50
jrosser	i see we make that config setting, but from a cinder POV i'm not sure i understand the active/active model here if the volumes are bound to a specific back end	14:51
johnd_	i have this in horizon, is this normal for you ? https://ibb.co/617q8r4	14:52
mgariepy	what is your backend config?	14:53
jrosser	i have this https://paste.opendev.org/show/bYpnYSYModo1HzRP4qv6/	14:53
johnd_	here is my config: https://paste.opendev.org/show/bfWltAz5gtdx0jAVueo0/	14:55
mgariepy	i have both config.	14:55
jrosser	there is some background info about active-active here https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/storage_guide/ch-cinder#active-active-deployment-for-high-availability	14:56
johnd_	Here are my pool and cluster https://paste.opendev.org/show/bDbZ8GPJf6r2ihM6o2JQ/	14:57
jrosser	but it is still confusing that the volumes have a os-vol-host-attr:host which points to a actual host rather than the cluster	14:58
johnd_	Here are also the cinder services https://paste.opendev.org/show/b40TC8fNXkiktUJoEU7w/	15:02
johnd_	If you have Horizon installed, how are your hosts volumes showing ?	15:08
*** dviroel is now known as dviroel\|lunch		15:10
foutattoro	hi all, i have a issue with my OSA cloud after a restart. all services run correctly but I'm getting an error with glance	15:14
foutattoro	Failed to contact the endpoint at http://172.29.236.15:9292 for discovery. Fallback to using that endpoint as the base url. Failed to contact the endpoint at http://172.29.236.15:9292 for discovery. Fallback to using that endpoint as the base url. The image service for :RegionOne exists but does not have any supported versions.	15:14
foutattoro	I had alredy create images and instances. I have do any update since my deployment	15:15
foutattoro	someone knows how to solve this please ?	15:15
jrosser	foutattoro: is 172.29.236.15 your internal VIP?	15:28
foutattoro	yes	15:33
jrosser	have you checked what happends with something like `wget http://172.29.236.15:9292/`	15:36
*** ysandeep\|rover is now known as ysandeep\|out		16:09
*** dviroel\|lunch is now known as dviroel		16:24
foutattoro	jrosser: I will check what's going wrong after restarting servers	17:00
foutattoro	Is there any procedure to follow after restarting infra servers ?	17:01
jrosser	restarting them all is not so great, as you have rabbitmq / galera clusters to consider	17:14
jrosser	there are some pointers here for checking those services after some kind of disruption https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html	17:14
foutattoro	jrosser:You right I think the issue come from galera cluster with is not synchronized	17:22
jrosser	did you restart all the infra nodes together?	17:22
foutattoro	yes	17:23
foutattoro	but I get this for galera https://paste.opendev.org/show/b42nMfxPH5Utzjt0lhVA/	17:23
foutattoro	how can I synchronize the cluster	17:24
jrosser	well, i think you have to follow the instructions here https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html#recover-a-multi-node-failure	17:25
jrosser	and tbh this is not something ever we expect openstack-ansible to be able to deal with	17:26
jrosser	this needs some mariadb skills to bring the DB back online after a total outage	17:27
jrosser	when rebooting/restarting control plane nodes it needs to be done very very carefully, checking the state of rabbitmq/galera at each step	17:28
foutattoro	jrosser: thanks for this information	17:43
foutattoro	I'm still dealing with this issue	17:44
mgariepy	the good thing is that you have a node with : safe_to_bootstrap: 1	17:54
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible master: Add CSP headers for img-src and worker-src https://review.opendev.org/c/openstack/openstack-ansible/+/841154	17:55
foutattoro	mgariepy: could you explain a bit more please	17:56
mgariepy	you should restart the cluster with that node.	17:58
mgariepy	https://galeracluster.com/2016/11/introducing-the-safe-to-bootstrap-feature-in-galera-cluster/	17:58
foutattoro	jrosser mgariepy: what is the risk of deploying openstack-infrastructure services ?	18:49
foutattoro	and backup vms from ceph storage	18:50
mgariepy	foutattoro, not sur i understand what you mean	18:55
foutattoro	since my galera cluster is not synchronied, I wonder if it is possible to deploy galera in new lxc containers	18:56
mgariepy	restarting galera is not that hard.	18:57
foutatoro	thanks mgariepy & jrosser: I have resated the galera cluster	19:51
mgariepy	is your rabbitmq cluster happy ?	19:52
foutatoro	I think yes.	19:52
foutatoro	I face with ceh issue	19:52
foutatoro	because openstack can't retrieve volume frem ceph https://paste.opendev.org/show/bnIUJ6sVLnsH7mnJZsLX/	19:53
mgariepy	can you paste: rabbitmq cluster_status	19:53
mgariepy	galera is needed by keystone.	19:54
mgariepy	which is needed for all the other services along with rabbitmq for most of them.	19:54
foutatoro	I seems that keysone works since I can list servers with "openstack server list"	19:55
foutatoro	but all vm are powerd off	19:55
mgariepy	restart all the cinder containers	19:56
mgariepy	and services.	19:56
mgariepy	how comes you have 33% degraded if everything is up :/	19:57
jrosser	how many infra hosts are there? only two?	19:57
jrosser	oddly there are 6 ceph mons	19:57
mgariepy	an even number for quorum is not good.	19:58
mgariepy	since you need majority to be happy (4 out of 6)	19:58
mgariepy	if you only have 5, with 3 your cluster can work	19:59
foutatoro	yes I've 2 infra hosts	19:59
jrosser	thats not so great - two nodes is going to fail just as badly as a single one for galera and ceph	20:01
foutatoro	jrosser: I run 3 ceph mon containers on each infra since I only have 2 hosts	20:05
foutatoro	what do you suggest for the ceph cluster ?	20:06
jrosser	if you lose one of those hosts then ceph is down	20:06
jrosser	as mgariepy says you need 4 of 6 mons to be up as a minimum, thats just how ceph works	20:06
jrosser	it must always be >50% of them available	20:06
foutatoro	OK, I have to add a new infra host if I understand	20:09
jrosser	yeah, that would certainly make things more robust	20:09
jrosser	but you should be able to recover whats happening now	20:10
foutatoro	but this doesn't explain why cinder can't get volumes from ceph	20:10
jrosser	it's also worth reading up on galera cluster sizing too https://galeracluster.com/library/documentation/weighted-quorum.html	20:10
jrosser	no it doesnt explain cinder, the place to start is the cinder api / cinder volumes, check the logs, restart the services if necessary	20:12
foutatoro	jrosser: cender services seem to be up with erros https://paste.opendev.org/show/bjrEenjqdbkwR1FbaDEt/	20:23
foutatoro	with this deployement I can't find logs into /var/log/cinder/*	20:24
foutatoro	mgariepy: is there a way to redure 33% degraded in ceph cluster ?	20:25
jrosser	the logs are all in systemd journals	20:27
jrosser	`journalctl -u <unit-name>`	20:27
foutatoro	may be the error come from this message	20:28
foutatoro	2022-05-09 20:26:04.072 61 DEBUG cinder.scheduler.host_manager [req-ae393cc0-efc2-4 e2d-add6-120ae3bd3d05 - - - - -] Received volume service update from Cluster: ceph@RBD - Host: rbd:volumes@RBD: {'vendor_name': 'Open Source', 'driver_version': '1.2.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 6977.09, 'free_capacity_gb': 6977.09, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_s	20:28
foutatoro	'total_capacity_gb': 6977.09, 'free_capacity_gb': 6977.09, 'reserved_percentage': 0,	20:28
foutatoro	jrosser, mgariepy: any suggestion ?	20:43
*** dviroel is now known as dviroel\|afk		20:43
foutatoro	jrosser, mgariepy:: I just found the root issue https://paste.opendev.org/show/buSSglduS58Tt2s7yzuG/	20:52
foutatoro	how to start cinder-volume please ?	20:53
jrosser	i'm not so sure that it's stopped	21:05
jrosser	restarting the cinder-volume service and looking at the journal perhaps	21:06
foutatoro	I wonder also if my PV are full https://paste.opendev.org/show/bMbxrFpzlZ8WWFKrVfhY/	21:11
foutatoro	jrosser: do you think is it possible to loose volume after reboot	21:33
jrosser	i suspect that this is a result of the cinder active/active stuff	21:33
jrosser	but it's late now here so i have to go	21:34
foutatoro	jrosser: ok thanks for your help	21:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!