*** ysandeep|out is now known as ysandeep|rover | 04:58 | |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-repo_server master: Add upgrade path from lsyncd to shared filesystem. https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/839411 | 06:46 |
---|---|---|
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-repo_server master: Remove all code for lsync, rsync and ssh https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/837588 | 06:46 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-repo_server master: Clean up legacy lsycnd, rsync and ssh key config https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/837859 | 06:46 |
*** ysandeep|rover is now known as ysandeep|rover|lunch | 07:20 | |
*** ysandeep|rover|lunch is now known as ysandeep|rover | 08:23 | |
jrosser | noonedeadpunk: i am not sure how we merge this https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/839411 | 09:35 |
jrosser | seems it is very dependant on this https://review.opendev.org/c/openstack/openstack-ansible/+/837589 | 09:35 |
jrosser | the filesystem is created in the integrated repo atm, becasue the serial: keyword in the repo_server playbook really breaks forming the gluster cluster which can't be done with the tasks serial | 09:41 |
jrosser | oh also i sent an email direct to the Derek guy from the ML in case all the replies are ending up in his spam filter | 09:47 |
noonedeadpunk | ^ I did same actually... | 09:50 |
jrosser | :) | 09:51 |
jrosser | noonedeadpunk: https://github.com/jrosser/openstack-ansible-os_skyline | 09:59 |
noonedeadpunk | I can create a repo in opendev for that:) | 10:00 |
noonedeadpunk | Regarding repo - sorry, I'm away today, can't check details for that | 10:00 |
noonedeadpunk | But https://github.com/jrosser/openstack-ansible-os_skyline/blob/master/tasks/db_setup.yml and https://github.com/jrosser/openstack-ansible-os_skyline/blob/master/tasks/service_setup.yml are obsolete :D | 10:01 |
jrosser | oh yeah it's a big hack | 10:01 |
jrosser | thats how i found we still didnt tidy up os_placement which is what this is created from | 10:01 |
jrosser | i need to fix up the copyright lines as well | 10:02 |
*** dviroel_ is now known as dviroel\ | 11:14 | |
*** dviroel\ is now known as dviroel | 11:14 | |
*** ysandeep|rover is now known as ysandeep|rover|break | 11:57 | |
opendevreview | Merged openstack/openstack-ansible stable/xena: Fix extra facts gathering with tags https://review.opendev.org/c/openstack/openstack-ansible/+/840479 | 12:07 |
*** ysandeep|rover|break is now known as ysandeep|rover | 12:57 | |
johnd_ | Hello there, I would like to ask you some question about cinder deployment. We have 3 controllers with 3 cinder-volume and 3 cinder-api each. Those services are installed o | 14:27 |
johnd_ | Hello there, I would like to ask you some question about cinder deployment. We have 3 controllers with 3 cinder-volume and 3 cinder-api each. Those services are installed in LXC and connected to a same Ceph Backend. | 14:28 |
johnd_ | When we create VMs, volumes are managed by different cinder volumes. We can see this in Horizon (os-controller-2-cinder-volumes-container-0533e258@rbd#rbd / os-controller-3-cinder-volumes-container-871bcbc8@rbd#rbd/...). Now, if we want to migrate the volume from one service to another in horizon, we can only select ceph@rbd#rbd and this fails. If we migrate with CLI, the volume always ends on os-controller-3-cinder-volumes-container-871bcbc8@r | 14:30 |
johnd_ | Is this normal ? How to fix this ? | 14:31 |
mgariepy | you can set the `backend_host` config in the ceph secion on cinder. this will allow any 3 cinder-volume to manage the volumes. | 14:39 |
jrosser | johnd_: is this an existing/old deployment, becasue I think this is somthing that was adjusted a while ago | 14:41 |
johnd_ | this is an old deployment | 14:41 |
johnd_ | the cluster was created on pike | 14:42 |
jrosser | mgariepy: i remember this being not quite so simple https://opendev.org/openstack/openstack-ansible/src/branch/master/releasenotes/notes/enable-active-active-9af1551759468dc8.yaml | 14:45 |
jrosser | https://github.com/openstack/openstack-ansible-os_cinder/commit/c148d77e29af6faebc1c9b012ae08aed447cd179 | 14:48 |
jrosser | https://github.com/openstack/openstack-ansible-os_cinder/commit/c6b9f011b777aa0513e99a35fba7d976a4c9d4c1 | 14:50 |
johnd_ | We have "cluster = ceph" in our config | 14:50 |
jrosser | it seems a bit opaque to me how this is actually supposed to work | 14:50 |
jrosser | i see we make that config setting, but from a cinder POV i'm not sure i understand the active/active model here if the volumes are bound to a specific back end | 14:51 |
johnd_ | i have this in horizon, is this normal for you ? https://ibb.co/617q8r4 | 14:52 |
mgariepy | what is your backend config? | 14:53 |
jrosser | i have this https://paste.opendev.org/show/bYpnYSYModo1HzRP4qv6/ | 14:53 |
johnd_ | here is my config: https://paste.opendev.org/show/bfWltAz5gtdx0jAVueo0/ | 14:55 |
mgariepy | i have both config. | 14:55 |
jrosser | there is some background info about active-active here https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/storage_guide/ch-cinder#active-active-deployment-for-high-availability | 14:56 |
johnd_ | Here are my pool and cluster https://paste.opendev.org/show/bDbZ8GPJf6r2ihM6o2JQ/ | 14:57 |
jrosser | but it is still confusing that the volumes have a os-vol-host-attr:host which points to a actual host rather than the cluster | 14:58 |
johnd_ | Here are also the cinder services https://paste.opendev.org/show/b40TC8fNXkiktUJoEU7w/ | 15:02 |
johnd_ | If you have Horizon installed, how are your hosts volumes showing ? | 15:08 |
*** dviroel is now known as dviroel|lunch | 15:10 | |
foutattoro | hi all, i have a issue with my OSA cloud after a restart. all services run correctly but I'm getting an error with glance | 15:14 |
foutattoro | Failed to contact the endpoint at http://172.29.236.15:9292 for discovery. Fallback to using that endpoint as the base url. Failed to contact the endpoint at http://172.29.236.15:9292 for discovery. Fallback to using that endpoint as the base url. The image service for :RegionOne exists but does not have any supported versions. | 15:14 |
foutattoro | I had alredy create images and instances. I have do any update since my deployment | 15:15 |
foutattoro | someone knows how to solve this please ? | 15:15 |
jrosser | foutattoro: is 172.29.236.15 your internal VIP? | 15:28 |
foutattoro | yes | 15:33 |
jrosser | have you checked what happends with something like `wget http://172.29.236.15:9292/` | 15:36 |
*** ysandeep|rover is now known as ysandeep|out | 16:09 | |
*** dviroel|lunch is now known as dviroel | 16:24 | |
foutattoro | jrosser: I will check what's going wrong after restarting servers | 17:00 |
foutattoro | Is there any procedure to follow after restarting infra servers ? | 17:01 |
jrosser | restarting them all is not so great, as you have rabbitmq / galera clusters to consider | 17:14 |
jrosser | there are some pointers here for checking those services after some kind of disruption https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html | 17:14 |
foutattoro | jrosser:You right I think the issue come from galera cluster with is not synchronized | 17:22 |
jrosser | did you restart all the infra nodes together? | 17:22 |
foutattoro | yes | 17:23 |
foutattoro | but I get this for galera https://paste.opendev.org/show/b42nMfxPH5Utzjt0lhVA/ | 17:23 |
foutattoro | how can I synchronize the cluster | 17:24 |
jrosser | well, i think you have to follow the instructions here https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html#recover-a-multi-node-failure | 17:25 |
jrosser | and tbh this is not something ever we expect openstack-ansible to be able to deal with | 17:26 |
jrosser | this needs some mariadb skills to bring the DB back online after a total outage | 17:27 |
jrosser | when rebooting/restarting control plane nodes it needs to be done very very carefully, checking the state of rabbitmq/galera at each step | 17:28 |
foutattoro | jrosser: thanks for this information | 17:43 |
foutattoro | I'm still dealing with this issue | 17:44 |
mgariepy | the good thing is that you have a node with : safe_to_bootstrap: 1 | 17:54 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible master: Add CSP headers for img-src and worker-src https://review.opendev.org/c/openstack/openstack-ansible/+/841154 | 17:55 |
foutattoro | mgariepy: could you explain a bit more please | 17:56 |
mgariepy | you should restart the cluster with that node. | 17:58 |
mgariepy | https://galeracluster.com/2016/11/introducing-the-safe-to-bootstrap-feature-in-galera-cluster/ | 17:58 |
foutattoro | jrosser mgariepy: what is the risk of deploying openstack-infrastructure services ? | 18:49 |
foutattoro | and backup vms from ceph storage | 18:50 |
mgariepy | foutattoro, not sur i understand what you mean | 18:55 |
foutattoro | since my galera cluster is not synchronied, I wonder if it is possible to deploy galera in new lxc containers | 18:56 |
mgariepy | restarting galera is not that hard. | 18:57 |
foutatoro | thanks mgariepy & jrosser: I have resated the galera cluster | 19:51 |
mgariepy | is your rabbitmq cluster happy ? | 19:52 |
foutatoro | I think yes. | 19:52 |
foutatoro | I face with ceh issue | 19:52 |
foutatoro | because openstack can't retrieve volume frem ceph https://paste.opendev.org/show/bnIUJ6sVLnsH7mnJZsLX/ | 19:53 |
mgariepy | can you paste: rabbitmq cluster_status | 19:53 |
mgariepy | galera is needed by keystone. | 19:54 |
mgariepy | which is needed for all the other services along with rabbitmq for most of them. | 19:54 |
foutatoro | I seems that keysone works since I can list servers with "openstack server list" | 19:55 |
foutatoro | but all vm are powerd off | 19:55 |
mgariepy | restart all the cinder containers | 19:56 |
mgariepy | and services. | 19:56 |
mgariepy | how comes you have 33% degraded if everything is up :/ | 19:57 |
jrosser | how many infra hosts are there? only two? | 19:57 |
jrosser | oddly there are 6 ceph mons | 19:57 |
mgariepy | an even number for quorum is not good. | 19:58 |
mgariepy | since you need majority to be happy (4 out of 6) | 19:58 |
mgariepy | if you only have 5, with 3 your cluster can work | 19:59 |
foutatoro | yes I've 2 infra hosts | 19:59 |
jrosser | thats not so great - two nodes is going to fail just as badly as a single one for galera and ceph | 20:01 |
foutatoro | jrosser: I run 3 ceph mon containers on each infra since I only have 2 hosts | 20:05 |
foutatoro | what do you suggest for the ceph cluster ? | 20:06 |
jrosser | if you lose one of those hosts then ceph is down | 20:06 |
jrosser | as mgariepy says you need 4 of 6 mons to be up as a minimum, thats just how ceph works | 20:06 |
jrosser | it must always be >50% of them available | 20:06 |
foutatoro | OK, I have to add a new infra host if I understand | 20:09 |
jrosser | yeah, that would certainly make things more robust | 20:09 |
jrosser | but you should be able to recover whats happening now | 20:10 |
foutatoro | but this doesn't explain why cinder can't get volumes from ceph | 20:10 |
jrosser | it's also worth reading up on galera cluster sizing too https://galeracluster.com/library/documentation/weighted-quorum.html | 20:10 |
jrosser | no it doesnt explain cinder, the place to start is the cinder api / cinder volumes, check the logs, restart the services if necessary | 20:12 |
foutatoro | jrosser: cender services seem to be up with erros https://paste.opendev.org/show/bjrEenjqdbkwR1FbaDEt/ | 20:23 |
foutatoro | with this deployement I can't find logs into /var/log/cinder/* | 20:24 |
foutatoro | mgariepy: is there a way to redure 33% degraded in ceph cluster ? | 20:25 |
jrosser | the logs are all in systemd journals | 20:27 |
jrosser | `journalctl -u <unit-name>` | 20:27 |
foutatoro | may be the error come from this message | 20:28 |
foutatoro | 2022-05-09 20:26:04.072 61 DEBUG cinder.scheduler.host_manager [req-ae393cc0-efc2-4 e2d-add6-120ae3bd3d05 - - - - -] Received volume service update from Cluster: ceph@RBD - Host: rbd:volumes@RBD: {'vendor_name': 'Open Source', 'driver_version': '1.2.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 6977.09, 'free_capacity_gb': 6977.09, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_s | 20:28 |
foutatoro | 'total_capacity_gb': 6977.09, 'free_capacity_gb': 6977.09, 'reserved_percentage': 0, | 20:28 |
foutatoro | jrosser, mgariepy: any suggestion ? | 20:43 |
*** dviroel is now known as dviroel|afk | 20:43 | |
foutatoro | jrosser, mgariepy:: I just found the root issue https://paste.opendev.org/show/buSSglduS58Tt2s7yzuG/ | 20:52 |
foutatoro | how to start cinder-volume please ? | 20:53 |
jrosser | i'm not so sure that it's stopped | 21:05 |
jrosser | restarting the cinder-volume service and looking at the journal perhaps | 21:06 |
foutatoro | I wonder also if my PV are full https://paste.opendev.org/show/bMbxrFpzlZ8WWFKrVfhY/ | 21:11 |
foutatoro | jrosser: do you think is it possible to loose volume after reboot | 21:33 |
jrosser | i suspect that this is a result of the cinder active/active stuff | 21:33 |
jrosser | but it's late now here so i have to go | 21:34 |
foutatoro | jrosser: ok thanks for your help | 21:37 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!