*** odyssey4me is now known as Guest7290 | 04:14 | |
snadge | my openstack-ansible ussuri pre-production platform crashed the other day, due to disk space, and now galera won't restart.. should i just re-run the playbook or try and figure out why it won't start? | 11:31 |
---|---|---|
snadge | now it just starts and i haven't even done anything.. sigh, its just been one of those kinda days | 11:37 |
noonedeadpunk | cool that it's solved anyway:) | 11:41 |
noonedeadpunk | but for the future - I wouldn't trust galera recovery to the playbook anyway | 11:41 |
noonedeadpunk | it's smth that should be done manually | 11:42 |
*** arxcruz is now known as arxcruz|pto | 11:55 | |
snadge | i think the first start cleared a temporary table and bailed, and all i needed to do was start it again.. but i had something urgent come up and the day was ruined, then after awful day i thought maybe I should take a look at this.. reluctantly.. and easy success | 12:13 |
*** odyssey4me is now known as Guest7335 | 12:17 | |
*** odyssey4me is now known as Guest7338 | 12:31 | |
opendevreview | Merged openstack/openstack-ansible stable/wallaby: Fix ceph-ansible shallow_since date https://review.opendev.org/c/openstack/openstack-ansible/+/808999 | 13:35 |
spatel | noonedeadpunk how do you take backup of mysql database for openstack? | 13:58 |
noonedeadpunk | we use mariabackup script shipped with galera role :) | 13:58 |
spatel | mysqldump --opt --all-databases > openstack.sql ? | 13:58 |
spatel | oh! where is that script? | 13:58 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L218-L238 | 13:59 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/templates/mariabackup_script.py.j2 | 13:59 |
spatel | all i need to do add galera_mariadb_backups_enabled: true in user_variables.yml right? | 14:00 |
noonedeadpunk | kind of. jsut ensure you have that in your role because it landed not that long time ago | 14:02 |
*** odyssey4me is now known as Guest7352 | 14:07 | |
spatel | oh wait what do you mean in your role? | 14:11 |
spatel | noonedeadpunk ^ | 14:12 |
noonedeadpunk | nah, sorry, I meant more osa release | 14:13 |
noonedeadpunk | because that landed in V | 14:13 |
spatel | i have V :) | 14:18 |
spatel | i am upgrading my prod from V-W right now so want to make sure backup is latest and gratest | 14:19 |
spatel | greatest | 14:19 |
noonedeadpunk | oh, cool | 14:19 |
noonedeadpunk | let us know how that passed | 14:19 |
noonedeadpunk | I'm a bit nervous about PKI stuff and rabbitmq | 14:20 |
spatel | why? | 14:23 |
spatel | do you have any doubt so please share before i press buttom | 14:23 |
spatel | button* | 14:24 |
noonedeadpunk | well in CI it was all working:) | 14:26 |
noonedeadpunk | and technically it should be fine | 14:26 |
spatel | i did test in lab and didn't see any issue | 14:27 |
spatel | i have multi-node lab like production | 14:27 |
noonedeadpunk | as you might know - we're generating certificate authority (you might want to define CSR details for it!) | 14:27 |
noonedeadpunk | and replace SSL used for galera and distribute CA across all hosts and containers | 14:27 |
spatel | oh wait.. why it didn't ask me to do in LAB? | 14:28 |
noonedeadpunk | because we have defaults set | 14:28 |
spatel | i am ok with default.. as far as it works | 14:28 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/all/ssl.yml#L33-L66 | 14:28 |
spatel | can i disable SSL if i don't like ? | 14:29 |
noonedeadpunk | nope | 14:29 |
spatel | hmm so SSL is mandatory | 14:29 |
noonedeadpunk | well, roles allow that, but you can't iirc because some oslo.messaging dependency now requires rabbit cert to be trusted and encryption enabled | 14:30 |
noonedeadpunk | oh, well, thinking about it, I think you can disable encryption for rabbit after all | 14:30 |
noonedeadpunk | as that's how we workarounded the issue with self-signed cert during development of stuff | 14:31 |
spatel | how to disable? | 14:32 |
spatel | i just want to know all tools before i start upgrade :) | 14:33 |
noonedeadpunk | you would need setting `openstack_pki_authorities: []` and `rabbitmq_use_ssl: false` | 14:33 |
spatel | in user_variables right? | 14:33 |
noonedeadpunk | yep | 14:34 |
spatel | that is for rabbitmq what about mysql? | 14:34 |
noonedeadpunk | mysql not imoplemented yet | 14:35 |
spatel | great! so that is easy | 14:35 |
noonedeadpunk | well, after W you would need that anyway just in case, if you want to have live migrations | 14:35 |
spatel | we don't do live migration in our cloud | 14:36 |
spatel | we have all local storage | 14:36 |
noonedeadpunk | ah, I see | 14:36 |
noonedeadpunk | but actually local storage could live migrate as well iirc | 14:36 |
noonedeadpunk | it just needs ssl :) | 14:36 |
spatel | block :) | 14:36 |
spatel | what is the connection with live migration and SSL ? | 14:36 |
spatel | nova use SSH for block migration right? | 14:37 |
noonedeadpunk | This behaviour will be removed after W | 14:37 |
spatel | hmm | 14:37 |
noonedeadpunk | And with SSH migration you can't migrate local block storage | 14:37 |
noonedeadpunk | https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.live_migration_tunnelled | 14:37 |
spatel | oh | 14:38 |
spatel | libvirtd | 14:39 |
spatel | i have added galera_mariadb_backups_enabled: true now what playbook i should be running without restarting mysql :) | 14:39 |
noonedeadpunk | galera-install with some tag... | 14:39 |
spatel | its production so better ask stupid question | 14:39 |
noonedeadpunk | --tags galera_server-backups | 14:40 |
spatel | sweet let me run | 14:40 |
noonedeadpunk | thinking about that - it's worth moving this out of galera-server part... | 14:41 |
spatel | hmm nothing happened | 14:42 |
spatel | let me check if that tags is correct | 14:43 |
noonedeadpunk | doh | 14:43 |
noonedeadpunk | we should have backported https://opendev.org/openstack/openstack-ansible-galera_server/commit/677dddf21a6d976c88e87fd0230ec1452a18217f | 14:43 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server stable/wallaby: Improve support for tags https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/809147 | 14:44 |
spatel | noonedeadpunk so its no available ? | 14:45 |
noonedeadpunk | yeah, they're not working :( | 14:46 |
spatel | anyway.. let me use mysqldump | 14:46 |
spatel | thats is ok | 14:46 |
spatel | do you have command to take full backup with all routine + user/password? | 14:46 |
noonedeadpunk | I think you can create some playbook and use tasks_from: galera_server_backups.yml | 14:46 |
spatel | sure | 14:47 |
spatel | is this enough to take backup. mysqldump --opt --all-databases > openstack.sql | 14:47 |
spatel | or any special option you would recommand | 14:47 |
noonedeadpunk | that would block your tables during backup just in case | 14:48 |
noonedeadpunk | but it will work | 14:48 |
noonedeadpunk | also you would need to backup grants sepearatelly | 14:48 |
spatel | hmm let me see how to do that | 14:51 |
spatel | mysqldump -u root -p --routines --triggers --opt --quote-names --all-databases > openstack.sql | 14:51 |
spatel | routines should do grant also right? | 14:52 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Change pki_create_ca condition https://review.opendev.org/c/openstack/openstack-ansible/+/809205 | 14:52 |
noonedeadpunk | um, no idea honestly. I usually used another request for that | 14:53 |
spatel | do you have command, if you don't mind to share :) | 14:54 |
spatel | let me do some restore test on LAB to verify back and restore works | 14:55 |
noonedeadpunk | Not 100% sure as I guess I lost mine copy of it, but it should be smth like mysql --skip-column-names -A -e"SELECT CONCAT('SHOW GRANTS FOR ''',user,'''@''',host,''';') FROM mysql.user WHERE user<>''" | mysql --skip-column-names -A | sed 's/$/;/g' > MySQLUserGrants.sql | 14:56 |
noonedeadpunk | maybe `mysqldump -u root -p mysql user` also works - dunno | 14:57 |
noonedeadpunk | last time I used mysqldump years ago... | 14:58 |
spatel | Thank you! let me give it a shot in LAB | 14:58 |
spatel | worth putting small doc in OSA (just incase someone like me struggling) | 14:59 |
noonedeadpunk | why? when we have mariabackup inside the role? | 14:59 |
spatel | in that case also we need something to let people know right? | 15:00 |
spatel | OSA doesn't have any FAQ section, it would be great to have that to just reference links in FAQ for help | 15:01 |
noonedeadpunk | oh, huh, I was pretty sure we have smth | 15:01 |
noonedeadpunk | but indeed we don' ( | 15:01 |
spatel | I haven't seen any doc in OSA related recover except rebuild galera cluster | 15:02 |
noonedeadpunk | I guess it's matter of time that we don't have | 15:02 |
noonedeadpunk | but anybody can contribute to docs :) | 15:03 |
spatel | me | 15:03 |
spatel | I would also like to create FAQ if you don't mind.. because that helps new folks to get onboard | 15:04 |
spatel | Quick and dirty Question / Answer | 15:04 |
noonedeadpunk | I think that can be discussed. What kind of stuff you see there? | 15:04 |
noonedeadpunk | ie what kind of questions? | 15:04 |
spatel | like how to run specific playbook, how to use tags, how to use -vvv to debug, how to add new compute node, how to recover rabbitmq, how to recover mysql etc... | 15:05 |
spatel | we do have doc but they are not on single page.. with reference | 15:06 |
spatel | how to disable compute node.. / how to remove compute node / and many more small and simple question | 15:07 |
spatel | something like this - https://docs.openstack.org/devstack/latest/faq.html | 15:08 |
noonedeadpunk | eventually the thing here is to define what questuions should be covered there and whats not | 15:09 |
noonedeadpunk | like disablement of compute - is smth related to nova for instance. We can reference nova doc ofc there. But then also neutron comes. And cinder for some kind of deployemtns | 15:09 |
spatel | we don't need to decide right now but we can start with simple thing and then keep adding more stuff as we grow | 15:10 |
noonedeadpunk | I guess we tried to do smth like that in https://docs.openstack.org/openstack-ansible/latest/reference/commands/reference.html | 15:10 |
spatel | I know what you saying.. | 15:12 |
spatel | noonedeadpunk do you have any experience with rental server? | 15:25 |
noonedeadpunk | nope | 15:25 |
spatel | damn it :) | 15:25 |
noonedeadpunk | I always everywhere was working on own hardware | 15:25 |
spatel | we have huge pressure to run stuff on remote rental datacenter and trying to figure out how | 15:25 |
spatel | we need global presence and its hard to have local DC and staff | 15:26 |
spatel | japan/Singapore/south america etc. | 15:26 |
noonedeadpunk | it depends on amount of presence. Because you can rent a rack kind of everywhere | 15:27 |
spatel | i am going with server.com to see how it goes | 15:27 |
spatel | rental is easy but we want to deploy vlan provider so need bunch of vlans in fabric | 15:27 |
spatel | want to make sure they offer that too | 15:28 |
spatel | noonedeadpunk you are correct we need separate dump for grant table. so i did this and getting this error - https://paste.opendev.org/show/809343/ | 16:05 |
spatel | using your command to dump grant and then copy paste all grant to new DB but seems like something is missing | 16:06 |
noonedeadpunk | spatel: So first you need to restore your all_databases | 16:06 |
spatel | i did | 16:06 |
spatel | i can see all users in mysql.user table and all data | 16:06 |
noonedeadpunk | huh | 16:06 |
spatel | just grant part is missing in my restore | 16:07 |
noonedeadpunk | yeah, exactly | 16:07 |
noonedeadpunk | so in mysql.user you have users records? | 16:07 |
spatel | yes every one.. | 16:08 |
spatel | i compared with my db where i tool backup | 16:08 |
noonedeadpunk | dunno, that should have worked... | 16:08 |
noonedeadpunk | I think you can try to google then how to backup grants | 16:08 |
spatel | https://paste.opendev.org/show/809344/ | 16:08 |
spatel | :) let me figure out | 16:09 |
noonedeadpunk | maybe smth has changed as again - I used that years ago... | 16:10 |
spatel | totally | 16:11 |
spatel | let me figure out and get back to you | 16:11 |
noonedeadpunk | but looks pretty valid to me tbh... | 16:12 |
spatel | https://www.thegeekdiary.com/mysql-how-to-backup-user-privileges-as-create-user-and-or-grant-statements/ | 16:12 |
spatel | check this out | 16:12 |
noonedeadpunk | that exactly what I did for 5.5 haha | 16:13 |
noonedeadpunk | I mean the command I gave | 16:13 |
spatel | i am running 10.5.12 | 16:13 |
noonedeadpunk | no idea what is an alternative in mysql world | 16:14 |
noonedeadpunk | I guess it's smth like 5.7... | 16:14 |
spatel | may be mariadb has something special | 16:14 |
noonedeadpunk | I doubt | 16:14 |
spatel | https://mariadb.com/kb/en/mariabackup-overview/ | 16:15 |
spatel | apt-get install mariadb-backup | 16:15 |
spatel | hmm it has special binary to do that | 16:15 |
noonedeadpunk | mariadb-backup is whole new concept | 16:16 |
spatel | hmm | 16:16 |
noonedeadpunk | that we use in role with script | 16:16 |
noonedeadpunk | it's fork from percona xtrabackup | 16:16 |
noonedeadpunk | which can make incremental backups | 16:16 |
noonedeadpunk | spatel - question: does any service comes to your mind that has same naming for ubuntu/centos/debian both for package name and systemd name? | 16:17 |
spatel | net-tools | 16:17 |
spatel | systemd name... hmm | 16:17 |
spatel | let me think | 16:18 |
noonedeadpunk | so eventually, I'm trying to write some tests for ansible systemd module... | 16:18 |
noonedeadpunk | and need smth robust not to make complicated conditions | 16:18 |
noonedeadpunk | I thought about chrony at first... | 16:19 |
noonedeadpunk | but it's tricky | 16:19 |
spatel | rsyslog | 16:20 |
spatel | you need same package and service name.. hmm still thinking | 16:20 |
spatel | crond | 16:20 |
noonedeadpunk | nah, package and service name might be different | 16:26 |
noonedeadpunk | eventualy I just need some service only) | 16:26 |
noonedeadpunk | rsyslog sounds good enough! | 16:26 |
noonedeadpunk | thanks a lot! | 16:28 |
spatel | cool | 16:36 |
jrosser | noonedeadpunk: I think there is a universal package name used in the old tests for openstack-hosts | 16:45 |
jrosser | oh | 16:45 |
jrosser | service not package, ignore | 16:45 |
spatel | noonedeadpunk solution is - FLUSH PRIVILEGES; | 16:48 |
spatel | :D | 16:48 |
spatel | as soon as i did FLUSH PRIVILEGES; it accepted grant command | 16:48 |
noonedeadpunk | ah, lol | 16:51 |
spatel | noonedeadpunk i am seeing very odd issue in lab, i have created one mysql server outside openstack to test backup/restore and i did restore everything. now i am telling haproxy to go to my newly created db machine but getting this error https://paste.opendev.org/show/809345/ | 17:21 |
spatel | i can connect directly to remote mysql but not able to do that via haproxy vip | 17:22 |
spatel | do we have any security stuff coming between it ? | 17:22 |
spatel | noonedeadpunk there? | 18:13 |
spatel | this is very odd, i can't use other mysql with haproxy.. i wonder some security issue coming on my way | 18:14 |
spatel | everything looks ok then why haproxy doesn't able to work with new mysql | 18:16 |
mgariepy | anyone has seen this with rabbitmq ? https://paste.openstack.org/show/809348/ | 18:20 |
mgariepy | that's after a restart.. | 18:21 |
mgariepy | before i had : Channel error on connection | 18:22 |
spatel | mgariepy is your cluster status showing healthy? | 18:23 |
spatel | i have seen that error and i believe i re-build whole rabbitMQ | 18:23 |
spatel | nuke it | 18:24 |
mgariepy | rebuilt, one node at the time ? | 18:26 |
spatel | this is what i did last time https://gist.github.com/satishdotpatel/f4f6cc2026da11fbb27a3527caba448a | 18:28 |
spatel | if its production then just extra careful.. | 18:28 |
spatel | when i say nuke means re-build RabbitMQ / destroy and re-build | 18:29 |
spatel | do you have any notification queue ? | 18:29 |
spatel | try to purge that first | 18:29 |
spatel | https://gist.github.com/satishdotpatel/df751b5281726dca77065f78eab9584a | 18:30 |
spatel | This is other way to destroy rabbitMQ and rebuild from scratch - https://gist.github.com/satishdotpatel/9f11c54e86cb0f3ad59d5feac1827b1f | 18:30 |
mgariepy | no notification queues. | 18:32 |
spatel | good | 18:32 |
mgariepy | yep it's prod. | 18:33 |
spatel | are you able to spin up vm or not? | 18:33 |
spatel | if not then your downtime already started :) | 18:33 |
mgariepy | i know ;p | 18:33 |
spatel | i would say just nuke it | 18:33 |
spatel | i did many time in my cloud, because there is no easy way to fix rabbitMQ | 18:34 |
spatel | try https://gist.github.com/satishdotpatel/f4f6cc2026da11fbb27a3527caba448a if not work then go with https://gist.github.com/satishdotpatel/9f11c54e86cb0f3ad59d5feac1827b1f | 18:35 |
mgariepy | vms are still running. so it's not too bad haha :D | 18:38 |
spatel | yes, they won't get impacted even your mysql is dead :) | 18:39 |
mgariepy | stop all rabbitmq > start all rabbitmq. seems to fix the issue. | 19:21 |
mgariepy | for now. | 19:21 |
mgariepy | from : https://groups.google.com/g/rabbitmq-users/c/q1FEA4Q0z3Q | 19:50 |
mgariepy | thanks for your help spatel. | 19:52 |
spatel | nice! | 19:52 |
mgariepy | i'll need to dig in the logs to see what happened tho. | 19:53 |
spatel | may be network isolation | 20:00 |
spatel | split-brain for few second | 20:00 |
spatel | did you stop start all node at same time or one by one? | 20:00 |
mgariepy | all at the same time | 20:07 |
mgariepy | ansible rabbitmq_all -m service -a "name=rabbitmq-server state=stopped" ; sleep 10; ansible rabbitmq_all -m service -a "name=rabbitmq-server state=started" | 20:08 |
spatel | hmm | 20:11 |
spatel | i thought you have to start first node and then second and third for quorum | 20:11 |
spatel | like Galera | 20:11 |
mgariepy | they had the data i guess. | 20:25 |
*** prometheanfire is now known as Guest1 | 21:20 | |
*** promethe- is now known as prometheanfire | 21:53 | |
snadge | im having trouble deleting a volume, it gets stuck in an error deleting state.. i've tried setting available and detaching and trying again, to no avail | 22:58 |
snadge | the volume appears to be mapped on the san and is online.. nothing in cinder logs | 23:12 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!