*** tpot has quit IRC | 00:02 | |
*** cloudnautique has quit IRC | 00:03 | |
SamYaple | kfox1111: you still around? | 00:16 |
---|---|---|
*** cemason has quit IRC | 00:26 | |
*** vinkman has quit IRC | 00:30 | |
kfox1111 | SamYaple: yeah. | 00:33 |
SamYaple | hola | 00:34 |
SamYaple | you had so questions about recovery? | 00:34 |
kfox1111 | hi. :) | 00:34 |
SamYaple | some* | 00:34 |
kfox1111 | yeah. specifically, what the procedure is for galera to recover it from power failure. | 00:34 |
SamYaple | that is a bit tricky because thats dependant on galera | 00:35 |
kfox1111 | the docs for it mention doing some games finding the last written server and bringing that one up first with special args, then adding the rest. | 00:35 |
kfox1111 | but I'm not sure how that will work with the containers. | 00:35 |
SamYaple | so the official way if you are unaware is to find /var/lib/mysql/grastate.dat with the highest revision | 00:35 |
*** dimsum__ has quit IRC | 00:35 | |
SamYaple | but when it crashes that is sometimes -1 | 00:35 |
SamYaple | but basically you have to pick a node to start the cluster again with | 00:36 |
SamYaple | ideally the llast node to shutdown | 00:36 |
SamYaple | this cant be done automatically, so it will be in the deployers responsibilities to do this | 00:36 |
kfox1111 | I'm guessing with powerfailure, they will be basically the same. | 00:36 |
SamYaple | maybe maybe not, what if one was down ahead of time anyway | 00:37 |
kfox1111 | but how do you start it back up with the containers? do you tweak a config file and docker start it back up, or do you use ansible? | 00:37 |
kfox1111 | ah. true. | 00:37 |
SamYaple | since we use xtrabackup for the configs, i do my backups with xtrabackup | 00:38 |
SamYaple | it talks to mariadb over tcp so containers dont really play a part here | 00:38 |
kfox1111 | I don't follow. | 00:39 |
kfox1111 | your backing up/restoring the data to a fresh cluster for recovery of power failures? | 00:40 |
SamYaple | oh im sorry | 00:40 |
SamYaple | i thought you askeed how i backup the data | 00:40 |
SamYaple | you asked how to "start it back up" | 00:40 |
kfox1111 | right. | 00:40 |
kfox1111 | I see logic in the bootstrap code to do the startup dance, but not a way to do it otherwise. | 00:41 |
SamYaple | no to start the cluster fresh again the easiest thing to do is make the gcomm:// list empty and start a contaier | 00:41 |
SamYaple | you can do this with the override file for a single node | 00:41 |
SamYaple | its a bit of a dance, sure. but it only involves running the playbooks once if you do it right | 00:42 |
kfox1111 | so you do that on one of the hosts, and then docker start mariadb on that host first? | 00:42 |
SamYaple | kfox1111: yes, but that would require COPY_ALWAYS as your config method | 00:42 |
SamYaple | without that you have to run the playbooks twice | 00:42 |
SamYaple | the first time limited to a single host, the second time without the gcomm override on all hosts | 00:42 |
SamYaple | as much as I want to make this automated, this is realyl a manual process for reasons discussed before | 00:43 |
kfox1111 | if its not copy always, then how does the data stay safe? | 00:43 |
kfox1111 | is the mariadb data in a seperate volume? | 00:44 |
kfox1111 | I'm ok with it being manual. I just need it documented. :) | 00:44 |
SamYaple | there is a "data container" as docker has taken to calling it for the mysql data | 00:44 |
kfox1111 | ah. so you can rebuild the container leaving the data safe. ok. | 00:44 |
SamYaple | kfox1111: ill tell you what. if you file a bug stating lack of documentation in this area and assign it to me ill get this for your in writing by next week | 00:46 |
SamYaple | i dont see the proceedure to much different than a normal recovery | 00:46 |
kfox1111 | Awesome. Thanks. :) | 00:46 |
SamYaple | but i get how it is a bit more complex | 00:46 |
kfox1111 | Yeah. but the potential for getting it wrong is "poof" :) | 00:47 |
SamYaple | well thats what backups are for | 00:47 |
SamYaple | worst case you _can_ start a new cluster and restore backup | 00:47 |
SamYaple | its really hard to get it wrong in my opinion.... if you have a proper backup | 00:47 |
kfox1111 | true. but I've never seen just how bad it is to regress your database a day or two when vm's are coming and going in the cloud. no idea what happens. :/ | 00:48 |
kfox1111 | And I really really don't want to find out. :) | 00:48 |
SamYaple | its not so bad. if the database is down you aren't creating new vms anyway | 00:48 |
kfox1111 | its the vm's that are created between the backup and outage that are lost track of? | 00:49 |
SamYaple | and ideally if your database is down you keep access to your environment limited to readonly more or less until you know the maintenance is successful | 00:49 |
SamYaple | i mean if the db is down you arent creating vms | 00:49 |
kfox1111 | tenants created, quotas changed, a lot of potential for skews that I don't think have any recovery mechanisms. | 00:49 |
SamYaple | you are tlaking about when you bring the db back up and then vms get created and then the db crashes | 00:49 |
SamYaple | if the db is down there will not be any changes in your environemnt | 00:50 |
kfox1111 | no, I'm saying, if you take a backup at midnight, then it crashes at noon, you have half a day of your users doing things on the cloud that are lost if you have to rebuild the db from backups. | 00:50 |
*** alisonh has quit IRC | 00:50 | |
SamYaple | ah yes | 00:50 |
SamYaple | its not so bad actually. data is intact | 00:50 |
SamYaple | new changes are obviously lost | 00:51 |
SamYaple | vms that have migrated are also a pain | 00:51 |
SamYaple | but with shared backend its pretty quick to fix that stuff | 00:51 |
SamYaple | im not advocatign for it mind you :) | 00:51 |
kfox1111 | sure. :) | 00:51 |
SamYaple | but you can always dump the raw /varlib/mysql/ folder before trying to bring the cluster back up | 00:52 |
kfox1111 | I've had to fix up nova quota's miscalculations before. just something I hope to avaoid whenever possible. :) | 00:52 |
kfox1111 | ah. thats true too. | 00:52 |
kfox1111 | a good note for the procedure too. :) | 00:52 |
kfox1111 | can you cow a docker volume? | 00:52 |
SamYaple | not that im aware, but if yuor backend is a cow filesystem (btrfs, zfs) you can cow the subvolume | 00:53 |
larsks | kfox1111: Not directly, but if you were using host volumes you could do that manually via overlayfs... | 00:53 |
SamYaple | larsks: how would that look with overlayfs? | 00:53 |
kfox1111 | bummer. cause that would make it a realy easy procedure. snapshot data volume, then procede to restart cluster. | 00:54 |
larsks | For each container, you could make a new overlayfs mount with the same base directory. | 00:54 |
larsks | And then mount the merged volume (so no container would mount the base) | 00:54 |
SamYaple | eh. yea thats not so easy. seems like a greater risk to data of trying to do that | 00:55 |
kfox1111 | is rabbit cluster's totally stateles? power off recovery is just start it all back up? | 00:56 |
SamYaple | kfox1111: ugh rabbitmq | 00:56 |
larsks | Well...I don't know that it's at all risky, but it is certainly manual. | 00:56 |
SamYaple | it should be but rabbitmq is horrible with clustering | 00:56 |
kfox1111 | yeah. probably safer to just tar dump the data volume. | 00:56 |
SamYaple | i actually have a new clustering patch in the queue for rabbitmq which _would_ make it that way | 00:56 |
SamYaple | honestly with rabbitmq i would just commit to losign those messages if it doesnt start backup perfectly | 00:57 |
kfox1111 | Yeah, when completely powered off, I'd kind of expect it. I don't think openstack services really care? | 00:58 |
kfox1111 | should I adjust the bug report to just ask for power off /restart instructions for the cloud rather then just galera? | 00:58 |
SamYaple | youll lose recent messages | 00:58 |
kfox1111 | but the services will just make a new rpc call though? | 00:58 |
SamYaple | so potetially some accepted apis that break (like volume deletions?) | 00:58 |
*** achanda has quit IRC | 00:59 | |
kfox1111 | hmm.. could be. | 00:59 |
SamYaple | well if all of your control nodes failed we are talking full datacenter poerfailure no? | 00:59 |
kfox1111 | yeah. most likely. | 00:59 |
kfox1111 | we had one just the other day. :/ | 00:59 |
kfox1111 | they happen rarely, but they do happen. :/ | 00:59 |
SamYaple | presummably you put your control nodes on different power lines so _all_ nodes would be down | 00:59 |
SamYaple | so everything is starting fresh | 01:00 |
kfox1111 | right. | 01:01 |
kfox1111 | thats the case I'm worried about. | 01:01 |
SamYaple | yea i think youll be more ok than you realize | 01:02 |
*** achanda has joined #kolla | 01:02 | |
SamYaple | ive done a few full stop recoveries with kolla | 01:02 |
SamYaple | not so bad | 01:02 |
SamYaple | but its not at the 50-100 node scale | 01:02 |
SamYaple | still the mechanices shouldnt change | 01:02 |
kfox1111 | Yeah. I totally believe it can be done. :) just need it documented before my management will consider kolla production worthy enough to use. :/ | 01:03 |
SamYaple | i understand | 01:03 |
SamYaple | just throw that bug my way so i dont foget | 01:03 |
kfox1111 | yup. most of the way done with it. just a sec. | 01:04 |
*** unicell has quit IRC | 01:04 | |
kfox1111 | Submitted: | 01:07 |
kfox1111 | https://bugs.launchpad.net/kolla/+bug/1507065 | 01:07 |
openstack | Launchpad bug 1507065 in kolla "documentation for power recovery" [Undecided,New] | 01:07 |
kfox1111 | having a problem assigning it to you though. | 01:07 |
*** alisonh has joined #kolla | 01:08 | |
kfox1111 | Thanks for looking into it. | 01:08 |
*** daneyon_ has quit IRC | 01:15 | |
*** achanda has quit IRC | 01:15 | |
*** vinkman has joined #kolla | 01:39 | |
*** vinkman has quit IRC | 01:40 | |
*** vinkman has joined #kolla | 01:41 | |
*** vinkman has quit IRC | 01:41 | |
*** vinkman has joined #kolla | 01:42 | |
*** vinkman has quit IRC | 01:44 | |
*** dimsum__ has joined #kolla | 01:47 | |
*** dimsum__ has quit IRC | 01:47 | |
*** dimsum__ has joined #kolla | 01:48 | |
*** tummy has joined #kolla | 02:04 | |
openstackgerrit | Kuo-tung Kao proposed openstack/kolla: add "registry" flag to "tools/build.py" https://review.openstack.org/234629 | 02:08 |
*** achanda has joined #kolla | 02:21 | |
*** dimsum__ has quit IRC | 02:37 | |
*** cemason has joined #kolla | 02:38 | |
*** cemason has quit IRC | 02:43 | |
*** tummy has quit IRC | 02:51 | |
*** bmace has quit IRC | 02:53 | |
*** bmace has joined #kolla | 03:08 | |
*** bmace has quit IRC | 03:15 | |
*** tummy has joined #kolla | 03:17 | |
*** bmace has joined #kolla | 03:30 | |
*** cloudnautique has joined #kolla | 03:32 | |
*** tummy has quit IRC | 03:33 | |
*** dimsum__ has joined #kolla | 03:38 | |
*** dimsum__ has quit IRC | 03:43 | |
*** bmace has quit IRC | 03:49 | |
*** tummy has joined #kolla | 03:53 | |
*** bmace has joined #kolla | 04:04 | |
*** tummy has quit IRC | 04:06 | |
*** bmace has quit IRC | 04:24 | |
*** bmace has joined #kolla | 04:39 | |
*** dimsum__ has joined #kolla | 04:41 | |
*** dimsum__ has quit IRC | 04:46 | |
*** dtturner has quit IRC | 04:50 | |
*** unicell has joined #kolla | 05:32 | |
*** CBR09 has joined #kolla | 05:34 | |
*** exploreshaifali has joined #kolla | 05:51 | |
*** asalkeld has quit IRC | 06:01 | |
*** exploreshaifali has quit IRC | 06:36 | |
*** exploreshaifali has joined #kolla | 06:41 | |
*** achanda has quit IRC | 06:41 | |
*** dimsum__ has joined #kolla | 06:44 | |
*** achanda has joined #kolla | 06:45 | |
*** dimsum__ has quit IRC | 06:49 | |
*** achanda has quit IRC | 07:07 | |
*** shardy_a1k has joined #kolla | 07:09 | |
*** shardy_afk has quit IRC | 07:10 | |
*** cloudnautique has quit IRC | 07:11 | |
*** shardy_a1k has quit IRC | 07:14 | |
*** shardy_afk has joined #kolla | 07:15 | |
*** exploreshaifali has quit IRC | 07:16 | |
*** exploreshaifali has joined #kolla | 07:29 | |
*** jmccarthy has quit IRC | 07:30 | |
*** jmccarthy has joined #kolla | 07:30 | |
*** cemason has joined #kolla | 08:06 | |
*** cemason has quit IRC | 08:10 | |
openstackgerrit | Michal Rostecki proposed openstack/kolla: [WIP] Use trusts in heat.conf https://review.openstack.org/236198 | 08:27 |
openstackgerrit | Michal Rostecki proposed openstack/kolla: [WIP] Use trusts in heat.conf https://review.openstack.org/236198 | 08:29 |
*** pbourke has quit IRC | 08:39 | |
*** pbourke has joined #kolla | 08:40 | |
nihilifer | o/ | 08:42 |
nihilifer | is ansible-magnum blueprint free to assign? | 08:43 |
*** dimsum__ has joined #kolla | 08:46 | |
*** dimsum__ has quit IRC | 08:52 | |
*** exploreshaifali has quit IRC | 09:30 | |
*** dimsum__ has joined #kolla | 09:49 | |
*** dimsum__ has quit IRC | 09:53 | |
*** cloudnautique has joined #kolla | 10:13 | |
*** cloudnautique has quit IRC | 10:17 | |
*** cemason has joined #kolla | 10:33 | |
*** dwalsh has joined #kolla | 10:34 | |
*** cemason has quit IRC | 10:37 | |
SamYaple | nihilifer: go for it | 10:52 |
openstackgerrit | Merged openstack/kolla: add "registry" flag to "tools/build.py" https://review.openstack.org/234629 | 10:55 |
*** achanda has joined #kolla | 11:07 | |
*** achanda has quit IRC | 11:07 | |
*** CBR09 has quit IRC | 11:23 | |
openstackgerrit | Sam Yaple proposed openstack/kolla: Remove vip for rabbitmq https://review.openstack.org/235777 | 11:39 |
*** exploreshaifali has joined #kolla | 11:49 | |
*** dimsum__ has joined #kolla | 11:50 | |
*** dimsum__ has quit IRC | 11:56 | |
*** dwalsh has quit IRC | 12:18 | |
*** diogogmt has quit IRC | 12:55 | |
*** diogogmt has joined #kolla | 12:58 | |
*** exploreshaifali has quit IRC | 13:01 | |
*** dimsum__ has joined #kolla | 13:09 | |
*** jainman has joined #kolla | 13:19 | |
jainman | bug -1502633 -Need help for gerrit review | 13:20 |
*** jainman has quit IRC | 13:35 | |
*** jainman has joined #kolla | 13:47 | |
jainman | Hi, Having issue in commiting gerrit-review - Bug https://bugs.launchpad.net/kolla/+bug/1502633 | 13:48 |
openstack | Launchpad bug 1502633 in kolla "connecting to rabbitmq returns ECONNREFUSED" [Critical,Invalid] | 13:48 |
*** jainman has quit IRC | 13:54 | |
*** cloudnautique has joined #kolla | 14:35 | |
*** sdake has joined #kolla | 14:54 | |
*** cloudnautique has quit IRC | 14:55 | |
*** sdake has quit IRC | 15:20 | |
openstackgerrit | Michal Rostecki proposed openstack/kolla: [WIP] Add Ansible support for Magnum https://review.openstack.org/236223 | 16:16 |
*** daneyon has joined #kolla | 16:24 | |
*** daneyon_ has joined #kolla | 16:25 | |
*** daneyon has quit IRC | 16:29 | |
*** achanda has joined #kolla | 17:06 | |
-openstackstatus- NOTICE: Gerrit will be offline for project renames starting at 1800 UTC. | 17:06 | |
*** ChanServ changes topic to "Gerrit will be offline for project renames starting at 1800 UTC." | 17:06 | |
*** achanda has quit IRC | 17:26 | |
*** cemason has joined #kolla | 17:57 | |
-openstackstatus- NOTICE: Gerrit is offline for project renames. | 17:59 | |
*** ChanServ changes topic to "Gerrit is offline for project renames." | 17:59 | |
*** cloudnautique has joined #kolla | 18:02 | |
*** cloudnautique has quit IRC | 18:04 | |
*** cloudnautique has joined #kolla | 18:06 | |
*** cemason has quit IRC | 18:12 | |
*** ChanServ changes topic to "Kolla IRC meetings on Wednesday - Agenda @ https://wiki.openstack.org/wiki/Meetings/Kolla - IRC channel is *LOGGED* @ http://eavesdrop.openstack.org/irclogs/%23kolla/" | 18:35 | |
-openstackstatus- NOTICE: Gerrit is back online. Github transfers are in progress and should be complete by 1900 UTC. | 18:35 | |
*** achanda has joined #kolla | 18:37 | |
*** openstackgerrit has quit IRC | 18:46 | |
*** openstackgerrit has joined #kolla | 18:46 | |
*** cemason has joined #kolla | 19:13 | |
*** dimsum__ has quit IRC | 19:14 | |
*** cemason has quit IRC | 19:18 | |
*** openstackgerrit has quit IRC | 19:31 | |
*** openstackgerrit has joined #kolla | 19:31 | |
*** cemason has joined #kolla | 19:37 | |
*** cloudnautique has quit IRC | 19:46 | |
*** cloudnautique has joined #kolla | 19:47 | |
*** achanda has quit IRC | 19:49 | |
*** cloudnautique has quit IRC | 19:52 | |
*** dimsum__ has joined #kolla | 20:15 | |
*** dimsum__ has quit IRC | 20:20 | |
*** vinkman has joined #kolla | 20:42 | |
*** jtriley has joined #kolla | 20:55 | |
*** achanda has joined #kolla | 21:10 | |
*** cemason has quit IRC | 21:12 | |
*** jtriley has quit IRC | 21:24 | |
*** cemason has joined #kolla | 21:43 | |
*** dimsum__ has joined #kolla | 22:18 | |
*** dimsum__ has quit IRC | 22:23 | |
*** dimsum__ has joined #kolla | 22:30 | |
*** achanda has quit IRC | 22:32 | |
*** cloudnautique has joined #kolla | 22:50 | |
*** cloudnautique has quit IRC | 22:57 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!