mcape | Hello all, from the Sunday emergency department! | 06:01 |
---|---|---|
mcape | We have two region swift storage, with three replicas on R1Z1, R2Z1, and R2Z2. | 06:02 |
mcape | For some, still unknown reason, three disks on R2Z2 were lost yesterday. | 06:02 |
mcape | Our storage was 97% full, and now object nodes on R1Z1 are running out of disk space, they are already on 98-99% and growing. | 06:03 |
mcape | I do not understand why only R1Z1 object servers are taking in handoff partitions from the lost disks? Why not R2Z1, or remaining disks on R2Z2? | 06:04 |
mcape | Or what is going on? | 06:05 |
mcape | Perhaps someone could shed light to the processes that are happening during such outages? | 06:05 |
mcape | Another interesting observation is that before, when we did datacenter evacuation from city A to city B, | 06:05 |
mcape | and plugged off entire zone at once for two days, this kind of swelling did not happen at all. | 06:05 |
mcape | So currently we think to do such a thing as a workaround - shutdown entire R2Z2 until disks will be fixed. Which sounds kinda unintuitive and perhaps plainly wrong | 06:05 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!