Sunday, 2024-04-21

mcape	Hello all, from the Sunday emergency department!	06:01
mcape	We have two region swift storage, with three replicas on R1Z1, R2Z1, and R2Z2.	06:02
mcape	For some, still unknown reason, three disks on R2Z2 were lost yesterday.	06:02
mcape	Our storage was 97% full, and now object nodes on R1Z1 are running out of disk space, they are already on 98-99% and growing.	06:03
mcape	I do not understand why only R1Z1 object servers are taking in handoff partitions from the lost disks? Why not R2Z1, or remaining disks on R2Z2?	06:04
mcape	Or what is going on?	06:05
mcape	Perhaps someone could shed light to the processes that are happening during such outages?	06:05
mcape	Another interesting observation is that before, when we did datacenter evacuation from city A to city B,	06:05
mcape	and plugged off entire zone at once for two days, this kind of swelling did not happen at all.	06:05
mcape	So currently we think to do such a thing as a workaround - shutdown entire R2Z2 until disks will be fixed. Which sounds kinda unintuitive and perhaps plainly wrong	06:05

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!