reid_g | Tried out the handoffs_only setting... very nice | 12:17 |
---|---|---|
reid_g | In handoffs_only mode. If a disk fails, will handoffs for missing fragments be created? | 16:00 |
timburke__ | reid_g, new writes may land on handoffs. but the data that was on the failed disk *will not* get rebuilt elsewhere (whether it was a primary or handoff location) | 16:02 |
timburke__ | so it's the sort of thing you turn on during the massive rebalances that happen during an expansion, then turn off again quick as you can so you can ensure full durability | 16:03 |
reid_g | Alright. In normal operation a failed disk will cause handoffs to be created for EC in both new writes and stored data. | 16:07 |
timburke__ | yup -- though the rebuilding-to-a-handoff is a fairly recent addition. for a while, the assumption was that the failed disk would get removed from the ring fairly quickly, and we'd only rebuild to the new primary. i should double check when we added that... | 16:10 |
reid_g | Is that recent for EC only or REP also? | 16:11 |
reid_g | Yeah would be nice to know | 16:13 |
timburke__ | got the rebuild-to-handoffs behavior back in stein: https://github.com/openstack/swift/blob/master/CHANGELOG#L863-L871 | 16:37 |
timburke__ | had it for replicated policies for a good long while | 16:37 |
reid_g | So when swift drive fails. Replication policy = Create handoff for missing data? EC policy (Stein+) = Recreate fragment in handoff? | 16:39 |
reid_g | When the drive is replaced. Replication/EC [stein+] policy = Move back to primary. EC otherwise = recreate the missing fragment. | 16:40 |
timburke__ | yup -- you'll want to unmount the failed drive as soon as you notice the failure. that'll cause the object server to start responding 507 to REPLICATE requests and the replicator/reconstructor will use handoffs to ensure full durability | 16:41 |
timburke__ | hands in the DC go swap out drives. SRE gets a new FS on the replacement, mounts it in the old location, then swift works to fill it back up | 16:43 |
timburke__ | note that sometimes drives fail in subtle ways -- they start to get just a *little* FS corruption, maybe you see ENODATA tracebacks in logs. it's not enough to where SMART indicates there's anything too fishy, and you can still do a healthcheck with a full PUT/POST/GET/DELETE of some well-known object through the object-server | 16:48 |
timburke__ | i'm ambivalent about how to handle those cases -- on the one hand, it's tempting to keep the disk in the ring but drop its weight to zero, drain off whatever data you can still read. on the other, the drive seems to be having trouble; how much should we really trust *anything* that's still on there? | 16:50 |
timburke__ | if the cluster's generally fairly healthy, i'd lean toward just unmounting the disk and letting the other replicas/frags get it back to full durability. if it's *not* doing great, i start to worry about whether that failing drive is my last backstop against data loss | 16:52 |
reid_g | Yeah they are normally healthy | 17:14 |
reid_g | Our normal process now is to get the list of disks that failed in the morning and have them replaced by the EOD usually. | 17:24 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!