[nSLUG] hard disks can't be trusted (as if you didn't already know that)

Jeff Warnica jeff at coherentnetworksolutions.com
Sun Mar 4 17:39:21 AST 2007


One thing of note from the Google study (or the /. discussion that
followed it, hmm...) was that there are "clusters" of failures. For
example, if a particular drive in a machine/array fails, its friends -
which would assumabley be of the same model, and manufacturing batch,
been in the same environment, had the same access patterns, etc, are
much more likely to fail then the general case statistics would
indicate.

Putting this as a rule: if a drive in an array fails, swap it out right
then with your cold standby on the shelf, but go about ordering a
entirely new set of drives.

On Sun, 2007-03-04 at 17:27 -0400, George N. White III wrote:
> <http://arstechnica.com/news.ars/post/20070225-8917.html>
> discusses a couple recent studies from Proceedings of the 5th USENIX
> Conference on File and Storage Technologies (FAST'07), February 2007,
> of drive failure rates in actual use.   Google had 8% of drives
> failing in 2-3 years, with rates remaining around that level in
> subsequent years.  Google's study confirms my experience: when a drive
> starts remapping sectors, it should be replaced.  The two big studies
> differ mostly in whether the mortality curve is monotonically
> increasing or
> "bathtub" shaped -- the latter is commonly observed in mortality
> studies so it is curious that Google didn't find the effect.
> 
> One surprise not mentioned in the ARSTechnica article is that there
> isn't much advantage to using server-grade disks.  Google's study is
> for
> consumer grade disks, but their overall failure rates aren't out of
> line with other studies.
> 
> The documented high failure rates ought to convince large
> organizations to take a more proactive approach to disk replacement.
> Nobody should be keeping important data on a 3 year old disk.
> 


!DSPAM:45eb3ca3240431524913011!




More information about the nSLUG mailing list