[nSLUG] Does this mean my hard drive is failing?

D G Teed donald.teed at gmail.com
Thu Jul 31 13:47:29 ADT 2014


On Mon, Jul 28, 2014 at 10:19 PM, Gerald Ruderman
<linux at zdoit.airpost.net> wrote:
> Hi,
>
> Got these errors on a CentOS 6 box running software raid. Has been
> running for 12 months without faults. What do they mean? Or specifically
> should I replace the drive? Thanks
>
> Jul 27 01:01:45 kernel: ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0
> action 0x0
> Jul 27 01:01:45 kernel: ata3.01: BMDMA stat 0x64
> Jul 27 01:01:45 kernel: ata3.01: failed command: READ DMA EXT
> Jul 27 01:01:45 kernel: ata3.01: cmd 25/00:90:00:fa:8e/00:02:00:00:00/f0
> tag 0 dma 335872 in
> Jul 27 01:01:45 kernel:         res 51/40:00:a0:fb:8e/40:00:00:00:00/f0
> Emask 0x9 (media error)
> Jul 27 01:01:45 kernel: ata3.01: status: { DRDY ERR }
> Jul 27 01:01:45 kernel: ata3.01: error: { UNC }
> Jul 27 01:01:45 kernel: ata3.00: configured for UDMA/133
> Jul 27 01:01:45 kernel: ata3.01: configured for UDMA/133
> Jul 27 01:01:45 kernel: ata3: EH complete
> {5 times over 9 seconds}
>
> Jul 27 01:01:48 kernel: ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0
> action 0x0
> Jul 27 01:01:48 kernel: ata3.01: BMDMA stat 0x64
> Jul 27 01:01:48 kernel: ata3.01: failed command: READ DMA EXT
> Jul 27 01:01:48 kernel: ata3.01: cmd 25/00:90:00:fa:8e/00:02:00:00:00/f0
> tag 0 dma 335872 in
> Jul 27 01:01:48 kernel:         res 51/40:00:a0:fb:8e/40:00:00:00:00/f0
> Emask 0x9 (media error)
> Jul 27 01:01:48 kernel: ata3.01: status: { DRDY ERR }
> Jul 27 01:01:48 kernel: ata3.01: error: { UNC }
> Jul 27 01:01:48 kernel: ata3.00: configured for UDMA/133
> Jul 27 01:01:48 kernel: ata3.01: configured for UDMA/133
> Jul 27 01:01:48 kernel: sd 2:0:1:0: [sdb] Unhandled sense code
> Jul 27 01:01:48 kernel: sd 2:0:1:0: [sdb] Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> Jul 27 01:01:48 kernel: sd 2:0:1:0: [sdb] Sense Key : Medium Error
> [current] [descriptor]
> Jul 27 01:01:48 kernel: Descriptor sense data with sense descriptors (in
> hex):
> Jul 27 01:01:48 kernel:        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00
> 00 00
> Jul 27 01:01:48 kernel:        00 8e fb a0
> Jul 27 01:01:48 kernel: sd 2:0:1:0: [sdb] Add. Sense: Unrecovered read
> error - auto reallocate failed
> Jul 27 01:01:48 kernel: sd 2:0:1:0: [sdb] CDB: Read(10): 28 00 00 8e fa
> 00 00 02 90 00
> Jul 27 01:01:48 kernel: ata3: EH complete
>
> Jul 27 01:01:50 kernel: ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0
> action 0x0
> Jul 27 01:01:50 kernel: ata3.01: BMDMA stat 0x64
> Jul 27 01:01:50 kernel: ata3.01: failed command: READ DMA
> Jul 27 01:01:50 kernel: ata3.01: cmd c8/00:08:a0:fb:8e/00:00:00:00:00/f0
> tag 0 dma 4096 in
> Jul 27 01:01:50 kernel:         res 51/40:00:a0:fb:8e/40:00:00:00:00/f0
> Emask 0x9 (media error)
> Jul 27 01:01:50 kernel: ata3.01: status: { DRDY ERR }
> Jul 27 01:01:50 kernel: ata3.01: error: { UNC }
> Jul 27 01:01:50 kernel: ata3.00: configured for UDMA/133
> Jul 27 01:01:50 kernel: ata3.01: configured for UDMA/133
> Jul 27 01:01:50 kernel: ata3: EH complete
> {3 times over 5 seconds}
>
> --

This type of error never cleans up by itself.  It usually snowballs until
the system is non-responsive.  You said you have software RAID,
now is the time to utilize it.  I would halt the system, remove the
sdb drive, and restart.  If this is actually a server with hot swappable
bays, then pull it and replace.  I would pull even if there is no spare
ready, because this type of error drags down system performance
while the hardware is fighting with errors and possibly interrupts.


More information about the nSLUG mailing list