[nSLUG] Tip: check your BBU on raid controllers

D G Teed donald.teed at gmail.com
Fri Jan 25 14:19:58 AST 2013


Thought I'd share a server tip.  This is specific to Perc controllers
and use of Open Manage, but there could be other brands which
also disable write-back (for controller cache) in a similar way,
as the BBU degrades.

We have a few Dell 2950's reaching 4 years old or so.  They have Perc 5/i
controllers.  It turns out the battery status to Dell's Open Manage can say
"OK"
while they have actually dropped below the threshold to support a Write
Policy
of "Write Back".  If your server has significant IO load, lacking
write-back will
kill IO performance.  I learned of one system where the battery threshold
had
dropped below 30%, which triggered the controller to go into "Write
Through" mode.

Useless battery check:

# omreport storage battery controller=0
Battery 0 on Controller PERC 5/i Integrated (Embedded)

Controller PERC 5/i Integrated (Slot Embedded)
ID                  : 0
Status              : Ok
Name                : Battery 0
State               : Ready
Recharge Count      : Not Applicable
Max Recharge Count  : Not Applicable
Learn State         : Idle
Next Learn Time     : 59 days 21 hours
Maximum Learn Delay : 7 days 0 hours

State of write policy:

# omreport storage vdisk
List of Virtual Disks in the System

Controller PERC 5/i Integrated (Embedded)
ID                        : 0
Status                    : Ok
Name                      : Virtual Disk 0
State                     : Ready
Hot Spare Policy violated : Not Assigned
Encrypted                 : Not Applicable
Layout                    : RAID-5
Size                      : 836.63 GB (898319253504 bytes)
Device Name               : /dev/sda
Bus Protocol              : SAS
Media                     : HDD
Read Policy               : No Read Ahead
Write Policy              : Write Through
Cache Policy              : Not Applicable
Stripe Element Size       : 64 KB
Disk Cache Policy         : Disabled

Hints in the messages log:

# grep 'Server Administrator' /var/log/messages
{ ... }
Jan 18 07:25:35 myserv Server Administrator: Storage Service EventID: 2335
Controller event log: BBU disabled; changing WB virtual disks to WT:
Controller 0 (PERC 5/i Integrated)

BBU is the back up battery for the controller.  WB means write back.  WT
means write through.

These status messages can also appear at boot up as it flips back and
forth, or during "auto-learn" of
the BBU at 90 day cycles.  You can confirm the battery level via a
controller export of the log:

# omconfig storage controller controller=0 action=exportlog

# grep Absolute /var/log/lsi_0125.log
T42:     Absolute State of Charge  : 25 %

Solution is to replace BBU.  Until then, the server's IO is poor (manifested
by high load, high util% in 'iostat -x 4 4', when compared to the same
server's historical performance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://nslug.ns.ca/mailman/private/nslug/attachments/20130125/3c5b69e3/attachment.html>


More information about the nSLUG mailing list