Handling Disk Errors on SFS20 storage 9–59
When the resynchronization is complete, the status information will change, as shown in the following
example:
# mdadm --detail /dev/md0
/dev/md0:
.
.
.
State : clean
.
.
.
Number Major Minor RaidDevice State
0 105 96 0 active sync /dev/cciss/c1d6
1 105 32 1 active sync /dev/cciss/c1d2
You can check the progress of the resynchronization process by examining the event log as follows:
sfs> show log facility=storage && age < "5m"
.
.
.
2004/11/02 10:28:56 storage n south2: mds8: /proc/mdstat:
md0 : active raid1 cciss/c1d2[2] cciss/c1d6[0]
10485504 blocks [2/1] [U_]
[=>...................] recovery = 6.8% (721344/10485504)
finish=2.9min speed=55488K/sec
----
.
.
.
When the resynchronization is complete, the /proc/mdstat command indicates this, as shown in
the following example:
sfs> show log facility=storage && age < "5m"
.
.
.
2004/11/02 10:56:41 storage n south2: mds8: /proc/mdstat:
md0 : active raid1 cciss/c1d2[1] cciss/c1d6[0]
10485504 blocks [2/2] [UU]
----
.
.
.
9.34 Handling Disk Errors on SFS20 storage
The sfsmgr show array array_number command displays any one of the following states for each
of the bays/disks on an SFS20 array:
• ok
• removed/failed
• predict fail
• logging errors
See Section 4.5 for more information on these states.
The system log records disk issues, as shown in the following example:
sfs> show log data contains "disk bay" && facility=storage && severity>notice
2006/01/05 13:40:44 storage !! south_test5: P92CB0AMQRA684: array 4: disk bay
1: disk Y69BMY3E has been removed or failed (was online)
2006/01/06 09:32:04 storage !! south_test5: P92CB0AMQRA684: array 4: disk bay
1: disk Y69BMY3E is logging errors (was removed or failed)
2006/01/10 10:43:25 storage !! south_test2: P92CB0AMQR2618: array 1: disk bay
12: disk Y69CHCDE has been removed or failed (was online)
2006/01/26 07:11:35 storage !! south_test5: P92CB0AMQRA683: array 3: disk bay
7: disk Y69BLLYE is logging errors (was online)
sfs>
In addition, if email alerts are configured on the system, disk errors trigger the default disk_errors alert
to send email to the configured recipients. The filter for the default disk_errors alert is as follows:
facility=storage && severity>notice && data contains "disk bay"
Commenti su questo manuale