Replacing an LSI raid disk with MegaCli
If you have identified a failed, or failing disk, it is possible to replace it using the MegaCli utility. In the example below we will cover replacing a failed disk from a raid 5 that has three disks total.
The first thing we want to check is the status of our raid 5.
[root@raid log]# MegaCli64 -ldinfo -lALL -aALL
Adapter 0 — Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :
RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
Size : 929.458 GB
Parity Size : 464.729 GB
State : Degraded
Strip Size : 64 KB
Number Of Drives : 3
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Cached, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk’s Default
Encryption Type : None
Is VD Cached: Yes
Cache Cade Type : Read Only
You can see in the example above that the state of the array is showing up as ‘State : Degraded’. This means that at least one disk has failed, or is not present in the array. Next we will want to look at all of our disks:
[root@raid log]# MegaCli64 -pdlist -aALL
The output of that command is quite long, but in our example it shows three disks and their primary information is:
Enclosure Device ID: 252
Slot Number: 0
….
Firmware state: Online, Spun Up
Enclosure Device ID: 252
Slot Number: 1
….
Firmware state: Online, Spun Up
Enclosure Device ID: 252
Slot Number: 2
….
Firmware state: Online, Spun Up
Enclosure Device ID: 252
Slot Number: 3
….
Firmware state: Offline <==== This is what to look for
In our example the failed disk is shown as ‘Enclosure Device ID:252′ and ‘Slot Number: 3′. So for MegaCli syntax this drive will be reference as [252:3] in the examples below. Now that we know the EIDs and slot numbers of each of the drives we can go ahead and remove the failed drive.
First we set the original disk offline if an error has not already cause the controller to set it offline
CODE[root@raid log]# MegaCli64 -pdoffline -physdrv[252:3] -a0 Adapter: 0: EnclId-252 SlotId-3 state changed to OffLine.Exit Code: 0x00
Mark the failed disk as missing
CODE[root@raid log]# MegaCli64 -pdmarkmissing -physdrv[252:3] -aAll EnclId-252 SlotId-3 is marked Missing. Exit Code: 0x00
Mark the failed disk as prepared for removal
CODE[root@raid log]# MegaCli64 -pdprprmv -physdrv[252:3] -a0 Prepare for removal Success Exit Code: 0x00
Now you can go replace the faulty disk, it might help to use the hdd identify command to locate the disk
CODE[root@raid log]# MegaCli64 -pdlocate -start -physdrv[252:3] -a0 Adapter: 0: Device at EnclId-252 SlotId-3 — PD Locate Start Command was successfully sent to Firmware Exit Code: 0x00
- Depending on your setup, there's two options:
If you use hot spares and the original hot spare was already put into the raid array, set the new disk to replace the hot spare that just went into service
CODE[root@raid log]# MegaCli64 -PDHSP -Set -PhysDrv[<enclosure#>:<disk#>] -a<adapter#>
If you don’t use hot spares you will need to add the disk to the array and start the rebuild manually
CODE[root@raid log]# MegaCli64 -PdReplaceMissing -PhysDrv[252:3] -Array0 -row0 -a0 [root@raid log]# MegaCli64 -PDRbld -Start -PhysDrv[252:3] -a0
Optional: We can watch the rebuild progress. Depending on the size of the array this may take a considerable amount of time. Also the raid array is usable during this time, but you can expect to encounter performance hits while the raid array is rebuilding.
CODE[root@raid log]# MegaCli64 -PDRbld -ShowProg -PhysDrv[252:3] -a0