Skip to main content
Skip table of contents

Replacing an LSI raid disk with MegaCli

If you have identified a failed, or failing disk, it is possible to replace it using the MegaCli utility. In the example below we will cover replacing a failed disk from a raid 5 that has three disks total.

The first thing we want to check is the status of our raid 5.

[root@raid log]# MegaCli64 -ldinfo -lALL -aALL
 Adapter 0 — Virtual Drive Information:
 Virtual Drive: 0 (Target Id: 0)
 Name :
 RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
 Size : 929.458 GB
 Parity Size : 464.729 GB
 State : Degraded
 Strip Size : 64 KB
 Number Of Drives : 3
 Span Depth : 1
 Default Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BBU
 Current Cache Policy: WriteThrough, ReadAheadNone, Cached, No Write Cache if Bad BBU
 Default Access Policy: Read/Write
 Current Access Policy: Read/Write
 Disk Cache Policy : Disk’s Default
 Encryption Type : None
 Is VD Cached: Yes
 Cache Cade Type : Read Only

You can see in the example above that the state of the array is showing up as ‘State : Degraded’. This means that at least one disk has failed, or is not present in the array. Next we will want to look at all of our disks:

[root@raid log]# MegaCli64 -pdlist -aALL

The output of that command is quite long, but in our example it shows three disks and their primary information is:

Enclosure Device ID: 252
 Slot Number: 0
 Firmware state: Online, Spun Up

Enclosure Device ID: 252
 Slot Number: 1
 Firmware state: Online, Spun Up

Enclosure Device ID: 252
 Slot Number: 2
 Firmware state: Online, Spun Up

Enclosure Device ID: 252
 Slot Number: 3
 Firmware state: Offline <==== This is what to look for

In our example the failed disk is shown as ‘Enclosure Device ID:252′ and ‘Slot Number: 3′. So for MegaCli syntax this drive will be reference as [252:3] in the examples below. Now that we know the EIDs and slot numbers of each of the drives we can go ahead and remove the failed drive.

  1. First we set the original disk offline if an error has not already cause the controller to set it offline

    [root@raid log]# MegaCli64 -pdoffline -physdrv[252:3] -a0
    Adapter: 0: EnclId-252 SlotId-3 state changed to OffLine.Exit Code: 0x00
  2. Mark the failed disk as missing

    [root@raid log]# MegaCli64 -pdmarkmissing -physdrv[252:3] -aAll 
    EnclId-252 SlotId-3 is marked Missing. 
    Exit Code: 0x00
  3. Mark the failed disk as prepared for removal

    [root@raid log]# MegaCli64 -pdprprmv -physdrv[252:3] -a0 
    Prepare for removal Success 
    Exit Code: 0x00
  4. Now you can go replace the faulty disk, it might help to use the hdd identify command to locate the disk

    [root@raid log]# MegaCli64 -pdlocate -start -physdrv[252:3] -a0 
    Adapter: 0: Device at EnclId-252 SlotId-3 — PD Locate Start Command was successfully sent to Firmware 
    Exit Code: 0x00
  5. Depending on your setup, there's two options:
    1. If you use hot spares and the original hot spare was already put into the raid array, set the new disk to replace the hot spare that just went into service

      [root@raid log]# MegaCli64 -PDHSP -Set -PhysDrv[<enclosure#>:<disk#>] -a<adapter#>
    2. If you don’t use hot spares you will need to add the disk to the array and start the rebuild manually

      [root@raid log]# MegaCli64 -PdReplaceMissing -PhysDrv[252:3] -Array0 -row0 -a0 
      [root@raid log]# MegaCli64 -PDRbld -Start -PhysDrv[252:3] -a0
  6. Optional: We can watch the rebuild progress. Depending on the size of the array this may take a considerable amount of time. Also the raid array is usable during this time, but you can expect to encounter performance hits while the raid array is rebuilding.

    [root@raid log]# MegaCli64 -PDRbld -ShowProg -PhysDrv[252:3] -a0

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.