Replacing a Disk in RAID
Let’s say the server has 2 disks:
/dev/sdb. These disks are assembled into software RAID1 using
One of the disks failed, for example,
/dev/sdb. The failed disk must be replaced.
Please note that before replacing a disk, it is advisable to remove it from the array.
Removing a Disk From the Array
View the array state by running the following:
cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sda3 sdb3 975628288 blocks super 1.2 [2/2] [UU] bitmap: 3/8 pages [12KB], 65536KB chunk md0 : active raid1 sda2 sdb2 999872 blocks super 1.2 [2/2] [UU] unused devices: <none>
In this case, the array is assembled so that
md0 consists of
md1 consists of
On this server,
md1 is swap and root.
lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 985M 1 loop sda 8:0 0 931.5G 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 977M 0 part │ └─md0 9:0 0 976.4M 0 raid1 └─sda3 8:3 0 930.6G 0 part └─md1 9:1 0 930.4G 0 raid1 ├─vg0-swap_1 253:0 0 4.8G 0 lvm └─vg0-root 253:1 0 925.7G 0 lvm / sdb 8:16 0 931.5G 0 disk ├─sdb1 8:17 0 1M 0 part ├─sdb2 8:18 0 977M 0 part │ └─md0 9:0 0 976.4M 0 raid1 └─sdb3 8:19 0 930.6G 0 part └─md1 9:1 0 930.4G 0 raid1 ├─vg0-swap_1 253:0 0 4.8G 0 lvm └─vg0-root 253:1 0 925.7G 0 lvm /
sdb from all devices:
mdadm /dev/md0 --remove /dev/sdb2 mdadm /dev/md1 --remove /dev/sdb3
If partitions are not removed from the array,
mdadm does not consider the disk to be failed and uses it. When removing a disk, an error is displayed that the device is in use.
In this case, mark the disk as failed before removing it:
mdadm /dev/md0 -f /dev/sdb2 mdadm /dev/md1 -f /dev/sdb3
Run the commands to remove partitions from the array again.
After removing the failed disk from the array, request disk replacement by creating a ticket specifying the s/n of the failed disk. Downtime availability depends on server configuration.
Defining the Partition Table (GPT or MBR) and Moving It to the New Disk
After replacing the failed disk, you need to add the new disk to the array. To do this, you need to determine the partition table type: GPT or MBR.
gdisk is used for this.
apt-get install gdisk -y
Run the following:
gdisk -l /dev/sda
/dev/sda is a healthy disk in the RAID.
The output looks as follows for MBR:
Partition table scan: MBR: MBR only BSD: not present APM: not present GPT: not present
And something like this for GPT:
Partition table scan: MBR: protective BSD: not present APM: not present GPT: present
Before adding a disk to the array, you need to create the same partitions as on
sda. This process varies depending on the disk layout.
Copying the Partition Layout for GPT
To copy the partition layout for GPT:
sgdisk -R /dev/sdb /dev/sda
Please note that the disk that the layout is copied to is written first, and the disk that the layout is copied from is the second (that is, from
sdb). If you swap them, the layout on the initially healthy disk will be destroyed.
The second way to copy partition layout:
sgdisk --backup=table /dev/sda sgdisk --load-backup=table /dev/sdb
After copying, assign a new random UIDD to the disk:
sgdisk -G /dev/sdb
Copying the Partition Layout for MBR
To copy the partition layout for MBR:
sfdisk -d /dev/sda | sfdisk /dev/sdb
Please note that the disk that the layout is copied from is written first, and the disk that the layout is copied to is the second.
If you cannot see the partitions in the system, then you can re-read the partition table by running the following:
sfdisk -R /dev/sdb
Adding a Disk to the Array
When partitions on
/dev/sdb are created, you can add the disk to the array:
mdadm /dev/md0 -a /dev/sdb2 mdadm /dev/md1 -a /dev/sdb3
After adding the disk to the array, synchronization starts. The speed depends on the disk size and type (ssd/hdd):
cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sda3 sdb3 975628288 blocks super 1.2 [2/1] [U_] [============>........] recovery = 64.7% (632091968/975628288) finish=41.1min speed=139092K/sec bitmap: 3/8 pages [12KB], 65536KB chunk md0 : active raid1 sda2 sdb2 999872 blocks super 1.2 [2/2] [UU] unused devices: <none>
Installing a Boot Loader
After adding the disk to the array, you need to install a boot loader on it.
If the server is booted into normal mode or in
infiltrate-root, this can be done by running the following:
If the server is booted to Recovery or Rescue mode, i.e. with a live cd, the boot loader installation looks like this:
- Mount the root file system to
mount /dev/md2 /mnt
mount /dev/md0 /mnt/boot
mount --bind /dev /mnt/dev mount --bind /proc /mnt/proc mount --bind /sys /mnt/sys
chrootinto the mounted file system:
Now you can try to boot into normal mode.
Replacing a Failed Disk
You can conditionally make the disk failed in the array failed using
mdadm /dev/md0 --fail /dev/sda1
mdadm /dev/md0 -f /dev/sda1
mdadm /dev/md0 --remove /dev/sda1
mdadm /dev/md0 -r /dev/sda1
You can add a new disk to the array using
--add (-a) and
mdadm /dev/md0 --add /dev/sda1
mdadm /dev/md0 -a /dev/sda1
Error while Restoring the Boot Loader after Replacing the Disk in RAID1
If the following error appears while installing
root #grub-install --root-directory=/boot /dev/sda Could not find device for /boot/boot: not found or not a block device
Run the following:
root #grep -v rootfs /proc/mounts > /etc/mtab