Replacing a Disk in RAID
Let’s say the server has 2 disks: /dev/sda
and /dev/sdb
. These disks are assembled into software RAID1 using mdadm --assemble
.
One of the disks failed, for example, /dev/sdb
. The failed disk must be replaced.
Please note that before replacing a disk, it is advisable to remove it from the array.
Removing a Disk From the Array
View the array state by running the following:
cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda3[0] sdb3[1]
975628288 blocks super 1.2 [2/2] [UU]
bitmap: 3/8 pages [12KB], 65536KB chunk
md0 : active raid1 sda2[2] sdb2[1]
999872 blocks super 1.2 [2/2] [UU]
unused devices: <none>
In this case, the array is assembled so that md0
consists of sda2
and sdb2
, and md1
consists of sda3
and sdb3
.
On this server, md0
is /boot
, and md1
is swap and root.
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 985M 1 loop
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 977M 0 part
│ └─md0 9:0 0 976.4M 0 raid1
└─sda3 8:3 0 930.6G 0 part
└─md1 9:1 0 930.4G 0 raid1
├─vg0-swap_1 253:0 0 4.8G 0 lvm
└─vg0-root 253:1 0 925.7G 0 lvm /
sdb 8:16 0 931.5G 0 disk
├─sdb1 8:17 0 1M 0 part
├─sdb2 8:18 0 977M 0 part
│ └─md0 9:0 0 976.4M 0 raid1
└─sdb3 8:19 0 930.6G 0 part
└─md1 9:1 0 930.4G 0 raid1
├─vg0-swap_1 253:0 0 4.8G 0 lvm
└─vg0-root 253:1 0 925.7G 0 lvm /
Remove sdb
from all devices:
mdadm /dev/md0 --remove /dev/sdb2
mdadm /dev/md1 --remove /dev/sdb3
If partitions are not removed from the array, mdadm
does not consider the disk to be failed and uses it. When removing a disk, an error is displayed that the device is in use.
In this case, mark the disk as failed before removing it:
mdadm /dev/md0 -f /dev/sdb2
mdadm /dev/md1 -f /dev/sdb3
Run the commands to remove partitions from the array again.
After removing the failed disk from the array, request disk replacement by creating a ticket specifying the s/n of the failed disk. Downtime availability depends on server configuration.
Defining the Partition Table (GPT or MBR) and Moving It to the New Disk
After replacing the failed disk, you need to add the new disk to the array. To do this, you need to determine the partition table type: GPT or MBR. gdisk
is used for this.
Install gdisk
:
apt-get install gdisk -y
Run the following:
gdisk -l /dev/sda
Where /dev/sda
is a healthy disk in the RAID.
The output looks as follows for MBR:
Partition table scan:
MBR: MBR only
BSD: not present
APM: not present
GPT: not present
And something like this for GPT:
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Before adding a disk to the array, you need to create the same partitions as on sda
. This process varies depending on the disk layout.
Copying the Partition Layout for GPT
To copy the partition layout for GPT:
sgdisk -R /dev/sdb /dev/sda
Please note that the disk that the layout is copied to is written first, and the disk that the layout is copied from is the second (that is, from sda
to sdb
). If you swap them, the layout on the initially healthy disk will be destroyed.
The second way to copy partition layout:
sgdisk --backup=table /dev/sda
sgdisk --load-backup=table /dev/sdb
After copying, assign a new random UIDD to the disk:
sgdisk -G /dev/sdb
Copying the Partition Layout for MBR
To copy the partition layout for MBR:
sfdisk -d /dev/sda | sfdisk /dev/sdb
Please note that the disk that the layout is copied from is written first, and the disk that the layout is copied to is the second.
If you cannot see the partitions in the system, then you can re-read the partition table by running the following:
sfdisk -R /dev/sdb
Adding a Disk to the Array
When partitions on /dev/sdb
are created, you can add the disk to the array:
mdadm /dev/md0 -a /dev/sdb2
mdadm /dev/md1 -a /dev/sdb3
After adding the disk to the array, synchronization starts. The speed depends on the disk size and type (ssd/hdd):
cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda3[1] sdb3[0]
975628288 blocks super 1.2 [2/1] [U_]
[============>........] recovery = 64.7% (632091968/975628288) finish=41.1min speed=139092K/sec
bitmap: 3/8 pages [12KB], 65536KB chunk
md0 : active raid1 sda2[2] sdb2[1]
999872 blocks super 1.2 [2/2] [UU]
unused devices: <none>
Installing a Boot Loader
After adding the disk to the array, you need to install a boot loader on it.
If the server is booted into normal mode or in infiltrate-root
, this can be done by running the following:
grub-install /dev/sdb
If the server is booted to Recovery or Rescue mode, i.e. with a live cd, the boot loader installation looks like this:
- Mount the root file system to
/mnt
:
mount /dev/md2 /mnt
- Mount
boot
:
mount /dev/md0 /mnt/boot
- Mount
/dev
,/proc
and/sys
:
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
chroot
into the mounted file system:
chroot /mnt
- Install
grub
onsdb
:
grub-install /dev/sdb
Now you can try to boot into normal mode.
Replacing a Failed Disk
You can conditionally make the disk failed in the array failed using --fail (-f)
:
mdadm /dev/md0 --fail /dev/sda1
or
mdadm /dev/md0 -f /dev/sda1
mdadm /dev/md0 --remove /dev/sda1
or
mdadm /dev/md0 -r /dev/sda1
You can add a new disk to the array using --add (-a)
and --re-add
:
mdadm /dev/md0 --add /dev/sda1
or
mdadm /dev/md0 -a /dev/sda1
Error while Restoring the Boot Loader after Replacing the Disk in RAID1
If the following error appears while installing grub
:
root #grub-install --root-directory=/boot /dev/sda
Could not find device for /boot/boot: not found or not a block device
Run the following:
root #grep -v rootfs /proc/mounts > /etc/mtab