Mdadm

From Earlham CS Department
Jump to navigation Jump to search

Rebuilding and Creating RAID 1 Arrays with mdadm

Crating Arrays

To create a mirrored array with two drives, sda and sdb, on partitions, sda1 and sdb1:

mdadm --create --verbose /dev/md0 --level=raid1 --raid-devices=2 /dev/sda1 /dev/sdb1

Now you can monitor the status of the building of the array with:

cat /proc/mdstat

Once finished, save your mdadm configuration with:

mdadm --verbose --detail --scan > /etc/mdadm.conf

You may need to edit this file to remove unwanted lines or to add an email address to MAILADDR to be notified if a drive failure occurs:

MAILADDR user1@dom1.com, user2@dom2.com

On some systems, mdadm's configuration file is /etc/mdadm/mdadm.conf, it is very important to put the configuration in the correct location.

Rebuilding Arrays

If a drive ever fails, or is the system is booted with a drive removed, you will need to add it back into the array.

Failed Drive

In this example, /dev/sda1 and /dev/sdb1 make up the RAID 1 array /dev/md0. Let us say that /dev/sdb fails.

Determining failed drive

Run

cat /proc/mdstat
[root@lo4 ~]# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sdb1[1](F) sda1[0]
      204736 blocks super 1.0 [2/1] [U_]

unused devices: <none>

When a drive fails or is missing, you will see an underscore in the array output ([U_] instead of [UU]). (F) will be displayed next to the failed drive (sdb1[1](F)).
If not, running lsblk or fdisk -l may help you determine which drive it is that failed

hdparm -I /dev/sda | grep "Serial Number"

Will give you the serial number of /dev/sda, which may help you identify physical disks as well.

Remove Failed Drive

If a drive has has failed, it should be removed from the mdadm array before being replaced.

mdadm --manage /dev/md0 --fail /dev/sdb1
[root@lo4 ~]# mdadm --manage /dev/md0 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0

Now, we can remove it from the array.

mdadm --manage /dev/md0 --remove /dev/sdb1
[root@lo4 ~]# mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1 from /dev/md0

Check /proc/mdstat. There should no longer be any (F) or listed drive besides sda1[0].

Power down the system.

shutdown -h now

Replace Drive

Now that everything is powered down, remove the failed HDD then replace it with the new one.

Once the drive is replaced, boot the system back up.

Add New Drive to Array

Recreate the partitioning scheme of /dev/sda on the new drive.

sfdisk -d /dev/sda | fdisk /dev/sdb

Then verify with lsblk or fdisk -l.

mdadm --manage /dev/md0 --add /dev/sdb1
[root@lo4 ~]# mdadm --manage /dev/md0 --add /dev/sdb1
mdadm: added /dev/sdb1

Finally, check the status of the rebuilding with

cat /proc/mdstat
[root@lo4 ~]# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdb1[1] sda1[0]
      204736 blocks super 1.0 [2/1] [U_]
      [===========>.........]  recovery = 57.7% (118400/204736) finish=0.0min speed=118400K/sec

unused devices: <none>

Missing Drive

In this example, /dev/sda1 and /dev/sdb1 make up the RAID 1 array /dev/md0. Let us say that /dev/sdb1 is missing.

Use lsblk to examine HDD partitions with block sizes. Alternatively, you can use fdisk -l or any other utility you prefer.

Now check the status of mdadm with:

cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sdb1[1](F) sda1[0]
      204736 blocks super 1.0 [2/1] [U_]

unused devices: <none>

When a drive fails or is missing, you will see an underscore in the array output ([U_] instead of [UU]).

Use the output from lsblk and /proc/mdsat to match the present drive in an active mdadm array (/dev/md0) with the corresponding partition on the missing drive. For example, "match" /dev/sda1 with /dev/sdb1 (after verifying their block sizes are the same).

Now add /dev/sdb1 back into the array:

mdadm --manage /dev/md0 --add /dev/sdb1
[root@lo4 ~]# mdadm --manage /dev/md0 --add /dev/sdb1
mdadm: added /dev/sdb1

You can view the status of the rebuilding array with:

cat /proc/mdstat
[root@lo4 ~]# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdb1[1] sda1[0]
      204736 blocks super 1.0 [2/1] [U_]
      [===========>.........]  recovery = 57.7% (118400/204736) finish=0.0min speed=118400K/sec

unused devices: <none>

References and Resources