How to Monitor Software RAID on Linux Servers

Copy page View as Markdown Open in ChatGPT Open in Claude Open in Perplexity

Monitoring your RAID array helps to identify potential failures early, ensuring data integrity and system stability. Regular checks using tools like* mdadm*and smartmontools provide insights into disk health, performance, and potential failures.

By proactively monitoring RAID arrays, you can increase the chances of preventing unexpected downtime and the time-consuming data recovery procedures that follow.

#Instructions to Monitor Software RAID on Linux Servers

Before monitoring your RAID array, it is essential to identify its configuration. Use the following commands to determine your RAID setup. Identifying your RAID setup helps you understand the type of redundancy and performance improvements it provides.

#Step 1: Identify Your RAID Array

Check active RAID devices.

Open the terminal and run the following command to check active RAID devices for any degradation or array failures:
Command Line
```
cat /proc/mdstat
```
Here is an example output with healthy disks:
```
Outputroot@content-kit:~# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
      249916416 blocks super 1.2 [2/2] [UU]
      bitmap: 2/2 pages [8KB], 65536KB chunk

unused devices: <none>
```
To explain this output further:
- Personalities: Lists the available RAID types supported on the system. In this case, the system supports RAID1, RAID0, RAID6, RAID5, RAID4, and RAID10.
- md0: Indicates the active RAID array, in this case, md0 is configured as a RAID 1 (mirroring) array.
- Devices: The array consists of two NVMe drive partitions: nvme1n1p2 and nvme0n1p2. The numbers inside the square brackets [1] and [0] indicate their order in the array.
- Blocks and version: The RAID array contains 249916416 data blocks and uses the super 1.2 metadata format.
- [2/2] [UU]: This section shows the RAID member count and their status. [2/2] indicates that both disks are active, and [UU] means both disks are functioning correctly. If one disk fails, it will show [U_] or [_U], indicating which disk is degraded.
- Bitmap: The bitmap helps track changes to the RAID set, speeding up re-synchronization by reducing unnecessary data copying. In this example, the bitmap size is 8KB, with a chunk size of 65536KB.
- Unused devices: Indicates that no additional devices are currently unused within the RAID setup.
Identify RAID partitions.

To identify RAID partitions and their layout, run:
Command Line
```
lsblk
```
This will visualize your disk layout, showing RAID devices, partitions, and how storage is allocated. An example is:
```
Outputroot@content-kit:~# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
nvme0n1     259:0    0 238.5G  0 disk
├─nvme0n1p1 259:1    0     1M  0 part
└─nvme0n1p2 259:2    0 238.5G  0 part
  └─md0       9:0    0 238.3G  0 raid1 /
nvme1n1     259:3    0 238.5G  0 disk
├─nvme1n1p1 259:4    0     1M  0 part
└─nvme1n1p2 259:5    0 238.5G  0 part
  └─md0       9:0    0 238.3G  0 raid1 /
```
Further explained this shows:
- NAME: Lists devices and their partitions. Here, nvme0n1 and nvme1n1 are NVMe drives, each with partitions (nvme0n1p2 and nvme1n1p2) forming the RAID array md0.
- SIZE: Displays device capacity. Both disks are 238.5G, and md0 reflects the combined RAID size.
- TYPE: Identifies the device type - disk for physical drives, part for partitions, and raid1 for the RAID array.
- MOUNTPOINTS: Shows where devices are mounted. The RAID array md0 is mounted at /.
Gather detailed RAID information.

To gather detailed information about a specific RAID array, run this command, and replace /dev/md0 with your actual RAID device to retrieve crucial information such as RAID level, disk health, and recovery status:
Command Line
```
sudo mdadm --detail /dev/md0
```
An example of this would be:
```
Outputroot@content-kit:~# sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Tue Jan 21 09:26:48 2025
     Raid Level : raid1
     Array Size : 249916416 (238.34 GiB 255.91 GB)
  Used Dev Size : 249916416 (238.34 GiB 255.91 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Intent Bitmap : Internal

    Update Time : Wed Jan 22 06:56:07 2025
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

Consistency Policy : bitmap

           Name : 246013:0
           UUID : fd3e2b9a:da14efcd:73e749f8:50e44710
         Events : 911

    Number   Major   Minor   RaidDevice State
       0     259        2        0      active sync   /dev/nvme0n1p2
   1     259        5        1      active sync   /dev/nvme1n1p2
```
Explanation of the output:
- Version: The RAID metadata version, here 1.2, which defines the format used to store RAID information.
- Creation Time: Indicates when the RAID array was created.
- RAID Level: Specifies the type of RAID configuration; in this case, RAID 1 (mirroring).
- Array Size: Displays the total capacity of the RAID array, which is 238.34 GiB.
- Used Dev Size: Shows the storage utilized by each device.
- Raid Devices / Total Devices: Number of active and total devices in the RAID setup.
- Persistence: Confirms that the RAID superblock is persistent, meaning it retains configuration across reboots.
- State: Displays the current status of the array, clean indicates no issues.
- Active / Working / Failed Devices: Provides counts of functioning, operational, and failed devices, respectively.
- Consistency Policy: Indicates that a bitmap is used to track changes and speed up rebuilds.
- Device list: Shows associated storage devices with their respective RAID roles.
Verify the RAID configuration file.

To verify and check RAID configurations stored on your system, run:
Command Line
```
sudo cat /etc/mdadm/mdadm.conf
```
Which should return:
```
Outputroot@content-kit:~# sudo cat /etc/mdadm/mdadm.conf
ARRAY /dev/md0 metadata=1.2 name=246013:0 UUID=fd3e2b9a:da14efcd:73e749f8:50e44710
MAILADDR alerts@internal-mx.cherryservers.com
```
An explanation of the output is:
- ARRAY /dev/md0: Specifies the RAID array device managed by* mdadm*. In this case, the array is identified as /dev/md0.
- metadata=1.2: Indicates the metadata version used to store RAID configuration details. The metadata helps the system recognize and rebuild the RAID array upon reboots.
- name=246013:0: This field assigns a unique name to the RAID array, which can help track and manage multiple RAID arrays.
- UUID=fd3e2b9a:da14efcd:73e749f8:50e44710: The unique identifier assigned to the RAID array. This UUID identifies the correct array, even if the device name changes.
- MAILADDR alerts@internal-mx.cherryservers.com: Defines the email address where notifications and alerts regarding RAID events (such as failures or degradations) will be sent. Importance of the configuration:
The mdadm.conf file ensures that the RAID array is assembled automatically during system boot. The MAILADDR setting allows system administrators to receive critical RAID alerts proactively, helping to prevent data loss. For more details on creating and managing RAID arrays, refer to our dedicated guide to creating different types of RAID arrays.

#Step 2: Monitor Your RAID Arrays

Once you have identified your RAID setup, the next step is to monitor it continuously to ensure optimal performance and prevent unexpected failures.

Install monitoring tools.

To monitor RAID health, you will need to install the necessary tools using the package manager for your Linux distribution. The commands for popular Linux distributions are:
- Debian/Ubuntu-based distributions:
Command Line
```
sudo apt update && sudo apt install mdadm smartmontools -y
```
- RHEL/CentOS-based distributions:
Command Line
```
sudo dnf install mdadm smartmontools -y
```
Or for older CentOS versions:
Command Line
```
sudo yum install mdadm smartmontools -y
```
- Arch Linux:
Command Line
```
sudo pacman -S mdadm smartmontools --noconfirm
```
- openSUSE:
Command Line
```
sudo zypper install mdadm smartmontools
```
Once the tools are installed, you can check the status and health of your RAID array. The following are crucial commands for monitoring your RAID status.
Check RAID sync and failures. To detect any degraded or syncing issues in the RAID array in real time, run:
Command Line
```
cat /proc/mdstat
```
Check disk health with smartmontools. The smartctl utility provides detailed health reports for individual RAID disks. Run this command, replacing /dev/sda with the appropriate disk identifier for your system (e.g., /dev/nvme0n1 or /dev/sdb).
Command Line
```
sudo smartctl -a /dev/sda
```
Some key things to watch here for are:
- Overall health status (e.g., PASSED or FAILED)
- Disk temperature and SMART attributes
- Reallocated sectors and potential failure indicators
- -H – Quick health check of the disk.
- -i – View basic disk information (model, serial, firmware).
- -t short|long – Perform self-tests to detect errors.
- -l error – Display recent error logs.
You can identify your drives using the lsblk command:
Command Line
```
lsblk
```

#Step 3: Automate Monitoring with Cron Jobs

To ensure regular monitoring, you can automate checks using cron jobs. To start, run:

crontab -e

If it's your first time using crontab, you will be prompted to select an editor.

Outputroot@content-kit:~# crontab -e
no crontab for root - using an empty one

Select an editor. To change later, run 'select-editor'.
  1. /bin/nano        <---- easiest
  2. /usr/bin/vim.basic
  3. /usr/bin/vim.tiny
  4. /bin/ed

Choose 1-4 [1]:

Add the following entry to check RAID health daily at 3 AM, and log it. Ensure that you replace /dev/md0 with your actual RAID array (e.g., /dev/md127):

0 3 * * * /usr/sbin/mdadm --detail /dev/md0 >> /var/log/raid_status.log

To break this down:

*0 3 * * ** – This specifies the schedule for running the command; - 0 – Minute (0 minutes past the hour); 3 – Hour (3 AM); * * * – Every day, every month, and every day of the week.
/usr/sbin/mdadm --detail /dev/md0 – This command checks the - detailed status of the RAID array.
>> /var/log/raid_status.log – This appends the output to the specified log file for later review. You may opt to change the log location by modifying /var/log/raid_status.log to any preferred path (e.g., /home/user/raid_log.txt).

An example configuration would look like this:

GNU nano 7.2 /tmp/crontab.J3M99T/crontab


Edit this file to introduce tasks to be run by cron.
Each task to run has to be defined through a single line
indicating with different fields when the task will be run
and what command to run for the task
To define the time you can provide concrete values for
minute (m), hour (h), day of month (dom), month (mon),
and day of week (dow) or use '*' in these fields (for 'any').
Notice that tasks will be started based on the cron's system
daemon's notion of time and timezones.
Output of the crontab jobs (including errors) is sent through
email to the user the crontab file belongs to (unless redirected).
For example, you can run a backup of all your user accounts
at 5 a.m every week with:
0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
For more information see the manual pages of crontab(5) and cron(8)
m h dom mon dow command


0 3 * * * /usr/sbin/mdadm --detail /dev/md0 >> /var/log/raid_status.log

#Step 4: OPTIONAL - Set up Email Alerts

If desired, you can configure email notifications to receive automatic alerts in case of RAID issues, in the mdadm.conf file.

Edit the configuration file by running:
Command Line
```
sudo nano /etc/mdadm/mdadm.conf
```
Add or modify the following line to specify an email address for alerts. Replace the example with your desired email:
```
MAILADDR alerts@yourdomain.com
```
Save and update the RAID configuration using:
Command Line
```
sudo mdadm --detail --scan >> /etc/mdadm/mdadm.conf
```
Command Line
```
sudo update-initramfs -u
```
By implementing these monitoring solutions and automation methods, you can effectively ensure that your RAID arrays remain healthy and perform optimally. For further guidance on replacing failed disks, please visit our dedicate removing, replacing, and resyncing a disk guide.

Monitoring your RAID array helps to identify potential failures early, ensuring data integrity and system stability. Regular checks using tools like* mdadm*and smartmontools provide insights into disk health, performance, and potential failures.

By proactively monitoring RAID arrays, you can increase the chances of preventing unexpected downtime and the time-consuming data recovery procedures that follow.

Was this article helpful?

Thanks for the feedback!