Cherry Servers

How to Monitor Software RAID on Linux Servers

Monitoring your RAID array helps to identify potential failures early, ensuring data integrity and system stability. Regular checks using tools like* mdadm*and smartmontools provide insights into disk health, performance, and potential failures.

By proactively monitoring RAID arrays, you can increase the chances of preventing unexpected downtime and the time-consuming data recovery procedures that follow.

#Instructions to Monitor Software RAID on Linux Servers

Before monitoring your RAID array, it is essential to identify its configuration. Use the following commands to determine your RAID setup. Identifying your RAID setup helps you understand the type of redundancy and performance improvements it provides.

#Step 1: Identify Your RAID Array

  1. Check active RAID devices.

    Open the terminal and run the following command to check active RAID devices for any degradation or array failures:

    Command Line
    cat /proc/mdstat
    

    Here is an example output with healthy disks:

    Outputroot@content-kit:~# cat /proc/mdstat
    Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10]
    md0 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
          249916416 blocks super 1.2 [2/2] [UU]
          bitmap: 2/2 pages [8KB], 65536KB chunk
    
    unused devices: <none>
    

    To explain this output further:

    • Personalities: Lists the available RAID types supported on the system. In this case, the system supports RAID1, RAID0, RAID6, RAID5, RAID4, and RAID10.
    • md0: Indicates the active RAID array, in this case, md0 is configured as a RAID 1 (mirroring) array.
    • Devices: The array consists of two NVMe drive partitions: nvme1n1p2 and nvme0n1p2. The numbers inside the square brackets [1] and [0] indicate their order in the array.
    • Blocks and version: The RAID array contains 249916416 data blocks and uses the super 1.2 metadata format.
    • [2/2] [UU]: This section shows the RAID member count and their status. [2/2] indicates that both disks are active, and [UU] means both disks are functioning correctly. If one disk fails, it will show [U_] or [_U], indicating which disk is degraded.
    • Bitmap: The bitmap helps track changes to the RAID set, speeding up re-synchronization by reducing unnecessary data copying. In this example, the bitmap size is 8KB, with a chunk size of 65536KB.
    • Unused devices: Indicates that no additional devices are currently unused within the RAID setup.
  2. Identify RAID partitions.

    To identify RAID partitions and their layout, run:

    Command Line
    lsblk
    

    This will visualize your disk layout, showing RAID devices, partitions, and how storage is allocated. An example is:

    Outputroot@content-kit:~# lsblk
    NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
    nvme0n1     259:0    0 238.5G  0 disk
    ├─nvme0n1p1 259:1    0     1M  0 part
    └─nvme0n1p2 259:2    0 238.5G  0 part
      └─md0       9:0    0 238.3G  0 raid1 /
    nvme1n1     259:3    0 238.5G  0 disk
    ├─nvme1n1p1 259:4    0     1M  0 part
    └─nvme1n1p2 259:5    0 238.5G  0 part
      └─md0       9:0    0 238.3G  0 raid1 /
    

    Further explained this shows:

    • NAME: Lists devices and their partitions. Here, nvme0n1 and nvme1n1 are NVMe drives, each with partitions (nvme0n1p2 and nvme1n1p2) forming the RAID array md0.
    • SIZE: Displays device capacity. Both disks are 238.5G, and md0 reflects the combined RAID size.
    • TYPE: Identifies the device type - disk for physical drives, part for partitions, and raid1 for the RAID array.
    • MOUNTPOINTS: Shows where devices are mounted. The RAID array md0 is mounted at /.
  3. Gather detailed RAID information.

    To gather detailed information about a specific RAID array, run this command, and replace /dev/md0 with your actual RAID device to retrieve crucial information such as RAID level, disk health, and recovery status:

    Command Line
    sudo mdadm --detail /dev/md0
    

    An example of this would be:

    Outputroot@content-kit:~# sudo mdadm --detail /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Tue Jan 21 09:26:48 2025
         Raid Level : raid1
         Array Size : 249916416 (238.34 GiB 255.91 GB)
      Used Dev Size : 249916416 (238.34 GiB 255.91 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent
    
        Intent Bitmap : Internal
    
        Update Time : Wed Jan 22 06:56:07 2025
              State : clean
     Active Devices : 2
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 0
    
    Consistency Policy : bitmap
    
               Name : 246013:0
               UUID : fd3e2b9a:da14efcd:73e749f8:50e44710
             Events : 911
    
        Number   Major   Minor   RaidDevice State
           0     259        2        0      active sync   /dev/nvme0n1p2
       1     259        5        1      active sync   /dev/nvme1n1p2
    

    Explanation of the output:

    • Version: The RAID metadata version, here 1.2, which defines the format used to store RAID information.
    • Creation Time: Indicates when the RAID array was created.
    • RAID Level: Specifies the type of RAID configuration; in this case, RAID 1 (mirroring).
    • Array Size: Displays the total capacity of the RAID array, which is 238.34 GiB.
    • Used Dev Size: Shows the storage utilized by each device.
    • Raid Devices / Total Devices: Number of active and total devices in the RAID setup.
    • Persistence: Confirms that the RAID superblock is persistent, meaning it retains configuration across reboots.
    • State: Displays the current status of the array, clean indicates no issues.
    • Active / Working / Failed Devices: Provides counts of functioning, operational, and failed devices, respectively.
    • Consistency Policy: Indicates that a bitmap is used to track changes and speed up rebuilds.
    • Device list: Shows associated storage devices with their respective RAID roles.
  4. Verify the RAID configuration file.

    To verify and check RAID configurations stored on your system, run:

    Command Line
    sudo cat /etc/mdadm/mdadm.conf
    

    Which should return:

    Outputroot@content-kit:~# sudo cat /etc/mdadm/mdadm.conf
    ARRAY /dev/md0 metadata=1.2 name=246013:0 UUID=fd3e2b9a:da14efcd:73e749f8:50e44710
    MAILADDR alerts@internal-mx.cherryservers.com
    

    An explanation of the output is:

    • ARRAY /dev/md0: Specifies the RAID array device managed by* mdadm*. In this case, the array is identified as /dev/md0.
    • metadata=1.2: Indicates the metadata version used to store RAID configuration details. The metadata helps the system recognize and rebuild the RAID array upon reboots.
    • name=246013:0: This field assigns a unique name to the RAID array, which can help track and manage multiple RAID arrays.
    • UUID=fd3e2b9a:da14efcd:73e749f8:50e44710: The unique identifier assigned to the RAID array. This UUID identifies the correct array, even if the device name changes.
    • MAILADDR alerts@internal-mx.cherryservers.com: Defines the email address where notifications and alerts regarding RAID events (such as failures or degradations) will be sent. Importance of the configuration:

    The mdadm.conf file ensures that the RAID array is assembled automatically during system boot. The MAILADDR setting allows system administrators to receive critical RAID alerts proactively, helping to prevent data loss. For more details on creating and managing RAID arrays, refer to our dedicated guide to creating different types of RAID arrays.

#Step 2: Monitor Your RAID Arrays

Once you have identified your RAID setup, the next step is to monitor it continuously to ensure optimal performance and prevent unexpected failures.

  1. Install monitoring tools.

    To monitor RAID health, you will need to install the necessary tools using the package manager for your Linux distribution. The commands for popular Linux distributions are:

    • Debian/Ubuntu-based distributions:
    Command Line
    sudo apt update && sudo apt install mdadm smartmontools -y
    
    • RHEL/CentOS-based distributions:
    Command Line
    sudo dnf install mdadm smartmontools -y
    

    Or for older CentOS versions:

    Command Line
    sudo yum install mdadm smartmontools -y
    
    • Arch Linux:
    Command Line
    sudo pacman -S mdadm smartmontools --noconfirm
    
    • openSUSE:
    Command Line
    sudo zypper install mdadm smartmontools
    

    Once the tools are installed, you can check the status and health of your RAID array. The following are crucial commands for monitoring your RAID status.

  2. Check RAID sync and failures. To detect any degraded or syncing issues in the RAID array in real time, run:

    Command Line
    cat /proc/mdstat
    

    Check disk health with smartmontools. The smartctl utility provides detailed health reports for individual RAID disks. Run this command, replacing /dev/sda with the appropriate disk identifier for your system (e.g., /dev/nvme0n1 or /dev/sdb).

    Command Line
    sudo smartctl -a /dev/sda
    

    Some key things to watch here for are:

    • Overall health status (e.g., PASSED or FAILED)
    • Disk temperature and SMART attributes
    • Reallocated sectors and potential failure indicators
    • -H – Quick health check of the disk.
    • -i – View basic disk information (model, serial, firmware).
    • -t short|long – Perform self-tests to detect errors.
    • -l error – Display recent error logs.

    You can identify your drives using the lsblk command:

    Command Line
    lsblk
    

#Step 3: Automate Monitoring with Cron Jobs

  1. To ensure regular monitoring, you can automate checks using cron jobs. To start, run:

    Command Line
    crontab -e
    

    If it's your first time using crontab, you will be prompted to select an editor.

    Outputroot@content-kit:~# crontab -e
    no crontab for root - using an empty one
    
    Select an editor. To change later, run 'select-editor'.
      1. /bin/nano        <---- easiest
      2. /usr/bin/vim.basic
      3. /usr/bin/vim.tiny
      4. /bin/ed
    
    Choose 1-4 [1]:
    
  2. Add the following entry to check RAID health daily at 3 AM, and log it. Ensure that you replace /dev/md0 with your actual RAID array (e.g., /dev/md127):

    0 3 * * * /usr/sbin/mdadm --detail /dev/md0 >> /var/log/raid_status.log
    

    To break this down:

    • *0 3 * * ** – This specifies the schedule for running the command; - 0 – Minute (0 minutes past the hour); 3 – Hour (3 AM); * * * – Every day, every month, and every day of the week.
    • /usr/sbin/mdadm --detail /dev/md0 – This command checks the - detailed status of the RAID array.
    • >> /var/log/raid_status.log – This appends the output to the specified log file for later review. You may opt to change the log location by modifying /var/log/raid_status.log to any preferred path (e.g., /home/user/raid_log.txt).

    An example configuration would look like this:

    GNU nano 7.2 /tmp/crontab.J3M99T/crontab
    
    
    Edit this file to introduce tasks to be run by cron.
    Each task to run has to be defined through a single line
    indicating with different fields when the task will be run
    and what command to run for the task
    To define the time you can provide concrete values for
    minute (m), hour (h), day of month (dom), month (mon),
    and day of week (dow) or use '*' in these fields (for 'any').
    Notice that tasks will be started based on the cron's system
    daemon's notion of time and timezones.
    Output of the crontab jobs (including errors) is sent through
    email to the user the crontab file belongs to (unless redirected).
    For example, you can run a backup of all your user accounts
    at 5 a.m every week with:
    0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
    For more information see the manual pages of crontab(5) and cron(8)
    m h dom mon dow command
    
    
    0 3 * * * /usr/sbin/mdadm --detail /dev/md0 >> /var/log/raid_status.log
    

#Step 4: OPTIONAL - Set up Email Alerts

If desired, you can configure email notifications to receive automatic alerts in case of RAID issues, in the mdadm.conf file.

  1. Edit the configuration file by running:

    Command Line
    sudo nano /etc/mdadm/mdadm.conf
    
  2. Add or modify the following line to specify an email address for alerts. Replace the example with your desired email:

    MAILADDR alerts@yourdomain.com
    
  3. Save and update the RAID configuration using:

    Command Line
    sudo mdadm --detail --scan >> /etc/mdadm/mdadm.conf
    
    Command Line
    sudo update-initramfs -u
    

    By implementing these monitoring solutions and automation methods, you can effectively ensure that your RAID arrays remain healthy and perform optimally. For further guidance on replacing failed disks, please visit our dedicate removing, replacing, and resyncing a disk guide.

    Monitoring your RAID array helps to identify potential failures early, ensuring data integrity and system stability. Regular checks using tools like* mdadm*and smartmontools provide insights into disk health, performance, and potential failures.

    By proactively monitoring RAID arrays, you can increase the chances of preventing unexpected downtime and the time-consuming data recovery procedures that follow.

No results found for ""
Recent Searches
Navigate
Go
ESC
Exit
We use cookies to ensure seamless user experience for our website. Required cookies - technical, functional and analytical - are set automatically. Please accept the use of targeted cookies to ensure the best marketing experience for your user journey. You may revoke your consent at any time through our Cookie Policy.
build: 920a9a1ae.1622