How to Remove, Replace, and Resync a Disk in a Degraded RAID Array

Copy page View as Markdown Open in ChatGPT Open in Claude Open in Perplexity

Cherry Servers data centers continuously monitor RAID arrays for servers with pre-installed operating systems. In the event that we detect a RAID array degradation or failure event, we will promptly notify the customer and recommend scheduling maintenance to replace the failed disk.

The replacement process usually takes around 15-20 minutes to complete, and should not lead to data loss. As long as the remaining disks continue to function, the new disk will be able to copy the secured data, allowing the RAID array to continue functioning without data loss.

This guide covers the process for a software-type RAID array, and applies to various RAID configurations, including RAID 1, RAID 5, and RAID 10. Before you begin, please ensure that you have “smartmontools” installed, as it is required to retrieve the serial number of the failed drive.

Furthermore, please contact our dedicated support team and inform them that you will be replacing a degraded RAID disk. As part of the process, they will perform the replacement once the disk has been detached.

#Instructions to Remove, Replace, and Resync a Disk in a Degraded RAID Array

#Step 1: Check the Current RAID Array Status

Open the terminal and connect to your server. Check the current RAID array status of your disks, using: bash command cat /proc/mdstat This command will display the status of your RAID arrays. The faulty drive could be marked as [F] or might have disappeared from the RAID array. You’ll see something like this: bash output root@ijsvkuoqaf-rufkgeaosw:~# cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdb22 sda2[0] 244059136 blocks super 1.2 [2/1] [_U] bitmap: 2/2 pages [8KB], 65536KB chunk

#Mark the Faulty Disk as Failed

As shown before, you can identify the faulty disk by running the following command:
```
Outputcat /proc/mdstat
```

If you notice that a drive is missing, use the following command to identify the missing disk: bash command lsblk 2. Mark the disk as failed with this command. Replace "/dev/md0" with your RAID array device and "/dev/sdb2" with the faulty disk.

```bash command
mdadm --manage /dev/md0 --fail /dev/sdb2 
```
A confirmation will be shown:
```bash output
root@ijsvkuoqaf-rufkgeaosw:~# mdadm --manage /dev/md0 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md0
```

#Step 3: Install “smartmontools” and Retrieve the Serial number of the Faulty Disk

Install “smartmontools”.

The instructions to do this using common Linux distributions are:
- For Debian/Ubuntu operating systems:
Command Line
```
sudo apt-get install smartmontools 
```
- For CentOS/RHEL operating systems:
Command Line
```
sudo yum install smartmontools 
```
- For Fedora:
Command Line
```
sudo dnf install smartmontools 
```
- For Arch Linux:
Command Line
```
sudo pacman -S smartmontools
```
Find the serial number of the faulty disk using the following command, and replace “/dev/sdb2” with the faulty disk's device name:
Command Line
```
smartctl -a /dev/sdb2 | grep "Serial Number" 
```
You should see the serial number displayed like this:
```
Outputroot@ijsvkuoqaf-rufkgeaosw:~# smartctl -a /dev/sdb2 | grep "Serial Number"
Serial Number:    S3YJNC1K903109A
```

#Step 4: Remove the Faulty Disk from the Array

Use this command to remove the disk from the array:

mdadm --manage /dev/md0 --remove /dev/sdb2

Confirmation will appear as:

Outputroot@ijsvkuoqaf-rufkgeaosw:~# mdadm --manage /dev/md0 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md0

You can verify the disk’s removal using the same command we used at the beginning:
Command Line
```
cat /proc/mdstat 
```

#Step 5: Contact Support and Power off the Server

Contact technical support.

Please inform our dedicated support team that you have completed all the above steps and the server is ready for maintenance.
Power off the server with:
Command Line
```
sudo shutdown -h now 
```
Our technical support team will replace the faulty drive with a new one and power the server back on.

#Step 6: Add the New Disk

Power on the server if it has not already been powered on.
Copy the partition table from a remaining operational disk to the new disk.

Use the following command to identify the good disk (e.g., /dev/sda) and the new disk (e.g., /dev/sdb), replacing the names with your actual disk names.
Command Line
```
sfdisk -d /dev/sda | sfdisk /dev/sdb 
```

Verify that the partition table on the new disk matches that of the other disks in the array with:

fdisk –l

You should see something similar to this:

Outputroot@ijsvkuoqaf-rufkgeaosw:~# fdisk -l
Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 8D869982-27C0-42BC-A398-BE04CD523E2F

Device       Start       End   Sectors   Size Type
/dev/sdb1     2048      4095      2048     1M BIOS boot
/dev/sdb2     4096 488386559 488382464 232.9G Linux filesystem

Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 860
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F05BFD55-29C9-4075-B716-195CFFB2398F

Device       Start       End   Sectors   Size Type
/dev/sda1     2048      4095      2048     1M BIOS boot
/dev/sda2     4096 488386559 488382464 232.9G Linux filesystem

Disk /dev/md0: 232.75 GiB, 249916555264 bytes, 488118272 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Add the new disk to the RAID array with this command, replacing "/dev/sdb2" with the new disk's actual partition name.:
Command Line
```
mdadm --manage /dev/md0 --add /dev/sdb2 
```

#Step 7: Synchronize the RAID Array

You may check the synchronization status of your RAID array using the command:

cat /proc/mdstat

This may take some time depending on the size of the disks and the RAID level.

These steps should ensure minimal downtime for your server, and maintain data integrity. If you encounter any issues during these steps please don’t hesitate to contact our dedicated technical support team.

Was this article helpful?

Thanks for the feedback!