How to Remove, Replace, and Resync a Disk in a Degraded RAID Array
Cherry Servers data centers continuously monitor RAID arrays for servers with pre-installed operating systems. In the event that we detect a RAID array degradation or failure event, we will promptly notify the customer and recommend scheduling maintenance to replace the failed disk.
The replacement process usually takes around 15-20 minutes to complete, and should not lead to data loss. As long as the remaining disks continue to function, the new disk will be able to copy the secured data, allowing the RAID array to continue functioning without data loss.
This guide covers the process for a software-type RAID array, and applies to various RAID configurations, including RAID 1, RAID 5, and RAID 10. Before you begin, please ensure that you have “smartmontools” installed, as it is required to retrieve the serial number of the failed drive.
Furthermore, please contact our dedicated support team and inform them that you will be replacing a degraded RAID disk. As part of the process, they will perform the replacement once the disk has been detached.
#Instructions to Remove, Replace, and Resync a Disk in a Degraded RAID Array
#Step 1: Check the Current RAID Array Status
Open the terminal and connect to your server.
Check the current RAID array status of your disks, using:
bash command cat /proc/mdstat
This command will display the status of your RAID arrays. The faulty drive could be marked as [F] or might have disappeared from the RAID array. You’ll see something like this:
bash output root@ijsvkuoqaf-rufkgeaosw:~# cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdb22 sda2[0] 244059136 blocks super 1.2 [2/1] [_U] bitmap: 2/2 pages [8KB], 65536KB chunk
#Mark the Faulty Disk as Failed
- As shown before, you can identify the faulty disk by running the following command:
Output
cat /proc/mdstat
If you notice that a drive is missing, use the following command to identify the missing disk:
bash command lsblk
2. Mark the disk as failed with this command. Replace "/dev/md0" with your RAID array device and "/dev/sdb2" with the faulty disk.
```bash command
mdadm --manage /dev/md0 --fail /dev/sdb2
```
A confirmation will be shown:
```bash output
root@ijsvkuoqaf-rufkgeaosw:~# mdadm --manage /dev/md0 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md0
```
#Step 3: Install “smartmontools” and Retrieve the Serial number of the Faulty Disk
-
Install “smartmontools”.
The instructions to do this using common Linux distributions are:
- For Debian/Ubuntu operating systems:
sudo apt-get install smartmontools- For CentOS/RHEL operating systems:
sudo yum install smartmontools- For Fedora:
sudo dnf install smartmontools- For Arch Linux:
sudo pacman -S smartmontools -
Find the serial number of the faulty disk using the following command, and replace “/dev/sdb2” with the faulty disk's device name:
smartctl -a /dev/sdb2 | grep "Serial Number"You should see the serial number displayed like this:
Output
root@ijsvkuoqaf-rufkgeaosw:~# smartctl -a /dev/sdb2 | grep "Serial Number" Serial Number: S3YJNC1K903109A
#Step 4: Remove the Faulty Disk from the Array
-
Use this command to remove the disk from the array:
mdadm --manage /dev/md0 --remove /dev/sdb2Confirmation will appear as:
Output
root@ijsvkuoqaf-rufkgeaosw:~# mdadm --manage /dev/md0 --remove /dev/sdb2 mdadm: hot removed /dev/sdb2 from /dev/md0 -
You can verify the disk’s removal using the same command we used at the beginning:
cat /proc/mdstat
#Step 5: Contact Support and Power off the Server
-
Contact technical support.
Please inform our dedicated support team that you have completed all the above steps and the server is ready for maintenance.
-
Power off the server with:
sudo shutdown -h now -
Our technical support team will replace the faulty drive with a new one and power the server back on.
#Step 6: Add the New Disk
-
Power on the server if it has not already been powered on.
-
Copy the partition table from a remaining operational disk to the new disk.
Use the following command to identify the good disk (e.g., /dev/sda) and the new disk (e.g., /dev/sdb), replacing the names with your actual disk names.
sfdisk -d /dev/sda | sfdisk /dev/sdb -
Verify that the partition table on the new disk matches that of the other disks in the array with:
fdisk –lYou should see something similar to this:
Output
root@ijsvkuoqaf-rufkgeaosw:~# fdisk -l Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors Disk model: Samsung SSD 860 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 8D869982-27C0-42BC-A398-BE04CD523E2F Device Start End Sectors Size Type /dev/sdb1 2048 4095 2048 1M BIOS boot /dev/sdb2 4096 488386559 488382464 232.9G Linux filesystem Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors Disk model: Samsung SSD 860 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: F05BFD55-29C9-4075-B716-195CFFB2398F Device Start End Sectors Size Type /dev/sda1 2048 4095 2048 1M BIOS boot /dev/sda2 4096 488386559 488382464 232.9G Linux filesystem Disk /dev/md0: 232.75 GiB, 249916555264 bytes, 488118272 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes -
Add the new disk to the RAID array with this command, replacing "/dev/sdb2" with the new disk's actual partition name.:
mdadm --manage /dev/md0 --add /dev/sdb2
#Step 7: Synchronize the RAID Array
You may check the synchronization status of your RAID array using the command:
cat /proc/mdstat
This may take some time depending on the size of the disks and the RAID level.
These steps should ensure minimal downtime for your server, and maintain data integrity. If you encounter any issues during these steps please don’t hesitate to contact our dedicated technical support team.