n a recent recovery, DriveSavers tackled a challenging RAID 5 failure. The 4x3TB array had entered Emergency Mode. The failure jeopardised 30 years of invaluable personal and professional data for the customer.
RAID Glossary
By Mike Cobb, Director of Engineering
Thinking about using RAID configurations in your data storage strategy but not sure what the alphabet soup acronym means, and which type to use? Start here and learn the basic language.
RAID DEFINED
RAID is shorthand for redundant array of independent disks, and refers to drives that work together to store data across two or more disks. The term dates back to 1987, when University of California, Berkeley researchers noted that mainframe computers could benefit from the “inexpensive disks” then flooding the personal computer market. Indeed, the original paper referred to “redundant array of inexpensive disks.” This method allows data to be mapped across disks to provide redundancy. The technology was first applied to storage media for mainframes, with the idea being that if any one of the drives failed, the data would be preserved. While RAID initially applied to hard disks, SSD storage can be configured in this format as well, either with other SSDs or with hard disks.
At its most basic, RAID describes how data can be distributed across different storage devices. How data is distributed or accessed will depend upon which level you choose.
HOW RAID WORKS
In nearly all levels of RAID, the technology relies on interpreting “parity” in service of “fault tolerance.” Parity refers to error protection. Fault tolerance allows a system to operate in the event of a component failure. Those levels with parity rely on striping blocks of data across two or more drives.
There are six core levels—0, 1, 5, 6, 10 and 50. Levels 2, 3 and 4 are either obsolete or rarely used. In all of these, the idea is that multiple storage devices are represented as a single drive volume. Depending upon the implementation, you will see an increase in the total volume storage capacity based on the use of multiple disks. (Only RAID 1 mirrored results in a net reduction of capacity.) However, total useable storage capacity will always be 25–50% less than the actual available space.
SOFTWARE AND HARDWARE RAID
RAID requires a disk controller or software that governs how the data should be written to disk. The hardware option uses the physical RAID controller, which includes a dedicated CPU. Software RAID is built into the operating system, where specific operating systems may support specific levels and nestings. A solution may be hardware, software or a combination.
WHY USE RAID?
The reasons for choosing to use RAID over other data storage approaches comes back to the technology’s use of “parity” to provide fault tolerance. With RAID, you have a level of data redundancy that’s designed to give you security that your data will be safe in the event of a hardware (storage media) failure. This type of system won’t negate the concept of needing a second backup but it does offer an extra safeguard. It’s a good hedge to protect against hardware failure, as opposed to data corruption caused by the operating system or other software glitch.
RAID’S ROLE IN A BACKUP STRATEGY
It cannot be stressed enough that using a RAID is not a backup strategy unto itself. Think of it as a data survival strategy as opposed to a backup strategy. Depending upon the level you use, RAID can provide a performance boost or an additional layer of protection against a hardware failure; however, this technology should not be equated with backup. It’s just one component within a backup strategy that should include cloud backup and a second hardware backup as well. Particularly for home- and small-office implementations, these redundant systems are now easier to back up than it used to be, thanks to the ever-increasing hard drive capacity and how some NAS boxes include a USB port for backing up the array to another device.
Parity
Parity, or parity bits, are binary code that act as checksums for tracking errors in data transmissions. A data bit (a 1 or a 0) gets added to the end of each data block, creating either odd or even bits, and enables a damaged disk to be reconstructed without any loss of data. For a RAID to be functioning properly, it will go through error checking algorithms that ensure data integrity and detects errors using logical data operations.
Fault Tolerance
In RAID, the idea is that fault tolerance comes from the fact that data parity is spread across two or more drives—or can be on one dedicated “parity drive,” depending upon the RAID level. Should a designated data drive fail, the parity data can be used to rebuild the data—and rebuild the drive array.
In levels 1 and higher, there is less usable storage than the actual drives add up to. In a similar principle, RAID levels work based off of the lowest common denominator, i.e. the smallest drive capacity in play. That means that if, for some reason, you mix a 1TB drive with three 2TB drives, all drives will only act as if they’re using the 1TB of space. This is less of an issue than it used to be, given the ever-dropping cost of storage, but it remains a point of interest that you’ll maximize your storage capacity by using drives with matching capacity.
RAID LEVELS EXPLAINED
RAID has, historically, been one of the more complex and confusingly described technologies in storage. While RAID is generally recognized to include levels 0, 1, 5, 6, 10 and 50, 0, 1 and 10 do not provide the same level of redundancy and fault tolerance of 5, 6 and 50. So let’s consider the different levels and which ones are no longer relevant.
RAID 0
Often referred to as “data striping,” this is the other RAID level that’s a misnomer since it inherently lacks redundancy and parity. The total capacity of a RAID 0 volume equals the capacity of the drives paired together. For example, if two 2TB drives are striped together in RAID 0, they would become a single 4TB volume. If one drive fails, the entire volume fails, since two drives “striped” together are seen as a single volume. In this example, the data is written evenly across both drives, which boosts both the read and write throughput. This setup is often used for storage in performance-driven systems, such as gaming PCs, as well as for live streaming and video-on-demand applications in which data reliability takes a backseat to throughput. Works with a minimum of two disks.
RAID 1
Known as “data mirroring,” this level also lacks redundancy. RAID 1 mirrored involves data being written to a pair of drives, hence why it’s referred to as “mirroring.” You don’t look to this type of setup for performance enhancements; rather, you can expect slower write performance since data has to be written to both drives simultaneously. Since this setup lacks parity and striping, it’s not well-suited for backup. If one drive fails, the data is duplicated on the second drive and can be rebuilt to a new disk. However, if one drive ends up with a problem due to a software corruption, that will be reflected on the mirrored disk, thereby causing the failover storage RAID 1 provides moot. This setup works with two or more pairs of disks.
RAID 2
As of 2014, this level was no longer being used commercially.
RAID 3
This level is another relic of the past; while it does exist in the wild, but is otherwise not a commonly used setup. RAID 3 is considered a byte-level, dual parity system, with parity data written to two locations per the SNIA definition of the term. Disks are synchronized while in rotation, and the data stripe size scales up to the size of the exported block size. Also not commonly used.
RAID 4
RAID 4 another varietal that is no longer in vogue. It is described as block-level striping, with a dedicated parity drive. This is the first level where multiple input/output read operations can happen in parallel, as opposed to having one read operation across all drives. This results in better performance over levels 2 and 3, for example, especially when dealing with small files. Since every write operation requires a write to the parity drive, the parity drive can experience more wear and tear than the data drives.
RAID 5
This RAID level supports both fault tolerance and parity, and was designed as an alternative to RAID 4’s dedicated parity drive. With this setup, the array stripes parity data across all drives, reserving the equivalent of a drive’s capacity across all drives in the array so data can be parsed among the drives. By distributing the parity bits, it reduces wear-and-tear on any one drive—parity bits would rotate evenly among the drives and improves the read performance since data is accessed from multiple disks (although, with servers and large data sets, parity can slow write performance). The data stripe size is at least the same size as the exported block size, but could exceed the block size.
RAID 5 uses an extra disk to rebuild lost data, which means the minimum disk configuration is three drives. This will protect against a single drive failure; however, there is one gotcha: After you replace a failed drive, this setup puts an extreme load on the other drives as it works to rebuild the array, which in turn could lead to a second drive failure—particularly if the reconstruction process encounters an unrecoverable read error rate on a drive (at which point, reconstruction ends). It is for this reason that RAID 5 is not recommended for use with enterprise storage, though it is well-suited for file storage servers, including home and small-office NAS systems. This setup requires a minimum of three disks.
RAID 6
This could be considered the most fault tolerant varietal but it also provides the lowest usable capacity. Like RAID 5, 6 uses block-level data striping; however, it differs from by having what’s called “double parity,” or an extra parity block. The use of double parity works to provide fault tolerance for two drives, as opposed to the one drive in RAID 5. This approaches against the possibility of both a drive failure and an unrecoverable read error, and is why RAID 6 is especially appropriate for larger disk capacities, file storage servers and application servers. Four disks is the minimum number for this setup.
RAID 10
Some would define RAID 10 as the first of the “nested” RAID levels, and is often alternately referred to as 1+0 or 0+1, depending on how it is implemented. This setup offers the best combination of security and performance by combining the striping of RAID 0 with the data mirroring of RAID 1.
If RAID 10 is used to stripe mirrored drives, the mirroring nomenclature comes first, and it is referred to as 1+0. If it is used to accomplish the reverse, where you’re mirroring striped drives, the striped nomenclature dominates and it is called 0+1. As with striping in RAID 0, RAID 10 will exhibit performance improvements over straightforward mirroring. To use this setup in either nested variation, you’ll need a minimum of four disk drives, and your maximum capacity will be the equivalent capacity of two of those drives.
RAID 50
Like RAID 10, RAID 50 is a nested level. It may also be referred to as 5+0. This setup combines striping with distributed parity by striping data across two RAID 5 arrays. This combination allows up to four failed drives, provided each failed drive is on a different RAID 5 within the 5+0. However, if multiple drives fail on the same array, you may not be so lucky.
Further Reading
Interested in learning more about RAID? Here are some articles you may find helpful.
RAID Data Recovery Guide and Tips
BACK UP YOUR DATA
Backing up your data seems like a no-brainer, and it should be! But there are still people out there who don’t.
According to Backblaze, 76 percent of users do back up their information. That’s not too shabby, but in reality most of those users only back up once a year. Think about everything you generate in a year’s time. Do you really think protecting your data only once a year is OK?
With easy access to personal cloud storage, there’s really no excuse not to back up your data. There are even several applications available to automatically handle the task. This is a “set it and forget it” situation, meaning you don’t even have to get involved! Or you can stick to the old-fashioned way, and use a good old external hard drive. It really doesn’t matter how you back up your data, just do it!
Of course, no matter how you choose to back up your information, understand a RAID array is not a backup plan. So get on a backup schedule, follow that schedule and protect your data!
PREPARE FOR FAILURE
The question is not if your hard drive will fail, but when. In the case of RAID hard drives, they’re usually comprised of hard drives created at the same time, with the same materials, by the same manufacturer. It stands to reason they’ll all probably fail at or around the same time.
Not even three or more hard drives can protect your information if they fail all at once. So keep your data safe by regularly monitoring the status of the RAID drives. Track your RAID array use, and create a plan of action to swiftly respond to any RAID failure.
If you notice one RAID array failure, replace the drive immediately. Wait too long, and you may have two or three hard drives down by the time you get around to fixing it. When that happens, for the best chance of recovering your data, definitely call in a data recovery professional.
TOO MANY FAILED DISKS
When you’re confronted with a failing RAID hard drive, you may be tempted to perform a rebuild operation. This is a process where data is copied to a different drive while the failed drive is replaced. After the fix, the data is restored, and you can get back to work.
You may have success with a rebuild if there’s only one disk to replace. In such cases, RAID hard drives usually have a “hot plug” where hard drives can be (somewhat) easily switched out. Once the bad drive is replaced, the user can run the rebuild operation and essentially start over without losing any data.
But this solution isn’t completely foolproof. There’s always the chance of something going wrong during the rebuild that causes additional damage. Before you do something you can’t undo, consider using a data recovery specialist to extract your information.
When two or more disks fail on your RAID hard drive, your best bet to prevent data loss is to send the drive to a data recovery specialist. If you attempt to replace more than one RAID drive yourself and run a rebuild, you risk losing your data forever. So don’t!
TAKE A PICTURE, IT LASTS LONGER
If you do attempt to rebuild your RAID hard drive, create an image of the hard disks’ content before you begin to rebuild.
Again, rebuilds aren’t a guarantee, especially when you take on the task by yourself. But with an image of the content, you stand a chance of recovering the data should the rebuild fail. In that situation, a data recovery specialist can use your images to recreate the RAID drive. Just be sure to label the content.
While there are many advantages to using a RAID hard drive, no piece of hardware is ever perfect. Like all technology, they’re fallible, complex and will, at some point, fail.
Take steps to prevent RAID failure and data loss. But if you do find yourself in need of RAID drive data recovery, contact Drivesavers to get your data back!