Monday, June 9, 2008

An Introduction To Raid

Writen by Gary Hendricks

Capacity, reliability, and performance are important for file servers or other machines where you're storing large or important files. Disk drives are vulnerable to failure, though, and when they do fail, data written since the last backup is lost. Disks have limitations on how fast they can go, although disk speed is only a limitation for heavily loaded servers.

What is RAID?
You can get much greater capacities, avoid losing data from disk failure, and do all that at reasonable cost using a technology called Redundant Array of Inexpensive Disks (RAID), invented at the University of California at Berkeley by D. A. Patterson, G. Gibson, and R. H. Katz. The industry also uses the phrase Redundant Array of Independent Disks, so you'll probably see both. RAID uses conventional disks with specialized host adapters to change how data goes onto your disk.

What RAID Does
The idea behind RAID is to take the conventional disks in personal computers and gang them together in parallel. The resulting assembly gives you the low cost of disks manufactured in high volume plus good reliability and a multiplier on the performance of individual disks.

The host adapter (frequently called a controller in RAID systems) sits between one high-rate data stream (on the computer side) and several lower-rate streams (on the disk side). When the computer writes to the disk, the host adapter takes high-rate data and breaks it into multiple synchronized streams, one for each disk, in a process called striping. Reads by the computer cause the host adapter to take a data stream from each disk, multiplex the set of streams into one stream, and send that resulting stream on to the computer.

In the example shown below, the one high-speed stream splits into four separate disk data streams at one-fourth the rate of the combined stream.

There are six different levels of RAID functionality. The simplest RAID system, RAID level 0, merely stripes the data onto multiple disks for better performance. There is no overhead for redundant data storage and no protection against failure. The highest level is RAID 5, which provides both striping for performance and redundancy for failure protection.

RAID Level 0
RAID level 0 spreads the data stream across multiple disks. You can get a similar effect to that of RAID 0 by having multiple disks and can use features in Windows 2000 or Windows XP to simulate RAID in the operating system. Suppose your computer sends a sequence of data to a RAID 0 host adapter connected to two disks. The host adapter will interleave the data to the two drives, sending odd blocks to one drive and even blocks to the other.

Because the data volume and rate to any specific disk is a fraction of the aggregate, you get better capacity and performance from RAID 0 than from a single conventional disk. There is no error correction or redundant data written to the array, however, so RAID 0 cannot survive a disk failure. You would use RAID 0 only in situations where you needed the capacity or performance gain, but not the enhanced data reliability.

RAID Level 1
In the same way that RAID 0 focuses solely on capacity and performance with no concession to reliability, RAID 1 focuses on reliable data storage with no concession to capacity or performance. RAID 1, also called disk mirroring, uses disks in pairs with both disks of a pair storing the identical data. The redundant copy protects your data against hardware failures, but you're still vulnerable to user error deleting important files.

Suppose your computer sends a sequence of data to the RAID 1 host adapter connected to two disks. The host adapter will write all the data to each of the two drives. The identical data is stored on both drives, so if one fails, the data is still available. The operation completes when both drives have written the data, so the write can take longer than for one disk alone because of delays for unsynchronized rotation and for I/O bus contention.

RAID 1 offers better reliability than RAID 0 or conventional disk setups, but does not increase performance.

RAID Level 2, Level 3, and Level 4
RAID 2 adds one or more disks to hold an error correction code with which lost data from a failed disk can be reconstructed. When your computer sends a sequence of data to a RAID 2 host adapter connected to two data disks and an ECC disk, the host adapter interleaves the data to the two data drives. Odd blocks go to one drive, and even to the other. The host adapter computes the error correction code for the data written to the data drives and writes it to the ECC drive.

RAID 3 is the same as RAID 2, except that it uses a simpler code — parity instead of ECC. RAID 3 has the same small-transfer performance limitations of RAID 2, but less storage overhead.

RAID 4 is nearly the same as RAID 3, but instead of striping across disks at the byte level, it operates at the sector level. This makes RAID 4 like RAID 2 except that it uses parity rather than ECC, and it interleaves sectors. RAID 4 therefore has good data reliability and storage efficiency, as do RAID 2 and 3, and retains fast writes for large data blocks.

RAID Level 5
RAID 5 is the same as RAID 4, except that instead of dedicating a single disk to storing parity, the parity data stream is striped across all the disks along with Suppose your computer sends a sequence of data to a RAID 5 host adapter connected to four disks. The host adapter interleaves the data to the drives, ensuring that no one drive ever holds two blocks of a group protected by a parity block.

The host adapter inserts the new parity information in the data stream that it sends to the disks, mixing the parity information in with the original data. As long as there is at least one more disk than there are original data streams, the loss of a disk can take out only one data stream, and so parity is enough to regenerate the lost data.

Conclusion
RAID technology can be difficult to understand, especially for the beginner. Do go through the above tips to understand it so that you can make better purchase decisions when building your next computer system.

Gary Hendricks runs a hobby site on building computers. Visit his website at http://www.build-your-own-computers.com for tips and tricks on assembling a PC, as well as buying good computer components.

No comments: