I think a long time ago I posted about external disks...it seems that more and more people think they are bad news compared to the longevity of internally mounted disks?
Okay, the following is a bit choppy, because I'm trying to describe a few random concepts. Sorry about this:
RAID1 is easier to understand as it's a simple copy of the data. If one disk dies, we know the other contains an exact copy. I rarely use RAID1 because of its storage efficiency, described later.
My most recent failure was a Hitachi P7K500 500G disk. It just totally would no longer spin up. I had the disk for 4 weeks, fortunately everything on it was redundant (in two ways -- it's part of a 4-disk Linux Software RAID5 as well as the RAID5 being a copy of another RAID5 array that will be replaced by the former). The recovery of the affected array was fairly smooth -- data on it was still totally accessible as if nothing happened, and the "rebuilding" of the array when I insert the replacement disk was likewise transparent minus the performance hit.
RAID5/6 Parity is a weird concept, it requires a bit of math knowledge to understand. The function "XOR" is quite a weird function as if you change any one input, the output will change like two 3-way switches on a light bulb. In fact the 3-way switches work with a lamp is pretty much a great metaphor as to how the "XOR" function works.
Let's say we have two 3 way switches and a lamp and nothing is obscured from view (obscured will be defined as the "broken" phase, as if we "lost" data). Because of the special property of these 3 way switches, flipping any one switch will change the light. The two switches are "data" and the light bulb is "Parity" - metadata computed from the switch "data". In a normal working system we can program the switches any way we want, that is our "data". The lamp lights or turns off depending on how we set the switches. Let's say that having both switches are set the same, both up or both down, will turn on the light.
Now lets say we have a "failure". In this example we're not saying that the switch breaks or the lamp burns out as a "failure" but rather define it as the switch or light bulb being obscured from view. Here's the magic: If we see the lamp is off, and switch 2 is in the "up" state, and we can't see (as in we lost data) switch 1. Do we have enough data from this system to know whether switch 1 was "up" or "down"? By the above 3-way system, by golly we do know if switch 1 was "down" because the light is off, without looking at the switch -- we were able to recover the state of the switch based on data we could see.
On a corollary to this, if the lamp is 'on' and we can't see either switch 1 or switch 2, can we predict what the status of the two switches? In this case we lost data. We know that both switches must be either both up or both down, but there's no way to tell which way they are for certain.
RAID3 is most like this example. RAID5 works similarly at a different level, but is a bit different as the switches and lights are spread across the disks. And just like a real 3-disk RAID 5 following the 'corollary' case of the light and switches, if two disks die, we lose data.
The major differences between using RAID1 vs RAID5 is storage efficiency. In RAID1 we lose 50% capacity to redundancy (one whole disk spent on copying another). RAID5 is (N-1)/N efficiency where in a 3- disk system we lose 33% of our total disk space to redundancy information
While it seems like raising N to a large number will make our efficiency asymptotically reach 100%, realize the risk that if one disk dies in a RAID5, we've lost redundancy and if a second disk breaks, we lost data on the whole array. As the number of disks increases, the chance of one of the disks breaking increases. As the size of disks go up and reliability of disks do not change, the current trend is that the max number of disks in the array is coming down, forcing us to use less storage efficient arrays.