Discussion Forums > Technology

Triple-parity RAID

<< < (2/3) > >>

mgz:
i think its kinda silly to even discuss it, to some people and for data to some, there is no such thing as being to careful

nstgc:
The discussion isn't on individuals using it. The discussion is for corporations, and, as Proin mentioned, the military/government.

kureshii:

--- Quote from: nstgc on December 22, 2009, 02:37:07 PM ---I'm far from an expert, but that doesn't make that much sense. If the problem is errors that accumulate during the reconstruction time, which are getting exponentially longer, then nested arrays should take care of that. I remember going over the calculations once for fairlure rate in a nsted system versus an n-parity system. I can't remember exactly what I came to, but concluded that a nested system was better. I did not take reconstruction time into consideration, however. If, as you said, the size of drives is increasing at a great rate, then it seems that the best thing to do is not use RAID 5/6 at all for the top level. The problem with a nested array is the hardware to run it. With more controllers, more things can go wrong. If you have a duplex of two nested arrays (lets say a large RAID 6 arrar of small RAID 5s with a spare RAID 5 array) wouldn't that take care of it. The chances of that many HDD dying seems very slim, but the chances of enough HDD and a controller (which may be harder to detect in advance) seems like a possibility.
--- End quote ---
We've been through nested RAID before, and you can probably add to that thread, but the inevitable conclusion is with more nested levels, it simply becomes cost-ineffective to be using entire RAID arrays for parity data.So that's not a scalable solution (or not a cost-effective one at scale, anyway).

As far as I can tell from the article, the 2 main issues are:
1) Throughput not increasing at the same scale as disk capacity
2) Scalable efficient n-level RAID algorithm not in existence yet

The first issue leads to longer RAID rebuild times (and not errors accumulating during rebuild)), and for sufficiently large arrays the rebuild time can stretch to days, or even weeks. The article mainly attributes this to the low throughput of the drives relative to their capacity. Many degraded RAID arrays are left to run in a degraded state while they're still being rebuilt (because for some enterprises the point of a RAID is maximum uptime); that does not leave much throughput available for RAID rebuilding.

With RAID rebuild times stretching that long, the chance of a second or even third disk failing during the rebuild period increases as well. And hence the increasing need for triple-parity RAID. Extrapolating from that, a scaleable n-level RAID algorithm (the second issue) would eliminate the need to have to write a new RAID algorithm each time an additional parity level is required.

Additionally, there are other factors affecting rebuild times, such as time required for a human operator to procure a new disk and insert it into the array, but let's ignore those for now.



So at heart, it's really a throughput problem. Low throughput (relative to disk capacity) leads to longer rebuild times, longer rebuild times lead to increased risk of additional disks failing during rebuild, higher level parity needed to offset increased risk of additional disks failing.

Or to put things in perspective, it takes ~3.5hrs to fill a 2TB hard disk at 150MB/s write speed (I assume you're all astute enough to correct me on the issues and not the numbers). In the near future, we'll have 4TB disks, but disk read/write speeds are hardly doubling; a 4TB disk will take 7 hours to fill, and so on.

But I don't find throughput issues that interesting, which is why I'm focusing on the second issue: scaleable n-level RAID algorithms, and the associated computational costs.


--- Quote from: nstgc on December 22, 2009, 11:52:15 PM ---One thing I don't understand is why you can't just use something like par2 (a archive backup program that uses a Read-Solomon ring), but with drives. It scales very well.

--- End quote ---
par2 helps repair corrupted data, it doesn't increase drive reliability, or prevent the whole array going down when the disk fails.

nstgc:
And what about my suggestion to use SDD as a auto-swap drive?

I understand that any numbers anyone provides would be worthless (even though I gave some already) since we are talking some arbitrary distance into the future.

I didn't read that entire "RAID over RAID" thread, however I still hold to the belief that you should at least have two levels so you can swap out an array if need be. I don't know if that was addressed, I'll look it up tomorrow (its 1:26 am here and I'm quite tired).

[edit] Another option I just thought of that scales as well as n-parity RAID would be to make the spare a RAID instead of a drive. Admittedly it seems (is clearly) more costly, but it is a technology that exists now, and would work. I doubt any of us are actually actively trying to solve this problem for personal gain, so I think this is a valid option in Fairyland.

kureshii:

--- Quote from: nstgc on December 22, 2009, 02:37:07 PM ---I'm far from an expert, but that doesn't make that much sense. If the problem is errors that accumulate during the reconstruction time, which are getting exponentially longer, then nested arrays should take care of that. I remember going over the calculations once for fairlure rate in a nsted system versus an n-parity system. I can't remember exactly what I came to, but concluded that a nested system was better. I did not take reconstruction time into consideration, however. If, as you said, the size of drives is increasing at a great rate, then it seems that the best thing to do is not use RAID 5/6 at all for the top level. The problem with a nesteded array is the hardware to run it. With more controllers, more things can go wrong. If you have a duplex of two nested arrays (lets say a large RAID 6 arrar of small RAID 5s with a spare RAID 5 array) wouldn't that take care of it. The chances of that many HDD dying seems very slim, but the chances of enough HDD and a controller (which may be harder to detect in advance) seems like a possibility.
--- End quote ---
It seems like a big waste of parity data to me. Despite using the equivalent of 2 whole arrays as well as 1 disk in each of the remaining arrays for parity, the failure of at least 2 disks in at least 3 arrays would bring the whole nested RAID down.

Of course, it is not likely that such a thing would happen, but what can be accomplished with such a setup that can't be accomplished with a 7-level RAID and less hardware?



I don't get what you mean by using a silicon disk drive as swap. What advantages would that provide? Because if the sole advantage is a lower risk of failure, it still does not address the issues of time for RAID rebuilding.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version