Discussion Forums > Technology

RAID over RAID

(1/3) > >>

Xiong Chiamiov:
So, I thought I'd share my RAID setup with y'all.  I have three hardware RAID 5s of three disks each, and then I software RAID 5 those.  Awesome, neh?

You should all try it.  Now.

kureshii:
Math question time!

Let's start by defining some terminology:

Level 1 RAID: 5 disks in RAID5
Level 2 RAID: 5 Level 1 RAIDs in RAID5
Level 3 RAID: 5 Level 2 RAIDs in RAID5

and so on...

Notice that a full Level 1 RAID will have 1 disk of parity data (divided among the 5 disks, of course). A full Level 2 RAID will have 9 disks of parity data (1 Level 1 RAID worth, and an additional 4 disks worth).

The question: In a full N-level RAID, how many disks of parity data will there be? Give a formulation in terms of N.

Additional question: What level RAID (by the above definition) will have a data redundancy ratio (redundant i.e. parity data / total data) lower than a traditional RAID 1 (which has a data redundancy ratio of 0.5)?

(Solution will be posted later, if I can work it out)

[disclaimer: I am aware that standard RAID levels are ostensibly not how I have defined them; I adopt these definitions only in this question as it makes it easier to visualise. I am also aware that RAID 5 does not use a dedicated disk for parity, unlike RAID 3/4, and that parity data is distributed across all disk members, but let's not complicate things more just yet]

Jarudin:
Regular RAID5:
Disk 1Disk 2parity
Your setup:
Disk 1Disk 3parityDisk 2Disk 4parityparityparityparityThis means you're only using 4/9 of your disks, this is LESS than the 1/2 ratio of RAID 1.

I don't know what this means for read/write speeds but with the added software RAID it can't be good.

Is it just me or does this seem like a stupid idea?

As for the formula.
If you have X disks per level, you will use 1 disk for parity for that level.
So the formula would be something like ((X-1)/X)^N where N is the 'depth'. Very inefficient for small X and large N I'd say.
This is not the amount of parity disks but the ratio of non-parity disks. This can be derived easily.

--Jarudin--

xShadow:

--- Quote from: Jarudin on December 15, 2009, 07:31:32 AM ---Regular RAID5:
Disk 1Disk 2parity
Your setup:
Disk 1Disk 3parityDisk 2Disk 4parityparityparityparityThis means you're only using 4/9 of your disks, this is LESS than the 1/2 ratio of RAID 1.

I don't know what this means for read/write speeds but with the added software RAID it can't be good.

Is it just me or does this seem like a stupid idea?

As for the formula.
If you have X disks per level, you will use 1 disk for parity for that level.
So the formula would be something like ((X-1)/X)^N where N is the 'depth'. Very inefficient for small X and large N I'd say.
This is not the amount of parity disks but the ratio of non-parity disks. This can be derived easily.

--Jarudin--

--- End quote ---

That formula looks about right.

If you switch it around to say P=(X)^N-(X-1)^N , P should be the number of parity disks.

Works according to what Kureshii said, because 5^2-4^2=9 , and 5-4=1, which is right for the lower level.

The third level is 125-64=61... that's pretty bad. >_>

Now, I don't know what the heck this raid stuff is, because I've never really researched it, but I'm not quite understanding why that formula works. Way I see it, you have X-1 disks that aren't "parity" disks (I don't know what a parity disk is, besides that it's used for error checking and data recovery; all I know is it subtracts one from the number of disks you have 'available') in an array of X disks (according to what you said), so a level 1 raid has X-1 available disks. Soooo... why wouldn't having an X array of those X arrays just add one more parity disk for the entire array of arrays, supposing each array in the new higher-level array is just counted as one disk (in other words, have you end up with X+1 parities for a level 2 RAID)? >___>;
I guess my knowledge is lacking somewhere in that department, so I'm basically just interested; Wikipedia's not giving me any good info.

Actually...

Is it because when you get to the higher-level array, the size of a "disk" has been upgraded to the ("working") size of an entire lower-level array?

Just wondering. For some reason, I'm on track to get out of here with a degree in Comp E, and I realized I don't know that much technical crap about computers. >_>;

kureshii:
Solution:
(click to show/hide)Writing the number of disks of parity data out level by level for clarity (or if you're familiar enough with multi-level RAID setups you can use Jarudin's shortcut), we see that for each 5-element RAID5 cluster, 20% (or 0.2) of the cluster elements are used for parity.

Number of disks of parity for each RAID level (assuming the array is fully filled to capacity):

Diagramatically:
Level 1: 0000I
Level 2: 4x(0000I), IIIII
Level 3: 4x(4x(0000I), IIIII), IIIII IIIII IIIII IIIII IIIII
Where 0 indicates data, and I indicates disks with parity data. Again, note that parity data is not actually stored entirely on 1 disk in each cluster, but distributed across the entire cluster.

Formulaically:
Level 1: 0.2(5^1) = 5^0
Level 2: 0.2(5^2) + 4x0.2(5^1) = (4^0)(5^1) + (4^1)(5^0)
Level 3: 0.2(5^3) + 4x0.2(0.2(5^2) + 4x0.2(5^1)) = (4^0)(5^2) + (4^1)(5^1) + (4^2)(5^0)
Level 4: 0.2(5^4) + 4x0.2(0.2(5^3) + 4x0.2(0.2(5^2) + 4x0.2(5^1))) = (4^0)(5^3) + (4^1)(5^2) + (4^2)(5^1) + (4^3)(5^0)
...
Level N: (4^0)(5^[N-1]) + (4^1)(5^[N-2]) + ... + (4^[N-1])(5^0)

We notice that for a Level N RAID, the ratio of each successive term is 0.8, i.e. multiply the first term by 0.8 to get the second term, and so on. We can thus formulate the number of disks of parity data as a sum of geometric series, for which the solution we already know from high school.

Sum[0->n](ar^k) = a(r^[n+1] - 1)/(r-1)

Our first term, a, is just (4^0)(5^[N-1])
The number of terms, n is N-1

Substituting them in, we get
Sum[0->N-1](5^[N-1]*(0.8^k)) = 5^N(1-0.8^N), the number of parity data disks in a Level N RAID as defined in this question.

The data redundancy ratio is just (1-0.8^N). (Jarudin is right)

To get a data redundancy ratio less than a traditional RAID 1, we need (1-0.8^N) < 0.5
so 0.8^N > 0.5

By taking logarithms (or just trying out values of N) we find that N must be at most 3.
Of course, that is assuming that each RAID5 cluster consists of 5 subclusters. If one is using 3 subclusters per RAID5, that ratio would be (1-0.67^N).

In any case, with a sufficiently high number for N, you end up with mostly parity data in your array (at N=10, that's 90% parity data!). And yet it does not necessarily provide you with more reliability, since losing at least 2 disks each from at least 2 lowest-level clusters will result in your house of cards collapsing :3

Besides, filling a RAID entirely with parity data is like adding armour on top of a pile of scrap armour.

Now, calculating failure probability rates would be an interesting extension of this...

[btw, yes, we are aware that xiong is trolling, but we'll take any excuse we can get for an interesting tech discussion that isn't about formatting hard disks *ahem*]

Navigation

[0] Message Index

[#] Next page

Go to full version