Parity

From Hackepedia
Revision as of 10:56, 28 March 2006 by Franks (talk | contribs) (removed image link, need to re-add)
Jump to navigationJump to search

Parity is a sort of checksum of data that allows reconstruction of data that was lost. RAID, RAM and multicasting protocols make use of parity.

XOR Parity

One simple way to implement parity is to use XOR. Consider a RAID 5 where you have three disks with parity on one of the three disks. Consider that a byte on disk one has a value of 43 (hexadecimal) and a byte on disk two has a value of 98 (hexadecimal), the parity on disk three is the XOR'ed product of the prior two. 0x43 XOR 0x98 = 0xDB. Now pretend you lose disk two, you just XOR the parity with the remaining good disks and you reconstruct what disk two was. So: 0xDB XOR 0x43 = 0x98. Same concept if you had four disks, let's pretend it has a byte value of 15 (hexadecimal), So: 0x43 XOR 0x98 XOR 0x15 = 0xCE. If you lose disk one you XOR the XOR'ed product of the remaining disks with the parity so: (0x98 XOR 0x15) XOR 0xCE = 0x43.

Parity based on Polynomials

RAID 6 uses the Reed-Solomon method for computing parity. We don't know anything about math so you won't see much explanation here.

Here is the homebrew method of a commoner:

(My strong point isn't math so you may spot problems with this method, if so please let me know) A RAID array can use the Quadratic Formula to achieve RAID 6 functionality. For 32 disks and each writing one byte all there is needed is 6 bytes of Parity for the checksums. With this we can compute the values of 2 missing disks. Here is how it's done. Loop through all the disks (0 through 31) and do some work. Take the sequence of the disk being worked on and leftshift it by 9 bits, then fill the first 9 bits with the value of the byte of the disk plus one. The extra one is needed because if the value is 0 then our method will not work. Square the result and add it to a 32 bit number which we'll call S. Also add the value of the byte plus one to a 16 bit number which we'll call C. Loop until all disks are done. Now write the 16 bit number called C and the 32 bit number called S as the parity (6 bytes). If we lose one disk, we can recreate the value of its byte with the value of C and doing subtraction against the recomputed sum of all the good disks. If we lose two disks we do some computation. First we count up all the squares like before on all the good disks. We keep the difference of S and the squares we just counted up, this value shall be called A. We also count up the values of all the good disks and keep the difference of what we had stored in C.


We calculate how much longer this difference would have to be with the 9 bit leftshifted positions of the bad disks. This value shall be called B. Given this information we can devise a formula for calculating the values including the positional information of the disk and we'll call this value X.

Here is the solution:

A = X^2 + (B - X)^2

If we factor this we get:

A = X^2 + B^2 - BX - BX + X^2

equals

A = 2X^2 - 2BX + B^2

equals

2X^2 - 2BX + B^2 - A = 0

We can stick this into the quadratic formula now to solve for the value of X:

The quadratic formula will give us 2 answers we check which one of those causes the equation 2X^2 - 2BX + B^2 - A = 0 to hold true, and if it matches we have the values of one missing disk. If the first disk is D1, the second is D2. We can find the value of D2 the following way: D2 = B - D1.

Here is how a sample program utilising this method looks like (raid6.c).

With 33 disks the ratio of parity to data is best at 18.75%, so if you have 33 disks with a capacity for 250 Gigabytes, you'd have 1547 Gigabytes allocated to Parity and 6703 Gigabytes for your data. Reed-Solomon can surely beat that, but this method is understandable for the common drop-out.

Note: This can be compacted a bit. The extra bit with the value of the data isn't needed saving us a lot of space. Then the sum of all the values can be compacted into a CRC of 10 bits (instead of 16) giving us savings in the parity. Also coincidentally we can save the two most significant bits of the sum of the squared values, giving us a total saving of 8 bits or exactly one byte off the parity. This gives us 5 bytes parity for every 32 bytes data, still ugly and nowhere near as good as reed-solomon but better than 6 bytes. That's a savings of over 3% on the parity of before. Some would prefer an XOR parity every 6 bytes than be able to reproduce two bytes with a checksum of 5 bytes within a 37 byte block of data. Of course being able to reproduce 2 lost bytes adjacent to each other can also be a good thing.