Python erasure coding library that can handle larger strings?

246 views Asked by At

I'm looking for an python erasure coding library that works for larger inputs. So far I've checked out:

  • unireedsolomon: fails for 256-byte inputs, unmaintained
  • reedsolo/reedsolomon: fails for a 300-byte input silently.
  • Reed-Solomon clearly a learning project, bug tracker disabled
  • pyeclib: fails for 100-byte input using reed-solomon encoding, and doesn't seem to provide any documentation on valid parameters, such that I couldn't figure out how to test other algorithms (nor does liberasurecode)

I want something that can handle n=10,000 k=2,000 or so, ideally larger.

1

There are 1 answers

6
rcgldr On

Only the field polynomial has to be prime or primitive, not the generating polynomial. If you wanted a RS(10000, 8000, 2000) (n = 10000, k = 8000, n-k = 2000) code, GF(2^16) with primitive reducing polynomial x^16 + x^12 + x^3 + x^1 + 1 could be used. The generating polynomial would be of degree 2000. Assuming first consecutive root is 2, then generating polynomial = (x-2)(x-4)(x-8) ... (x-2^2000) (all of this math done in GF(2^16), + and - are both xor). Correction would involve generating 2000 syndromes and using Berlekamp Massey or Sugiyama's extended Euclid decoder. I don't know if there are Python libraries that support GF(2^16).

https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Berlekamp%E2%80%93Massey_decoder

https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Euclidean_decoder

Large n and k can be avoided by interleaving. Tape drives like LTO treat large data blocks as matrices interleaved across rows (called C1) and down columns (called C2) using GF(2^8). LTO-8 uses RS(249,237,13), which I assume is the ECC used down columns to correct rows. With 32 read|write heads, there's an interleave of 32, probably across rows. I don't know what the RS() code is across rows, or what the interleave down columns is.