Decompressing large files using multiple computers

41 views Asked by At

We have to deal with extracting gzip/bzip files over the internet, sometimes they are way over multiple gigabytes (eg. 15gb wiki dump).

Is there a way that those can be extracted by multiple computers instead of by one? Perhaps reading the header plus the bytes between X and Y by each node in the cluster, writing it into a shared folder?

Or any other way that can accelerate that process?

1

There are 1 answers

1
alephnerd On

Have you considered using a parallelized alternative to gzip/bzip?

In the scenario that you are using bzip, pbzip2 is a parallelized alternative using pthreads to speedup download. In addtion, a parallel alternative to gzip is pgzip.