Burrows-Wheeler Transform (BWT) - Stored Data

Question

Burrows-Wheeler Transform (BWT) - Stored Data

399 views Asked by Macro At 03 May 2013 at 17:52

After using BWT, which set of data do we need in the encoded data? Do we need to encode (or export) the Suffix Array?

Input:

stackoverflow

BWT Output:

wtavrcfkle$soo

Suffix Array:

13, 2, 3, 7, 9, 4, 10, 5, 11, 8, 0, 1, 6, 12

Original Q&A

There are 5 answers

flanglet On 04 December 2013 at 06:26

To be clear, the suffix array and the BWT output are the same thing. If you look at the suffix array in your example, it contains the indexes of the letters in the BWT output taken from the BWT input (starting with 1): 13 -> w, 2 -> t, 3 -> a, etc... Using a suffix array is just a mechanism to calculate the output of the BWT in linear time. Transmitting the suffix array or the BWT output means transmitting the same information.

rob mayoff On 03 May 2013 at 18:09

All you need to invert the transform is the output string (wtavrcfkle$soo in your example).

Peter de Rivaz On 03 May 2013 at 18:09

You only need to transmit the BWT output.

The surprising thing about this transform is that the original string can be reconstructed from just the permuted output string.

The wikipedia article contains example code for doing this inverse.

Note that the normal mode of operation is to use run length coding to encode the BWT output before transmission (or you have not achieved any compression).

The nice thing about the transform is that it tends to produce long runs of similar characters (if there is structure in the source material) and so the run length coding works well.

comingstorm On 03 May 2013 at 18:18

To reverse the BWT, you only need the index of the original last character, not the entire suffix array. If you don't have this index, I believe choosing an arbitrary index will result in a rotated version of your original string.

Note that, if you include an end-of-line code (as in your example), the original last character is obvious, so the index doesn't need to be provided separately...

**richselian** · Accepted Answer · 2013-05-03T18:19:08+00:00

richselian On 03 May 2013 at 18:19 BEST ANSWER

suffix array is only needed to compute bwt transform, after transform done it can be dropped away.

BWT("stackoverflow")="wtavrcfkle$soo"

UNBWT("wtavrcfkle$soo")="stackoverflow"

You can also restore the suffix array from transformed output if you like:)

TechQA.

Burrows-Wheeler Transform (BWT) - Stored Data

There are 5 answers

Related Questions in ALGORITHM

Related Questions in BURROWS-WHEELER-TRANSFORM

Popular Questions

Popular Tags

Trending Questions