After using BWT, which set of data do we need in the encoded data? Do we need to encode (or export) the Suffix Array?
Input:
stackoverflow
BWT Output:
wtavrcfkle$soo
Suffix Array:
13, 2, 3, 7, 9, 4, 10, 5, 11, 8, 0, 1, 6, 12
After using BWT, which set of data do we need in the encoded data? Do we need to encode (or export) the Suffix Array?
Input:
stackoverflow
BWT Output:
wtavrcfkle$soo
Suffix Array:
13, 2, 3, 7, 9, 4, 10, 5, 11, 8, 0, 1, 6, 12
To be clear, the suffix array and the BWT output are the same thing. If you look at the suffix array in your example, it contains the indexes of the letters in the BWT output taken from the BWT input (starting with 1): 13 -> w, 2 -> t, 3 -> a, etc... Using a suffix array is just a mechanism to calculate the output of the BWT in linear time. Transmitting the suffix array or the BWT output means transmitting the same information.
You only need to transmit the BWT output.
The surprising thing about this transform is that the original string can be reconstructed from just the permuted output string.
The wikipedia article contains example code for doing this inverse.
Note that the normal mode of operation is to use run length coding to encode the BWT output before transmission (or you have not achieved any compression).
The nice thing about the transform is that it tends to produce long runs of similar characters (if there is structure in the source material) and so the run length coding works well.
To reverse the BWT, you only need the index of the original last character, not the entire suffix array. If you don't have this index, I believe choosing an arbitrary index will result in a rotated version of your original string.
Note that, if you include an end-of-line code (as in your example), the original last character is obvious, so the index doesn't need to be provided separately...
suffix array is only needed to compute bwt transform, after transform done it can be dropped away.
You can also restore the suffix array from transformed output if you like:)