Fast implementation of BWT in Lua

246 views Asked by At
local function fShallowCopy(tData)
    local tOutput = {}
    for k,v in ipairs(tData) do
        tOutput[k] = v
    end
    return tOutput
end

local function fLexTblSort(tA,tB) --sorter for tables
    for i=1,#tA do 
        if tA[i]~=tB[i] then 
            return tA[i]<tB[i]
        end
    end 
    return false 
end

function fBWT(tData)

    --setup--
    local iSize = #tData
    local tSolution = {}
    local tSolved = {}


    --key table--
    for n=1,iSize do 
        tData[iSize] = fRemove(tData,1)
        tSolution[n] = fShallowCopy(tData)
    end
    table.sort(tSolution,fLexTblSort)


    --encode output--
    for i=1,iSize do
        tSolved[i] = tSolution[i][iSize]
    end


    --finalize--
    for i=1,iSize do
        if fIsEqual(tSolution[i],tData) then
            return i,tSolved
        end
    end
    return false
end

Above is my current code for achieving BWT encoding in Lua. The issue is because of the size of the tables and lengths of loops it takes a long time to run. For a 1000 character input the average encoding time is about 1.15 seconds. Does anyone have suggestions for making a faster BWT encoding function?

the biggest slowdowns appear to be in fLexTblSort and fShallowCopy. I have included both above the BWT function as well.

1

There are 1 answers

3
Jakuje On

If I see right, your algorithm has complexity O(n^2 log n), if the sort is quicksort. The comparator function fLexTblSort takes O(n) itself for each pair of values you compare.

As I checked with my implementation from few years back, I see possible space to improve. You create all the possible rotations of the tData, which takes also a lot of time. I used only single data block and I stored only starting positions of particular rotations. You also use a lot of loops which can shrink into less.

Mine implementation was in C, but the concept can be used also in Lua. The idea in some hybrid pseudocode between your Lua and C.

function fBWT(tData)

  local n = #tData
  local tSolution = {}
  for(i = 0; i < n; i++)
    tSolution[i] = i;

  --table.sort(tSolution, fLexTblSort)
  quicksort(tData, n, tSolution, 0, n)

  for(i = 0; i < n; i++){
    tSolved[i] = tData[(tSolution[i]+n-1)%n];
    if( tSolution[i] == 0 )
        I = i;
  }

  return I, tSolved
end

You will also need your own sort function, because the standard does not offer enough flexibility for this magic. Quicksort is a good idea (you might avoid some of the arguments, but I pasted just the C version I was using):

void swap(int array[], int left, int right){
    int tmp = array[right]; 
    array[right] = array[left];
    array[left] = tmp;         
}

void quicksort(uint8_t data[], int length, int array[], int left, int right){
    if(left < right){ 
        int boundary = left;
        for(int i = left + 1; i < right; i++){ 
            if( offset_compare(data, length, array, i, left) < 0 ){
                swap(array, i, ++boundary);
            }
        }
        swap(array, left, boundary);
        quicksort(data, length, array, left, boundary);
        quicksort(data, length, array, boundary + 1, right);
    }     
}

The last step is your own comparator function (similar to your original, but working on the rotations, again in C):

/**
 *  compare one string (fixed length) with different rotations.
 */
int offset_compare(uint8_t *data, int length, int *array, int first, int second){
    int res;
    for(int i = 0; i < length; i++){
        res = data[(array[first]+i)%length] - data[(array[second]+i)%length];
        if( res != 0 ){
            return res;
        }
    }
    return 0;
}

This is the basic idea I came up with few years ago and which worked for me. Let me know if there is something not clear or some mistake.