Range-coders: How to get rid of divisions?

Question

Range-coders: How to get rid of divisions?

392 views Asked by fuz At 02 August 2012 at 22:57

I am trying to optimize the QTC video codec to work on the Raspberry Pi with a decent performance. One important bottleneck is a 32 bits integer division done in the range decoder which takes into account for 18% of the decoding time. Since the device's ARM processor apparently lacks an integer division instruction, I think one could easily optimize this. The division has to be accurate.

Both the dividend and the divisor in that particular division are different each call, but it is known that the divisor is always smaller than 65536. I thought about building a lookup table of inverse divisor values. Using that table I could use a multiplication instead of the division. The lookup table is going to have a size of 256 kibibytes.

Questions

Is it a good idea to perform that optimization?
Are there better ways to get rid of the software division?
Is there a different way to implement the algorithm such that there is no division?
Other ideas?

Original Q&A

There are 2 answers

fuz On 03 August 2012 at 01:08

One can also take advantage of the fact, that the Raspberry Pi includes a FP unit that is capable of performing a double precision FP division, which is faster than the software emulation of the integer division. Replacing all integer divisions a = b / c with a = (double)b / (double)c works for me.

**vulture** · Accepted Answer · 2012-08-03T00:32:15+00:00

If you want to use a magic multiplication + LUT, here's some code.

Simple tester testing random divisors i. Did not exhaustively test all i's, but has worked for the short period I ran it. Seems to work on all 32-bit states of the dividend (j=0..2^32-1) for the i's I tested.

In reality, you would precompute a lookup table for i=2..64k-1 or some range like that (i=0 wont work because value/0 is undefined and i=1 wont work because the magic multiplier for that is just outside the range of a 32-bit number). Then use the equation using i as your lookup index to get the magic multiplier 'm'. Change as needed & don't hate on my style. :P

#include <stdio.h>

int main() {
  unsigned int i,j,k,m,c;

  // compute j/i,
  // compute k = 2^32/i
  // instead of j/i, use m = ~(j*k)>>32
  srand(time(0));
  for(c=0;c<64;c++) {
    // generate random divisor i's for testing, then fully test every j
    i = rand()&0x7fff;      
    // precompute these and put into a lookup table, index on [i]
    k = (((__int64)1)<<32)/i;  
    for(j=0;j!=-1;j++) {
      // status updater so we know it's working...
      if(!(j&0xfffff)) { printf("%d : %d     \r", i, j); fflush(0); }    
      // multiply instead of divide!
      m = (((__int64)j*k)+k/2)>>32; 
      // rare fixup
      if(j - m*i >= i) m++;                          
      if(m != j/i) {
        // as long as this line doesn't print, we're ok
        printf("wrong : %d %d %d   got: %d  should be: %d\n", 
            i, j, k, m, j/i);    
      }
    }
  }
}

TechQA.

Range-coders: How to get rid of divisions?

Questions

There are 2 answers

Related Questions in OPTIMIZATION

Related Questions in ARM

Related Questions in INTEGER-DIVISION

Related Questions in RASPBERRY-PI

Related Questions in RANGE-ENCODING

Popular Questions

Popular Tags

Trending Questions