How can I best improve the execution time of a bicubic interpolation algorithm?

Question

How can I best improve the execution time of a bicubic interpolation algorithm?

5k views Asked by neuviemeporte At 28 January 2011 at 16:34

I'm developing some image processing software in C++ on Intel which has to run a bicubic interpolation algorithm on small (about 1kpx) images over and over again. This takes a lot of time, and I'm aiming to speed it up. What I have now is a basic implementation based on the literature, a somewhat-improved (with regard to speed) version which doesn't do matrix multiplication, but rather uses pre-calculated formulas for parts of the interpolating polynomial and last, a fixed-point version of the matrix-multiplying code (works slower actually). I also have an external library with an optimized implementation, but it's still too slow for my needs. What I was considering next is:

vectorization using MMX/SSE stream processing, on both the floating and fixed-point versions
doing the interpolation in the Fourier domain using convolution
shifting the work onto a GPU using OpenCL or similar

Which of these approaches could yield greatest performance gains? Could you suggest another? Thanks.

Original Q&A

There are 4 answers

denis On 04 February 2011 at 16:55

Not an answer for bicubic, but maybe an alternative:
if I understand you, you have 32 x 32 xy, 1024 x 768 image, and want interpolated image[xy].
Just rounding xy, image[ int( xy )], would be too grainy.
But wait — you could make a smoothed double image 2k x 1.5k, once, and take
image2[ int( 2*xy )]: less grainy, very fast. Or similarly,
image4[ int( 4*xy )] in a smoothed 4k x 3k image.
How well this works depends on ...

Chris O On 28 January 2011 at 16:56

There's the Intel IPP libraries, which use SIMD internally for faster processing. The Intel IPP also uses OpenMP, if configured, you can gain benefit of relatively easy multiprocessing.

These libraries do support bicubic interpolation and are payware (you buy a development license but redistribs are free).

Krypes On 28 January 2011 at 17:00

Be careful with going the GPU route. If your convolution kernel is too fast, you're going to end up being IO bound. You won't know for sure which is the fastest unless you implement both.

GPU Gems 2 has a chapter on Fast Third-Order Texture Filtering which should be a good starting point for your GPU solution.

A combination of Intel Threading Building Blocks and SSE instructions would make a decent CPU solution.

**detunized** · Accepted Answer · 2011-01-28T16:37:59+00:00

I think GPU is the way to go. It's probably the most natural task for this type of hardware. I would start by looking into CUDA or OpenCL. Older techniques like simple DirectX/OpenGL pixel/fragment shaders should work just fine as well.

Some links I found, maybe they could help you:

TechQA.

How can I best improve the execution time of a bicubic interpolation algorithm?

There are 4 answers

Related Questions in C++

Related Questions in PERFORMANCE

Related Questions in IMAGE-PROCESSING

Related Questions in INTERPOLATION

Related Questions in BICUBIC

Popular Questions

Popular Tags

Trending Questions