FFT using C++ fixed-point for optimizing performance for ARM devices

Question

FFT using C++ fixed-point for optimizing performance for ARM devices

2k views Asked by Jav_Rock At 17 May 2012 at 15:22

I am using OpenCV DFT in mobiles and tablets, let's say ARM devices. The codes are in C++. I was expecting to be able to optimize FFT performance by using ARM registers and fixed point arithmetics, but I only manage to get double time than OpenCV, not even the same time.

I use RADIX-4 256-point FFT.

Does anybody know what OpenCV does and why is it so difficult to optimize? Which is the fastest FFT algorithm for ARM devices? radix-4, radix-8, 256 points, 1024...

Original Q&A

There are 1 answers

**Dan Hulme** · Accepted Answer · 2012-06-28T01:23:49+00:00

The implementation of OpenCV uses device-specific optimizations on Tegra, Tegra 2, and Tegra 3 devices. On Tegra and Tegra 2 the implementation is parallelized and some operations use GLSL shaders to accelerate on the GPU; on Tegra 3 it also uses NEON SIMD instructions for vectorizing some operations on CPU, and CUDA for even better GPU performance. Given that NVidia leant manpower to the optimization effort, using their in-depth knowledge of the platform, outperforming it for more than the odd uncommon operation would probably be a big task.

This article is mostly Tegra 3 specific, but talks a lot about the kind of techniques they used and the performance speedup they got over optimized but device-independent code.

TechQA.

FFT using C++ fixed-point for optimizing performance for ARM devices

There are 1 answers

Related Questions in C++

Related Questions in ARM

Related Questions in FFT

Related Questions in RADIX

Related Questions in TEGRA

Popular Questions

Trending Questions