My program runs on the android device, and the device is ARM system with NEON supported.
At first I used libjpeg to compress the RGB image(800*480) to jpeg. The speed was about 70ms for per image, but it was too slow for me. Later I found the libjpeg-turbo, seems it can improve the compressing speed with the NEON in ARM.
But after compiling and testing, I found their compressing speed almost the same. And the change of the quality and flag passed to tjCompress2 also took no effect. I have no idea whether something is wrong or something is missing in my program. Codes below :
tjhandle _jpegCompressor = tjInitCompress();
tjCompress2(_jpegCompressor, (unsigned char*)in, PARAM_WIDTH,
PARAM_WIDTH*PERSIZE, PARAM_HEIGHT, PERSIZE,
(unsigned char**)&out, (long unsigned int*)outlen, TJSAMP_444, 100,
TJFLAG_FASTDCT);
tjDestroy(_jpegCompressor);
The jpeg buffer(out) is allocated and freed by myself.
The version of libjpeg-turbo I use is 1.4.2
As far as I know libjpeg-turbo has SIMD, SSE2, MMX instructions for x86 processor. I've looked at some of the assembly code and I didn't see any code for other types of CPU architectures.
I'm surprised it even worked. I think that it (the library) preserves the original code, that would explain why it was able to even run.
If you're looking for optimizations, you may want to look at optimizations you can do with the libjpeg itself. There are several documentation files, one specifically has instructions for optimizing on the ARM processor. You can also tweak the memory manager. You'll find a lot more information there, than what I can type here.