I am developing Android x86 based frameweork for Intel Atom Processor. I have implemented the entire framework, but I am facing problems with the SIMD implementation for my code. When I run the basic C code, it gives a considerable performance same on the emulator as well as the hardware, however, when I enable the intrinsics option for the code, there is no actual gain but a negligible loss in performance. I have run my code on Intel i7 processor, there s approximately 200% gain. I certainly take into consideration the frequency & number of cores that a PC and a tablet utilizes but still there should be some gain when I enable SIMD code on the Android framework. Possible problems which I have analyzed so far:
1) Local C flags(can anyone suggest suitable C flags for Intel Atom Processor).
2) Is it advisable to use .so file instead of the source code in the framework.
3) Suitable NDK for Intel Atom, I am using 4.8.
4) Optimization level should be set to O2 or O3.
If there are any other reasons that may hinder the performance, please let me know. Thank you in advance.
All the Intel Atom platforms support at least SSSE3.
To know what the compiler has been able to vectorize, you can use
-ftree-vectorizer-verbose
flag.1) You can compile your code using
-mtune=atom -mssse3 -mfpmath=sse
to fully use SSSE3, including for FP maths. (When compiling in 32bits, mfpmath is set to 387 by default which is a lot slower.)it's safer to provide only up to SSSE3 code for the x86 ABI. If you only need to support specific platforms, 64bits Atom all supports SSE4.2, to optimize for these you can use
-mtune=slm -msse4.2 -mfpmath=sse
2) I'm not sure if I understand your question 2), but if you're using a precompiled .so file, it will not be further optimized when you're compiling code linked to it.
3) The latest NDK is usually the best, current version is r9d. GCC 4.8 brings also a lot of performance optimizations compared to the default GCC 4.6, you can use it by setting
NDK_TOOLCHAIN_VERSION:=4.8
inside Application.mk4) -O3 is quite safe and brings more performance, you should use it.