How do I know,armadillo is working with openblas on my ARM?
Background
I am writing a program with armadillo, and I install openblas, and using cross compiler to compile it. I wish program with armadillo and openblas is faster than with only armadillo. But they running the same time. So, on my armv7 board ,how do I know armadillo is working with openblas?
Environment
PC:ubuntu16.04 cross compiler: arm-linux-g++ borad: ARM v7
Compiler command:
- only with armadillo:
arm-linux-g++ -mtune=cortex-a7 -std=c++11 -I/home/sgks/SGKS6802_LinuxSDK/sdk_build/system/01_software/armadillo-8.100.0_install/include -L/opt/sgks/rootfs/usr/lib -L/home/sgks/SGKS6802_LinuxSDK/sdk_build/system/01_software/armadillo-8.100.0_install/lib -o ./cmake-build-debug/armadillo_test ./main.cpp -larmadillo -O3
2. with armadillo and openblas:
arm-linux-g++ -mtune=cortex-a7 -std=c++11 -I/home/sgks/SGKS6802_LinuxSDK/sdk_build/system/01_software/armadillo-8.100.0_install/include -I/home/sgks/SGKS6802_LinuxSDK/sdk_build/system/01_software/OpenBLAS-0.2.20-install-arm/include -L/home/sgks/SGKS6802_LinuxSDK/sdk_build/system/01_software/armadillo-8.100.0_install/lib -L/home/sgks/SGKS6802_LinuxSDK/sdk_build/system/01_software/OpenBLAS-0.2.20-install-arm/lib -o ./cmake-build-debug/armadillo_test ./main.cpp -DARMA_DONT_USE_WRAPPER -lopenblas -O3
test code: `
include
include
using namespace arma; int main() { clock_t start, stop; double dur; fmat weigth_layer1(2801, 2642,fill::randu); fmat weigth_layer2(2643, 2645, fill::randu); fmat weigth_layer3(2646, 2527, fill::randu); fmat weigth_layer4(2528, 607, fill::randu);
fmat input(1, 2801, fill::randu); fmat layer1_output(1,2643); fmat layer2_output(1,2646); fmat layer3_output(1,2528); fmat layer4_output(1,607); //************************* layer1 ************************* start = clock(); layer1_output(0,span(0,2641)) = input * weigth_layer1; layer1_output(0,2642) = 1.0; //bias layer1_output.elem(find(layer1_output < 0 )) -= layer1_output.elem(find(layer1_output < 0 )); //Relu cout << "layer1: " << layer1_output.n_cols << endl; //************************* layer2 ************************* layer2_output(0,span(0,2644)) = layer1_output * weigth_layer2; layer2_output(0,2645) = 1.0; //bias cout << "layer2: " << layer2_output.n_cols << endl; layer2_output.elem(find(layer2_output < 0 )) -= layer2_output.elem(find(layer2_output < 0 )); //Relu //************************* layer3 ************************* layer3_output(0,span(0,2526)) = layer2_output * weigth_layer3; layer3_output(0,2527) = 1.0; cout << "layer3: " << layer3_output.n_cols << endl; layer3_output.elem(find(layer3_output < 0 )) -= layer3_output.elem(find(layer3_output < 0 )); //Relu //************************* layer4 ************************* layer4_output = layer3_output * weigth_layer4; cout << "layer4: " << layer4_output.n_cols << endl; stop = clock(); dur = stop - start; printf("time : %f\n", dur / CLOCKS_PER_SEC); return 0;
}
`
PS:code format is wrong, sry for that, stackoverflow doesnt support markdown?
If your OpenBlas is installed correctly and the lib path is correct it should be the one used. You can view the config in Armadillo:
Another way to test is to disable BLAS when you compile and compare the performance (in this case it will use an emulated function). NB! Add this define before you include armadillo.