I have a trained tensorflow.keras model. I'm loading my model and doing inference from my C code on CPU on Ubuntu 18.04. For performance reasons, I'm comparing different builts of Tensorflow.
The first built I have is the precompiled version that I downloaded from https://www.tensorflow.org/install/lang_c.
https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-linux-x86_64-2.4.0.tar.gz
Then I built the tensorflow 2.4 from the source following the installation procedure here. I used the defaults in ./configure. Then I ran,
bazel build --config=opt //tensorflow/tools/lib_package:libtensorflow
Then I untar the generated file below and do the necessary exports (I didn't untar into /usr/local),
~/tensorflow/bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz
Lastly, I rebuilt the tensorflow 2.4 from the source, again used the defaults in ./configure. Then I ran,
bazel build --config=mkl --config=noaws --config=nogcp --config=nohdfs -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 //tensorflow/tools/lib_package:libtensorflow
From what I read, this command should build tensorflow with intel-mkl support and with avx, avx2, fma, sse4.1 and sse4.2 instruction sets. I checked that my CPU supports those instructions. Then I untar the generated file and did the exports.
The performance results show that precompiled library is nearly twice as fast than the ones I built from the source. What am I doing wrong here, I couldn't find any other way other than using
--config=mkl -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2
Building with these options didn't show any change in performance. Is there a way to learn with which flags the precompiled versions were built, how can I reproduce it?
Thanks,