I try to analyse the scaling behaviour of a C++-program that I have parallelised with Intel OpenMP and the Intel Composer XE 2014. When I run a "Advanced Hotspot Analyses", I get as a result, that a library function called "kmp print storage map gtip" consumes the second longest part of the total runtime. I googled for the meaning of this routine, but I didn't get results. Is this routine related to the std::map datastructures, that I am using in this part of the algorithm? Thanks in advance!
EDIT Now I removed one barrier and could speedup everything. But now a new Hotspot comes into play. Suddenly When I do a Locks & Wait analysis I have at the first position "OMP Join Barrier mkl_blas_daxpy_omp:115" and "OMP Join Barrier mkl_blas_dcopy:155"". But I don't call any mkl routine explicitly. How can I investigate this further?
__kmp_print_storage_map_gtid
gets called whenever the environment variableKMP_STORAGE_MAP
is set totrue
orverbose
. It prints to the standard error stream the location of various objects used by the OpenMP runtime library. As I/O operations are slow in general, it is not surprising that it takes a lot of your program's execution time, especially when it comes to short test cases.Since
KMP_STORAGE_MAP
is undocumented and its default value isfalse
, it is safe to assume that it is there only to be used in special cases by other tools, e.g. by VTune while doing hotspot analysis. When your program runs normally, the function won't get called at all.