I'm studying some codes in Java (SOR algorithm and LU factorisation). The main goal is to study the impact of executing such algorithms in a NUMA aware architecture. I already found some tools such as numactl, and other affinity environment variables. Such as: GOMP_CPU_AFFINITY (GCC) and KMP_AFFINITY (ICC) to pin threads to cores using the same algorithms in C. However I don't know what alternatives I have for studying NUMA in Java. For Java I only use numactl with performance gains using --interleave=all flag but I don't really have control about what is happening in a JVM level.
I found another tool called numastat which is supposed to measure "NUMA counters" in a NUMA architecture and know the allocations which were "hit" (numa_hit) and "miss" (numa_miss) in a NUMA-Node. However I'm not sure how can I use it to measure this counters with my Java application. What kind of tests (and programming techniques) should I perform in order to study the impact of NUMA in Java applications?
Thanks for your help.