I have two physical, "identical" Linux RedHat servers. I ran a small program on both of them. My problem: the CPU usage of my program varies between both servers. I am not a Linux expert. I am wondering what could lead to that performance difference?
I wrote the program in C++ and in java to see if the inconsistency comes from the programming language chosen. The program itself does a little bit of integer calculation over time to consume a constant amount of CPU time. Both program versions have the same percentual CPU usage difference.
The environmental variables I have already thought of and could be excluded:
- identical server type
- identical processor (both have two sockets, single core)
- both Intel Hyper-Threading-Technology enabled
- identical clock speed
- identical OS version (Red Hat Enterprise Linux Server release 5.9)
- identical Java version, Java RE, JVM
- Intel Demand based Switching can be ignored since the measurement tool uses the default value of clock speed for CPU capacity
- processor affinity can be excluded as well I think. I ran multiple measurement series and I always retrieve exactly the same CPU usage values.
Is there maybe a C library or something like that, that has an impact on the CPU usage of C++ and Java programs which needs to be updated separately from the actual OS version? Or could there be a different thread scheduler?
There are a variety of things that can differ even for "identical" systems. Different compilers being used to build various libraries, as well as different versions of compilers. For example, there are continuous improvements from generation to generation of the ability of Intel compilers to optimize. Other differences can occur due to airflow differences causing one machine to run hotter than the other resulting in a drop in frequency occasionally. There are a whole host of other issues that can cause identical systems to run differently.
Here's my recommendation: Create an OS image and use that same image for both systems. Disconnect both from any network. Run compute bound (which you are). Bind your app to a certain core. Verify the exit air temperatures are well within specification. Disable any turbo capability. If there are still differences, do a memory speed check.
Also, use a more sophisticated profiling and analysis tool such as Intel Vtune. You can dig into actual cycles, measure cache misses, branch mispredicts, etc. They should also be identical. If they aren't, the analysis should give you an idea of where the problem lies.