I am doing performance analysis on linux for large scale programs which is memory driven(tens of Gigabytes memory).
I am thinking if it's possible to config linux/hardware to be more suitable to run such kind of large programs. But I am not familiar with this side.
Anybody have points about how to config
- memory allocation strategy of OS
- cache config for CPU
- else...
Any comment is appreciated..
This is the typical CPU model (4 Opteron processors each has dual core):
processor : 3
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 2218
stepping : 2
cpu MHz : 2600.000
cache size : 1024 KB
physical id : 1
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 5200.09
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
Useful for investigating memory / caching on a multi-socket system:
hwloc's
lstopo
(example):numactl / libnuma (but only if it really is a NUMA system)
sysfs
,procfs
: