I'm running some small MPI jobs across nodes in a computer lab at my university. There's no queuing system installed, so I have to generate MPI hostfiles myself each time I want to run a job, then run them like so:
mpirun --hostfile mpi_hostfile -n 32 ./mpi_program
I use Open MPI, so right now my hostfiles look something like this:
localhost slots=4
hydra13 slots=4
hydra14 slots=4
hydra2 slots=4
hydra22 slots=4
hydra24 slots=4
hydra26 slots=4
hydra1 slots=4
My question is this: each of the nodes has an Intel® Core™ i7-3770 processor, which is quad-core, but also hyper-threaded. What's considered best practice for Open MPI hostfiles where hyperthreading is concerned? Should I list four or eight slots for each node?
Thanks.
It depends on your usage. You'll probably want to do some experiments with lots of configurations, but usually what people do if they are using MPI+OpenMP (I'm assuming you meant OpenMP the threading library. Not Open MPI, the MPI library even though your question is tagged OpenMPI.) is to have one MPI process per node and one OpenMP thread per core. I'm not sure how hyperthreading weighs in here, but that's the usual practice.
If, indeed, you mean Open MPI everywhere you mentioned OpenMP, then it's different. If you're only using MPI processes, then usually, people use one MPI process per core.
In the end, you'll need to test out your application with a range of setups and see which is fastest for your machines and your application. There is no silver bullet.