MPI Hostfiles with Hyperthreading

1.6k views Asked by At

I'm running some small MPI jobs across nodes in a computer lab at my university. There's no queuing system installed, so I have to generate MPI hostfiles myself each time I want to run a job, then run them like so:

mpirun --hostfile mpi_hostfile -n 32 ./mpi_program

I use Open MPI, so right now my hostfiles look something like this:

localhost slots=4
hydra13 slots=4
hydra14 slots=4
hydra2 slots=4
hydra22 slots=4
hydra24 slots=4
hydra26 slots=4
hydra1 slots=4

My question is this: each of the nodes has an Intel® Core™ i7-3770 processor, which is quad-core, but also hyper-threaded. What's considered best practice for Open MPI hostfiles where hyperthreading is concerned? Should I list four or eight slots for each node?

Thanks.

2

There are 2 answers

1
Wesley Bland On

It depends on your usage. You'll probably want to do some experiments with lots of configurations, but usually what people do if they are using MPI+OpenMP (I'm assuming you meant OpenMP the threading library. Not Open MPI, the MPI library even though your question is tagged OpenMPI.) is to have one MPI process per node and one OpenMP thread per core. I'm not sure how hyperthreading weighs in here, but that's the usual practice.

If, indeed, you mean Open MPI everywhere you mentioned OpenMP, then it's different. If you're only using MPI processes, then usually, people use one MPI process per core.

In the end, you'll need to test out your application with a range of setups and see which is fastest for your machines and your application. There is no silver bullet.

0
Maxim Masiutin On

You can run the --use-hwthread-cpus command line parameter for mpirun.

In this case, Open MPI will consider the processor to be a thread provided by hyperthreading. Otherwise, it considers a processor to be a CPU core, which is the default behavior.

For example, in the Xeon Phi (Knights Landing Microarchitecture), each core has four hyperthreaded threads instead of two. Therefore, if you run Open MPI on Xeon Phi with --use-hwthread-cpus, it will allocate four Open MPI processors for each core.

When using this option, Open MPI will refer to the threads provided by Hyper-Threading as "hardware threads". With this technique, you will not oversubscribe, and if some Open MPI processors will run on a virtual machine, it will use the correct number of threads assigned to that virtual machine.