I want to use IPython's MPI abilities with distributed computing. Namely I would like MPI to be run with a machine file of sorts so I can add multiple machines.
EDIT:
I forgot to include my configuration.
Configuration
~/.ipython/profile_default/ipcluster_config.py
# The command line arguments to pass to mpiexec.
c.MPILauncher.mpi_args = ["-machinefile ~/.ipython/profile_default/machinefile"]
# The mpiexec command to use in starting the process.
c.MPILauncher.mpi_cmd = ['mpiexec']
Bash Execution
$ dacluster start -n20
2015-06-10 16:16:46.661 [IPClusterStart] Starting ipcluster with [daemon=False]
2015-06-10 16:16:46.661 [IPClusterStart] Creating pid file: /home/aidan/.ipython/profile_default/pid/ipcluster.pid
2015-06-10 16:16:46.662 [IPClusterStart] Starting Controller with MPI
2015-06-10 16:16:46.700 [IPClusterStart] ERROR | IPython cluster: stopping
2015-06-10 16:16:47.667 [IPClusterStart] Starting 20 Engines with MPIEngineSetLauncher
2015-06-10 16:16:49.701 [IPClusterStart] Removing pid file: /home/aidan/.ipython/profile_default/pid/ipcluster.pid
Machinefile
~/.ipython/profile_default/machinefile
localhost slots=8
aidan-slave slots=16
I might mention that it works when I run
mpiexec -machinefile machinefile mpi_hello
And the output of that execution includes hostnames, so I am sure it is actually distributing. Plus I watch on top.
Thank you,
I guess I asked too soon. the problem was in the below line
It should have been split on the spaces with absolute path
I hope this can help someone. Note that this solves only the problem in the BASH output. The connection is made with MPI to a remote server (namely aidan-slave). If start the dacluster, then I see in top a bunch of python sessions start, symptomatic of a IPython session running remotely.
Unfortunately, DistArray examples, at least pi_montecarlo, hang indefinitely. I worked back to the source of the issue and found that the line that is hanging in line 736 in the context.py file of the globalapi module in distarray.
I think this is a symptom of a broken or bad MPI connection because the line seems to want to execute a command on all the slaves processes. I don't know how to fix it.