I'm working on a code that work with Epiphany processor (http://www.parallella.org/) and to run Epiphany codes i need sudo privileges on host side program. There is no escape from sudo!
Now i need to run this code across several nodes, in order to do that i'm using mpi but mpi wont function properly with sudo
#sudo mpirun -n 12 --hostfile hosts -x LD_LIBRARY_PATH=${ELIBS} -x EPIPHANY_HDF=${EHDF} ./hello-mpi.elf
Even a simple code that does node communication does not work. The ranks comes 0 if i use sudo. Communication between threads works but not across nodes. This is important because i wanted to divide the work load properly across the cards.
here is the simple code
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Hello World from MPI Process %d on machine %s\n", rank, processor_name);
MPI_Finalize();
}
This code should spit out the rank number differently across the nodes but it does not work with sudo
Any help on this would be great
Here is the output from running the above code without sudo.
mpirun -n 3 --hostfile $MPI_HOSTS ./mpitest
output:
Hello world from processor work1, rank 1 out of 3 processors
Hello world from processor command, rank 0 out of 3 processors
Hello world from processor work2, rank 2 out of 3 processors
This is as expected.
Here is the output from running the above code with sudo.
sudo mpirun -n 3 --hostfile $MPI_HOSTS ./mpitest
output:
Hello world from processor command, rank 0 out of 1 processors
Hello world from processor work1, rank 0 out of 1 processors
Hello world from processor work2, rank 0 out of 1 processors
This is not.
Edit:-
I think @Hristo Iliev got the right answer but I'm not going to be able to test this out
Short answer: instead of
sudo mpirun -n 12 ... ./hello-mpi.elf
, the command should be:For that to work properly, you have to modify the
sudo
configuration (viavisudo
) on all hosts and enable passwordless operation for your user:This entry will allow your user to run
sudo mpirun
without first authenticating yourself, which is important since only the standard input of rank 0 is redirected. It will also allow you to executesudo
with the-E
option in order to allow it to pass the special Open MPI variables (OMPI_...
) to the executable (without those variables in the environment, the executables cannot connect to each other and instead run as singletons).Long answer: Running
mpirun
withsudo
results in the former being executed with effective userroot
. The waympirun
creates an MPI job is by first launching the requested number of executables and then waiting for them to get to know each other during theMPI_Init
call. Depending on the content of the host list file,mpirun
either spawns a child process (for host entries that match the hostmpirun
is executed on) or starts a process remotely usingrsh
,ssh
or some other mechanism (e.g. many cluster resource management systems have their own mechanisms for that). When thersh
/ssh
mechanism is used, since the program runs as root,mpirun
attempts to log into the other host(s) as root. This usually fails for one or both of two reasons:That's why you see rank 0 coming up (it's a local
fork()
-based spawn) and the other ranks missing. Since enabling remote root login is considered a security risk by many, I would rather go the way described in the short answer.Another option would be to make
hello-mpi.elf
owned by root and set the Set UID bit viachmod u+s hello-mpi.elf
. Then you won't needsudo
at all. This will not work if the filesystem is mounted with thenosuid
option or if some other security mechanism is active. Also root-owned suid binaries pose security risks since they always execute with root permissions, no matter what user runs them.I wonder, why you need root permissions in order to talk to the Epiphany board. Is the SDK doing some fancy privileged operations or is it simply accessing a device file in
/dev
that is only writeable by root? If it's the latter, perhaps the device node could be created with different permissions.