I am new to parallel computing, so I decided to start from hello world compiling with Mpich2. Here is the code:
/* helloworld.c */
#include <stdio.h>
/* You MUST include this for the MPI_* functions */
#include "mpi.h"
int main(int argc, char **argv) {
int rank;
char host[150];
int namelen;
/* Initialize MPI. This handles mpich-specific command line arguments */
MPI_Init(&argc, &argv);
/* Get my rank. My rank number gets stored in the 'rank' variable */
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
/* Look up what computer I am running on. Store it in 'host' */
MPI_Get_processor_name(host,&namelen);
printf("Hello world (Rank: %d / Host: %s)\n", rank, host);
fflush(stdout);
/* Finalize: Close connections to the other children, clean up memory
* the MPI library has allocated, etc */
MPI_Finalize();
return 0;
}
I compile and run it like this:
mpicc helloworld.c -o myhello
mpirun -nc 2 ./myhello
It works. However I noticed by increasing the number of CPUs the wall clock time increased which I expect it to decrease?! Moreover there is no limitation for the number of CPUs, However my laptop has 5 core but I can set the number of CPUs in the code as much as I want, I expect some error if I exceed the number of real CPUs.
First off parallel processing in theory should do as you say:
This although is nice to think of in theory, in practice it is actually a completely different story.
Without knowing anything about your project I would think that the program has little parallelizable processing and/or its so fast/small that the message passing that the program has to do is actually slowing it down. Don't forget that its not just the program running there are a lot of other processes done in the background from each core.
What I would advise doing is something that can really show the usefulness of a parrllel process like for example splitting an array into different segments and computing each segment with a different processor (i.e should be a huge array) or reading different text files at the same time and doing some work on them.
As a final note you should really look into Amdahl's law which explains how much a system can speedup with parallel processing.