MPI on C, Segmentation fault: 11

622 views Asked by At

I have Mac OS X Yosemite 10.10.1 (14B25).

I have some problems with compiling the code. Here it is:

#include <stdio.h>
#include <mpi.h>

#define n 3
#define repeats 1

double abs(double item)
{
    return (item > 0) ? item : -item;
}

int swap_raws (double **a, int p, int q)
{
    if (p >= 0 && p < n && q >= 0 && q < n)
    {
        if (p == q)
            return 0;    

        for (int i = 0; i < n; i++)
        {
            double temp = a[p][i];
            a[p][i] = a[q][i];
            a[q][i] = temp;
        }

        return 0;
    }
    else
        return -1;
}

double f_column (int rank, int size, double *least)
{
    double t1, t2, tbeg, tend, each_least = 1, least0;
    int map[n];
    double **a = malloc (sizeof (*a) * n);
    int i, j, k;    

    for (i = 0; i < n; i++)
        a[i] = malloc (sizeof (*a[i]) * n);    

    if (rank == 0)
        for (i = 0; i < n; i++)
            for (j = 0; j < n; j++)
                a[i][j] = 1.0 / (i + j + 1);

    MPI_Bcast (a, n * n, MPI_DOUBLE, 0, MPI_COMM_WORLD);

    for (i = 0; i < n; i++)
        map[i] = i % size;

    MPI_Barrier (MPI_COMM_WORLD);

    t1 = MPI_Wtime ();

    for (k = 0; k < n - 1; k++)
    {
        double max = abs (a[k][k]);
        int column = k;

        for (j = k + 1; j < n; j++)
        {
            double absv = abs (a[k][j]);

            if (absv > max)
            {
                max = absv;
                column = j;
            }
        }

        if (map[k] == rank && column != k && swap_raws (a, k, column))
        {
            printf("ERROR SWAPPING %d and %d columns\n", k, column);
            return -1;
        }

        MPI_Bcast (&a[k], n, MPI_DOUBLE, map[k], MPI_COMM_WORLD);
        MPI_Bcast (&a[column], n, MPI_DOUBLE, map[k], MPI_COMM_WORLD);

        if (map[k] == rank)
            for (i = k + 1; i < n; i++)
                a[k][i] /= a[k][k];

        MPI_Barrier (MPI_COMM_WORLD);
        MPI_Bcast (&a[k][k+1], n - k - 1, MPI_DOUBLE, map[k], MPI_COMM_WORLD);

        for (i = k + 1; i < n; i++)
            if (map[i] == rank)
                for (j = k + 1; j < n; j++)
                    a[j][i] -= a[j][k] * a[i][j];
    }

    t2 = MPI_Wtime ();

    for (i = 0; i < n; i++)
        if (map[i] == rank)
            for (j = 0; j < n; j++)
            {
                double absv = abs (a[i][j]);

                if (each_least > absv)
                    each_least = absv;

                //printf ("a[%d][%d] = %lg\n", j, i, a[i][j]);
            }

    MPI_Reduce (&each_least, &least0, 1, MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);
    MPI_Reduce (&t1, &tbeg, 1, MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);
    MPI_Reduce (&t2, &tend, 1, MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);

    for (i = 0; i < n; i++)
        free (a[i]);
    free (a);

    if (rank == 0)
    {
        *least = least0;
        return (tend - tbeg);
    }
}

int main (int argc, char *argv[])
{
    int rank, size;
    double min, max, aver, least;

    if (n == 0)
        return 0;

    MPI_Init (&argc, &argv);
    MPI_Comm_rank (MPI_COMM_WORLD, &rank);
    MPI_Comm_size (MPI_COMM_WORLD, &size);

    // It works!
    //double try = f_column_non_parallel (rank, size, &least);
    double try = f_column (rank, size, &least);
    aver = max = min = try;

    for (int i = 1; i < repeats; i++)
    {
        //double try = f_column_non_parallel (rank, size, &least);
        double try = f_column (rank, size, &least);

        if (try < min)
            min = try;
        else if (try > max)
            max = try;

        aver += try;
    }
    aver /= repeats;

    MPI_Finalize ();

    if (rank == 0)
        printf("N: %d\nMIN: %f\nMAX: %f\nAVER: %f\nLEAST: %lg\n", size, min, max, aver, least);

    return 0;
}

I have the Gilbert matrix. a(i)(j) = 1 / (i + j + 1) for i,j from 0 to n

This code should find LU decomposition using MPI in order to do it in the parallel way.

The first one process initialises the array and then broadcasts it to other processes.

Then I find the maximum in the raw and swap that columns. Then I would like to broadcast that data to every process, i.e. using MPI_Barrier (MPI_COMM_WORLD); but it says:

The error below So, I don't know what's happened and how I can fix that problem. The same variant of the program runs without using processes and non-parallel version but doesn't work here.

If you find the solution, the example should work like that (I was calculating it by myself, you can check it too, but I can admit it's true). The matrix (here j and i vertically and horizontally respectively, it works in not such a convenient way for people but you should take it):

1   1/2 1/3    1   1/2  1/3     1   1/2  1/3      |1   1/2  1/3  |
1/2 1/3 1/4 -> 1/2 1/12 1/12 -> 1/2 1/12 1     -> |1/2 1/12 1/12 | <- answer
1/3 1/4 1/5    1/3 1/12 4/45    1/3 1/12 1/180    |1/3 1    1/180|

The source matrix so:

    |1   0 0|   |1 1/2  1/3  |   |1   1/2 1/3|
A = |1/2 1 0| * |0 1/12 1/12 | = |1/2 1/3 1/4|
    |1/3 1 1|   |0 0    1/180|   |1/3 1/4 1/5|

Can you help me to find out made mistake? Thank you in advance :)

2

There are 2 answers

2
Pavan Balaji On BEST ANSWER

Your program has a bug in the following part of the code:

double **a = malloc (sizeof (*a) * n);
[...snip...]
MPI_Bcast (a, n * n, MPI_DOUBLE, 0, MPI_COMM_WORLD);

You are allocating 'n' pointers in "a", not an 'n * n' array. So when you do an 'n * n' size MPI_Bcast of "a", you are asking MPI to transfer from garbage memory locations that is not allocated. This is causing MPI to segfault.

You can change "a" to simply "double *" instead of "double **" and allocate 'n * n' doubles in there to fix this issue.

2
xbug On

What grieves me the most is that f_column() is supposed to return a double, but the return value is undefined when rank != 0.

This comment caught my attention:

// It works!
//double try = f_column_non_parallel (rank, size, &least);
double try = f_column (rank, size, &least);

It suggests that the previous version of f_column() was working, and that you ran into troubles when attempting to parallelize it (I'm guessing that's what you're doing now).

How this could lead to a segfault is not immediately apparent to me though. I'd expect a floating point exception.

A couple of other points:

  • I'm not too comfortable with your memory allocation code (I'd probably use calloc() instead of malloc(), and sizeof() on explicit data types, etc...); it just freaks me out to see things like a[i] = malloc(sizeof (*a[i]) * n);, but it's just a matter of style, really.

  • You appear to have proper bound checking (indices over a are always positive and < n).

  • Oh, and you're redefining abs(), which is probably not a good idea.

  • Try to compile your code in debug mode, and run it with gdb; also run it through valgrind if you can, MacOS X should be supported by now.

  • You should probably take a closer look at your compiler warnings ;-)