I have a C code that solves the Traveling Salesman Problem using a greedy algorithm. However, the current implementation is sequential, and I'd like to parallelize it using OpenMP to achieve better performance.
Here is the existing code:
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <math.h>
#define N 5000
#define LARGE (2 * pow(N * 10, 2))
int X[N];
int Y[N];
float distance[N][N + 1];
int main(int na, char * arg[]) {
assert(na == 2);
printf("Dimension %s\n", arg[1]);
int nn = atoi(arg[1]);
assert(nn <= N);
for (int i = 0; i < nn; i++)
X[i] = rand() % (nn * 10);
for (int i = 0; i < nn; i++)
Y[i] = rand() % (nn * 10);
for (int i = 0; i < nn; i++)
distance[i][i] = 0;
for (int i = 0; i < nn; i++)
for (int j = i + 1; j < nn; j++)
distance[i][j] = distance[j][i] = sqrt(pow(X[i] - X[j], 2) + pow(Y[i] - Y[j], 2));
float best = LARGE;
int good[nn];
for (int first = 0; first < nn; first++) {
float dist = 0;
int path[nn];
for (int i = 0; i < nn; i++)
path[i] = -1;
path[first] = 0;
int current = first;
for (int i = 1; i < nn; i++) {
float dmin = LARGE;
int index = 0;
for (int j = 0; j < nn; j++) {
if (path[j] == -1 && current != j && distance[current][j] < dmin) {
dmin = distance[current][j];
index = j;
}
}
current = index;
path[current] = i;
dist += dmin;
if (dist >= best) {
dist = 0;
break;
}
}
if (dist) {
float dmin = distance[current][first];
dist += dmin;
if (dist < best) {
for (int i = 0; i < nn; i++)
good[path[i]] = i;
best = dist;
}
distance[first][nn] = dist;
}
}
printf("Solution :\n");
for (int i = 0; i < nn; i++)
printf("%d\n", good[i]);
printf("Distance %g == %g\n", best, distance[good[0]][nn]);
exit(0);
}
I'm looking for guidance on how to effectively parallelize this code using OpenMP to achieve the best possible speedup. Are there specific sections of the code that are more amenable to parallelization? What OpenMP constructs should I consider using, and are there any potential pitfalls I should be aware of?
Any insights, code snippets, or recommendations on how to approach parallelizing this code would be greatly appreciated. Thank you!
If your objective is to parallelize your code using OpenMP and you prefer not to alter the algorithm, then you must utilize either a user-defined reduction or a manual reduction. Below, I've provided an example in response to your request in the comments using manual reduction.