Is there false-sharing with sum[] in the following snippet of code that computes the row-wise sums of a sparse CSR matrix since independent threads update distinct locations of the array which could potentially be mapped to the same cache-line?
If yes, how can we avoid this assuming sum[] is pre-allocated and cannot be re-defined to have elements map to unique cachelines.
#pragma omp parallel for
for (int i = 0; i < N; i++)
float row_sum = 0.;
for (int k = rowOffsets[i]; k < rowOffsets[i+1]; k++){
row_sum += values[k];
}
sum[i] = row_sum;
}
This is precisely a case of potential false sharing. However it's really not bad, as 1) this is just a single write at the end of each outer iteration, and 2) the default omp scheduling will group the iterations by large chunks for each thread, hence minimizing the cache line conflicts.
You could reduce further the false sharing by delaying the moment you write to
sum[]until the end of each thread:But frankly you don't need this kind of complication in the above case. It can help in some other cases, though.