I am trying to parallelize this piece of C code with OpenMP:
for (i = 0; i < openSetSize; i++) {
tmpF = arrayCells[openSet[i]].f;
if (tmpF <= arrayCells[openSet[best]].f && tmpF <= arrayCells[bestPath[0]].f){
isThereBest = true;
best = i;
}
}
I tried this way:
#pragma omp parallel {
int best_private = 0;
#pragma omp for nowait
for (int i = 0; i < openSetSize; i++) {
double tmpF = arrayCells[openSet[i]].f;
if (tmpF <= arrayCells[openSet[best_private]].f && tmpF <= arrayCells[bestPath[0]].f) {
isThereBest = true;
best_private = i;
}
}
#pragma omp critical
{
if(best_private > best){
best = best_private;
}
}
}
but the performance are not satisfactory at all (much more time spent with the omp version).
Does anyone have better hints? Or do know where I am wrong? Thank you so much
best_privatescalar at the same time. Isn't there some special OMP code to signal that (and get round it)?best_privateis declared inside theomp parallelconstruct, every thread will have its own separate variable of that name, i.e. the variable is private to every thread. Therefore, it should not be a problem if several threads write tobest_privateat the same time, because they will not be writing to the same variable. However, in contrast tobest_private, the variableisThereBestdoes seem to be shared between all threads, which could cause thread contention when several threads write to it at the same time.best_privatea while after I posted that comment; also about theisThereBest. But, even within that OMP block (and thecriticalpart), there will be overheads in resolving the contention; those overheads are likely to wipe out any speed-up due to parallel running of such a 'simple' loop.omp criticalconstruct is executed outside the loop, so I don't think it should be a problem. It will only require every thread to acquire the mutex a single time.