3

I am trying to parallelize this piece of C code with OpenMP:

for (i = 0; i < openSetSize; i++) {
    tmpF = arrayCells[openSet[i]].f;
    if (tmpF <= arrayCells[openSet[best]].f && tmpF <= arrayCells[bestPath[0]].f){
        isThereBest = true;
        best = i;
    }
}

I tried this way:

#pragma omp parallel {
            
    int best_private = 0;
    #pragma omp for nowait
    for (int i = 0; i < openSetSize; i++) {
        double tmpF = arrayCells[openSet[i]].f;
        if (tmpF <= arrayCells[openSet[best_private]].f && tmpF <= arrayCells[bestPath[0]].f) {
            isThereBest = true;
            best_private = i;
        }
    }
            
    #pragma omp critical
    {
        if(best_private > best){
            best = best_private;
        }
    }
}

but the performance are not satisfactory at all (much more time spent with the omp version).

Does anyone have better hints? Or do know where I am wrong? Thank you so much

5
  • As far as i can tell, you are not doing much in your loop. Depending on openSetSize, the overhead of creating and maintaining multiple threads might outweigh the benefit of parallelizing the loop. Commented May 20, 2021 at 19:04
  • Different 'threads' would be attempting to access the best_private scalar at the same time. Isn't there some special OMP code to signal that (and get round it)? Commented May 20, 2021 at 19:05
  • @AdrianMole: Since best_private is declared inside the omp parallel construct, every thread will have its own separate variable of that name, i.e. the variable is private to every thread. Therefore, it should not be a problem if several threads write to best_private at the same time, because they will not be writing to the same variable. However, in contrast to best_private, the variable isThereBest does seem to be shared between all threads, which could cause thread contention when several threads write to it at the same time. Commented May 20, 2021 at 20:02
  • @AndreasWenzel Yeah - I realized about the best_private a while after I posted that comment; also about the isThereBest. But, even within that OMP block (and the critical part), there will be overheads in resolving the contention; those overheads are likely to wipe out any speed-up due to parallel running of such a 'simple' loop. Commented May 20, 2021 at 20:05
  • @AdrianMole: The omp critical construct is executed outside the loop, so I don't think it should be a problem. It will only require every thread to acquire the mutex a single time. Commented May 20, 2021 at 20:11

1 Answer 1

2

The only problem I see is that all threads are writing to the shared variable isThereBest.

The simplest solution would be to you give every thread its own private version of the variable, just like you did with the variable best:

#pragma omp parallel
{
    bool isThereBest_private = false;
    int best_private = 0;

    #pragma omp for nowait
    for (int i = 0; i < openSetSize; i++) {
        double tmpF = arrayCells[openSet[i]].f;
        if (tmpF <= arrayCells[openSet[best_private]].f && tmpF <= arrayCells[bestPath[0]].f) {
            isThereBest_private = true;
            best_private = i;
        }
    }

    #pragma omp critical
    {
        if ( isThereBest_private ) {
            isThereBest = true;
        if ( best_private > best ) {
            best = best_private;
        }
    }
}

However, a cleaner solution would be to use the reduction clause instead, which makes the entire omp critical construct redundant:

#pragma omp parallel for reduction(||:isThereBest) reduction(max:best)
for (int i = 0; i < openSetSize; i++) {
    double tmpF = arrayCells[openSet[i]].f;
    if (tmpF <= arrayCells[openSet[best]].f && tmpF <= arrayCells[bestPath[0]].f) {
        isThereBest = true;
        best = i;
    }
}

Using the reduction clause instead of an omp critical construct will likely also improve performance.

Sign up to request clarification or add additional context in comments.

19 Comments

More problematic is the race condition of the variable best_private
@dreamcrash: How can a private variable have a race condition? As far as I know, race conditions are only possible on shared data?
Oh, sorry about that I missed that best_private is declared inside the parallel region
Btw a possible improvement (performance-wise and readability -wise), although it depends a lot on other factors, would be to use openmp reduce with the 'max' and the '&&' operators similar to this stackoverflow.com/a/67549626/1366871
@dreamcrash: If the loop only runs for a few microseconds, then you are probably right that thread contention in the omp critical construct could be a problem. However, if it runs for several milliseconds or more, then I believe it should not be a problem. Either way, your recommendation of using the reduction clause instead of an omp critical construct probably is the cleaner solution.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.