OpenMP Nested For Loop Race Condition

Question

I am having a race problem with the current situation. I am trying to create the amount of threads as there are cores, use locks on OMP. I have tried to do #pragma omp critical and also #pragma amp atomic and none of them are helping. I am getting some crazy negative numbers... I know how to do it using private, but I want to actually synchronize the threads, not create multiple variables for the threads and then combine at the end.

// multiplication
#pragma omp parallel for
for (i = 0; i < N; i++)
{
     for (j = 0; j < N; j++)
     {
          c[i][j] = 0.;
          for (k = 0; k < N; k++) 
          {        
               omp_set_lock(&lock);
               //#pragma omp critical
               //#pragma omp atomic
               c[i][j] += a[i][k] * b[k][j];
               threadTasks[omp_get_thread_num()]++;
               omp_unset_lock(&lock);
          }
          printf("[%d, %d]: Thread ID %d\n", i, j, omp_get_thread_num());
     }
}

I don't understand the reason you give for not doing this straightforward. Nesting the outer loop backwards, as well as your usage of threadTasks, creates false sharing and probably makes parallelization useless. Your compiler may not scalarize c[i][j] even with loop nesting fixed, even if it would do so without openmp. This would be particularly important if you haven't set restrict. — tim18
– tim18, Commented Feb 14, 2018 at 12:48
@hisener That was an error I fixed it. Thanks for pointing it out — Programmer1010
– Programmer1010, Commented Feb 15, 2018 at 1:12

Zulan · Accepted Answer · 2018-02-14 07:46:57Z

1

You don't need any protection (locks, atomics, critical sections) against race conditions for this matrix multiplication. In fact it would totally kill your performance doing so. Since each thread is executing a different i, different threads can never write or read the same index of c[i][j].

However, you need to have private loop variables for your inner loops or everything goes wrong. In general, declare all variables as locally as possible. Then they are implicit private which is almost always the right thing for a variable that is not needed outside of a private section.

I know how to do it using private, but I want to actually synchronize the threads, not create multiple variables for the threads and then combine at the end.

For the inner loop variables, there is no alternative to making them private. Many times, a reduction (private copies, aggregation at the end), can perform better than a shared result variable. In this case a shared result should be fine.

answered Feb 14, 2018 at 7:46

Zulan

22.8k7 gold badges57 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Programmer1010 Over a year ago

Yeah it only works if you make the inner loop private. Thank you for the heads up. I was just working on pthreads so openmp was somewhat hard to understand for me but I think I got it. Thank you

cho_uc · Accepted Answer · 2019-05-26 22:06:36Z

0

For multiple nested loops with various operations, I will not use #pragma omp parallel for, instead I break down into #pragma omp parallel and #pragma omp for collapse(2). See the full code below:

#pragma omp parallel
{
#pragma omp for collapse(2)
    for (i = 0; i < N; i++){
        for (j = 0; j < N; j++){
            c_parallel[i][j] = 0.;
            for (k = 0; k < N; k++) {  
               c_parallel[i][j] += a_parallel[i][k] * b_parallel[k][j];    
            }
        }
    }

}//end parallel region

I am using vector< vector<float>> for the matrix and got 300% faster performance on i5 4 cores/Win10.

answered May 26, 2019 at 22:06

cho_uc

1912 silver badges8 bronze badges

Collectives™ on Stack Overflow

OpenMP Nested For Loop Race Condition

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related