0

I am having a race problem with the current situation. I am trying to create the amount of threads as there are cores, use locks on OMP. I have tried to do #pragma omp critical and also #pragma amp atomic and none of them are helping. I am getting some crazy negative numbers... I know how to do it using private, but I want to actually synchronize the threads, not create multiple variables for the threads and then combine at the end.

// multiplication
#pragma omp parallel for
for (i = 0; i < N; i++)
{
     for (j = 0; j < N; j++)
     {
          c[i][j] = 0.;
          for (k = 0; k < N; k++) 
          {        
               omp_set_lock(&lock);
               //#pragma omp critical
               //#pragma omp atomic
               c[i][j] += a[i][k] * b[k][j];
               threadTasks[omp_get_thread_num()]++;
               omp_unset_lock(&lock);
          }
          printf("[%d, %d]: Thread ID %d\n", i, j, omp_get_thread_num());
     }
}
3
  • 1
    Why do you call omp_unset_lock end of the second loop? Commented Feb 14, 2018 at 6:22
  • I don't understand the reason you give for not doing this straightforward. Nesting the outer loop backwards, as well as your usage of threadTasks, creates false sharing and probably makes parallelization useless. Your compiler may not scalarize c[i][j] even with loop nesting fixed, even if it would do so without openmp. This would be particularly important if you haven't set restrict. Commented Feb 14, 2018 at 12:48
  • @hisener That was an error I fixed it. Thanks for pointing it out Commented Feb 15, 2018 at 1:12

2 Answers 2

1

You don't need any protection (locks, atomics, critical sections) against race conditions for this matrix multiplication. In fact it would totally kill your performance doing so. Since each thread is executing a different i, different threads can never write or read the same index of c[i][j].

However, you need to have private loop variables for your inner loops or everything goes wrong. In general, declare all variables as locally as possible. Then they are implicit private which is almost always the right thing for a variable that is not needed outside of a private section.

I know how to do it using private, but I want to actually synchronize the threads, not create multiple variables for the threads and then combine at the end.

For the inner loop variables, there is no alternative to making them private. Many times, a reduction (private copies, aggregation at the end), can perform better than a shared result variable. In this case a shared result should be fine.

Sign up to request clarification or add additional context in comments.

1 Comment

Yeah it only works if you make the inner loop private. Thank you for the heads up. I was just working on pthreads so openmp was somewhat hard to understand for me but I think I got it. Thank you
0

For multiple nested loops with various operations, I will not use #pragma omp parallel for, instead I break down into #pragma omp parallel and #pragma omp for collapse(2). See the full code below:

#pragma omp parallel
{
#pragma omp for collapse(2)
    for (i = 0; i < N; i++){
        for (j = 0; j < N; j++){
            c_parallel[i][j] = 0.;
            for (k = 0; k < N; k++) {  
               c_parallel[i][j] += a_parallel[i][k] * b_parallel[k][j];    
            }
        }
    }

}//end parallel region

I am using vector< vector<float>> for the matrix and got 300% faster performance on i5 4 cores/Win10.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.