3

Assume we have two nested loops. The inner loop should be parallelized, but the outer loop needs to be executed sequentially. Then the following code does what we want:

for (int i = 0; i < N; ++i) {
  #pragma omp parallel for schedule(static)
  for (int j = first(i); j < last(i); ++j) {
    // Do some work
  }
}

Now assume that each thread has to obtain some thread-local object to carry out the work in the inner loop, and that getting these thread-local objects is costly. Therefore, we don't want to do the following:

for (int i = 0; i < N; ++i) {
  #pragma omp parallel for schedule(static)
  for (int j = first(i); j < last(i); ++j) {
    ThreadLocalObject &obj = GetTLO(omp_get_thread_num()); // Costly!
    // Do some work with the help of obj
  }
}

How can I solve this issue?

  1. Each thread should ask for its local object only once.

  2. The inner loop should be parallelized among all threads.

  3. The iterations of the outer loop should be executed one after the other.

My idea is the following, but does it really want I want?

#pragma omp parallel
{
  ThreadLocalObject &obj = GetTLS(omp_get_thread_num());
  for (int i = 0; i < N; ++i) {
    #pragma omp for schedule(static)
    for (int j = first(i); j < last(i); ++j) {
      // Do some work with the help of obj
    }
  }
}
2
  • @HighPerformanceMark, typos aside, I think the OPs question is interesting which is rare for OpenMP questions lately. Do you have any comments on my solution using threadprivate? Did I make a mistake (I almost only use C now and my C++ is super rusty)? Commented Apr 15, 2015 at 7:41
  • Your method is fine. Could you tell me why you need a thread local object initialized to the thread number? Commented Apr 16, 2015 at 13:33

1 Answer 1

1

I don't really get why the complication of threadprivate should be necessary, when you can simply use a pool of objects. The basic idea should go along these lines:

#pragma omp parallel
{      
  // Will hold an handle to the object pool
  auto pool = shared_ptr<ObjectPool>(nullptr); 
  #pragma omp single copyprivate(pool)
  {
    // A single thread creates a pool of num_threads objects
    // Copyprivate broadcasts the handle
    pool = create_object_pool(omp_get_num_threads());
  }
  for (int i = 0; i < N; ++i) 
  {
    #pragma omp parallel for schedule(static)
    for (int j = first(i); j < last(i); ++j) 
    {
        // The object is not re-created, just a reference to it
        // is returned from the pool
        auto & r = pool.get( omp_get_thread_num() );
        // Do work with r
    }
  }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.