Parallelizing an inner loop with OpenMP

Question

Assume we have two nested loops. The inner loop should be parallelized, but the outer loop needs to be executed sequentially. Then the following code does what we want:

for (int i = 0; i < N; ++i) {
  #pragma omp parallel for schedule(static)
  for (int j = first(i); j < last(i); ++j) {
    // Do some work
  }
}

Now assume that each thread has to obtain some thread-local object to carry out the work in the inner loop, and that getting these thread-local objects is costly. Therefore, we don't want to do the following:

for (int i = 0; i < N; ++i) {
  #pragma omp parallel for schedule(static)
  for (int j = first(i); j < last(i); ++j) {
    ThreadLocalObject &obj = GetTLO(omp_get_thread_num()); // Costly!
    // Do some work with the help of obj
  }
}

How can I solve this issue?

Each thread should ask for its local object only once.
The inner loop should be parallelized among all threads.
The iterations of the outer loop should be executed one after the other.

My idea is the following, but does it really want I want?

#pragma omp parallel
{
  ThreadLocalObject &obj = GetTLS(omp_get_thread_num());
  for (int i = 0; i < N; ++i) {
    #pragma omp for schedule(static)
    for (int j = first(i); j < last(i); ++j) {
      // Do some work with the help of obj
    }
  }
}

@HighPerformanceMark, typos aside, I think the OPs question is interesting which is rare for OpenMP questions lately. Do you have any comments on my solution using threadprivate? Did I make a mistake (I almost only use C now and my C++ is super rusty)? — Z boson
– Z boson, Commented Apr 15, 2015 at 7:41
Your method is fine. Could you tell me why you need a thread local object initialized to the thread number? — Z boson
– Z boson, Commented Apr 16, 2015 at 13:33

Massimiliano · Accepted Answer · 2015-04-19 19:43:24Z

1

I don't really get why the complication of threadprivate should be necessary, when you can simply use a pool of objects. The basic idea should go along these lines:

#pragma omp parallel
{      
  // Will hold an handle to the object pool
  auto pool = shared_ptr<ObjectPool>(nullptr); 
  #pragma omp single copyprivate(pool)
  {
    // A single thread creates a pool of num_threads objects
    // Copyprivate broadcasts the handle
    pool = create_object_pool(omp_get_num_threads());
  }
  for (int i = 0; i < N; ++i) 
  {
    #pragma omp parallel for schedule(static)
    for (int j = first(i); j < last(i); ++j) 
    {
        // The object is not re-created, just a reference to it
        // is returned from the pool
        auto & r = pool.get( omp_get_thread_num() );
        // Do work with r
    }
  }
}

answered Apr 19, 2015 at 19:43

Massimiliano

8,0482 gold badges51 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Parallelizing an inner loop with OpenMP

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related