Selectively enable an OpenMP for loop within a parallel region

Question

Is it possible to selectively enable an openmp directive with template parameter or run time variable?

this (all threads work on the same for loop).
#pragma omp parallel
{
    #pragma omp for
    for (int i = 0; i < 10; ++i) { /*...*/ }
}
versus this (each thread works on its own for loop)
#pragma omp parallel
{
    for (int i = 0; i < 10; ++i) { /*...*/ }
}

Update (Testing if clause)

test.cpp:

#include <iostream>
#include <omp.h>

int main() {
    bool var = true;
    #pragma omp parallel 
    {
        #pragma omp for if (var)
        for (int i = 0; i < 4; ++i) {
            std::cout << omp_get_thread_num() << "\n";
        }
    }
}

Error message (g++ 6, compiled with g++ test.cpp -fopenmp)

test.cpp: In function ‘int main()’:
test.cpp:8:25: error: ‘if’ is not valid for ‘#pragma omp for’
         #pragma omp for if (var)
                         ^~

Both versions are parallel, mostly I want to selectively enable the #pragma omp for line. I will try to look up if the if clause can work with the for clause. thanks. — hamster on wheels
– hamster on wheels, Commented Feb 15, 2017 at 16:14
it does. msdn.microsoft.com/en-us/library/5187hzke.aspx hopefully this is true for all compilers. — hamster on wheels
– hamster on wheels, Commented Feb 15, 2017 at 16:15
@Zulan Implemented different data structures (sparse and dense array etc. some saves more memory than others.). The data structure can be divided into two kinds. The two kinds need two different ways to loop over them. And then there is a class template that implements algorithms that works on all of the data structures. I don't want to implement the loops for each kind of data structure or repeat lots of code. — hamster on wheels
– hamster on wheels, Commented Feb 15, 2017 at 18:01

hamster on wheels · Accepted Answer · 2017-02-15 17:38:31Z

1

Sort of a work around. Don't know if it is possible to get rid of conditionals for getting the thread id.

#include <iostream>
#include <omp.h>
#include <sstream>
#include <vector>
int main() {
    constexpr bool var = true;
    int n_threads = omp_get_num_procs();
    std::cout << "n_threads: " << n_threads << "\n";
    std::vector<std::stringstream> s(omp_get_num_procs());

    #pragma omp parallel if (var) 
    {

        const int thread_id0 = omp_get_thread_num();
        #pragma omp parallel
        {
            int thread_id1;
            if (var) {
                thread_id1 = thread_id0;
            } else {
                thread_id1 = omp_get_thread_num();
            }

            #pragma omp for
            for (int i = 0; i < 8; ++i) {
                s[thread_id1] << i << ", ";
            }
        }
    }

    for (int i = 0; i < s.size(); ++i) {
        std::cout << "thread " << i << ": " 
                  << s[i].str() << "\n";
    }
}

Output (when var == true):

n_threads: 8
thread 0: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 1: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 2: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 3: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 4: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 5: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 6: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 7: 0, 1, 2, 3, 4, 5, 6, 7,

Output (when var == false):

n_threads: 8
thread 0: 0, 
thread 1: 1, 
thread 2: 2, 
thread 3: 3, 
thread 4: 4, 
thread 5: 5, 
thread 6: 6, 
thread 7: 7,

edited Feb 15, 2017 at 17:38

answered Feb 15, 2017 at 17:14

hamster on wheels

2,9531 gold badge23 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

hamster on wheels Over a year ago

this works for both clang and g++. not sure about intel compiler.

Hristo Iliev Over a year ago

It won't work as expected if nested parallelism is enabled.

Zulan · Accepted Answer · 2017-02-15 19:19:57Z

1

I think the idiomatic C++ solution is to hide the different OpenMP pragmas behind algorithmic overloads.

#include <iostream>
#include <sstream>
#include <vector>
#include <omp.h>

#include <type_traits>
template <bool ALL_PARALLEL>
struct impl;

template<>
struct impl<true>
{
  template<typename ITER, typename CALLABLE>
  void operator()(ITER begin, ITER end, const CALLABLE& func) {
    #pragma omp parallel
    {
      for (ITER i = begin; i != end; ++i) {
        func(i);
      }
    }
  }
};

template<>
struct impl<false>
{
  template<typename ITER, typename CALLABLE>
  void operator()(ITER begin, ITER end, const CALLABLE& func) {
    #pragma omp parallel for
    for (ITER i = begin; i < end; ++i) {
      func(i);
    }
  }
};

// This is just so we don't have to write parallel_foreach()(...)
template <bool ALL_PARALLEL, typename ITER, typename CALLABLE>
void parallel_foreach(ITER begin, ITER end, const CALLABLE& func)
{
    impl<ALL_PARALLEL>()(begin, end, func);
}

int main()
{
    constexpr bool var = false;
    int n_threads = omp_get_num_procs();
    std::cout << "n_threads: " << n_threads << "\n";
    std::vector<std::stringstream> s(omp_get_num_procs());

    parallel_foreach<var>(0, 8, [&s](auto i) {
        s[omp_get_thread_num()] << i << ", ";
    });

    for (int i = 0; i < s.size(); ++i) {
        std::cout << "thread " << i << ": " 
                  << s[i].str() << "\n";
    }
}

If you use some specific types, you can do an overload by type instead of to using the bool template parameter and iterate through elements rather than the numerical indexed loop. Note that you can use C++ random access iterators in OpenMP worksharing loops! Depending on your types you might very well be able to implement an iterator that hides everything about the internal data access form the caller.

answered Feb 15, 2017 at 19:19

Zulan

22.8k7 gold badges57 silver badges117 bronze badges

4 Comments

hamster on wheels Over a year ago

I thought overhead was quite big for iterator: stackoverflow.com/questions/2513988/… Not sure if that is still true now. After reading that, I avoid writing iterator for classes if it is for openmp for loop.

Zulan Over a year ago

You misread the linked answer. The example he gives is for std::set, which has no random access iterator. Hence he does not use the a loop worksharing construct (#pragma omp (parallel) for), but a hand-made loop. If you use a normal #pragma omp for on a random access iterator, there is no inherent overhead. Your optimization mileage may vary, so do measure and compare.

hamster on wheels Over a year ago

thanks. guess I will add random access iterators in the next project ...

hamster on wheels Over a year ago

another way is to make a macro function.

hamster on wheels · Accepted Answer · 2017-02-16 15:49:41Z

0

#include <omp.h>
#include <sstream>
#include <vector>
#include <iostream>
int main() {
    constexpr bool var = false;
    int n_threads = omp_get_num_procs();
    std::cout << "n_threads: " << n_threads << "\n";
    std::vector<std::stringstream> s(omp_get_num_procs());

    #pragma omp parallel
    {
        const int thread_id = omp_get_thread_num();
        if (var) {
            #pragma omp for
            for (int i = 0; i < 8; ++i) {
                s[thread_id] << i << ", ";
            }
        } else {
            for (int i = 0; i < 8; ++i) {
                s[thread_id] << i << ", ";
            } // code duplication
        }
    }
    for (int i = 0; i < s.size(); ++i) {
        std::cout << "thread " << i << ": " 
                  << s[i].str() << "\n";
    }
}

edited Feb 16, 2017 at 15:49

answered Feb 15, 2017 at 18:23

hamster on wheels

2,9531 gold badge23 silver badges53 bronze badges

3 Comments

Hristo Iliev Over a year ago

You realise that the code in the else block actually creates a nested parallel region, which might lead to a surprising result? The only reason it might seem to work as the OP wants is that nested parallelism is disabled by default and the region will execute in serial in each thread.

hamster on wheels Over a year ago

Thanks. I fixed that by removing #pragma omp parallel for in the else block.

Hristo Iliev Over a year ago

Sorry, I didn't realise that you are the OP. You should really combine both your answers into one.

Collectives™ on Stack Overflow

Selectively enable an OpenMP for loop within a parallel region

3 Answers 3

2 Comments

4 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related