2

Is it possible to selectively enable an openmp directive with template parameter or run time variable?

this (all threads work on the same for loop).
#pragma omp parallel
{
    #pragma omp for
    for (int i = 0; i < 10; ++i) { /*...*/ }
}
versus this (each thread works on its own for loop)
#pragma omp parallel
{
    for (int i = 0; i < 10; ++i) { /*...*/ }
}

Update (Testing if clause)

test.cpp:

#include <iostream>
#include <omp.h>

int main() {
    bool var = true;
    #pragma omp parallel 
    {
        #pragma omp for if (var)
        for (int i = 0; i < 4; ++i) {
            std::cout << omp_get_thread_num() << "\n";
        }
    }
}

Error message (g++ 6, compiled with g++ test.cpp -fopenmp)

test.cpp: In function ‘int main()’:
test.cpp:8:25: error: ‘if’ is not valid for ‘#pragma omp for’
         #pragma omp for if (var)
                         ^~
7
  • 2
    #pragma omp parallel if(variable) Commented Feb 15, 2017 at 16:12
  • Both versions are parallel, mostly I want to selectively enable the #pragma omp for line. I will try to look up if the if clause can work with the for clause. thanks. Commented Feb 15, 2017 at 16:14
  • it does. msdn.microsoft.com/en-us/library/5187hzke.aspx hopefully this is true for all compilers. Commented Feb 15, 2017 at 16:15
  • tried g++ 6. if’ is not valid for ‘#pragma omp for’ Commented Feb 15, 2017 at 16:47
  • 1
    @Zulan Implemented different data structures (sparse and dense array etc. some saves more memory than others.). The data structure can be divided into two kinds. The two kinds need two different ways to loop over them. And then there is a class template that implements algorithms that works on all of the data structures. I don't want to implement the loops for each kind of data structure or repeat lots of code. Commented Feb 15, 2017 at 18:01

3 Answers 3

1

Sort of a work around. Don't know if it is possible to get rid of conditionals for getting the thread id.

#include <iostream>
#include <omp.h>
#include <sstream>
#include <vector>
int main() {
    constexpr bool var = true;
    int n_threads = omp_get_num_procs();
    std::cout << "n_threads: " << n_threads << "\n";
    std::vector<std::stringstream> s(omp_get_num_procs());

    #pragma omp parallel if (var) 
    {

        const int thread_id0 = omp_get_thread_num();
        #pragma omp parallel
        {
            int thread_id1;
            if (var) {
                thread_id1 = thread_id0;
            } else {
                thread_id1 = omp_get_thread_num();
            }

            #pragma omp for
            for (int i = 0; i < 8; ++i) {
                s[thread_id1] << i << ", ";
            }
        }
    }

    for (int i = 0; i < s.size(); ++i) {
        std::cout << "thread " << i << ": " 
                  << s[i].str() << "\n";
    }
}

Output (when var == true):

n_threads: 8
thread 0: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 1: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 2: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 3: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 4: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 5: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 6: 0, 1, 2, 3, 4, 5, 6, 7, 
thread 7: 0, 1, 2, 3, 4, 5, 6, 7,

Output (when var == false):

n_threads: 8
thread 0: 0, 
thread 1: 1, 
thread 2: 2, 
thread 3: 3, 
thread 4: 4, 
thread 5: 5, 
thread 6: 6, 
thread 7: 7, 
Sign up to request clarification or add additional context in comments.

2 Comments

this works for both clang and g++. not sure about intel compiler.
It won't work as expected if nested parallelism is enabled.
1

I think the idiomatic C++ solution is to hide the different OpenMP pragmas behind algorithmic overloads.

#include <iostream>
#include <sstream>
#include <vector>
#include <omp.h>

#include <type_traits>
template <bool ALL_PARALLEL>
struct impl;

template<>
struct impl<true>
{
  template<typename ITER, typename CALLABLE>
  void operator()(ITER begin, ITER end, const CALLABLE& func) {
    #pragma omp parallel
    {
      for (ITER i = begin; i != end; ++i) {
        func(i);
      }
    }
  }
};

template<>
struct impl<false>
{
  template<typename ITER, typename CALLABLE>
  void operator()(ITER begin, ITER end, const CALLABLE& func) {
    #pragma omp parallel for
    for (ITER i = begin; i < end; ++i) {
      func(i);
    }
  }
};

// This is just so we don't have to write parallel_foreach()(...)
template <bool ALL_PARALLEL, typename ITER, typename CALLABLE>
void parallel_foreach(ITER begin, ITER end, const CALLABLE& func)
{
    impl<ALL_PARALLEL>()(begin, end, func);
}

int main()
{
    constexpr bool var = false;
    int n_threads = omp_get_num_procs();
    std::cout << "n_threads: " << n_threads << "\n";
    std::vector<std::stringstream> s(omp_get_num_procs());

    parallel_foreach<var>(0, 8, [&s](auto i) {
        s[omp_get_thread_num()] << i << ", ";
    });

    for (int i = 0; i < s.size(); ++i) {
        std::cout << "thread " << i << ": " 
                  << s[i].str() << "\n";
    }
}

If you use some specific types, you can do an overload by type instead of to using the bool template parameter and iterate through elements rather than the numerical indexed loop. Note that you can use C++ random access iterators in OpenMP worksharing loops! Depending on your types you might very well be able to implement an iterator that hides everything about the internal data access form the caller.

4 Comments

I thought overhead was quite big for iterator: stackoverflow.com/questions/2513988/… Not sure if that is still true now. After reading that, I avoid writing iterator for classes if it is for openmp for loop.
You misread the linked answer. The example he gives is for std::set, which has no random access iterator. Hence he does not use the a loop worksharing construct (#pragma omp (parallel) for), but a hand-made loop. If you use a normal #pragma omp for on a random access iterator, there is no inherent overhead. Your optimization mileage may vary, so do measure and compare.
thanks. guess I will add random access iterators in the next project ...
another way is to make a macro function.
0
#include <omp.h>
#include <sstream>
#include <vector>
#include <iostream>
int main() {
    constexpr bool var = false;
    int n_threads = omp_get_num_procs();
    std::cout << "n_threads: " << n_threads << "\n";
    std::vector<std::stringstream> s(omp_get_num_procs());

    #pragma omp parallel
    {
        const int thread_id = omp_get_thread_num();
        if (var) {
            #pragma omp for
            for (int i = 0; i < 8; ++i) {
                s[thread_id] << i << ", ";
            }
        } else {
            for (int i = 0; i < 8; ++i) {
                s[thread_id] << i << ", ";
            } // code duplication
        }
    }
    for (int i = 0; i < s.size(); ++i) {
        std::cout << "thread " << i << ": " 
                  << s[i].str() << "\n";
    }
}

3 Comments

You realise that the code in the else block actually creates a nested parallel region, which might lead to a surprising result? The only reason it might seem to work as the OP wants is that nested parallelism is disabled by default and the region will execute in serial in each thread.
Thanks. I fixed that by removing #pragma omp parallel for in the else block.
Sorry, I didn't realise that you are the OP. You should really combine both your answers into one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.