0

I have a piece of code with two nested for loops. When the first one has few steps the second one has a lot, and the other way around. I can run both for loops with omp for directives independently and I have consistent results (and some speedup). However I'd like to:

  1. Run the first one in parallel if it has 16 steps or more
  2. Else run the second one in parallel (but not the first one even if it has 8 steps)

This is not nested parallelism because either one loop is parallel or the other. If I run them independently and run top -H to see the threads, I observe sometimes only one thread, sometimes more (in each case) so what I want to do would make sense and would actually improve performance?

So far I did something like this:

#pragma omp parallel
{
    #pragma omp for schedule(static,16)
    for(...){
        /* some declarations */
        #pragma omp for schedule(static,16) nowait
        for(...){
            /* ... */
        }
    }
}

which does not compile (work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region) and which would not behave as I described anyway. I also tried collapse but had problems with the "/* some declarations */ ", and I'd like to avoid it since it's openmp3.0 and I'm not sure the target hardware's compiler will support this.

Any ideas?

1 Answer 1

1

You cannot nest work-sharing constructs that bind to the same parallel region, but you can use nested parallelism and selectively deactivate the regions with the if(condition) clause. If condition evaluates to true at run time, then the region is active, otherwise it executes serially. It would look like this:

/* Make sure nested parallelism is enabled */
omp_set_nested(1);

#pragma omp parallel for schedule(static) if(outer_steps>=16)
for(...){
    /* some declarations */
    #pragma omp parallel for if(outer_steps<16)
    for(...){
        /* ... */
    }
}

The drawback here is that the inner region introduces a small overhead if it is not active at run time. If you desire efficiency and are ready to sacrifice maintainability for that, then you can write two different implementations of the nested loop and branch to the appropriate one based on the value of outer_steps.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi and thanks. Actually I ended up writing two different parallel regions within an if(){}else{} as you said. It's ugly but it's quite efficient. I'll try your solution when I'll be looking for something elegant.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.