The problem with parallelizing the program using OpenMP

Question

I have a program that works with arrays and outputs a single number. To parallelize the program, I use OpenMP, but the problem is that after writing the directives, I started getting answers that are not similar to the answers of the program without parallelization. Can anyone tell me where I made a mistake?

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <math.h>
#include <float.h>

#define LOOP_COUNT 100
#define MIN_1_ARRAY_VALUE 1
#define MAX_1_ARRAY_VALUE 10
#define MIN_2_ARRAY_VALUE 10
#define MAX_2_ARRAY_VALUE 100
#define PI 3.1415926535897932384626433
#define thread_count 4

double generate_random(unsigned int* seed, int min, int max) {
    return (double)((rand_r(seed) % 10) + (1.0 / (rand_r(seed) % (max - min + 1))));
}
double map_1(double value) {
    return pow(value / PI, 3);
}
double map_2(double value) {
    return fabs(tan(value));
}
void dwarf_sort(int n, double mass[]) {
    int i = 1;
    int j = 2;
    while (i < n) {
        if (mass[i-1]<mass[i]) {
            i = j;
            j = j + 1;
        }
        else
        {
            double tmp = mass[i];
            mass[i] = mass[i - 1];
            mass[i - 1] = tmp;
            --i;
            if (i==0)
            {
                i = j;
                j = j + 1;
            }
        }
    }
}
int main(int argc, char* argv[]) {
    printf("lab work 3 in processing...!\n");

    int trial_counter;
    int array_size = atoi(argv[1]);

    struct timeval before, after;
    long time_diff;
    gettimeofday(&before, NULL);

    for (trial_counter = 0; trial_counter < LOOP_COUNT; trial_counter++) {
        double arr1[array_size];
        double arr2[array_size / 2];
        double arr2_copy[array_size / 2];
        double arr2_min = DBL_MAX;
        unsigned int tempValue = trial_counter;
        unsigned int *currentSeed = &tempValue;
        //stage 1 - init

#pragma omp parallel num_threads(thread_count)
        {

#pragma omp parallel for default(none) shared(arr1, currentSeed, array_size) schedule(guided, thread_count)
            for (int i = 0; i < array_size; i++) {
                arr1[i] = generate_random(currentSeed, MIN_1_ARRAY_VALUE, MAX_1_ARRAY_VALUE);
                // printf("arr[%d] = %f\n", i, arr1[i]);
            }
#pragma omp parallel for default(none) shared(arr2, arr2_copy, array_size, currentSeed, arr2_min) schedule(guided, thread_count)
            for (int i = 0; i < array_size / 2; i++) {
                double value = generate_random(currentSeed, MIN_2_ARRAY_VALUE, MAX_2_ARRAY_VALUE);
                arr2[i] = value;
                arr2_copy[i] = value;
                if (value < arr2_min) {
                    arr2_min = value;
                }
            }
#pragma omp parallel for default(none) shared(arr1, array_size) schedule(guided, thread_count)
            for (int i = 0; i < array_size; i++) {
                arr1[i] = map_1(arr1[i]);
            }
#pragma omp parallel for default(none) shared(arr2, arr2_copy, array_size) schedule(guided, thread_count)
            for (int i = 1; i < array_size / 2; i++) {
#pragma omp critical
                arr2[i] = map_2(arr2_copy[i] + arr2_copy[i - 1]);
            }
#pragma omp parallel for default(none) shared(arr2, arr1, array_size) schedule(guided, thread_count)
            for (int i = 0; i < array_size / 2; i++) {
                arr2[i] = pow(arr1[i], arr2[i]);
            }
#pragma omp parallel sections
            {
#pragma omp section
                {
                    dwarf_sort((int) array_size / 2, arr2);
                }
            }
            double final_sum = 0;
            for (int i = 0; i < array_size / 2; i++) {
                if (((int) arr2[i]) / 2 == 0) {
                    final_sum += sin(arr2[i]);
                }
            }
            // printf("Iteration %d, value: %f\n", trial_counter, final_sum);
        }
    }
    gettimeofday(&after, NULL);
    time_diff = 1000 * (after.tv_sec - before.tv_sec) + (after.tv_usec - before.tv_usec) / 1000;
    printf("\nN=%d. Milliseconds passed: %ld\n", array_size, time_diff);
    return 0;
}

Removing the C++ tag, as this is pure C (these are separate languages, and whilst OpenMP exists in both worlds, things would, from a design point of view, be handled differently than in your program). — Marcus Müller
– Marcus Müller, Commented May 29, 2022 at 14:14
oh, and by the way, rand_r is both not very fast and a pretty bad random generator. Also, the modulo method to limit the range of generated numbers will not yield good uniform samples. No offense (nothing that is usually well-taught!), but that generate_random is a bit of a "how to not do it" :) It doesn't matter much here, though (because the range is small, and your need for good randomness probably not very prominent) — Marcus Müller
– Marcus Müller, Commented May 29, 2022 at 14:18
@MarcusMüller. Thanks, I'll remember. But I need to use rand_r by assignment. The essence of the generator itself is to obtain pseudo-random numbers in order to check in the future whether the program is working correctly on one or more processors, as well as to check its execution speed. — Dmitrii Shurygin
– Dmitrii Shurygin, Commented May 29, 2022 at 14:27
As said, it's fine here :) It's more that this is a fairly complex program and I'm having a hard time understanding how it's parallelized. — Marcus Müller
– Marcus Müller, Commented May 29, 2022 at 14:29
omp parallel sections with only one section doesn't make sense to me. — paleonix
– paleonix, Commented May 30, 2022 at 16:25

Jérôme Richard · Accepted Answer · 2022-05-29 15:49:40Z

rand_r is thread-safe only if each thread have its own seed or if threads are guaranteed to operate on the seed in an exclusive way (eg. using an expensive critical section). This is not the case in your code. Indeed, currentSeed is shared between thread. Thus, it causes a race condition since multiple threads can mutate it simultaneously. You need to use a thread-private seed (with a different value so for results not to be deterministic between threads). The seed of each thread can be initialized from a shared array filled from the main thread (eg. naively [0, 1, 2, 3, etc.]).

The thing is you will still get different results suing a different number of threads with this approach. One solution is to split your data set is independent chunks with an associated seed and then possibly compute the chunk in parallel.

Note that using a #pragma omp parallel for in a #pragma omp parallel section causes many threads to be created (ie. over-subscription). This is generally very inefficient. You should use #pragma omp for instead.

Collectives™ on Stack Overflow

The problem with parallelizing the program using OpenMP

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related