Using OpenMP and MPI in the same program

Question

I am currently working on an assignment for my parallel programming class in which I need to write the same program sequentially, then parallelized using OpenMP then parallelized using MPI.

For context, the assignment is about searching for palindromes in a matrix of random characters. I already have most of the code working, my question is about how to structure, compile and run the project.

I could create three separate programs and run them independently, but I would like to combine them all in the same project so the three versions run one after the other and on the same initial matrix. This allows me to time each version to compare them.

I am using CMake as the build tool.

My CMakeLists.txt :

cmake_minimum_required(VERSION 3.21)
project(<project-name> C)

set(CMAKE_C_STANDARD 23)

find_package(MPI REQUIRED)
include_directories(SYSTEM ${MPI_INCLUDE_PATH})

find_package(OpenMP REQUIRED)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")

add_executable(<project-name> <source-files>)
target_link_libraries(<project-name> ${MPI_C_LIBRARIES})

I build the project using the following commands

mkdir build && cd build && cmake .. && make

My main function :

// All the header inclusions

int main(int argc, char **argv) {

    // Initialisation.
    srand(time(NULL));
    omp_set_num_threads(omp_get_num_procs());
    double start_time;
    ushort number_of_palindromes = 0;
    ushort palindrome_length = 5;
    ushort rows = 25000;
    ushort cols = 25000;
    char **matrix = create_matrix_of_chars(rows, cols);
    printf("Matrix of size %dx%d, searching for palindromes of size %d.\n", rows, cols, palindrome_length);

    // Run sequentially.
    printf("%-45s", "Running sequentially ... ");
    start_time = omp_get_wtime();
    number_of_palindromes = find_palindromes_sequentially(matrix, rows, cols, palindrome_length);
    printf("Found %4d palindromes in %7.4f seconds.\n", number_of_palindromes, omp_get_wtime() - start_time);

    // Run using OpenMP.
    printf("Running with OpenMP on %d %-20s", omp_get_num_procs(), "threads ... ");
    start_time = omp_get_wtime();
    number_of_palindromes = find_palindromes_using_openmp(matrix, rows, cols, palindrome_length);
    printf("Found %4d palindromes in %7.4f seconds.\n", number_of_palindromes, omp_get_wtime() - start_time);

    // Run using MPI.
    int num_procs, rank;
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    printf("%d: hello (p=%d)\n", rank, num_procs);
    MPI_Finalize();

    // Cleanup and exit.
    free_matrix(matrix, rows);
    return 0;

}

When running ./<project-name> the sequential and OpenMP versions run one after the other correctly. However, when running mpirun --use-hwthread-cpus ./<project-name> the program starts 8 instances of the entire project (the line "Matrix of size ..." gets printed 8 times).

My understanding was that the MPI region is delimited by MPI_Init(...) and MPI_Finalize() but that does not seem to be the case. How would I go about solving this ?

Thanking you in advance for your answers.

Victor Eijkhout · Accepted Answer · 2022-04-24 16:04:41Z

1

There is no such thing as an "MPI region". MPI uses independent process that only communicate/synchronize through the network. Meaning: the whole of your executable run in as many instances as you start it. Each and every statement, even before MPI_Init is executed by each and every instance.

answered Apr 24, 2022 at 16:04

Victor Eijkhout

6,0002 gold badges29 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

lu_K Over a year ago

Oh okay I see. I doesn't work the same way as OpenMP then. Is there a way to achieve what I'm trying to do or should I resign to creating an entirely separate program for the MPI version ?

Victor Eijkhout Over a year ago

No, MPI is completely different from OpenMP. It's meant to run on clusters that are connected with network cables.... Anyway, write your MPI program, and then execute the OpenMP part only where MPI_Comm_rank gives you zero.

lu_K Over a year ago

I see, thank you for your feedback. Marking the question as resolved.

score 0 · Accepted Answer · 2022-04-27 10:25:36Z

I need to write the same program sequentially, then parallelized using OpenMP then parallelized using MPI.

This could have several implications:

You're creating three separate programs for those three versions one by one. (I know this is what you're trying to avoid, but read on, as this is what you would actually want if a time-based comparison in between these three approaches is what you're after)
You're creating a parallelized version of the sequential program first by using OpenMP, and next by using MPI, or vice versa, but both separate. Difference between the first approach and this is that you can have the sequential code separately within these parallelized versions if you would want to go for 'sequential vs parallel' comparisons inside the program, as that wouldn't be too inaccurate. For instance, you can have the serialized code running before MPI_Init (you can for example, make the process with rank 0 run the sequential version in your program using MPI but this will lead to inaccurate timings for your parallelized code, and heavy load imbalance with rank 0 doing much more work) for your version with MPI, and have no OpenMP directives for one separate instance of the code which does whatever you're trying to parallelize in your program using OpenMP.
You're creating a program with both multiple MPI ranks and OpenMP threads, i.e. it uses a hybrid setup, which is more complicated and needs a proper setup. If you want to throw the sequential version in here as well, go with what I mentioned above, but as a combination of both, i.e. prior to initialization of the MPI environment and with no use of OpenMP directives and clauses.

Based on what you're exactly trying to achieve, you will need to think and tweak your code (with an added configuration for the third case, which I'll come to soon).

I could create three separate programs and run them independently, but I would like to combine them all in the same project so the three versions run one after the other and on the same initial matrix. This allows me to time each version to compare them.

Bad idea. Multiple MPI processes will be running your program, and you wouldn't get the accurate timing (i.e., assuming even if you get your code to work with the setup you got) for a thread's share of work. (I'm assuming you intend to time the components of shared-memory parallelism, i.e. the threads and not the processes, given that you're using omp_get_wtime())

To get to the closest of the actual timings for each approach, you would ideally want to have the three programs separate and time them each to their own, either using some internal function calls such as MPI_Wtime() and omp_get_wtime() for MPI ranks (might want to do a reduction such as MPI_MAX on top of that) and OpenMP threads respectively, or use some external tool that can measure the time, such as perf (ideal if you want to time the entire program instead of a particular section in your code) which you can incorporate in your binary execution statement in your makefile.

But if you really want to use both and go for the hybrid setup, then you would have to supply the level of thread support as an added argument in your call to MPI_Init(), specifying your desired way to go about multi-threading (three options for this) with processes. For more information on these levels, check the documentation for the option you would want to go with. (for your case, I would recommend the funneled configuration)

You could also write your program in a clever way that incorporates all of the three approaches by emplacing each within a #if parameterName==<value> ... #endif block, so as to run only one approach (or maybe two, depending on how you go about it) at a time, based on a parameter which you specify and can set (have different values for the different blocks) when you compile (adding -D<parameterName>=<value> to your other compilation flags).

Collectives™ on Stack Overflow

Using OpenMP and MPI in the same program

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related