MPI_Recv/MPI_Send overhead

Question

I am working on a C++ application, where I use the MPI C bindings to send and receive data over a network. I understand that sending

const int VECTOR_SIZE = 1e6;
std::vector<int> vector(VECTOR_SIZE, 0.0);

via

// Version A
MPI_Send(const_cast<int *>(vector.data()), vector.size(), MPI_INT, 1, 0, MPI_COMM_WORLD);

is much more efficient than

// Version B
for (const auto &element : vector)
    MPI_Send(const_cast<int *>(&element), 1, MPI_INT, 1, 0, MPI_COMM_WORLD);

due to the latency introduced by MPI_Send. However, if I want to send data structures which are not contiguous in memory (a std::list<int>, for instance), I cannot use version A but have to resort to version B or copy the list's content to a contiguous container (like std::vector<int>, for instance) first and use version A. Since I want to avoid an extra copy, I wonder if there are any options/other functions in MPI which allow for an efficient use of Version B (or at least a similar, loop-like construct) without incurring the latency each time MPI_Send is called?

You might look at Boost MPI, since it supports STL containers. — Jeff Hammond
– Jeff Hammond, Commented Jan 17, 2016 at 18:11

Community · Accepted Answer · 2017-05-23 10:34:11Z

Stepping and sending one by one through your std::list elements would indeed cause a significant communication overhead.

The MPI specification/library is designed to be language independent. This is why it uses language agnostic MPI datatypes. And the consequence is that it can only send from contiguous buffers (which is a feature that most languages offers) and not from more complex data structures like lists.

To avoid the communication overhead of sending one by one, there are two alternatives:

copy all the list elements into a std::vector and send the vector. However this creates a memory overhed AND makes the sending completely sequential (and during that time some MPI nodes could be iddle).
or iterate through your list, building smaller vectors/buffers and send these smaller chunks (eventually dispatching them to several destination nodes ?). This approach has the benefit of making better use of i/o latency and parallelism through a pipelining effect. You have however to experiment a little bit to find the optimal size of the intermediary chunks.

Collectives™ on Stack Overflow

MPI_Recv/MPI_Send overhead

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related