1

I have an MPI program in which worker ranks (rank != 0) make a bunch of MPI_Send calls, and the master rank (rank == 0) receives all these messages. However, I run into a Fatal error in MPI_Recv - MPI_Recv(...) failed, Out of memory.

Here is the code that I am compiling in Visual Studio 2010. I run the executable like so:

mpiexec -n 3 MPIHelloWorld.exe

int main(int argc, char* argv[]){
    int numprocs, rank, namelen, num_threads, thread_id;
    char processor_name[MPI_MAX_PROCESSOR_NAME];

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Get_processor_name(processor_name, &namelen);

    if(rank == 0){
        for(int k=1; k<numprocs; k++){
            for(int i=0; i<1000000; i++){
                double x;
                MPI_Recv(&x, 1, MPI_DOUBLE, k, i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            }
        }
    }
    else{
        for(int i=0; i<1000000; i++){
            double x = 5;
            MPI_Send(&x, 1, MPI_DOUBLE, 0, i, MPI_COMM_WORLD);
        }
    }
}

If I run with only 2 processes, the program does not crash. So it seems like the problem is when there is an accumulation of the MPI_Send calls from a third rank (aka a second worker node).

If I decrease the number of iterations to 100,000 then I can run with 3 processes without crashing. However, the amount of data being sent with one million iterations is ~ 8 MB (8 bytes for double * 1000000 iterations), so I don't think the "Out of Memory" is referring to any physical memory like RAM.

Any insight is appreciated, thanks!

2
  • For that particular question it is extremely important to know which MPI implementation you are using and in what configuration. Commented Nov 25, 2016 at 11:11
  • Using MS-MPI v 7.1 on Windows 7 Commented Nov 25, 2016 at 15:17

1 Answer 1

1

The MPI_send operation stores the data on the system buffer ready to send. The size of this buffer and where it is stored is implementation specific (I remember hearing that this can even be in the interconnects). In my case (linux with mpich) I don't get a memory error. One way to explicitly change this buffer is to use MPI_buffer_attach with MPI_Bsend. There may also be a way to change the system buffer size (e.g. MP_BUFFER_MEM system variable on IBM systems).

However that this situation of unrequited messages should probably not occur in practice. In your example above, the order of the k and i loops could be swapped to prevent this build up of messages.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for the info about the system buffer. I tried switching the order of the k and i loops however, and the behavior of the program remained the same.
From here technet.microsoft.com/en-us/library/… it looks like MS-MPI doesn't offer any control over setting the system buffer size.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.