MPI4PY doesn't spawn on multiple nodes

Ask Question

Asked 1 year, 9 months ago

Modified 1 year, 9 months ago

Viewed 177 times

I wrote a Python code that spawns N sub-processes to run an executable, child, written in Fortran. I use an MPI Broadcast from the python main to communicate a message (a string that reads CNTRL) to the fortran sub-processes on a HPC cluster. It works smoothly as long as I ask to spawn processes that could stay on a single node. As soon as I ask for more cores, even if SLURM allows me to use them, the code fails. The nodes on this cluster have 48 cores each.

main.py is this:

from mpi4py import MPI
from env import myclass

# Define message to send to Python
req = b'CNTRL'
# Define class locally
C = myclass(req)
# Call class function to perform communication
C.myfunc()

print('Python done')

The file containing the class, env.py, is this:

from mpi4py import MPI

class myclass:
  def __init__(self,req):

    # sub_comm is an MPI intercommunicator
    self.sub_comm = MPI.COMM_SELF.Spawn('./child', args=[], maxprocs=32)
    # common_comm is an intracommunicator accross the python process and the spawned process.
    self.common_comm=self.sub_comm.Merge(False)
    self.size = self.common_comm.Get_size()
    self.rank = self.common_comm.Get_rank()
    
    # define message to send to fortran
    self.req = req 

  def myfunc(self):
    # Just to try, send a message
    self.common_comm.Bcast([self.req,  MPI.CHAR],   root=0)
    print('Python sent a broadcast message: ',self.req)
    self.closeMPI()

  def closeMPI(self):
    # free the (merged) intra communicator
    self.common_comm.Free()
    # disconnect the inter communicator is required to finalize the spawned process.
    self.sub_comm.Disconnect()

The Fortran code is the following:

program child
  !
  use mpi
  !
  implicit none
  !
  integer :: parentcomm,intracomm,group,newgroup,newcomm
  integer :: rank,rank2,size,ierr,i
  character(len=5) :: request
  integer, allocatable, dimension(:) :: ranks
  !
  call MPI_INIT(ierr)
  call MPI_COMM_GET_PARENT(parentcomm, ierr)
  call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
  call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
  call MPI_COMM_GROUP(MPI_COMM_WORLD, group, ierr)
  allocate(ranks(size))
  do i=1,size
    ranks(i) = i-1
  end do
  call MPI_GROUP_INCL(group,size,ranks,newgroup,ierr)
  call MPI_GROUP_SIZE(newgroup, size, ierr)
  call MPI_COMM_CREATE(MPI_COMM_WORLD, newgroup, newcomm, ierr)
  call MPI_INTERCOMM_MERGE(parentcomm, .true., intracomm, ierr)
  call MPI_COMM_RANK(intracomm, rank2, ierr)
  call MPI_COMM_SIZE(intracomm, size, ierr)

  call MPI_BCAST(request,5,MPI_CHAR,0,intracomm,ierr)
  if (rank == 0) print*, 'Message received in fortran: ',request

  if (rank == 1) print*, 'Child frees intracomm'
  call MPI_COMM_FREE(intracomm, ierr)
  if (rank == 1) print*, 'Child disconnects intercomm'
  call MPI_COMM_DISCONNECT(parentcomm, ierr)
  if (rank == 1) print*, 'Child finalises'
  call MPI_FINALIZE(ierr)
end program child

The code works as supposed, and the output that I get is the following:

 Message received in fortran: CNTRL
 Child frees intracomm
 Child disconnects intercomm
 Child finalises
Python sent a broadcast message:  b'CNTRL'
Python done

but if, for example, I ask for 64 cores - and meanwhile I ask for 65 cores (64+1) on two nodes through SLURM - the code fails with this error message:

--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   node038
  Local device: mlx5_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 64
slots that were requested by the application:

  ./child

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/scratch/users/MY_USERNAME/tests/multinode_debug_notworking/main.py", line 7, in <module>
    C = myclass(req)
  File "/mnt/scratch/users/MY_USERNAME/tests/multinode_debug_notworking/env.py", line 10, in __init__
    self.sub_comm = MPI.COMM_SELF.Spawn('./child', args=[], maxprocs=64)
  File "mpi4py/MPI/Comm.pyx", line 1931, in mpi4py.MPI.Intracomm.Spawn
mpi4py.MPI.Exception: MPI_ERR_SPAWN: could not spawn processes

This also happens if I ask for 33 cores on 2 nodes (16 and 16+1). This is what I tried so far

using use-hwthread-cpus and bind-to-core options when spawning MPI processes
switching from OpenMPI to MPICH, installing accordingly mpi4py in the environment and compiling the fortran source code
specifying a nodelist as an input parameter for the MPI call

but nothing worked.

I am using Python 3.10.12 with 'mpi4py 3.1.5'. I tried with OpenMPI 4.1.1 and MPICH 4.1.2 because I couldn't run anything more recent on my HPC facility. Is it something that I could could aim at overcoming, am I missing something crucial or am I asking for something too hard to get using these libraries only?

asked Feb 12, 2024 at 16:29

Giorgio Maria Cavallazzi

111 bronze badge

1

If you simply run ./main.py, try mpirun -np 1 ./main.py

Gilles Gouaillardet
– Gilles Gouaillardet

2024-02-12 18:08:39 +00:00
Commented Feb 12, 2024 at 18:08

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

MPI4PY doesn't spawn on multiple nodes

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest