0

I have a function call in python which uses a package method that is non-thread safe (The package writes to three temporary files which have the same name). As the data needed to pass into this method are large and I have numerous input sets, I am approaching this from a distributed perspective, using the MPI4PY library, such that each rank handles a different group of input data at any given time. My problem is that when mapping calls to this function through MPI there are occasions where multiple ranks try to access the function call at once leading to a thread-race condition where data are being overwritten by two calls to the function at once (And then causing the script to error out).

Since the package method is non-thread safe, my question is how would I perform a mutex-style lock on the function such that only one MPI rank is allowed to work inside the function at a time:

For example:

def mpi_call(args):
   comm = MPI.COMM_WORLD
   # Need to mutex lock here
   non_threadsafe_method(args)
   # Need to unlock here
   return true

I have tried to use the Barrier() method here but this leads to a program deadlock since there are only a limited number of ranks that actually enter the method (Not all ranks enter the function that calls the package method).

I would like to know the best way to handle a mutex-style lock for this type of function.

Thanks!

4
  • Mutexes are only needed if you have shared data access. Since MPI has distributed memory, there is no shared data access. Where does your data race come from? Commented Jan 1, 2022 at 23:22
  • The package method (Fortran) that is not thread-safe writes three temporary files which share the same name (.inter1, .inter2, .inter3). These files are filled with data before being written to the final files. The race condition occurs when two or more ranks try to run the method at the same time causing these files to be overwritten by different ranks while processing is underway with them. Commented Jan 1, 2022 at 23:29
  • You are writing "thread" everywhere, do you actually mean "process"? Commented Jan 1, 2022 at 23:33
  • @psarka Essentially yes, the package method cannot be run by multiple instances at the same time (Be it threaded, or MPI processes), it uses three temporary files to store data that are then passed to fortran routines, trying to run the method on multiple ranks at once leads to a race condition where the data are overwritten and both calls fail. Commented Jan 1, 2022 at 23:38

1 Answer 1

2

Try a filesystem lock. It is essential that your conflict is between processes rather than threads (long story). Using fasteners library your code would look like this:

import fasteners

def mpi_call(args):
    comm = MPI.COMM_WORLD
    # Need to mutex lock here
    with fasteners.InterProcessLock('/tmp/tmp_lock_file'):
        non_threadsafe_method(args)
    # Need to unlock here
    return true

See more here: https://fasteners.readthedocs.io/en/latest/examples.html#interprocess-locks

Sign up to request clarification or add additional context in comments.

2 Comments

MPI is generally used to run processes on multiple nodes, to /tmp is generally not a fit. YMMV when using locks on parallel filesystems.
This recommended solution worked for my application. Thanks! Also per Gilles' comment, I made sure to write the lock file to the directory where the script is actively running in.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.