0

I am testing toy code to parallelize a process using python's multiprocess. The code works on my home computer but when I migrated it to a remote server I am working on it returns an error.

I first define functions in defs.py

import numpy as np

def g(n):
    A = np.random.rand(n, n)
    B = np.random.rand(n, n)
    
    return A * B

def run_complex_operations(operation, input, pool):
    result = pool.map(operation, input)
    
    return result

Python seems to find defs.py because when I run the two lines below it returns the expected result

import defs
print(defs.g(1))

However, when I run the following code to use my function in a multiprocess, Python returns an error.

import defs
import numpy as np
import time
import multiprocessing as mp


x = 10
n = 10000
l = [n] * x


start = time.time()

if __name__ ==  '__main__':
    processes_pool = mp.Pool(3)
    l[:] = defs.run_complex_operations(defs.g, range(x), processes_pool)

The error is:

Process SpawnPoolWorker-1:
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 358, in get
    return _ForkingPickler.loads(res)
ModuleNotFoundError: No module named 'defs'

What could be the reasons for the problem? It must be related to multiprocessing because python has no problem finding the other function in the defs module.

FWIW, The server version of python is 3.8.5, my local python is 3.9.7.

2
  • 1
    If the path used to import modules is different between the two computers this problem could happen. See for example bic-berkeley.github.io/psych-214-fall-2016/sys_path.html. You can check for path differences by adding the lines import sys; print(sys.path) at the very top of the script and comparing the two. Commented Dec 14, 2022 at 17:08
  • Thank you! This hint was decisive in solving the issue, see answer below. Commented Dec 15, 2022 at 8:43

3 Answers 3

1

Usually this error happens depending on your execution path. When you run python interpreter in the terminal you call it in your current dir, and it might find the sources you call. But when you execute something like python -m mysrcipt.py the interpreter might be pointing on another folder (it highly depends on your system settings). an easy way to check is to start your python script by

from os import getcwd
print(getcwd())

and make sure that the interpreter is pointing to the right directory.

If it does not, you can either change the execution path with os.chdir(yourPathHere) or you can add an __init__.py file in your subfolders (if any) and do a pip install . to install your module at your local system level. then you should be able to call the module anywhere

good luck

Sign up to request clarification or add additional context in comments.

4 Comments

Also, i have faint feeling that i've stumbled upon similar scenario once but it was because import paths where defined in PYTHONPATH env variable and that env was not passed down to the forked process -- I might remember wrong thought so take this comment with a grain of salt.
Thanks. The interpreter is set to the correct working directory. Neither "import defs.py" nor using one function from the module raise an error. Only trying to access the module through multi-processing raises an error.
multiprocessor calls a new process for a new instance of the interpreter, so that new instance might not point to the same path. so maybe you could check if the spawned process is still getting the env var from the mother process?
Thanks. I will look into that. Do you know how I could check that?
0

When I saved defs.py in the current working directory (see import os; os.getcwd()), the main file would find the module but the child process in MP would not.

When I saved defs.py in the environment path (see import sys, print(sys.path)), both the main file and child process would find the module.

Comments

0

I do not have enough "reputation" to comment your last post so I must post another here. The inclusion in your path points to what I was mentioning, when you create a child with multiprocessor, it's not "aware" of the environment/context of the parent. An easy way to avoid having to edit your path each time with a new module is just to use the pip install . to have pip do the job for you :D

Good luck ~

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.