0

When using multiprocessing with Windows, we must have if __name__ == '__main__':. For example:

# Script1.py

import multiprocessing
class Test(object):
    def __init__(self, x):
        self.x = x

    def square(self, i, return_jobs, x):
        i = x**2
        return_jobs[i] = i 

    def run(self):
        if __name__ == "__main__":
            manager = multiprocessing.Manager()
            return_jobs = manager.dict()
            jobs = []
            for i in range(len(self.x)):
                p = multiprocessing.Process(target = self.square, args=(i, return_jobs , self.x[i]) )
                jobs.append(p)
                p.start()
            for proc in jobs:
                print(proc)
                proc.join()
            print('result',return_jobs.values())

Test([2, 3]) .run() 

This simple example script ran fine and return something like: result [4, 9]. However, if I have have a different script and import Script1 and use Test then it won't work. That is,

# Script2.py

from Script1.py import Test 
class Test2(object):
    def __init__(self, y):
        self.y = y
        
    def run(self):
        z = Test(self.y).run()

will not invoke the function Test(self.y).run() at all. However, if I place the class Test2 in the same script as Test (# Script1.py ) then all is fine.

What is the best way to fix this? The Script1.py is a subprocess of the overall code. I don't want to have to combine these scripts together...

I should also note that I am using Spyder as well. This could be a problem.

1 Answer 1

2

First, in Script1.py, where you have placed the if __name__ == "__main__": check is not the correct place. It should be placed as follows:

if __name__ == "__main__":
    Test([2, 3]) .run() 

This is for two reasons. First, when the new processes are created, any statements at global scope will be executed by these processes. If you do not put the check as I have above, you will be needlessly creating instances of Test objects. It's true that when run is invoked against these objects run will immediately return because of where you did place the check, but why create the objects to begin with?

But the real reason for moving the check as I have done is that you only want to execute the statement Test([2, 3]).run() when you are executing Script1.py as the "main" script and not when it is being imported by some other script. By placing the check as I have done, when it is imported its name will not be "__main__" any more and therefore that statement will not be executed, which gives you more flexibility.

This now allows you in Script2.py to add your own if __name__ == '__main__': check as follows:

from Script1 import Test

class Test2(object):
    def __init__(self, y):
        self.y = y

    def run(self):
        z = Test(self.y).run()

if __name__ == '__main__':
    Test2([3, 6]).run()

Prints:

<Process name='Process-2' pid=9200 parent=4492 started>
<Process name='Process-3' pid=16428 parent=4492 started>
result [9, 36]

So that when Script2.py is the "main" script being executed, you have control over what object gets created and run.

Explanation

The important thing to remember with Windows is that when a script launches a new process that process starts execution of the source from the top so all statements at global scope (import statements, function declarations, variable assignments, etc.) are executed. Thus you want to avoid having at global scope things that don't need to be there since they will be re-executed by the new process and you might be doing for instance a calculation or creation of a large data structure that the newly created process does not use and you have wasted CPU cycles or memory for nothing. But you absolutely must not have any statements at global scope that when executed end up re-creating recursively the process you just created. That is why we have the need for the if __name__ == "__main__": around such statements (__name__ will not be "__main__" in the newly created process). So there is no need to have such a check in the run method, which is not at global scope. But eventually, in whatever script you run to starts things off, you will need that check for any code at global scope code that creates a process or invokes a function or method that creates a process.

Note that when Script2.py imports Script1.py, Script1.py is now a module and it's __name__ value will be "Script1", and again the code Test([2, 3]).run() will not execute. So that also explains why when we create a module we can place testing code within an if __name__ == "__main__": block -- it will not be executed when the module is imported.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the answer (+1). I understand your point about placing if __name__ == "__main__": outside the function run. It was actually what I did at first as well. The reason why I wrote it inside the function is because Script1.py is invoke by Script2.py through calling Test.run(). Thus, if it was placed outside the class then it wouldn't that be a problem? It seems like the problem is that I am using Spyder and Windows. Not sure though.
it has nothing to do with spyder or windows. It is caused by using "spawn" rather than "fork" as the method of starting a child process ("spawn" is your only option on windows, and is default on MacOS). The child process will import the main file to get access to all the necessary objects, so your "main" file shouldn't "do" anything if it's imported beyond import stuff and define stuff. if __name__ == "__main__": belongs in your "main" file (Script2.py)
I've added an explanation to the answer that I hope answers your question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.