Understanding if current process is part of a multiprocessing pool with --multiprocessing-fork

Question

I need to find a way for a python process to figure out if it was launched as part of a multiprocessing pool.

I am using dask to parallelize calculations, using dask.distributed.LocalCluster. For UX reasons (this is used as part of a library for a specialized scientific task) i want the dask cluster setup to happen in a module that the user can import.

This means that i cannot use the usual guard:

import dask.distributed as dd

if __name__=='__main__':
    dd.LocalCluster()

to prevent child process from starting their own cluster, since I need to start the cluster from within a module that is itself imported.

By digging around with the psutil method, i was able to find out that the child processes are called with a --multiprocessing-fork command line option, and they run the multiprocessing.spawn.spawn_main method. I am thinking of checking for the presence of the --multiprocessing-fork flag to understand if the current process is part of the pool or not.

Is this the right approach? is there a better way? I could not find any obvious documentation on the multiprocessing.spawn.spawn_main method.

Thanks a lot!

mdurant · Accepted Answer · 2024-10-30 15:18:14Z

-1

The simplest thing I can think of, is to see if distributed.worker.Worker._instances has any entries. Worker subprocesses should always have this. This is essentially what distributed.get_worker() does, which raises ValueError if not running on a worker.

answered Oct 30, 2024 at 15:18

mdurant

28.8k5 gold badges49 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ShadowRanger Over a year ago

Given that _instances isn't part of the public API, I'd suggest that calling distributed.get_worker(), where except ValueError: handles "in a worker" case, and a subsequent else: handles the "not in a worker" case would be the better approach for stability.

pnjun Over a year ago

Hey! I actually tried that approach first but it seems that distributed.get_worker() raises value error also in the worker. Probably because the import happens before dask is fully setup.

Collectives™ on Stack Overflow

Understanding if current process is part of a multiprocessing pool with --multiprocessing-fork

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related