2

I am doing a reinforcement learning simulation in Python that involves running many episodes and updating the weights of the agent per episode. To speed up the process, I wish to run parallel simulations using random numbers and update the weights in parallel.

The problem is, I noticed that even when using the experimental GIL-free Python 3.13t, the main bottleneck is in the random number generation. Basically, when I use N threads to run N simulations, it takes N times longer as if the random number generation were not parallelized.

Here is a simple program to illustrate this:

# test.py
import threading
import time
import random

def generate_randoms(n):
    """Generate n random floats between 0 and 1"""
    rng = random.Random()
    for i in range(n):
        rng.random()

def run_tasks_in_parallel(num_tasks, numbers):
    threads = []
    for i in range(num_tasks):
        t = threading.Thread(target=generate_randoms, args=(numbers[i],))
        threads.append(t)
        t.start()
    
    for t in threads:
        t.join()

if __name__ == "__main__":
    num_tasks = 2
    numbers = [1000000] * 2
    
    start_time = time.time()
    run_tasks_in_parallel(num_tasks, numbers)
    end_time = time.time()
    
    print(f"Time taken: {end_time - start_time} seconds")

Benchmarking results: (num_tasks=4 unless stated otherwise)

python3.13 test.py: ~11s

python3.13t test.py: ~6s

python3.13t -X gil=0 test.py: ~6s

python3.13t -X gil=1 test.py: ~16s

python3.13 test.py [num_tasks=1]: ~1s

python3.13t test.py [num_tasks=1]: ~1s

The program takes 2 times as long with num_tasks=2 than with num_takes=1. Using numpy.random.default_rng takes even longer. I thought it was a problem of shared random state, so I instantiated an rng for each thread as above, but the problem persisted. I have ruled out Python issues since removing the random number generation above allows for parallelism (n threads do not take n times as long without random number generation).

Is there a way to parallelize random number generation in GIL-free Python? Why doesn't random number generation (both CPython and Numpy) scale with threads?

I do not need the threads to share random state (I even prefer them not to), nor do I need thread-safe random number generation. I just need parallel random numbers to speed up my training.

19
  • Could you add also your CPython configuration options? sysconfig.get_config_var('CONFIG_ARGS') Commented Apr 7 at 10:26
  • What is the output of sys._is_gil_enabled()? Commented Apr 7 at 12:32
  • @Dunes Output is False Commented Apr 7 at 12:56
  • @cards "'--prefix=~/.pyenv/versions/3.13.2t' '--enable-shared' '--libdir=~/.pyenv/versions/3.13.2t/lib' '--disable-gil' 'LDFLAGS=-L~/.pyenv/versions/3.13.2t/lib -Wl,-rpath,~/.pyenv/versions/3.13.2t/lib' 'LIBS=-L~/.pyenv/versions/3.13.2t/lib -Wl,-rpath,~/.pyenv/versions/3.13.2t/lib' 'CPPFLAGS=-I~/.pyenv/versions/3.13.2t/include'" Commented Apr 7 at 13:00
  • 1
    @user1446642 maybe you could add the results of the time executions in your question: python3.13 test.py, python3.13t test.py, python3.13t -X gil=0 test.py, python3.13t -X gil=1 test.py Commented Apr 7 at 17:08

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.