4

I’ve been digging into FastAPI’s handling of synchronous and asynchronous endpoints, and I’ve come across a few things that I’m trying to understand more clearly, especially with regards to how blocking operations behave in Python.

From what I understand, when a synchronous route (defined with def) is called, FastAPI offloads it to a separate thread from the thread pool to avoid blocking the main event loop. This makes sense, as the thread can be blocked (e.g., time.sleep()), but the event loop itself doesn’t get blocked because it continues handling other requests.

But here’s my confusion: If the function is truly blocking (e.g., it’s waiting for something like time.sleep()), how is the event loop still able to execute other tasks concurrently? Isn’t the Python interpreter supposed to execute just one thread at a time?

Here an example:

from fastapi import APIRouter
import os
import threading
import asyncio

app = APIRouter()

@app.get('/sync')
def tarefa_sincrona():
    print('Sync')
    total = 0
    for i in range(10223424*1043):
        total += i
    print('Sync task done')

@app.get('/async')
async def tarefa_sincrona():
    print('Async task')
    await asyncio.sleep(5)
    print('Async task done')

If I make two requests — the first one to the sync endpoint and the second one to the async endpoint — almost at the same time, I expected the event loop to be blocked. However, in reality, what happens is that the two requests are executed "in parallel."

4
  • 1
    You block the thread, but the whole point of threading is that others can continue - see e.g. stackoverflow.com/q/92928/3001761 Commented Jan 23 at 21:40
  • 1
    You're misunderstanding Python's Global Interpreter Lock. Only one thread can be actively running Python code at any one time. But if one thread is sleeping others can run. Likewise, if a thread is executing C code (e.g. numpy), it can release the lock if it wants, and then wait for the lock before returning back to the Python caller. Commented Jan 23 at 21:53
  • @Chris, I think this is a different question. The other post's focus was on understanding how FastAPI works. Here, however, the focus is on why a blocking code in one thread doesn’t make other threads wait for its completion before executing. I apologize if I wasn’t clear about my doubt. Anyway, I found the answer in another forum. Here it is: "Yes, if you're doing math or string manipulation or whatever, then the GIL will be exchanged between bytecodes, but you'll never get two bytecodes running at the same time." and this makes sense for me. Thank you for your response nonetheless! 😊 Commented Jan 24 at 12:23
  • 1
    That answer is much more than "how FastAPI works." Please read it thoroughly, in order to get a complete answer to your question. I have now added a section, complementing the rest, related to your query above. Also, the answer from another forum that you are refering to is not entirely correct. Please have a look at the latest section added in the duplicate question about GIL (but again, please read the whole answer), where you would find when the GIL is released, as well as that certain math operations would not guarantee its release. Commented Jan 25 at 17:55

3 Answers 3

4

If the function is truly blocking (e.g., it’s waiting for something like time.sleep()), how is the event loop still able to execute other tasks concurrently? Isn’t the Python interpreter supposed to execute just one thread at a time?

Only one thread is indeed executed at a time. The flaw in the quoted question is to assume that time.sleep() keeps the thread active - as another answerer has pointed out, it does not.

The TL;DR is that time.sleep() does block the thread, but it contains a C macro that periodically releases its lock on the global interpreter.

Concurrency in Python (with GIL)

  • A thread can acquire a lock on the global interpreter, but only if the interpreter isn't already locked
  • A lock cannot be forcibly removed, it has to be released by the thread that has it
  • CPython will periodically release the running thread's GIL if there are other threads waiting for execution time
  • Functions can also voluntarily release their locks

Voluntarily releasing locks is pretty common. In C-extensions, it's practically mandatory:

  1. Py_BEGIN_ALLOW_THREADS is a macro for { PyThreadState *_save; _save = PyEval_SaveThread();
  2. PyEval_SaveThread() releases GIL.

time.sleep() voluntarily releases the lock on the global interpreter with the macro mentioned above.

Synchronous threading:

As mentioned earlier, Python will regularly try to release the GIL so that other threads can get a bit of execution time.

For threads with a varied workload, this is smart. If a thread is waiting for I/O but the code doesn't voluntarily release GIL, this method will still result in the GIL being swapped to a new thread.

For threads that are entirely or primarily CPU-bound, it works... but it doesn't speed up execution. I'll include code that proves this at the end of the post.

The reason it doesn't provide a speed-up in this case is that CPU-bound operations aren't waiting on anything, so sleeping func_1 to give execution time to func_2 just means that func_1 is idle for no reason - with the result that func_1's potential completion time gets staggered by the amount of execution time is granted to func_2.

Inside of an event loop:

asyncio's event loop is single-threaded, which is to say that it doesn't spawn new threads. Each coroutine that runs, uses the main thread (the same thread the event loop lives in). The way this works is that the event loop and its coroutines work together to pass the GIL among themselves.

But why aren't coroutines offloaded to threads, so that CPython can step in and release the GIL to to other threads?

Many reasons, but the easiest to grasp is maybe this: In practice that would have meant running the risk of significantly lagging the event loop. Because instead of immediately resuming its own tasks (which is to spawn a new coroutine) when the current coroutine finishes, it now possibly has to wait for execution time due to the GIL having been passed off elsewhere. Similarly, coroutines would take longer to finish due to constant context-switching.

Which is a long-winded way of saying that if time.sleep() didn't release its lock, or if you were running a long CPU-bound thing, a single thread would indeed block the entire event loop (by hogging the GIL).

So what now?

Inside of GIL-bound Python, whether it's sync or async, the only way to execute CPU-binding code (that doesn't actively release its lock) with true concurrency is at the process-level, so either multiprocessing or concurrent.futures.ProcessPoolExecutor, as each process will have its own GIL.

So:

async functions running CPU-bound code (with no voluntary yields) will run to completion before yielding GIL.

sync functions in separate threads running CPU-bound code with no voluntary yields will get paused periodically, and the GIL gets passed off elsewhere.

(For clarity:) sync functions in the same thread will have no concurrency whatsoever.

multiprocessing docs also hint very clearly at the above descriptions:

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.

As well as threading docs:

threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously

Reading between the lines, this is much the same as saying that tasks bound by anything other than I/O won't achieve any noteworthy concurrency through threading.


Testing it yourself:

# main.py

from fastapi import FastAPI
import time
import os
import threading

app = FastAPI()

def bind_cpu(id: int):
    thread_id = threading.get_ident()

    print(f"{time.perf_counter():.4f}:   BIND GIL for ID: {id}, internals: PID({os.getpid()}), thread({thread_id})")

    start = time.perf_counter()
    total = 0
    for i in range(100_000_000):
        total += i

    end = time.perf_counter()
    print(f"{time.perf_counter():.4f}:   REL  GIL for ID: {id}, internals: PID({os.getpid()}), thread({thread_id}). Duration: {end-start:.4f}s")

    return total

def endpoint_handler(method: str, id: int):
    print(f"{time.perf_counter():.4f}: Worker reads {method} endpoint with ID: {id} - internals: PID({os.getpid()}), thread({threading.get_ident()})")
    result = bind_cpu(id)
    print(f"{time.perf_counter():.4f}: Worker finished ID: {id} - internals: PID({os.getpid()}), thread({threading.get_ident()})")
    return f"ID: {id}, {result}"


@app.get("/async/{id}")
async def async_endpoint_that_gets_blocked(id: int):
    return endpoint_handler("async", id)

@app.get("/sync/{id}")
def sync_endpoint_that_gets_blocked(id: int):
    return endpoint_handler("sync", id)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True, workers=1)
# test.py

import asyncio
import httpx
import time

async def send_requests():
    async with httpx.AsyncClient(timeout=httpx.Timeout(25.0)) as client:
        tasks = []
        for i in range(1, 5):
            print(f"{time.perf_counter():.4f}: Sending HTTP request for id: {i}")
            if i % 2 == 0:
                tasks.append(client.get(f"http://localhost:8000/async/{i}"))
            else:
                tasks.append(client.get(f"http://localhost:8000/sync/{i}"))
        responses = await asyncio.gather(*tasks)
        for response in responses:
            print(f"{time.perf_counter():.4f}: {response.text}")

asyncio.run(send_requests())
  1. Launch FastAPI (python main.py)
  2. Fire off some requests (python test.py)

You will get results looking something like this:

[...]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

10755.6897: Sending HTTP request for id: 1
10755.6900: Sending HTTP request for id: 2
10755.6902: Sending HTTP request for id: 3
10755.6904: Sending HTTP request for id: 4

10755.9722: Worker reads async endpoint with ID: 4 - internals: PID(24492), thread(8972)
10755.9725:   BIND GIL for ID: 4, internals: PID(24492), thread(8972)
10759.4551:   REL  GIL for ID: 4, internals: PID(24492), thread(8972). Duration: 3.4823s
10759.4554: Worker finished ID: 4 - internals: PID(24492), thread(8972)
INFO:     127.0.0.1:56883 - "GET /async/4 HTTP/1.1" 200 OK

10759.4566: Worker reads async endpoint with ID: 2 - internals: PID(24492), thread(8972)
10759.4568:   BIND GIL for ID: 2, internals: PID(24492), thread(8972)
10762.6428:   REL  GIL for ID: 2, internals: PID(24492), thread(8972). Duration: 3.1857s
10762.6431: Worker finished ID: 2 - internals: PID(24492), thread(8972)
INFO:     127.0.0.1:56884 - "GET /async/2 HTTP/1.1" 200 OK

10762.6446: Worker reads sync endpoint with ID: 3 - internals: PID(24492), thread(22648)
10762.6448:   BIND GIL for ID: 3, internals: PID(24492), thread(22648)
10762.6968: Worker reads sync endpoint with ID: 1 - internals: PID(24492), thread(9144)
10762.7127:   BIND GIL for ID: 1, internals: PID(24492), thread(9144)
10768.9234:   REL  GIL for ID: 3, internals: PID(24492), thread(22648). Duration: 6.2784s
10768.9338: Worker finished ID: 3 - internals: PID(24492), thread(22648)
INFO:     127.0.0.1:56882 - "GET /sync/3 HTTP/1.1" 200 OK
10769.2121:   REL  GIL for ID: 1, internals: PID(24492), thread(9144). Duration: 6.4835s
10769.2124: Worker finished ID: 1 - internals: PID(24492), thread(9144)
INFO:     127.0.0.1:56885 - "GET /sync/1 HTTP/1.1" 200 OK

10769.2138: "ID: 1, 4999999950000000"
10769.2141: "ID: 2, 4999999950000000"
10769.2143: "ID: 3, 4999999950000000"
10769.2145: "ID: 4, 4999999950000000"

Interpretation

Going over the timestamps and the durations, two things are immediately clear:

  1. The async endpoints are executing de-facto synchronously
  2. The sync endpoints are executing concurrently and finish nearly at the same time BUT each request takes twice as long to complete compared to the async ones

Both of these results are expected, re: the explanations earlier.

The async endpoints become de-facto synchronous because the function we built hoards the GIL, and so the event loop gets no execution time until the coroutine returns.

The sync endpoints become faux-asynchronous because Python's context manager is swapping between them every ~5ms, which means that the first request increments by x%, then the second request increments by x% - repeat until both finish ~ish at the same time.

Sign up to request clarification or add additional context in comments.

Comments

2

time.sleep() block the current process but it doesnt completly render the interpreter useless since it need to measure the time. So it keeps working.

Think it like a person looking his clock and waiting. The person is capable to do other things and keeps breathing for example but their main foucs it to wait for sometime. Maybe waiting for their meal to cook.

In your scenerio where you use asynchronous, python interpreter just pauses one task and looks at other. So it is not completly usesless. Think it like a round-robin. Works for one process for limited cpu clock time (waiting for the time sleep in this example) then pauses it and looks at other process. "the function is truly blocking" doesnt mean it renders interpreter to unable to do anything other but it just tells it to wait for something.

So our person in example does some other task like loading the dishes in dishwasher and for every 4 dish placed they check their clock to see if their meal is ready. So cooking the meal is a blocking process for preapering dinner since you need to wait for it to be cooked. But you can asyncly load the dishes and check for the time to see if meal is ready.

4 Comments

Thank you, I understand what you're saying. However, in this case, if my synchronous code is performing heavy computations instead of using time.sleep(), the interpreter should be locked and unable to execute anything else until the code is completed, right? For async code, I can use await to instruct the interpreter when to switch to another task, but for sync code, I still don't fully understand how it works.
This would be accurate for multiprocess concurrency, but it's less accurate for threading in a GIL context. Threads that execute code will block the event loop, but threads that wait for some kind of I/O will not. So it is actually the case that if a function is truly blocking, e.g. executes continually on the CPU, the GIL will be locked to that thread until such a time that the function either finishes or waits for an I/O operation.
@Vegard, yes, that's what I thought as well. However, I would appreciate it if you could take a look at the example in my question, as it’s what’s confusing me. :)
@JoãoPedroZimmermann No matter what if you do hard calculation, wait for input or for time you pc do not just focus on that, and mindlesly do only that job. If some process overhelms your system you see the "program not responding" error, so if your program/code executes without freezing it means it is still responsive even performas a hard calculation. There are aproaches like round-robin, priority-based scheduling to divide your cpu resources by your OS allows divding interpreters focus for multiprocess or event-loop from interpreter to divide focus on async tasks.
2

If you go to the implementation of thread.sleep in the C code, you will see that a portion of the code is wrapped in Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS.

Normally, only one thread can run at once. But these two macros specially set up a section of code in which other threads are allowed to grab the Global Interpreter Lock.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.