Stream response with Python and Flask

Question

I was having some issues with my Flask app when some requests ended up using too much memory to send a response with all the data in just one go, reading the flask docs it says i can stream the response and i did the following exercises to compare the memory usage and times between the usual way i handle the request/response and with the streamed way.

The thing is the non-streamed version takes less than 1 second and the streamed version around 19 seconds i was able to find some information in other use cases but nothing explaining the works of this, i think i'm not understanding something to have such a big time difference between both methods.

Thanks!

This is the test code:

from flask import Flask, Response, jsonify, stream_with_context
import time
import json
from memory_profiler import memory_usage



app = Flask(__name__)

BIG_SIZE = 400_000  

# --------- NON-STREAMED VERSION ----------
@app.route("/normal")
def normal_response():
    start_time = time.time()
    mem_before = memory_usage()[0]

    # Build everything in memory first
    data = [{"id": i, "value": f"Item-{i}"} for i in range(BIG_SIZE)]

    mem_after = memory_usage()[0]
    elapsed = time.time() - start_time

    print(f"[NORMAL] Memory Before: {mem_before:.2f} MB, After: {mem_after:.2f} MB, Elapsed: {elapsed:.2f}s")

    return jsonify(data)


# --------- STREAMED VERSION ----------
@app.route("/streamed")
def streamed_response():
    start_time = time.time()
    mem_before = memory_usage()[0]

    def generate():
        yield "["
        first = True
        for i in range(BIG_SIZE):
            record = {"id": i, "value": f"Item-{i}"}
            if not first:
                yield ","
            yield json.dumps(record)
            first = False
        yield "]"

        mem_after = memory_usage()[0]
        elapsed = time.time() - start_time
        print(f"[STREAMED] Memory Before: {mem_before:.2f} MB, After: {mem_after:.2f} MB, Elapsed: {elapsed:.2f}s")

    return Response(stream_with_context(generate()), mimetype="application/json")


if __name__ == "__main__":
    app.run(debug=True,
            host='0.0.0.0', 
            port=8080)

generator has to execute function (and yield it) 400_000 times. Normal version doesn't have to do it. Did you try to yield 2 or more records at once? — furas
– furas, Commented Sep 10 at 20:06
You sir, are absolutely right. Did the yield in batches of records and was way faster — cd91
– cd91, Commented Sep 10 at 23:03

cd91 · Accepted Answer · 2025-09-10 23:08:16Z

Thanks furas for the explanation, helped me realize the problem itself. Gonna leave a version that yields in batches and, in adjusting the batch size, you can play with the ratio the memory it uses and the response time.

@app.route("/streamed_batches")
def streamed_response_batches():
    start_time = time.time()
    mem_before = memory_usage()[0]

    BATCH_SIZE = 20

    def generate():
        yield "["
        first = True
        batch = []

        for i in range(BIG_SIZE):
            batch.append({"id": i, "value": f"Item-{i}"})

            if len(batch) >= BATCH_SIZE or i == BIG_SIZE - 1:
                # Flush this batch
                chunk = json.dumps(batch)
                if not first:
                    yield ","
                yield chunk[1:-1]
                batch = []
                first = False

        yield "]"

        mem_after = memory_usage()[0]
        elapsed = time.time() - start_time
        print(f"[STREAMED_BATCHES] Memory Before: {mem_before:.2f} MB, "
              f"After: {mem_after:.2f} MB, Elapsed: {elapsed:.2f}s")

    return Response(stream_with_context(generate()), mimetype="application/json")

Collectives™ on Stack Overflow

Stream response with Python and Flask

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related