0

I was having some issues with my Flask app when some requests ended up using too much memory to send a response with all the data in just one go, reading the flask docs it says i can stream the response and i did the following exercises to compare the memory usage and times between the usual way i handle the request/response and with the streamed way.

The thing is the non-streamed version takes less than 1 second and the streamed version around 19 seconds i was able to find some information in other use cases but nothing explaining the works of this, i think i'm not understanding something to have such a big time difference between both methods.

Thanks!

This is the test code:

from flask import Flask, Response, jsonify, stream_with_context
import time
import json
from memory_profiler import memory_usage



app = Flask(__name__)

BIG_SIZE = 400_000  

# --------- NON-STREAMED VERSION ----------
@app.route("/normal")
def normal_response():
    start_time = time.time()
    mem_before = memory_usage()[0]

    # Build everything in memory first
    data = [{"id": i, "value": f"Item-{i}"} for i in range(BIG_SIZE)]

    mem_after = memory_usage()[0]
    elapsed = time.time() - start_time

    print(f"[NORMAL] Memory Before: {mem_before:.2f} MB, After: {mem_after:.2f} MB, Elapsed: {elapsed:.2f}s")

    return jsonify(data)


# --------- STREAMED VERSION ----------
@app.route("/streamed")
def streamed_response():
    start_time = time.time()
    mem_before = memory_usage()[0]

    def generate():
        yield "["
        first = True
        for i in range(BIG_SIZE):
            record = {"id": i, "value": f"Item-{i}"}
            if not first:
                yield ","
            yield json.dumps(record)
            first = False
        yield "]"

        mem_after = memory_usage()[0]
        elapsed = time.time() - start_time
        print(f"[STREAMED] Memory Before: {mem_before:.2f} MB, After: {mem_after:.2f} MB, Elapsed: {elapsed:.2f}s")

    return Response(stream_with_context(generate()), mimetype="application/json")


if __name__ == "__main__":
    app.run(debug=True,
            host='0.0.0.0', 
            port=8080)
2
  • 1
    generator has to execute function (and yield it) 400_000 times. Normal version doesn't have to do it. Did you try to yield 2 or more records at once? Commented Sep 10 at 20:06
  • You sir, are absolutely right. Did the yield in batches of records and was way faster Commented Sep 10 at 23:03

1 Answer 1

1

Thanks furas for the explanation, helped me realize the problem itself. Gonna leave a version that yields in batches and, in adjusting the batch size, you can play with the ratio the memory it uses and the response time.

@app.route("/streamed_batches")
def streamed_response_batches():
    start_time = time.time()
    mem_before = memory_usage()[0]

    BATCH_SIZE = 20

    def generate():
        yield "["
        first = True
        batch = []

        for i in range(BIG_SIZE):
            batch.append({"id": i, "value": f"Item-{i}"})

            if len(batch) >= BATCH_SIZE or i == BIG_SIZE - 1:
                # Flush this batch
                chunk = json.dumps(batch)
                if not first:
                    yield ","
                yield chunk[1:-1]
                batch = []
                first = False

        yield "]"

        mem_after = memory_usage()[0]
        elapsed = time.time() - start_time
        print(f"[STREAMED_BATCHES] Memory Before: {mem_before:.2f} MB, "
              f"After: {mem_after:.2f} MB, Elapsed: {elapsed:.2f}s")

    return Response(stream_with_context(generate()), mimetype="application/json")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.