1

Environment

  • Flask: 2.0.3
  • Flask-SQLAlchemy: 2.5.1
  • SQLAlchemy: 1.4.41
  • Deployment: Gunicorn with thread workers
  • Traffic: ~4 RPS in production
  • Observed Issue: Memory grows from 35-40% to 90% over 30 hours, forcing pod restarts

Problem Description

I'm experiencing a progressive memory leak in my Flask application. Memory usage grows continuously from ~1.5 GB to ~4 GB over 30 hours of operation, eventually causing the pod to run out of memory.

After profiling, I suspect the issue is related to SQLALCHEMY_RECORD_QUERIES and the sqlalchemy_queries list not being cleared properly.

glogger thread-local logger objects never being cleared

Current Implementation

Configuration

# config.py
class ProductionConfig(BaseConfig):
    SQLALCHEMY_RECORD_QUERIES = True  # Enabled for slow query logging
    DATABASE_QUERY_TIMEOUT = 0.05     # 50ms threshold

Slow Query Logging (Called on EVERY Request)

# utils/logging.py
from flask import request, current_app
from flask_sqlalchemy import get_debug_queries

def log_slow_db_query(min_time=0):
    """
    Logs slow database queries.
    This is called in after_request hook for EVERY request.
    """
    if not request.environ.get("query_logged"):
        for query in get_debug_queries():
            if query.duration >= min_time:
                logger.warning(
                    "SLOW QUERY: {statement} | Duration: {duration}s | "
                    "Params: {params} | Context: {context}".format(
                        statement=query.statement,
                        duration=query.duration,
                        params=query.parameters,
                        context=query.context,
                    )
                )
        # ❌ NOTE: We never clear the queries list here!

After Request Hook

# api.py
@app.after_request
def after_request(response):
    log_slow_db_query(min_time=current_app.config["DATABASE_QUERY_TIMEOUT"])
    return response

With Gunicorn's thread pool:

  1. Threads are reused across requests
  2. threading.local()._local persists within each thread
  3. Each new_logger() call creates:
  • New structlog logger instance - New GrofersRequestLogSchemaAdapter containing:
    • Full Flask request object
    • Request body, headers, query params
    • Trace IDs, user context
  1. Old loggers are never deleted from _local.logger
  2. Memory accumulates linearly with request count per thread

The Suspected Issue

Looking at Flask-SQLAlchemy's source code:

def get_debug_queries():
    """Returns list of queries recorded for this request."""
    return getattr(_app_ctx_stack.top, 'sqlalchemy_queries', [])

The queries are stored in _app_ctx_stack.top.sqlalchemy_queries - a list attached to the Flask application context.

def attach_logger(func):
    """Attach glogger to the request instance."""

    @wraps(func)
    def wrapped(*args, **kwargs):
        if not hasattr(request, "logger"):
            request.logger = get_struct_logger()
        return func(*args, **kwargs)

    return wrapped

def initialize_logger(**kwargs):
    glogger.new_logger(**kwargs)

Gunicorn thread workers, threading.local() storage persist across requests

My Understanding (Please Confirm):

  1. Per Flask docs: Application context is pushed when a request arrives and popped when the request ends
  2. Per Flask-SQLAlchemy docs: Recorded queries should be cleared when app context is torn down
  3. However: In Flask-SQLAlchemy 2.5.1, I don't see explicit cleanup of this list
  4. With Gunicorn thread workers: Threads are reused across requests, and if the app context is reused or not properly torn down, the sqlalchemy_queries list could persist and grow

Questions

  1. Does Flask-SQLAlchemy 2.5.1 automatically clear the sqlalchemy_queries list at the end of each request?

    • If yes, under what conditions might this fail?
    • If no, should I manually clear it?
  2. Is the app context truly torn down after each request with Gunicorn thread workers?

    • Or can _app_ctx_stack.top point to the same object across multiple requests in the same thread?
  3. Should I be explicitly clearing the list after reading it?

    queries = get_debug_queries()
    # ... process queries ...
    
    # Clear the list to prevent leak?
    from flask import _app_ctx_stack
    if hasattr(_app_ctx_stack.top, 'sqlalchemy_queries'):
        _app_ctx_stack.top.sqlalchemy_queries.clear()
    
  4. Is SQLALCHEMY_RECORD_QUERIES safe for production use?

    • The docs say it's for debugging, but many use it for slow query logging
    • Are there better alternatives for production slow query logging?
  5. With Gunicorn thread workers, does threading.local() storage persist across requests?

Should I try this

  1. Remove logger context from local threads
      @app.teardown_request
      def cleanup_thread_locals(exc):
          # Clear request.logger reference
          if hasattr(request, 'logger'):
              delattr(request, 'logger')
    
          # Clear glogger thread-local storage
          if hasattr(glogger, '_local'):
              if hasattr(glogger._local, 'logger'):
                  del glogger._local.logger
              if hasattr(glogger._local, 'trace_id'):
                  del glogger._local.trace_id
  1. Clear recorded queries list
      if hasattr(_app_ctx_stack.top, 'sqlalchemy_queries'):
          _app_ctx_stack.top.sqlalchemy_queries.clear()
  1. Close the session if active using session.remove in case of scoped session during teardown app context after api request
  def shutdown_session(exception=None):
      """Remove scoped sessions after each request"""
      app.db.session.remove()      # Clears write session
      app.db_read.session.remove()  # Clears read session

Related Information

  • This function is called on every single request (~4 RPS = ~345,600 calls/day)
  • Each request typically executes 5-15 database queries
  • We have multiple database binds (read/write separation)
  • Running in Kubernetes with resource limits

Seeking Advice

Is my understanding correct that the sqlalchemy_queries list can grow indefinitely in Flask-SQLAlchemy 2.5.1? If so:

  • What's the proper way to prevent this leak?
  • glogger creates new logger in threading.local() on every request, should we clear it after the request as the same thread is used again
  • Should I switch to SQLAlchemy event-based query logging instead?
  • Has anyone else encountered this issue with similar setup?

Any insights would be greatly appreciated!


Update: I've also identified that db.session.remove() is missing from our teardown, which could contribute to identity_map growth. But the sqlalchemy_queries list seems to be a separate issue.

3
  • SQLALCHEMY_RECORD_QUERIES made this to False hence the hunch of sqlalchemy_queries not getting cleared after the api context is false can have a look into other potential contributor to the leaks Commented Oct 27 at 14:37
  • Could this be due to threads not releasing memory and there is no option enabled max_requests = n gunicorn property which restarts the thread after n no of requests Commented Oct 28 at 18:25
  • Or could it be due to ddtrace library traces leaking memory ? Commented Oct 28 at 18:52

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.