Environment
- Flask: 2.0.3
- Flask-SQLAlchemy: 2.5.1
- SQLAlchemy: 1.4.41
- Deployment: Gunicorn with thread workers
- Traffic: ~4 RPS in production
- Observed Issue: Memory grows from 35-40% to 90% over 30 hours, forcing pod restarts
Problem Description
I'm experiencing a progressive memory leak in my Flask application. Memory usage grows continuously from ~1.5 GB to ~4 GB over 30 hours of operation, eventually causing the pod to run out of memory.
After profiling, I suspect the issue is related to SQLALCHEMY_RECORD_QUERIES and the sqlalchemy_queries list not being cleared properly.
glogger thread-local logger objects never being cleared
Current Implementation
Configuration
# config.py
class ProductionConfig(BaseConfig):
SQLALCHEMY_RECORD_QUERIES = True # Enabled for slow query logging
DATABASE_QUERY_TIMEOUT = 0.05 # 50ms threshold
Slow Query Logging (Called on EVERY Request)
# utils/logging.py
from flask import request, current_app
from flask_sqlalchemy import get_debug_queries
def log_slow_db_query(min_time=0):
"""
Logs slow database queries.
This is called in after_request hook for EVERY request.
"""
if not request.environ.get("query_logged"):
for query in get_debug_queries():
if query.duration >= min_time:
logger.warning(
"SLOW QUERY: {statement} | Duration: {duration}s | "
"Params: {params} | Context: {context}".format(
statement=query.statement,
duration=query.duration,
params=query.parameters,
context=query.context,
)
)
# ❌ NOTE: We never clear the queries list here!
After Request Hook
# api.py
@app.after_request
def after_request(response):
log_slow_db_query(min_time=current_app.config["DATABASE_QUERY_TIMEOUT"])
return response
With Gunicorn's thread pool:
- Threads are reused across requests
- threading.local()._local persists within each thread
- Each new_logger() call creates:
- New structlog logger instance
- New GrofersRequestLogSchemaAdapter containing:
- Full Flask request object
- Request body, headers, query params
- Trace IDs, user context
- Old loggers are never deleted from _local.logger
- Memory accumulates linearly with request count per thread
The Suspected Issue
Looking at Flask-SQLAlchemy's source code:
def get_debug_queries():
"""Returns list of queries recorded for this request."""
return getattr(_app_ctx_stack.top, 'sqlalchemy_queries', [])
The queries are stored in _app_ctx_stack.top.sqlalchemy_queries - a list attached to the Flask application context.
def attach_logger(func):
"""Attach glogger to the request instance."""
@wraps(func)
def wrapped(*args, **kwargs):
if not hasattr(request, "logger"):
request.logger = get_struct_logger()
return func(*args, **kwargs)
return wrapped
def initialize_logger(**kwargs):
glogger.new_logger(**kwargs)
Gunicorn thread workers, threading.local() storage persist across requests
My Understanding (Please Confirm):
- Per Flask docs: Application context is pushed when a request arrives and popped when the request ends
- Per Flask-SQLAlchemy docs: Recorded queries should be cleared when app context is torn down
- However: In Flask-SQLAlchemy 2.5.1, I don't see explicit cleanup of this list
- With Gunicorn thread workers: Threads are reused across requests, and if the app context is reused or not properly torn down, the
sqlalchemy_querieslist could persist and grow
Questions
Does Flask-SQLAlchemy 2.5.1 automatically clear the
sqlalchemy_querieslist at the end of each request?- If yes, under what conditions might this fail?
- If no, should I manually clear it?
Is the app context truly torn down after each request with Gunicorn thread workers?
- Or can
_app_ctx_stack.toppoint to the same object across multiple requests in the same thread?
- Or can
Should I be explicitly clearing the list after reading it?
queries = get_debug_queries() # ... process queries ... # Clear the list to prevent leak? from flask import _app_ctx_stack if hasattr(_app_ctx_stack.top, 'sqlalchemy_queries'): _app_ctx_stack.top.sqlalchemy_queries.clear()Is
SQLALCHEMY_RECORD_QUERIESsafe for production use?- The docs say it's for debugging, but many use it for slow query logging
- Are there better alternatives for production slow query logging?
With Gunicorn thread workers, does threading.local() storage persist across requests?
Should I try this
- Remove logger context from local threads
@app.teardown_request
def cleanup_thread_locals(exc):
# Clear request.logger reference
if hasattr(request, 'logger'):
delattr(request, 'logger')
# Clear glogger thread-local storage
if hasattr(glogger, '_local'):
if hasattr(glogger._local, 'logger'):
del glogger._local.logger
if hasattr(glogger._local, 'trace_id'):
del glogger._local.trace_id
- Clear recorded queries list
if hasattr(_app_ctx_stack.top, 'sqlalchemy_queries'):
_app_ctx_stack.top.sqlalchemy_queries.clear()
- Close the session if active using session.remove in case of scoped session during teardown app context after api request
def shutdown_session(exception=None):
"""Remove scoped sessions after each request"""
app.db.session.remove() # Clears write session
app.db_read.session.remove() # Clears read session
Related Information
- This function is called on every single request (~4 RPS = ~345,600 calls/day)
- Each request typically executes 5-15 database queries
- We have multiple database binds (read/write separation)
- Running in Kubernetes with resource limits
Seeking Advice
Is my understanding correct that the sqlalchemy_queries list can grow indefinitely in Flask-SQLAlchemy 2.5.1? If so:
- What's the proper way to prevent this leak?
- glogger creates new logger in threading.local() on every request, should we clear it after the request as the same thread is used again
- Should I switch to SQLAlchemy event-based query logging instead?
- Has anyone else encountered this issue with similar setup?
Any insights would be greatly appreciated!
Update: I've also identified that db.session.remove() is missing from our teardown, which could contribute to identity_map growth. But the sqlalchemy_queries list seems to be a separate issue.