Why Python does not release memory (under mod_wsgi + Django)

Question

I have Apache + mod_wsgi + Django app. mod_wsgi runs in daemon mode.

I have one view that fetches significant queryset from the DB and additionally allocates array by computing results of the queryset and then returns this array. I'm not using thread local storage, global variables or anything alike.

The problem is that my app eats memory relatively to the number threads I set for mod_wsgi.

I've made small experiment by setting various number of threads in mod_wsgi and then hitting my view by curl checking how far wsgi process can memory-climb.

It goes like this:

1 thread  - 256Mb
2 threads - 400Mb
3 threads - 535Mb
4 threads - 650Mb

So each thread add about 120-140Mb to the top memory usage.

I seems like the initial memory allocated for first request is never freed up. In single-thread scenario, its reused when second request (to the same view) is arrived. With that I can leave.

But when I use multiple threads, then when request is processed by a thread that never run this request before, this thread "saves" another 140mb somewhere locally.

How can fix this?
Probably Django saves some data in TSL. If that is the case, how can I disable it?
Alternatively, as a workaround, is it possible to bind request execution to a certain thread in mod_wsgi?

Thanks.

PS. DEBUG is set to False in settings.py

Do you a) only create the array when a dataset arrives and b) delete it when you have finished with it so the garbage collector can get to it? — Steve Barnes
– Steve Barnes, Commented Oct 22, 2013 at 12:22
a) - yes; b) - I don't del it explicitly. I convert it to JSON and return the JSON string. — Zaar Hai
– Zaar Hai, Commented Oct 22, 2013 at 12:33
That isn't true. So long as things aren't cached explicitly in some way, they should be able to be cleaned up straight away when the last scope using them is exited. The only time this may not be the case is if the data creates a reference count loop between objects, in which case one needs to wait for the garbage collector to kick in and break the cycle if it can. — Graham Dumpleton
– Graham Dumpleton, Commented Oct 22, 2013 at 13:58
What OP hasn't made clear is whether in multithread case the requests were serialised or run concurrently. If serialised, at most mod_wsgi daemon mode should only have activated two threads and not four due to the way it manages threads. Anyway, it is likely going to be due to some sort of caching at application level as mod_wsgi itself doesn't do anything to retain data. — Graham Dumpleton
– Graham Dumpleton, Commented Oct 22, 2013 at 14:00

Graham Dumpleton · Accepted Answer · 2013-10-22 14:08:11Z

11

In this sort of situation, what you should do is vertically partition your web application so that it runs across multiple mod_wsgi daemon process groups. That way you can tailor the configuration of the mod_wsgi daemon processes to the requirements of the subsets of URLs that you delegate to each. As the admin interface URLs of a Django application often have high transient memory usage requirements, yet aren't used very often, it can be recommended to do:

WSGIScriptAlias / /my/path/site/wsgi.py
WSGIApplicationGroup %{GLOBAL}

WSGIDaemonProcess main processes=3 threads=5
WSGIProcessGroup main

WSGIDaemonProcess admin threads=2 inactivity-timeout=60
<Location /admin>
WSGIProcessGroup admin
</Location>

So what this does is create two daemon process groups. By default URLs will be handled in the main daemon process group where the processes are persistent.

For the URLs for the admin interface however, they will be directed to the admin daemon process group, which can be set up with a single process with reduced number of threads, plus an inactivity timeout so that the process will be restarted automatically if the admin interface isn't used after 60 seconds, thereby reclaiming any excessive transient memory usage.

This will mean that submitting a requests to the admin interface can be slowed slightly if the processes had been recycled since the last time, as everything has to be loaded again, but since it is the admin interface and not a public URL, this is generally acceptable.

answered Oct 22, 2013 at 14:08

Graham Dumpleton

58.7k6 gold badges128 silver badges142 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Zaar Hai Over a year ago

Thanks for the approach. It should localize the problem. I'm still however interested why Django does not release the fetched objects. But that I'll ask in a separate question

Collectives™ on Stack Overflow

Why Python does not release memory (under mod_wsgi + Django)

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related