13

I'm having memory issues because it looks like Django is loading the objects into memory when using delete(). Is there any way to prevent Django from doing that?

From the Django docs:

Django needs to fetch objects into memory to send signals and handle cascades. However, if there are no cascades and no signals, then Django may take a fast-path and delete objects without fetching into memory. For large deletes this can result in significantly reduced memory usage. The amount of executed queries can be reduced, too.

https://docs.djangoproject.com/en/1.8/ref/models/querysets/#delete

I don't use signals. I do have foreign keys on the model I'm trying to delete, but I don't see why Django would need to load the objects into memory. It looks like it does, because my memory is rising as the query runs.

4
  • 1
    Since you have foreign keys, Django needs to load the objects in order to resolve how the relation should handle the deletion: docs.djangoproject.com/en/1.8/ref/models/fields/… Commented Jul 17, 2015 at 14:59
  • @petkostas I did try to put on_delete=models.DO_NOTHING on my ForeignKey fields, but that didn't help. But it wouldn't be good solution for me anyway, I want to disable loading of the objects in memory for this specific query, I don't want all my queries to ignore the ForeignKey constraints.. Commented Jul 17, 2015 at 15:36
  • Using raw sql could be a solution. Commented Jul 17, 2015 at 15:39
  • @rednaw That is expected behavior, Django needs to resolve the Foreign Key cascade policy, another option would be to send the task to a queue (celery) and rate limit the operations from there. Commented Jul 17, 2015 at 16:35

2 Answers 2

6

You can use a function like this to iterate over an huge number of objects without using too much memory:

import gc

def queryset_iterator(qs, batchsize = 500, gc_collect = True):
    iterator = qs.values_list('pk', flat=True).order_by('pk').distinct().iterator()
    eof = False
    while not eof:
        primary_key_buffer = []
        try:
            while len(primary_key_buffer) < batchsize:
                primary_key_buffer.append(iterator.next())
        except StopIteration:
            eof = True
        for obj in qs.filter(pk__in=primary_key_buffer).order_by('pk').iterator():
            yield obj
        if gc_collect:
            gc.collect()

Then you can use the function to iterate over the objects to delete:

for obj in queryset_iterator(HugeQueryset.objects.all()):
    obj.delete()

For more information you can check this blog post.

Sign up to request clarification or add additional context in comments.

3 Comments

That blog post link is offline, sadly.
Thank you Adam, I fixed the URL of the blog post.
If you get 'generator' object has no attribute 'next' in python 3, replace iterator.next() by next(iterator) and it should do the trick.
2

You can import django database connection and use it with sql to delete. I had exact same problem as you do and this helps me a lot. Here's some snippet(I'm using mysql by the way, but you can run any sql statement):

from django.db import connection
sql_query = "DELETE FROM usage WHERE date < '%s' ORDER BY date" % date
cursor = connection.cursor()
try:
    cursor.execute(sql_query)
finally:
    c.close()

This should execute only the delete operation on that table without affecting any of your model relationships.

1 Comment

It is likely that the query will fail at the database level, because the objects being deleted have foreign keys pointing to them, and Django put a constraint of type ON DELETE RESTRICT on foreign keys at the DB level.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.