0

I am trying to perform memory profiling of list vs numpy arrays.

%%file memory.py

import numpy as np

@profile
def allocate():
    vector_list = [float(i) for i in range(10000)]
    np.arange(0,10000,dtype='d')

allocate()

Running memory profiler in the shell:

!python -m memory_profiler memory.py

gives the following output:

Line #    Mem usage    Increment   Line Contents
================================================
     4   39.945 MiB    0.000 MiB   @profile
     5                             def allocate():
     6   39.949 MiB    **0.004 MiB**       vector_list = [float(i) for i in range(10000)]
     7   40.039 MiB    **0.090 MiB**       np.arange(0,10000,dtype='d')

Increment in memory of line 6 vs line 7 shows that numpy array was way more expensive than a list. What am I doing wrong?

12
  • 2
    I cannot reproduce your results... Commented Jul 19, 2017 at 22:35
  • Since you're largely interested in those two, you could just check the sizes of both objects using sys.getsizeof (which should work reasonably well for a list and a np.arange object), instead of relying extensively on a memory profiling tool. Commented Jul 19, 2017 at 22:40
  • @MosesKoledoye yeah, but you have to have a grip on CPython internals to use sys.getsizeof correctly. For example, you would need sum(map(sys.getsizeof, vector_list)) + sys.getsizeof(vector_list) to get an accurate picture of the memory usage of vector_list. And sys.getsizeof(np.arange(0,10000)) Commented Jul 19, 2017 at 22:42
  • @MosesKoledoye in other words, sys.getsizeof does not work reasonably well, naively, with a list. If you did it with vector_list, it would be off by about 240000 bytes Commented Jul 19, 2017 at 22:44
  • 1
    @MosesKoledoye yep. Check out my answer here, although, the original question was about the memory usage of a bunch of dicts. But I go into how numpy can be extremely memory efficient, but it also demonstrates the subtleties of getting the actual memory usage of a Python container. E.g. string interning, small-int caching, etc. Commented Jul 19, 2017 at 22:54

1 Answer 1

3

I do not know what is memory profiler reporting - I get very different numbers from you:

Line #    Mem usage    Increment   Line Contents
================================================
     3   41.477 MiB    0.000 MiB   @profile
     4                             def allocate():
     5   41.988 MiB    0.512 MiB       vector_list = [float(i) for i in range(10000)]
     6   41.996 MiB    0.008 MiB       np.arange(0,10000,dtype='d')

I would recommend the following two links for your reading: Python memory usage of numpy arrays and Size of list in memory

I have also modified your code as follows:

import numpy as np
import sys

@profile
def allocate():
    vector_list = [float(i) for i in range(10000)]
    npvect = np.arange(0,10000,dtype='d')
    listsz = sum(map(sys.getsizeof, vector_list)) + sys.getsizeof(vector_list)
    print("numpy array size: {}\nlist size: {}".format(npvect.nbytes, listsz)) 
    print("getsizeof(numpy array): {}\n".format(sys.getsizeof(npvect))) 

allocate()

and it outputs:

numpy array size: 80000
list size: 327632
getsizeof(numpy array): 80096
Sign up to request clarification or add additional context in comments.

2 Comments

This does not correctly account for the memory required for that list. You need to use sum(map(sys.getsizeof, vector_list)) + sys.getsizeof(vector_list)
I have edited my answer to account for your comment. Thanks! @juanpa.arrivillaga

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.