I am trying to perform memory profiling of list vs numpy arrays.
%%file memory.py
import numpy as np
@profile
def allocate():
vector_list = [float(i) for i in range(10000)]
np.arange(0,10000,dtype='d')
allocate()
Running memory profiler in the shell:
!python -m memory_profiler memory.py
gives the following output:
Line # Mem usage Increment Line Contents
================================================
4 39.945 MiB 0.000 MiB @profile
5 def allocate():
6 39.949 MiB **0.004 MiB** vector_list = [float(i) for i in range(10000)]
7 40.039 MiB **0.090 MiB** np.arange(0,10000,dtype='d')
Increment in memory of line 6 vs line 7 shows that numpy array was way more expensive than a list. What am I doing wrong?
sys.getsizeof(which should work reasonably well for a list and anp.arangeobject), instead of relying extensively on a memory profiling tool.sys.getsizeofcorrectly. For example, you would needsum(map(sys.getsizeof, vector_list)) + sys.getsizeof(vector_list)to get an accurate picture of the memory usage ofvector_list. Andsys.getsizeof(np.arange(0,10000))sys.getsizeofdoes not work reasonably well, naively, with alist. If you did it withvector_list, it would be off by about240000bytesdicts. But I go into hownumpycan be extremely memory efficient, but it also demonstrates the subtleties of getting the actual memory usage of a Python container. E.g. string interning, small-int caching, etc.