1

I have two numpy a,b the shape of them are (100,2048), and I used sys.getsizeof(a) = 112 and same with array b.

I have question, when I use c = np.concatenate((a,b),axis=0), the shape of c is (200,2048), but the sys.getsizeof(c) = 1638512

Why?

1
  • Could you include the code you use? I can't reproduce your example Commented Sep 1, 2018 at 15:41

2 Answers 2

1

getsizeof has limited value. It can be way off for lists. For arrays it's better, but you have to understand how arrays are stored.

In [447]: import sys
In [448]: a = np.arange(100)
In [449]: sys.getsizeof(a)
Out[449]: 896

But look at the size of a view:

In [450]: b = a.reshape(10,10)
In [451]: sys.getsizeof(b)
Out[451]: 112

This shows the size of the array object, but not the size of the shared databuffer. b doesn't have its own databuffer.

In [453]: a.size
Out[453]: 100
In [454]: b.size
Out[454]: 100

So my guess is that your a and b are views of some other arrays. But the concatenate produces a new array with its own databuffer. It can't be a view of the other two. So its getsizeof reflects that.

In [457]: c = np.concatenate((a,b.ravel()))
In [459]: c.shape
Out[459]: (200,)
In [460]: c.size
Out[460]: 200
In [461]: sys.getsizeof(c)
Out[461]: 1696

The databuffer for a is 100*8 bytes, so the 'overhead' is 96. For c, 200*8, again with a 96 'overhead'.

Sign up to request clarification or add additional context in comments.

Comments

0

It do not reproduce your example:

import numpy as np
import sys

a = np.random.rand(100, 2048)
b = np.random.rand(100, 2048)

print(sys.getsizeof(a), sys.getsizeof(b))
# 1638512 1638512

c = np.concatenate((a,b), axis=0)
print(sys.getsizeof(c))
# 3276912   which is about 1638512 + 1638512

2 Comments

Thx, the code I used is from some one else, see:github.com/Maluuba/gensen. It's a sentence embedding, which can convert a string (sentence) into a 2048 dimension vector, so every 100 sentences have a (100,2048) vector. I also confused why the size of a is only 112, maybe they used some compress technique that I don't know. anyway, I convert to list and then convert back to numpy array it's normal, thanks.
here is some detail about getsizeof stackoverflow.com/a/17574104/8069403 , maybe it could help understand what happens

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.