default values for numpy ndarray

Question

I was working with numpy.ndarray and something interesting happened. I created an array with the shape of (2, 2) and left everything else with the default values. It created an array for me with these values:

array([[2.12199579e-314, 0.00000000e+000],
       [5.35567160e-321, 7.72406468e-312]])

I created another array with the same default values and it also gave me the same result.

Then I created a new array (using the default values and the shape (2, 2)) and filled it with zeros using the 'fill' method. The interesting part is that now whenever I create a new array with ndarray it gives me an array with 0 values. So what is going on behind the scenes?

Creating an array without predefined values has no guarantee of the content. If it uses memory which wasn't used by the process yet, the content is usually zeros because the operating system usually clears memory before giving it to a process because it could contain sensitive information from another process or OS itself otherwise. — Michael Butscher
– Michael Butscher, Commented Dec 24, 2022 at 10:42

isCzech · Accepted Answer · 2022-12-24 22:29:27Z

3

See https://numpy.org/doc/stable/reference/generated/numpy.empty.html#numpy.empty: (Precisely as @Michael Butscher commented)

np.empty([2, 2]) creates an array without touching the contents of the memory chunk allocated for the array; thus, the array may look as if filled with some more or less random values.

np.ndarray([2, 2]) does the same.

Other creation methods, however, fill the memory with some values:

np.zeros([2, 2]) fills the memory with zeros, np.full([2, 2], 9) fills the memory with nines, etc.

Now, if you create a new array via np.empty() after creating (and disposing of, i.e. automatically garbage collected) an array filled with e.g. ones, your new array may be allocated the same chunk of memory and thus look as if "filled" with ones.

edited Dec 24, 2022 at 22:29

answered Dec 24, 2022 at 11:51

isCzech

4011 gold badge3 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

winter Over a year ago

Thanks! This was useful! First I guessed that maybe like integers (class int), when you create an array with the same default values, it points to the same memory of another array that we've created before, and it changes the pointer only when we apply some changes. I used the id function to test this and the addresses were different.

hpaulj Over a year ago

id of an array does not tell us where its data buffer is located.

isCzech Over a year ago

As @hpaulj commented, I'd try to avoid np.empty() unless you have a good reason; it bit me once: my code with np.empty() passed all tests locally on my machine (which apparently zeroed its memory) but then failed on a server because of the random uninitialized values :)

hpaulj · Accepted Answer · 2022-12-24 20:04:36Z

np.empty explicitly says it returns:

Array of uninitialized (arbitrary) data of the given shape, dtype, and
    order.  Object arrays will be initialized to None.

It's compiled code so I can't say for sure, but I strongly suspect is just calls np.ndarray, with shape and dtype.

ndarray describes itself as a low level function, and lists many, better alternatives.

In a ipython session I can make two arrays:

In [2]: arr = np.empty((2,2), dtype='int32'); arr
Out[2]: 
array([[  927000399,  1267404612],
       [ 1828571807, -1590157072]])

In [3]: arr1 = np.ndarray((2,2), dtype='int32'); arr1
Out[3]: 
array([[  927000399,  1267404612],
       [ 1828571807, -1590157072]])

The values are the same, but when I check the "location" of their data buffers, I see that they are different:

In [4]: arr.__array_interface__['data'][0]
Out[4]: 2213385069328
In [5]: arr1.__array_interface__['data'][0]
Out[5]: 2213385068176

We can't use that number in code to fiddle with the values, but it's useful as a human-readable indicator of where the data is stored. (Do you understand the basics of how arrays are stored, with shape, dtype, strides, and data-buffer?)

Why the "uninitialized values" are the same is anyones guess; my guess it's just an artifact of the how that bit of memory was used before. np.empty stresses that we shouldn't place an significance to those values.

Doing the ndarray again, produces different values and location:

In [9]: arr1 = np.ndarray((2,2), dtype='int32'); arr1
Out[9]: 
array([[1469865440,        515],
       [         0,          0]])
In [10]: arr1.__array_interface__['data'][0]
Out[10]: 2213403372816

apparent reuse

If I don't assign the array to a variable, or otherwise "hang on to it", numpy may reuse the data buffer memory:

In [17]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[17]: 2213403374512
In [18]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[18]: 2213403374512
In [19]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[19]: 2213403374512
In [20]: np.empty((2,2), dtype='int').__array_interface__['data'][0]
Out[20]: 2213403374512

Again, we shouldn't place too much significance to this reuse, and certainly not count on it for any calculations.

object dtype

If we specify the object dtype, then the values are initialized to None. This dtype contains references/pointers to objects in memory, and "random" pointers wouldn't be safe.

In [14]: arr1 = np.ndarray((2,2), dtype='object'); arr1
Out[14]: 
array([[None, None],
       [None, None]], dtype=object)

In [15]: arr1 = np.ndarray((2,2), dtype='U3'); arr1
Out[15]: 
array([['', ''],
       ['', '']], dtype='<U3')

Collectives™ on Stack Overflow

default values for numpy ndarray

2 Answers 2

3 Comments

apparent reuse

object dtype

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

apparent reuse

object dtype

Comments

Your Answer

Sign up or log in

Post as a guest

Related