1

I was working with numpy.ndarray and something interesting happened. I created an array with the shape of (2, 2) and left everything else with the default values. It created an array for me with these values:

array([[2.12199579e-314, 0.00000000e+000],
       [5.35567160e-321, 7.72406468e-312]])

I created another array with the same default values and it also gave me the same result.

Then I created a new array (using the default values and the shape (2, 2)) and filled it with zeros using the 'fill' method. The interesting part is that now whenever I create a new array with ndarray it gives me an array with 0 values. So what is going on behind the scenes?

1
  • Creating an array without predefined values has no guarantee of the content. If it uses memory which wasn't used by the process yet, the content is usually zeros because the operating system usually clears memory before giving it to a process because it could contain sensitive information from another process or OS itself otherwise. Commented Dec 24, 2022 at 10:42

2 Answers 2

3

See https://numpy.org/doc/stable/reference/generated/numpy.empty.html#numpy.empty: (Precisely as @Michael Butscher commented)

np.empty([2, 2]) creates an array without touching the contents of the memory chunk allocated for the array; thus, the array may look as if filled with some more or less random values.

np.ndarray([2, 2]) does the same.

Other creation methods, however, fill the memory with some values:

np.zeros([2, 2]) fills the memory with zeros, np.full([2, 2], 9) fills the memory with nines, etc.

Now, if you create a new array via np.empty() after creating (and disposing of, i.e. automatically garbage collected) an array filled with e.g. ones, your new array may be allocated the same chunk of memory and thus look as if "filled" with ones.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! This was useful! First I guessed that maybe like integers (class int), when you create an array with the same default values, it points to the same memory of another array that we've created before, and it changes the pointer only when we apply some changes. I used the id function to test this and the addresses were different.
id of an array does not tell us where its data buffer is located.
As @hpaulj commented, I'd try to avoid np.empty() unless you have a good reason; it bit me once: my code with np.empty() passed all tests locally on my machine (which apparently zeroed its memory) but then failed on a server because of the random uninitialized values :)
1

np.empty explicitly says it returns:

Array of uninitialized (arbitrary) data of the given shape, dtype, and
    order.  Object arrays will be initialized to None.

It's compiled code so I can't say for sure, but I strongly suspect is just calls np.ndarray, with shape and dtype.

ndarray describes itself as a low level function, and lists many, better alternatives.

In a ipython session I can make two arrays:

In [2]: arr = np.empty((2,2), dtype='int32'); arr
Out[2]: 
array([[  927000399,  1267404612],
       [ 1828571807, -1590157072]])

In [3]: arr1 = np.ndarray((2,2), dtype='int32'); arr1
Out[3]: 
array([[  927000399,  1267404612],
       [ 1828571807, -1590157072]])

The values are the same, but when I check the "location" of their data buffers, I see that they are different:

In [4]: arr.__array_interface__['data'][0]
Out[4]: 2213385069328
In [5]: arr1.__array_interface__['data'][0]
Out[5]: 2213385068176

We can't use that number in code to fiddle with the values, but it's useful as a human-readable indicator of where the data is stored. (Do you understand the basics of how arrays are stored, with shape, dtype, strides, and data-buffer?)

Why the "uninitialized values" are the same is anyones guess; my guess it's just an artifact of the how that bit of memory was used before. np.empty stresses that we shouldn't place an significance to those values.

Doing the ndarray again, produces different values and location:

In [9]: arr1 = np.ndarray((2,2), dtype='int32'); arr1
Out[9]: 
array([[1469865440,        515],
       [         0,          0]])
In [10]: arr1.__array_interface__['data'][0]
Out[10]: 2213403372816

apparent reuse

If I don't assign the array to a variable, or otherwise "hang on to it", numpy may reuse the data buffer memory:

In [17]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[17]: 2213403374512
In [18]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[18]: 2213403374512
In [19]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[19]: 2213403374512
In [20]: np.empty((2,2), dtype='int').__array_interface__['data'][0]
Out[20]: 2213403374512

Again, we shouldn't place too much significance to this reuse, and certainly not count on it for any calculations.

object dtype

If we specify the object dtype, then the values are initialized to None. This dtype contains references/pointers to objects in memory, and "random" pointers wouldn't be safe.

In [14]: arr1 = np.ndarray((2,2), dtype='object'); arr1
Out[14]: 
array([[None, None],
       [None, None]], dtype=object)

In [15]: arr1 = np.ndarray((2,2), dtype='U3'); arr1
Out[15]: 
array([['', ''],
       ['', '']], dtype='<U3')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.