1

I am seeing some behavior with Boolean indexing that I do not understand, and I was hoping to find some clarification here.

First off, this is the behavior I am seeking...

>>>
>>> a = np.zeros(10, dtype=np.ndarray)
>>> a
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=object)
>>> b = np.arange(10).reshape(2,5)
>>> b
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> a[5] = b
>>> a
array([0, 0, 0, 0, 0, array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]), 0,
       0, 0, 0], dtype=object)
>>>

The reason for choosing an ndarray of ndarrays is because I will be appending the arrays stored in the super array, and they will all be of different lengths. I chose the type ndarray instead of list for the super array so I can have access to all of numpys clever indexing features.

anyway if i make a Boolean indexer and use that to assign, say, b+5 at position 1, it does something I didn't expect

>>> indexer = np.zeros(10,dtype='bool')
>>> indexer
array([False, False, False, False, False, False, False, False, False, False], dtype=bool)
>>> indexer[1] = True
>>> indexer
array([False,  True, False, False, False, False, False, False, False, False], dtype=bool)
>>> a[indexer] = b+5
>>> a
array([0, 5, 0, 0, 0, array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]), 0,
       0, 0, 0], dtype=object)
>>>

Can anyone help me understand what's going on? I would like the result to be

>>> a[1] = b+5
>>> a
array([0, array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]]), 0, 0,
       0, array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]), 0, 0, 0, 0], dtype=object)
>>>

The final goal is to have a lot of "b" arrays stored in B, and to assign them to a like this

>>> a[indexer] = B[indexer]

EDIT:

found possible work around based on the discussion below. I can wrap my data in a class if i need to

>>>
>>> class myclass:
...     def __init__(self):
...             self.data = np.random.rand(1)
...
>>>
>>> b = myclass()
>>> b
<__main__.myclass object at 0x000002871A4AD198> 
>>> b.data
array([ 0.40185378])
>>>
>>> a[indexer] = b
>>> a
array([None, <__main__.myclass object at 0x000002871A4AD198>, None, None,
       None, None, None, None, None, None], dtype=object)
>>> a[1].data
array([ 0.40185378])

EDIT: this actually fails. I cannot allocate anything to the data field when indexed

1
  • it does not :( it fails... but thanks for the info! i will do that in the future Commented Apr 26, 2017 at 23:08

1 Answer 1

2
In [203]: a = np.empty(5, object)
In [204]: a
Out[204]: array([None, None, None, None, None], dtype=object)
In [205]: a[3]=np.arange(3)
In [206]: a
Out[206]: array([None, None, None, array([0, 1, 2]), None], dtype=object)

So simple indexing works with this object array.

Boolean indexing works for reading:

In [207]: a[np.array([0,0,0,1,0], dtype=bool)]
Out[207]: array([array([0, 1, 2])], dtype=object)
In [208]: a[np.array([0,0,1,0,0], dtype=bool)]

But has problems when writing:

Out[208]: array([None], dtype=object)
In [209]: a[np.array([0,0,1,0,0], dtype=bool)]=np.arange(2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-209-c1ef5580972c> in <module>()
----> 1 a[np.array([0,0,1,0,0], dtype=bool)]=np.arange(2)

ValueError: NumPy boolean array indexing assignment cannot assign 2 
input values to the 1 output values where the mask is true

np.where(<boolean>) and [2] also give problems:

In [221]: a[[2]]=np.arange(3)
/usr/local/bin/ipython3:1: DeprecationWarning: assignment will raise an 
error in the future, most likely because your index result shape does 
not match the value array shape. You can use `arr.flat[index] = values`    
to keep the old behaviour.

So whatever reason, indexed assignment to an object dtype array does not work as well as with regular ones.

Even the recommended flat doesn't work

In [226]: a.flat[[2]]=np.arange(3)
In [227]: a
Out[227]: array([None, None, 0, array([0, 1, 2]), None], dtype=object)

I can assign a non-list/array object

In [228]: a[[2]]=None
In [229]: a
Out[229]: array([None, None, None, array([0, 1, 2]), None], dtype=object)
In [230]: a[[2]]={3:4}
In [231]: a
Out[231]: array([None, None, {3: 4}, array([0, 1, 2]), None], dtype=object)
In [232]: idx=np.array([0,0,1,0,0],bool)
In [233]: a[idx]=set([1,2,3])
In [234]: a
Out[234]: array([None, None, {1, 2, 3}, array([0, 1, 2]), None], dtype=object)

object dtype arrays are at the edge of numpy array functionality.


Look at what we get with getitem. With a scalar index we get what object is stored in that slot (in my latest case, a set). But with [[2]] or boolean, we get another object array.

In [235]: a[2]
Out[235]: {1, 2, 3}
In [236]: a[[2]]
Out[236]: array([{1, 2, 3}], dtype=object)
In [237]: a[idx]
Out[237]: array([{1, 2, 3}], dtype=object)
In [238]: a[idx].shape
Out[238]: (1,)

I suspect that when a[idx] is on the LHS, it tries to convert the RHS to an object array first:

Out[241]: array([0, 1, 2], dtype=object)
In [242]: _.shape
Out[242]: (3,)
In [243]: np.array(set([1,2,3]), object)
Out[243]: array({1, 2, 3}, dtype=object)
In [244]: _.shape
Out[244]: ()

In the case of a set the resulting array has a single element and can be put in the (1,) slot. But when the RHS is a list or array the result is a n element array, e.g. (3,), which does not fit in the (1,) slot.

Solution (sort of)

If you want to assign a list/array to a slot in a object array with some form of advanced indexing (boolean or list), first put that item in an object array of the correct size:

In [255]: b=np.empty(1,object)
In [256]: b[0]=np.arange(3)
In [257]: b
Out[257]: array([array([0, 1, 2])], dtype=object)
In [258]: b.shape
Out[258]: (1,)
In [259]: a[idx]=b
In [260]: a
Out[260]: array([None, None, array([0, 1, 2]), array([0, 1, 2]), None], dtype=object)

Or working with your slightly large arrays:

In [264]: a = np.zeros(10, dtype=object)
In [265]: b = np.arange(10).reshape(2,5)
In [266]: a[5] = b
In [267]: c = np.zeros(1, dtype=object)  # intermediate object wrapper
In [268]: c[0] = b+5
In [269]: idx = np.zeros(10,bool)
In [270]: idx[1]=True
In [271]: a[idx] = c
In [272]: a
Out[272]: 
array([0, array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]]), 0, 0,
       0, array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]]), 0, 0, 0, 0], dtype=object)

If idx has n True items, the c has to have shape that will broadcast to (n,)

Sign up to request clarification or add additional context in comments.

3 Comments

It looks less buggy when I make sure that the RHS dtype matches the LHS (i.e. object dtype). Then it's just the standard business of broadcastable shapes. It comes back to that old question - how to unambiguously turn a list or array into an object array of known shape.
is it possible to assign and append with this indexing as well? if we had the right shape of C with values in it, does a[idx].append(C[idx]) make any... sense?
a[idx] is an object array, not a list. It does not have an append method. a[2] could be a list, and thus be appendable. You could put a large list or array in c, and then assign that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.