0

I have a somewhat weird problem which is probably derived from how indexing works in numpy. But for some reason I don't seem to understand it, let alone reach the behavior I'm expecting:

>>> a = np.array([['a', 'b'], ['c', 'd']], dtype='<U10')
>>> a
array([['a', 'b'],
       ['c', 'd']], dtype='<U10')
>>> a[0] = ['e']
>>> a
array([['e', 'e'],
       ['c', 'd']], dtype='<U10')

So what I was expecting is to obtain

array([['e'], ['c', 'd']], dtype='<U10)

Can someone give me a hint as of why this is not working as I was expecting, and how to reach the expected behavior?

Also, and in reaction to roganjosh's comment:

>>> a = np.array([np.array(['a', 'b']), np.array(['c', 'd'])])
>>> a[0] = 'e'
>>> a
array([['e', 'e'],['c', 'd']], dtype=object)

However:

>>> a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])])
>>> a[0] = 'e'
>>> a
array(['c', array(['c', 'd'], dtype='<U1')], dtype=object)

which feels sort of weird.

Thanks in advance!

11
  • 4
    You can't have that expected output because that would be a jagged array. Numpy is extending the dimensions of the single-item list you've given to be an array to keep the same shape you initialised a with Commented Dec 20, 2020 at 20:19
  • You can test this with arr = np.array([[1], [2, 3]]) and you'll get array([list([1]), list([2, 3])], dtype=object) at which point you really need to be questioning why you're using numpy at all. It's now just a nested list with the overhead of numpy, yet none of the decent parts of numpy will work on an object dtype Commented Dec 20, 2020 at 20:25
  • You are expecting a list behavior, replacing the first element of a list with another list. For this array the assignment is actually a[0,:] = ['e'], assigning 'e' to all elements of the 0 row. Commented Dec 20, 2020 at 20:27
  • 1
    I can see why you think it's confusing but actually, the underlying cause is that you're misusing the numpy library (I don't mean that in a harsh way). That leads on to the answer for your second point; don't use numpy if that's what you expect to happen. Just use regular lists. Numpy is not an enhanced python list; arrays have specific purposes Commented Dec 20, 2020 at 20:44
  • 1
    If you see object as the dtype, you're almost certainly doing something wrong. This also applies to Pandas Commented Dec 20, 2020 at 20:50

1 Answer 1

1

In your last example, you make a 2 element array. Each element can be anything - a string, a list, or array:

In [113]: a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])])
<ipython-input-113-3010d1b297e2>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])])

Without the warning:

In [114]: a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])],object)
     ...: 
In [115]: a.shape
Out[115]: (2,)
In [116]: a
Out[116]: 
array([array(['a', 'b', 'l'], dtype='<U1'),
       array(['c', 'd'], dtype='<U1')], dtype=object)
In [117]: a[0]
Out[117]: array(['a', 'b', 'l'], dtype='<U1')
In [118]: a[0] = ['foobar']
In [119]: a
Out[119]: array([list(['foobar']), array(['c', 'd'], dtype='<U1')], dtype=object)
In [120]: a[0] = 'foobar'
In [121]: a
Out[121]: array(['foobar', array(['c', 'd'], dtype='<U1')], dtype=object)

This array behaves very much like a 2 element list. In fact I'd question the value of using such an array instead of a list.

Creating an object dtype array with arrays that are all the same shape can be tricky, because np.array tries to makes multidimensional array where possible (as in your original example).

In [133]: a = np.empty(2,object)     # 'blank' array with desired shape
In [134]: a
Out[134]: array([None, None], dtype=object)
In [135]: a[:] = [['a','b'],['c','d']]    # assign 2 lists to it
In [136]: a
Out[136]: array([list(['a', 'b']), list(['c', 'd'])], dtype=object)
In [137]: a[1] = np.array(['a','b'])     # assign an array to an element
In [138]: a
Out[138]: array([list(['a', 'b']), array(['a', 'b'], dtype='<U1')], dtype=object)

The display gives information about the array elements.

2d array

The original example is 2d array. The fact that it is string dtype (or object) doesn't make much difference. It could just as well a numeric array. You can't change the shape by assignment.

In [122]: b = np.array([['a', 'b'], ['c', 'd']], dtype='<U10')
In [123]: b
Out[123]: 
array([['a', 'b'],
       ['c', 'd']], dtype='<U10')
In [124]: b.shape
Out[124]: (2, 2)

The regular multidimensional array indexing rules apply, including broadcasting.

In [125]: b[0]
Out[125]: array(['a', 'b'], dtype='<U10')
In [126]: _.shape
Out[126]: (2,)
In [127]: b[0] = 'd'         # broadcast to the whole row
In [128]: b
Out[128]: 
array([['d', 'd'],
       ['c', 'd']], dtype='<U10')

In [129]: b[0] = ['d','e']    # assign separate elements to the row
In [130]: b
Out[130]: 
array([['d', 'e'],
       ['c', 'd']], dtype='<U10')

In [131]: b[:,1] = ['x','y']   # assign to a column
In [132]: b
Out[132]: 
array([['d', 'x'],
       ['c', 'y']], dtype='<U10')

Look at how these arrays are converted to a list:

In [139]: a.tolist()
Out[139]: [['a', 'b'], array(['a', 'b'], dtype='<U1')]
In [140]: b.tolist()
Out[140]: [['d', 'x'], ['c', 'y']]

numpy is optimized for numeric multidimensional arrays. All the fast compiled code works on numeric values. It can store strings and general objects, but the processing is at Python speeds, not fast compiled ones.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.