In your last example, you make a 2 element array. Each element can be anything - a string, a list, or array:
In [113]: a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])])
<ipython-input-113-3010d1b297e2>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])])
Without the warning:
In [114]: a = np.array([np.array(['a', 'b', 'l']), np.array(['c', 'd'])],object)
...:
In [115]: a.shape
Out[115]: (2,)
In [116]: a
Out[116]:
array([array(['a', 'b', 'l'], dtype='<U1'),
array(['c', 'd'], dtype='<U1')], dtype=object)
In [117]: a[0]
Out[117]: array(['a', 'b', 'l'], dtype='<U1')
In [118]: a[0] = ['foobar']
In [119]: a
Out[119]: array([list(['foobar']), array(['c', 'd'], dtype='<U1')], dtype=object)
In [120]: a[0] = 'foobar'
In [121]: a
Out[121]: array(['foobar', array(['c', 'd'], dtype='<U1')], dtype=object)
This array behaves very much like a 2 element list. In fact I'd question the value of using such an array instead of a list.
Creating an object dtype array with arrays that are all the same shape can be tricky, because np.array tries to makes multidimensional array where possible (as in your original example).
In [133]: a = np.empty(2,object) # 'blank' array with desired shape
In [134]: a
Out[134]: array([None, None], dtype=object)
In [135]: a[:] = [['a','b'],['c','d']] # assign 2 lists to it
In [136]: a
Out[136]: array([list(['a', 'b']), list(['c', 'd'])], dtype=object)
In [137]: a[1] = np.array(['a','b']) # assign an array to an element
In [138]: a
Out[138]: array([list(['a', 'b']), array(['a', 'b'], dtype='<U1')], dtype=object)
The display gives information about the array elements.
2d array
The original example is 2d array. The fact that it is string dtype (or object) doesn't make much difference. It could just as well a numeric array. You can't change the shape by assignment.
In [122]: b = np.array([['a', 'b'], ['c', 'd']], dtype='<U10')
In [123]: b
Out[123]:
array([['a', 'b'],
['c', 'd']], dtype='<U10')
In [124]: b.shape
Out[124]: (2, 2)
The regular multidimensional array indexing rules apply, including broadcasting.
In [125]: b[0]
Out[125]: array(['a', 'b'], dtype='<U10')
In [126]: _.shape
Out[126]: (2,)
In [127]: b[0] = 'd' # broadcast to the whole row
In [128]: b
Out[128]:
array([['d', 'd'],
['c', 'd']], dtype='<U10')
In [129]: b[0] = ['d','e'] # assign separate elements to the row
In [130]: b
Out[130]:
array([['d', 'e'],
['c', 'd']], dtype='<U10')
In [131]: b[:,1] = ['x','y'] # assign to a column
In [132]: b
Out[132]:
array([['d', 'x'],
['c', 'y']], dtype='<U10')
Look at how these arrays are converted to a list:
In [139]: a.tolist()
Out[139]: [['a', 'b'], array(['a', 'b'], dtype='<U1')]
In [140]: b.tolist()
Out[140]: [['d', 'x'], ['c', 'y']]
numpy is optimized for numeric multidimensional arrays. All the fast compiled code works on numeric values. It can store strings and general objects, but the processing is at Python speeds, not fast compiled ones.
awitharr = np.array([[1], [2, 3]])and you'll getarray([list([1]), list([2, 3])], dtype=object)at which point you really need to be questioning why you're using numpy at all. It's now just a nested list with the overhead of numpy, yet none of the decent parts of numpy will work on anobjectdtypea[0,:] = ['e'], assigning 'e' to all elements of the 0 row.objectas the dtype, you're almost certainly doing something wrong. This also applies to Pandas