So I really give up on this.. I would like to pre-allocate a huge 2d-numpy array with shape(10000000,3) with one specific dtype per column.
Example:
a b c
-------- --------- --------
uint32 float32 uint8
------ ------ ------
90 2.43 4
100 2.42 2
123 2.33 1
So from the docs I can create a 2d array like this:
arr = np.zeros((4,3))
arr
Out[6]:
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
Good so far, but what about dtypes?
In [16]: arr.dtype
Out[16]: dtype('float64')
All float - So lets define dtype:
dtype_L1 = np.dtype({'names': ['a', 'b', 'c'],
'formats': [np.uint32, np.float32, np.uint8]})
And compare both:
In [25]: arr_dtype = np.zeros((4,3), dtype=dtype_L1)
In [26]: arr = np.zeros((4,3))
In [27]: arr[0,0]
Out[27]: 0.0
In [28]: arr_dtype[0,0]
Out[28]: (0, 0., 0)
In [29]: type(arr_dtype[0,0])
Out[29]: numpy.void
In [30]: type(arr[0,0])
Out[30]: numpy.float64
In [31]: arr.shape
Out[31]: (4, 3)
In [32]: arr_dtype.shape
Out[32]: (4, 3)
So - I do not see, why arr_dtype is not the same as arr, just with other dtype per column. Can somebody guide into a direction, please? It looks like I am creating an array with too high dimensions..:
**Update: One dimension too deep..? **
>>> arr[0,0]
0 ## Correct
>>> arr_dtype[0,0]
(0, 0., 0)
It really holds the dtyped array here?! Looking one dimension deeper:
>>> type(arr_dtype[0,0][0])
<class 'numpy.uint32'>
>>> type(arr_dtype[0,0][1])
<class 'numpy.float32'>
>>> type(arr_dtype[0,0][2])
<class 'numpy.uint8'>
# all good - But one level too deep.
- Expected:
numpyis putting up a 4x3 matrix, where each element is a number. 12 numbers at all is correct. - Obvserved:
numpyis putting up a 4x3 matrix where each element is ashape (3,)structure. So I have 4x3x3 fields = 36 numbers.
So is it possible to apply dtype in another way?
Final solution
You basically need to descide what is more important: Saving space or having all data in one array? One array can only have one dtype in it. So if you need different data types, go for multiple arrays with same length of Y-axis. Otherwise, create it simply like arr_dtype = np.zeros((4,3), dtype=np.float32) and make sure to set dtype to the correct type per array. Thanks for the comments!
arr_dtypeandarrhave different shape and dtype. The fields of one aren't the same as the columns of the other. Only the compound dtype allows a mix of dtype.unit32, first columnfloat32and second oneunit8. I think it will be more clear if I could see how to do that.structured arraywith three different column-wise dtypes? I still try to figure out why the code is wrong (based on your comment).. So from one of many examples it looks like thedtypeproperty as applied here is correct? Happy for advice.dt = np.dtype(...); arr = np.zeros((2000,), dtype=dt)makes the structured array.arr=np.zeros((2000,3), dtype=float)makes the 2d float array. Structured array makes most sense when one or more of the columns are string dtype, and/or a mix of float and int. It's really just an alternative to creating 3 separate arrays each with their own dtype. You can't do math across the fields, so there's little computational advantage to using the compound dtype.