Python(NumPy) making array from list of tuples makes it 1D if supplied dtype and 2D if not

Question

I am trying to make an array from list of tuples. It should be 2D array with 28193 rows and 28 columns. The last 3 columns are float other are int.

This is my code:

results = cur.execute('SELECT * from matches').fetchall()
array_type = np.dtype('int64, int64, int64, int64, int64, int64, int64, int64, int64, int64,'
                          ' int64, int64, int64, int64, int64, int64, int64, int64, int64, int64,'
                          ' int64, int64, int64, int64, int64, float64, float64, float'
                          )

arr = np.array(results, dtype=array_type)

What I receive is (28193, ) shape.

The strange part is that if I remove the dtype parameter in the array definition it gets created properly. I've compared and counted the columns multiple times...

Here is a sample row:

1   735083  1   1   1   24  0   4   2   0   1   2   6   22  15  0   9   10  8   5   8   1   1   1   0   3   3.4 2.5

And datatypes in the DB are the same: int*25, float*3

Thanks!

SQLite. For float values. They are now float in the DB, but used to be Real I've changed them for the sake of the test. I've tried something simpler and got the same result: alist = [(1,2.3),(2,3.2)], arr = np.array(alist, dtype='int,float'), arr.shape -> returns (2,) — Dimitur Epitropov
– Dimitur Epitropov, Commented Apr 13, 2017 at 20:30
you wrote "if I remove the dtype parameter in the array definition it gets created properly." So what's the downside of doing that? Are you concerned the dtypes may not be reliably correct? — Max Power
– Max Power, Commented Apr 13, 2017 at 20:36
I would like to be more optimal and not unnecessarily create 25 float rows instead of 25 Int rows. And I would like to find out why this happens this way. — Dimitur Epitropov
– Dimitur Epitropov, Commented Apr 13, 2017 at 20:40
numpy docs seem to suggest that without specifying the dtype, np.array() will handle it pretty well. "If [dtype] not given, then the type will be determined as the minimum type required to hold the objects in the sequence." docs.scipy.org/doc/numpy/reference/generated/numpy.array.html — Max Power
– Max Power, Commented Apr 13, 2017 at 20:41

JohanL · Accepted Answer · 2017-04-13 20:47:27Z

3

What you are doing, when adding the dtype=array_typeparameter is that you are creating a structured array with the implicit field name f0..f27. And your structured array is a 1d array, with each element containing 28 different data (fields).

When not adding the dtype parameter numpy will instead default to a datatype. And as always, when mixing ints and floats, all values are promoted to floats. Thus in this case you get a 28193x28 matrix of float64.

Now, it is up to you to know if you need to keep the type information or if it is OK to promote everything to floats. If you need to keep the types, you will have to do your indexing using arr[n][m].

answered Apr 13, 2017 at 20:47

JohanL

6,9211 gold badge16 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Dimitur Epitropov Over a year ago

Thanks! If you mean by "keep type of information" to create the array with dtype then type(arr[0]) returns numpy.void. Is that bad? I mean in this case the array is full of numpy.void objects and not even floats, is this bad memory wise?

JohanL Over a year ago

I don't know, but I would imagine that using structured arrays is using more memory than a standard float64 array. Why are you concerned with memory usage in the first place? Do you have a reason for that?

hpaulj Over a year ago

One record of that 28 field dtype is a numpy void object.

Collectives™ on Stack Overflow

Python(NumPy) making array from list of tuples makes it 1D if supplied dtype and 2D if not

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related