0

I am trying to make an array from list of tuples. It should be 2D array with 28193 rows and 28 columns. The last 3 columns are float other are int.

This is my code:

results = cur.execute('SELECT * from matches').fetchall()
array_type = np.dtype('int64, int64, int64, int64, int64, int64, int64, int64, int64, int64,'
                          ' int64, int64, int64, int64, int64, int64, int64, int64, int64, int64,'
                          ' int64, int64, int64, int64, int64, float64, float64, float'
                          )

arr = np.array(results, dtype=array_type)

What I receive is (28193, ) shape.

The strange part is that if I remove the dtype parameter in the array definition it gets created properly. I've compared and counted the columns multiple times...

Here is a sample row:

1   735083  1   1   1   24  0   4   2   0   1   2   6   22  15  0   9   10  8   5   8   1   1   1   0   3   3.4 2.5

And datatypes in the DB are the same: int*25, float*3

Thanks!

8
  • what type of database is cur connected to? Commented Apr 13, 2017 at 20:25
  • SQLite. For float values. They are now float in the DB, but used to be Real I've changed them for the sake of the test. I've tried something simpler and got the same result: alist = [(1,2.3),(2,3.2)], arr = np.array(alist, dtype='int,float'), arr.shape -> returns (2,) Commented Apr 13, 2017 at 20:30
  • you wrote "if I remove the dtype parameter in the array definition it gets created properly." So what's the downside of doing that? Are you concerned the dtypes may not be reliably correct? Commented Apr 13, 2017 at 20:36
  • I would like to be more optimal and not unnecessarily create 25 float rows instead of 25 Int rows. And I would like to find out why this happens this way. Commented Apr 13, 2017 at 20:40
  • numpy docs seem to suggest that without specifying the dtype, np.array() will handle it pretty well. "If [dtype] not given, then the type will be determined as the minimum type required to hold the objects in the sequence." docs.scipy.org/doc/numpy/reference/generated/numpy.array.html Commented Apr 13, 2017 at 20:41

1 Answer 1

3

What you are doing, when adding the dtype=array_typeparameter is that you are creating a structured array with the implicit field name f0..f27. And your structured array is a 1d array, with each element containing 28 different data (fields).

When not adding the dtype parameter numpy will instead default to a datatype. And as always, when mixing ints and floats, all values are promoted to floats. Thus in this case you get a 28193x28 matrix of float64.

Now, it is up to you to know if you need to keep the type information or if it is OK to promote everything to floats. If you need to keep the types, you will have to do your indexing using arr[n][m].

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! If you mean by "keep type of information" to create the array with dtype then type(arr[0]) returns numpy.void. Is that bad? I mean in this case the array is full of numpy.void objects and not even floats, is this bad memory wise?
I don't know, but I would imagine that using structured arrays is using more memory than a standard float64 array. Why are you concerned with memory usage in the first place? Do you have a reason for that?
One record of that 28 field dtype is a numpy void object.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.