How do you create a multidimensional numpy array from an iterable of tuples?

Question

I would like to create a numpy array from an iterable, which yields tuples of values, such as a database query.

Like so:

data = db.execute('SELECT col1, col2, col3, col4 FROM data')
A = np.array(list(data))

Is there a way faster way of doing so, without converting the iterable to a list first?

not sure if this works: docs.scipy.org/doc/numpy/reference/generated/… — Fabricator
– Fabricator, Commented Jun 25, 2014 at 6:11
@Fabricator The documentation says that it creates a 1d array from the iterable. In this case it would create an array of objects instead of a 2d array with 4 columns. — Bakuriu
– Bakuriu, Commented Jun 25, 2014 at 7:46
np.loadtxt is an example of creating an array from an iterable, namely a file. In simplified terms, it reads a line, makes a list from its substrings, and appends that to a list. At the end it converts the list of lists to an array. — hpaulj
– hpaulj, Commented Jun 28, 2014 at 16:35

newtover · Accepted Answer · 2014-06-27 13:45:29Z

I am not an experienced user of numpy, but here is a possible solution for the general question:

>>> i = iter([(1, 11), (2, 22)])
>>> i
<listiterator at 0x5b2de30>                    # a sample iterable of tuples
>>> rec_array = np.fromiter(i, dtype='i4,i4')  # mind the dtype
>>> rec_array                                  # rec_array is a record array
array([(1, 11), (2, 22)], 
    dtype=[('f0', '<i4'), ('f1', '<i4')])
>>> rec_array['f0'], rec_array[0]              # each field has a default name
(array([1, 2]), (1, 11))
>>> a = rec_array.view(np.int32).reshape(-1,2) # let's create a view
>>> a
array([[ 1, 11],
       [ 2, 22]])
>>> rec_array[0][1] = 23
>>> a                                          # a is a view, not a copy!
array([[ 1, 23],
       [ 2, 22]])

I assume that all columns are of the same type, otherwise rec_array is already what you want.

Concerning your particular case, I do not completely understand what is db in your example. If it is a cursor object, then you can just call its fetchall method and get a list of tuples. In most cases, the database library does not want to keep a partially read query result, waiting for your code processing each line, that is by the moment when the execute method returns, all data is already stored in a list, and there is hardly a problem of using fetchall instead of iterating cursor instance.

Maarten · Accepted Answer · 2014-06-26 17:49:06Z

1

Although technically not an answer to my question, I found a way to do what I am trying to do:

def get_cols(db, cols):
    def get_col(col):
        data = db.execute('SELECT '+col+' FROM data', dtype=np.float64)
        return np.fromiter((v[0] for v in data))

    return np.vstack([get_col(col) for col in cols]).T

edited Jun 26, 2014 at 17:49

answered Jun 26, 2014 at 16:02

Maarten

4,7895 gold badges33 silver badges37 bronze badges

Comments

Benny Jobigan · Accepted Answer · 2024-05-04 22:18:15Z

0

I know the question was asked 10 years ago, but I was trying to do something similar and thought I'd share a possible solution. Use chain (or chain.from_iterable) and reshape.

from itertools import chain
import numpy as np

NUM_COLS = 3 # or whatever for your data
with db.GetJunk() as cursor:
  data = np.fromiter(chain(*cursor), dtype=float) # or other dtype for your data
num_rows = int(len(data)/NUM_COLS)
data = data.reshape((num_rows, NUM_COLS))

answered May 4, 2024 at 22:18

Benny Jobigan

5,3302 gold badges34 silver badges44 bronze badges

Collectives™ on Stack Overflow

How do you create a multidimensional numpy array from an iterable of tuples?

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related