9

I would like to create a numpy array from an iterable, which yields tuples of values, such as a database query.

Like so:

data = db.execute('SELECT col1, col2, col3, col4 FROM data')
A = np.array(list(data))

Is there a way faster way of doing so, without converting the iterable to a list first?

3
  • not sure if this works: docs.scipy.org/doc/numpy/reference/generated/… Commented Jun 25, 2014 at 6:11
  • 1
    @Fabricator The documentation says that it creates a 1d array from the iterable. In this case it would create an array of objects instead of a 2d array with 4 columns. Commented Jun 25, 2014 at 7:46
  • np.loadtxt is an example of creating an array from an iterable, namely a file. In simplified terms, it reads a line, makes a list from its substrings, and appends that to a list. At the end it converts the list of lists to an array. Commented Jun 28, 2014 at 16:35

3 Answers 3

2

I am not an experienced user of numpy, but here is a possible solution for the general question:

>>> i = iter([(1, 11), (2, 22)])
>>> i
<listiterator at 0x5b2de30>                    # a sample iterable of tuples
>>> rec_array = np.fromiter(i, dtype='i4,i4')  # mind the dtype
>>> rec_array                                  # rec_array is a record array
array([(1, 11), (2, 22)], 
    dtype=[('f0', '<i4'), ('f1', '<i4')])
>>> rec_array['f0'], rec_array[0]              # each field has a default name
(array([1, 2]), (1, 11))
>>> a = rec_array.view(np.int32).reshape(-1,2) # let's create a view
>>> a
array([[ 1, 11],
       [ 2, 22]])
>>> rec_array[0][1] = 23
>>> a                                          # a is a view, not a copy!
array([[ 1, 23],
       [ 2, 22]])

I assume that all columns are of the same type, otherwise rec_array is already what you want.

Concerning your particular case, I do not completely understand what is db in your example. If it is a cursor object, then you can just call its fetchall method and get a list of tuples. In most cases, the database library does not want to keep a partially read query result, waiting for your code processing each line, that is by the moment when the execute method returns, all data is already stored in a list, and there is hardly a problem of using fetchall instead of iterating cursor instance.

Sign up to request clarification or add additional context in comments.

Comments

1

Although technically not an answer to my question, I found a way to do what I am trying to do:

def get_cols(db, cols):
    def get_col(col):
        data = db.execute('SELECT '+col+' FROM data', dtype=np.float64)
        return np.fromiter((v[0] for v in data))

    return np.vstack([get_col(col) for col in cols]).T

Comments

0

I know the question was asked 10 years ago, but I was trying to do something similar and thought I'd share a possible solution. Use chain (or chain.from_iterable) and reshape.

from itertools import chain
import numpy as np

NUM_COLS = 3 # or whatever for your data
with db.GetJunk() as cursor:
  data = np.fromiter(chain(*cursor), dtype=float) # or other dtype for your data
num_rows = int(len(data)/NUM_COLS)
data = data.reshape((num_rows, NUM_COLS))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.