2

This is similar to How to convert an array of strings to an array of floats in numpy.

I have a list of strings:

dat = [
    '  1  2  1.040000e+005  0.030000\n',
    '  2  7  0.000000e+000  0.030000\n',
    '  3  15  0.000000e+000  0.030000\n',
]

Here are my failed attempts to make a numpy record array:

import numpy as np
dat_dtype = [
    ('I', 'i'),
    ('J', 'i'),
    ('val1', 'd'),
    ('val2', 'd'),
]

# Attempt 1
np.array(dat, dat_dtype)
# looks like garbage

# Attempt 2
np.array([x.split() for x in dat], dtype=dat_dtype)
# looks like different garbage

# Attempt 3
string_ndarray = np.array([x.split() for x in dat], dtype='|S15')
# looks good so far
string_ndarray.astype(dat_dtype)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1.040000e+005'

I give up. Here's the only way I can get the expected output:

dat_ndarray = np.zeros(len(dat), dat_dtype)
for i, line in enumarate(dat):
    dat_ndarray[i] = tuple(line.split())

print(dat_ndarray)  # [(1, 2, 104000.0, 0.03) (2, 7, 0.0, 0.03) (3, 15, 0.0, 0.03)]

Is there a more direct method to get the expected record array?

1
  • Well you should also indicate what you are expecting output as. Commented Aug 26, 2015 at 3:51

2 Answers 2

1

Your input is lines of text, so you can use a text reader to convert it to an array (structured or plain). Here's one way to do that with numpy.genfromtxt:

np.genfromtxt(dat, dtype=dat_dtype)

For example,

In [204]: dat
Out[204]: 
['  1  2  1.040000e+005  0.030000\n',
 '  2  7  0.000000e+000  0.030000\n',
 '  3  15  0.000000e+000  0.030000\n']

In [205]: dat_dtype
Out[205]: [('I', 'i'), ('J', 'i'), ('val1', 'f'), ('val2', 'f')]

In [206]: np.genfromtxt(dat, dtype=dat_dtype)
Out[206]: 
array([(1, 2, 104000.0, 0.029999999329447746), (2, 7, 0.0, 0.029999999329447746), (3, 15, 0.0, 0.029999999329447746)], 
      dtype=[('I', '<i4'), ('J', '<i4'), ('val1', '<f4'), ('val2', '<f4')])
Sign up to request clarification or add additional context in comments.

2 Comments

This looks undocumented, as the first argument fname is neither a file or string.
The documentation is now corrected to describe this feature;
1

With your dat and dat_dtype this works:

In [667]: np.array([tuple(x.strip().split()) for x in dat],dtype=dat_dtype)
Out[667]: 
array([(1, 2, 104000.0, 0.03), (2, 7, 0.0, 0.03), (3, 15, 0.0, 0.03)], 
  dtype=[('I', '<i4'), ('J', '<i4'), ('val1', '<f8'), ('val2', '<f8')])

Structured arrays are best created with lists of tuples. I stripped off the \n, split it on white space, and then formed tuples

In [668]: [tuple(x.strip().split()) for x in dat]
Out[668]: 
[('1', '2', '1.040000e+005', '0.030000'),
 ('2', '7', '0.000000e+000', '0.030000'),
 ('3', '15', '0.000000e+000', '0.030000')]

I let the dat_dtype take care of the string to number conversion.

2 Comments

This is the same as my attempt 2, but with tuple(x.split()). It doesn't seem that white space matters.
Yes, the .strip() wasn't needed. It's just a habit from reading text lines - remove the \n before splitting into words. Without the tuple the best you get is an array of strings.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.