How to convert an array of strings to a record array?

Question

This is similar to How to convert an array of strings to an array of floats in numpy.

I have a list of strings:

dat = [
    '  1  2  1.040000e+005  0.030000\n',
    '  2  7  0.000000e+000  0.030000\n',
    '  3  15  0.000000e+000  0.030000\n',
]

Here are my failed attempts to make a numpy record array:

import numpy as np
dat_dtype = [
    ('I', 'i'),
    ('J', 'i'),
    ('val1', 'd'),
    ('val2', 'd'),
]

# Attempt 1
np.array(dat, dat_dtype)
# looks like garbage

# Attempt 2
np.array([x.split() for x in dat], dtype=dat_dtype)
# looks like different garbage

# Attempt 3
string_ndarray = np.array([x.split() for x in dat], dtype='|S15')
# looks good so far
string_ndarray.astype(dat_dtype)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1.040000e+005'

I give up. Here's the only way I can get the expected output:

dat_ndarray = np.zeros(len(dat), dat_dtype)
for i, line in enumarate(dat):
    dat_ndarray[i] = tuple(line.split())

print(dat_ndarray)  # [(1, 2, 104000.0, 0.03) (2, 7, 0.0, 0.03) (3, 15, 0.0, 0.03)]

Is there a more direct method to get the expected record array?

Well you should also indicate what you are expecting output as. — Anand S Kumar
– Anand S Kumar, Commented Aug 26, 2015 at 3:51

Warren Weckesser · Accepted Answer · 2015-08-26 04:14:30Z

1

Your input is lines of text, so you can use a text reader to convert it to an array (structured or plain). Here's one way to do that with numpy.genfromtxt:

np.genfromtxt(dat, dtype=dat_dtype)

For example,

In [204]: dat
Out[204]: 
['  1  2  1.040000e+005  0.030000\n',
 '  2  7  0.000000e+000  0.030000\n',
 '  3  15  0.000000e+000  0.030000\n']

In [205]: dat_dtype
Out[205]: [('I', 'i'), ('J', 'i'), ('val1', 'f'), ('val2', 'f')]

In [206]: np.genfromtxt(dat, dtype=dat_dtype)
Out[206]: 
array([(1, 2, 104000.0, 0.029999999329447746), (2, 7, 0.0, 0.029999999329447746), (3, 15, 0.0, 0.029999999329447746)], 
      dtype=[('I', '<i4'), ('J', '<i4'), ('val1', '<f4'), ('val2', '<f4')])

edited Aug 26, 2015 at 4:14

answered Aug 26, 2015 at 3:58

Warren Weckesser

116k20 gold badges207 silver badges224 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mike T Over a year ago

This looks undocumented, as the first argument fname is neither a file or string.

Mike T Over a year ago

The documentation is now corrected to describe this feature;

hpaulj · Accepted Answer · 2015-08-26 04:10:48Z

1

With your dat and dat_dtype this works:

In [667]: np.array([tuple(x.strip().split()) for x in dat],dtype=dat_dtype)
Out[667]: 
array([(1, 2, 104000.0, 0.03), (2, 7, 0.0, 0.03), (3, 15, 0.0, 0.03)], 
  dtype=[('I', '<i4'), ('J', '<i4'), ('val1', '<f8'), ('val2', '<f8')])

Structured arrays are best created with lists of tuples. I stripped off the \n, split it on white space, and then formed tuples

In [668]: [tuple(x.strip().split()) for x in dat]
Out[668]: 
[('1', '2', '1.040000e+005', '0.030000'),
 ('2', '7', '0.000000e+000', '0.030000'),
 ('3', '15', '0.000000e+000', '0.030000')]

I let the dat_dtype take care of the string to number conversion.

answered Aug 26, 2015 at 4:10

hpaulj

233k14 gold badges260 silver badges392 bronze badges

2 Comments

Mike T Over a year ago

This is the same as my attempt 2, but with tuple(x.split()). It doesn't seem that white space matters.

hpaulj Over a year ago

Yes, the .strip() wasn't needed. It's just a habit from reading text lines - remove the \n before splitting into words. Without the tuple the best you get is an array of strings.

Collectives™ on Stack Overflow

How to convert an array of strings to a record array?

2 Answers 2

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related