0

I have some data in a CSV that is formatted as such (I deleted some columns for simplicity):

Year,Region,Round,Diff
2014,South,Second Round,-24
2015,West,First Round,48
# ...lots of rows of this

I want to use both the string data in the Region and Round columns and the integer data in the Diff column.

Here is my relevant code:

import sklearn
import numpy as np
from numpy import genfromtxt
from StringIO import StringIO

# Some other code...

my_dtype=[('Year', int), ('Region', str),('Round', str),('Diff', int)] 
data = np.genfromtxt(my_file, delimiter=',',names=True,dtype=my_dtype)
print data

When I print my data, I get the following. NumPy is making every string an empty string.

[ ( 2014, '', '', -24)
( 2010, '', '', 48)
...]

Does anyone know how I could fix this? Am I using the dtype attribute wrong? Or something else? Thanks in advance.

1 Answer 1

1

Instead of putting str for the data type of the text fields, use the S format with a maximum string length:

In [10]: my_dtype = [('Year', int), ('Region', 'S8'), ('Round', 'S16'), ('Diff', int)] 

In [11]: data = np.genfromtxt('regions.csv', delimiter=',', names=True, dtype=my_dtype)

In [12]: data
Out[12]: 
array([(2014, b'South', b'Second Round', -24),
       (2015, b'West', b'First Round',  48)], 
      dtype=[('Year', '<i8'), ('Region', 'S8'), ('Round', 'S16'), ('Diff', '<i8')])

You can also use dtype=None and let genfromtxt() determine the data type for you:

In [13]: data = np.genfromtxt('regions.csv', delimiter=',', names=True, dtype=None)

In [14]: data
Out[14]: 
array([(2014, b'South', b'Second Round', -24),
       (2015, b'West', b'First Round',  48)], 
      dtype=[('Year', '<i8'), ('Region', 'S5'), ('Round', 'S12'), ('Diff', '<i8')])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.