I have some data in a CSV that is formatted as such (I deleted some columns for simplicity):
Year,Region,Round,Diff
2014,South,Second Round,-24
2015,West,First Round,48
# ...lots of rows of this
I want to use both the string data in the Region and Round columns and the integer data in the Diff column.
Here is my relevant code:
import sklearn
import numpy as np
from numpy import genfromtxt
from StringIO import StringIO
# Some other code...
my_dtype=[('Year', int), ('Region', str),('Round', str),('Diff', int)]
data = np.genfromtxt(my_file, delimiter=',',names=True,dtype=my_dtype)
print data
When I print my data, I get the following. NumPy is making every string an empty string.
[ ( 2014, '', '', -24)
( 2010, '', '', 48)
...]
Does anyone know how I could fix this? Am I using the dtype attribute wrong? Or something else? Thanks in advance.