1

I'm trying to read CSV with the following line:

raw_data = genfromtxt(datafile,delimiter='\t',dtype=None)

OK, this function reads this file into Record Array when it meets string data in the datafile. as far as I understand, when dtype is None, file should be read into Record Array too. Is that correct?

However, if there is no string data and only numeric one is presented, this function reads data into ndarray.

If no, is there a convenient way to force this function read file as record array?

The problem with ndarray is that all my code is built in order to process record arrays.

UPD1 Just in case someone will try to do it, here is a brief solution. Possibly this one is not the best, but at least it works:

Read file from csv as an ndarray raw_data = genfromtxt(datafile,delimiter='\t',dtype=None)

Generate default names and datatypes for columns:

names_=['f'+str(i) for i in range(raw_data.shape[1])];
names=[(name,raw_data.dtype) for name in names_];

And finaly, to create record array:

raw_data_as_ra = raw_data.ravel().view(names);
10
  • Just specify the desired dtype maybe? Commented Apr 14, 2014 at 9:08
  • Every time I read different CSV files - I can have thousands of columns and I don't know for sure which data I will meet in the file. Commented Apr 14, 2014 at 9:10
  • And what exactly is the problem with the ndarray? Is it that it converts ints to floats? or am I missing something bigger? Commented Apr 14, 2014 at 9:13
  • Sorry, I've forgot to mention that all my further analysis of this file is built around record arrays in order to capture general case, when not only numeric data is presented. Commented Apr 14, 2014 at 9:18
  • Maybe it's worth showing what exactly doesn't work in your processing code. Commented Apr 14, 2014 at 9:20

1 Answer 1

3

You could use recfromcsv, which is derived from genfromtxt, instead:

If your file looks like:

col1,col2,col3
1.1, 2.4, 3.2
4.1, 5.2, 6.3

Then do this

a = np.recfromcsv('yourfile.csv')

gives:

rec.array([(1.1, 2.4, 3.2), (4.1, 5.2, 6.3)], 
      dtype=[('col1', '<f8'), ('col2', '<f8'), ('col3', '<f8')])

Note that recfromcsv uses the first row as column/record names.

Also, you can use the same input parameters as genfromtxt (e.g. the delimiter parameter). Your line of code might look like this if your file is tab delimited:

np.recfromcsv(datafile,delimiter='\t'))
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I almost forgot about this function. Hope this function is able to read csv without names. Thanks again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.