Reading data file with unequal number of columns using numpy

Question

I have a .dat file with numbers. In the first row, this file has five columns, and in all subsequent rows, it has four columns. I want to be able to read this file using numpy. I encounter the following error when I try to read this file at present:

In [3]: F1 = np.loadtxt(‘file.dat')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent 
call last)
<ipython-input-3-c0f31adaf29a> in <module>()
----> 1 F1 = np.loadtxt(‘file.dat')

/Users/usr/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding)
   1090         # converting the data
   1091         X = None
-> 1092         for x in read_data(_loadtxt_chunksize):
   1093             if X is None:
   1094                 X = np.array(x, dtype)

/Users/usr/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.pyc in read_data(chunk_size)
   1014                 line_num = i + skiprows + 1
   1015                 raise ValueError("Wrong number of columns at line %d"
-> 1016                                  % line_num)
   1017 
   1018             # Convert each value according to its column and store

ValueError: Wrong number of columns at line 2

How can I read all the rows of the file except the first row using python? I have attached an example file here.

Additionally, the first column of this file (minus the first row) has n^2 number of rows (in the example I have n=3 and the entries of the column are 1,2,3,4,5,6,7,8,9). I want to read the first column (minus the first row) and save it as a text file where of shape (n,n) (i.e. the text file should have n rows and n columns). That is to say, I want the saved matrix to have the entries in the following order:

1.0 2.0 3.0
4.0 5.0 6.0
7.0 8.0 9.0

I will be thankful to have help.

genfromtxt is a little better at this, but it still needs the right number of delimiters, which is a big problem with the default white-space. usecols can be used to limit the load to the minimum number of columns. You can also tell it to skip the bad rows. — hpaulj
– hpaulj, Commented Oct 11, 2018 at 22:03
You could read the file line by line, split it, and perform your own corrections in the cases where the lines are too short. — hpaulj
– hpaulj, Commented Oct 11, 2018 at 22:04
hpaulj: Can you please give an example? I am new to python and I could not implement your suggestion. I will really appreciate it. — Ji Won Song
– Ji Won Song, Commented Oct 12, 2018 at 2:55

Zheng Liu · Accepted Answer · 2018-10-15 06:58:30Z

3

Some experiments to do: (not optimized) 1. Read in the lines of the file:

edit: The 'file.dat' file has empty lines. The if line.strip()... clause is to deal with the empty lines.

with open('file.dat', 'r') as fhand:
    file_lines = [line[:-1] for line in fhand if line.strip() != ''] # remove the last character '\n'. **Remove empty lines**.

If you don't like the first row, drop it.

file_lines.pop(0)

Now that the remaining lines have the same number of columns of numerical entries, you can split entries in each line, and do the type conversion:

mat_raw = [[float(term) for term in line.split()] for line in file_lines]

You then get a float matrix. For convenience in slicing, convert it into numpy array.

mat = numpy.array(mat_raw)
# then you can do whatever you like. eg: first column
first_col = mat[:, 0]
# reshape it to n by n matrix:
res = first_col.reshape((n, n))
...

Depending on the format of your file and your goal, you may optimise this code for your own use.

edited Oct 15, 2018 at 6:58

answered Oct 12, 2018 at 3:14

Zheng Liu

3022 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Ji Won Song Over a year ago

Thanks Zheng for the helpful answer. I still this small error when I try the technique that you suggested: ` first_col = mat[:, 0] IndexError: too many indices for array` Apparently the code doesn't like the line ` first_col = mat[:, 0] `. How can I get around this? Thanks in advance.

Zheng Liu Over a year ago

@JiWonSong Hi. Did it complain when you type mat = numpy.array(mat_raw)? If this line executed, can you tell me the results of these two lines? mat.shape and mat.dtype?

Zheng Liu Over a year ago

@JiWonSong Ah. I see the problem. Somehow, there are empty lines in the file.dat file. in defining file_lines, add an if clause will do. I've updated the answer.

Ji Won Song Over a year ago

Thanks very much. After the edit, the code works fine for me.

Collectives™ on Stack Overflow

Reading data file with unequal number of columns using numpy

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related