0

I have a .dat file with numbers. In the first row, this file has five columns, and in all subsequent rows, it has four columns. I want to be able to read this file using numpy. I encounter the following error when I try to read this file at present:

In [3]: F1 = np.loadtxt(‘file.dat')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent 
call last)
<ipython-input-3-c0f31adaf29a> in <module>()
----> 1 F1 = np.loadtxt(‘file.dat')

/Users/usr/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding)
   1090         # converting the data
   1091         X = None
-> 1092         for x in read_data(_loadtxt_chunksize):
   1093             if X is None:
   1094                 X = np.array(x, dtype)

/Users/usr/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.pyc in read_data(chunk_size)
   1014                 line_num = i + skiprows + 1
   1015                 raise ValueError("Wrong number of columns at line %d"
-> 1016                                  % line_num)
   1017 
   1018             # Convert each value according to its column and store

ValueError: Wrong number of columns at line 2 

How can I read all the rows of the file except the first row using python? I have attached an example file here.

Additionally, the first column of this file (minus the first row) has n^2 number of rows (in the example I have n=3 and the entries of the column are 1,2,3,4,5,6,7,8,9). I want to read the first column (minus the first row) and save it as a text file where of shape (n,n) (i.e. the text file should have n rows and n columns). That is to say, I want the saved matrix to have the entries in the following order:

1.0 2.0 3.0
4.0 5.0 6.0
7.0 8.0 9.0

I will be thankful to have help.

3
  • genfromtxt is a little better at this, but it still needs the right number of delimiters, which is a big problem with the default white-space. usecols can be used to limit the load to the minimum number of columns. You can also tell it to skip the bad rows. Commented Oct 11, 2018 at 22:03
  • 1
    You could read the file line by line, split it, and perform your own corrections in the cases where the lines are too short. Commented Oct 11, 2018 at 22:04
  • hpaulj: Can you please give an example? I am new to python and I could not implement your suggestion. I will really appreciate it. Commented Oct 12, 2018 at 2:55

1 Answer 1

3

Some experiments to do: (not optimized) 1. Read in the lines of the file:

edit: The 'file.dat' file has empty lines. The if line.strip()... clause is to deal with the empty lines.

with open('file.dat', 'r') as fhand:
    file_lines = [line[:-1] for line in fhand if line.strip() != ''] # remove the last character '\n'. **Remove empty lines**.

If you don't like the first row, drop it.

file_lines.pop(0)

Now that the remaining lines have the same number of columns of numerical entries, you can split entries in each line, and do the type conversion:

mat_raw = [[float(term) for term in line.split()] for line in file_lines]

You then get a float matrix. For convenience in slicing, convert it into numpy array.

mat = numpy.array(mat_raw)
# then you can do whatever you like. eg: first column
first_col = mat[:, 0]
# reshape it to n by n matrix:
res = first_col.reshape((n, n))
...

Depending on the format of your file and your goal, you may optimise this code for your own use.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Zheng for the helpful answer. I still this small error when I try the technique that you suggested: ` first_col = mat[:, 0] IndexError: too many indices for array` Apparently the code doesn't like the line ` first_col = mat[:, 0] `. How can I get around this? Thanks in advance.
@JiWonSong Hi. Did it complain when you type mat = numpy.array(mat_raw)? If this line executed, can you tell me the results of these two lines? mat.shape and mat.dtype?
@JiWonSong Ah. I see the problem. Somehow, there are empty lines in the file.dat file. in defining file_lines, add an if clause will do. I've updated the answer.
Thanks very much. After the edit, the code works fine for me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.