3

I have a CSV containing numbers which I am trying to convert to floats.

filename = "filename.csv"
enclosed_folder = "path/to/Folder"
full_path = os.path.join(enclosed_folder,filename)

with open(full_path) as input_data:
    temp = input_data.readlines()
    n = len(temp) #int(temp.pop(0))
    matrix = [x.split(" ") for x in temp]
    for i in range(n):
        for j in range(n):
            matrix[i][j] = float(matrix[i][j])
    input_data.close()

When I open the file in any text editor, it does not show the \n at the end of each row.

enter image description here

But running the python code shows the `ValueError: could not convert string to float' because of '\n' being present at the end of each row.

Traceback (most recent call last):
  File "hierarchical-clustering.py", line 37, in <module>
    matrix[i][j] = float(matrix[i][j])
ValueError: could not convert string to float: '1,0.058824,0.076923,0.066667,0.055556,0.058824,0.071429,0.052632,0.076923,0.0625,0.0625,0.055556,0.055556,0.05,0.066667,0,0,0.055556,0.0625,0.058824,0.058824,0.047619,0.055556,0.0625,0,0.052632,0.066667,0.055556,0.0625,0.058824,0.041667,0.066667,0.058824,0.071429,0.066667,0.076923,0,0.083333,0.052632,0.071429,0.076923,0,0.0625,0.076923,0.058824,0.076923,0.055556,0,0.0625,0.071429,0.0625,0.0625,0.083333,0,0,0,0.058824,0.0625,0,0.058824,0.0625,0.0625,0.066667,0.0625,0.052632,0.066667,0.076923,0.058824,0.071429,0.066667,0.058824,0.071429,0.058824,0.071429,0.058824,0.071429,0.071429\n'

So, how do I fix this error?

EDIT: I used strip() as well as rstrip() as suggested in some of the answers to remove whitespaces, but still the error does not go away:

Traceback (most recent call last):
  File "hierarchical-clustering.py", line 37, in <module>
    matrix[i][j] = float(matrix[i][j].rstrip())
ValueError: could not convert string to float: '1,0.058824,0.076923,0.066667,0.055556,0.058824,0.071429,0.052632,0.076923,0.0625,0.0625,0.055556,0.055556,0.05,0.066667,0,0,0.055556,0.0625,0.058824,0.058824,0.047619,0.055556,0.0625,0,0.052632,0.066667,0.055556,0.0625,0.058824,0.041667,0.066667,0.058824,0.071429,0.066667,0.076923,0,0.083333,0.052632,0.071429,0.076923,0,0.0625,0.076923,0.058824,0.076923,0.055556,0,0.0625,0.071429,0.0625,0.0625,0.083333,0,0,0,0.058824,0.0625,0,0.058824,0.0625,0.0625,0.066667,0.0625,0.052632,0.066667,0.076923,0.058824,0.071429,0.066667,0.058824,0.071429,0.058824,0.071429,0.058824,0.071429,0.071429'
4
  • 1
    I don't think float cares about newlines. I just tried float("1.0\n") on my machine and it happily gives me 1.0. I think the problem is your commas. float("1,2") does not work, for instance. Commented Jun 29, 2017 at 13:19
  • 1
    Have you considered using the csv module to read your csv file? If you use that instead of trying to parse the file manually, IIRC it will perform rudimentary type conversion on your behalf. Then you don't need to call float at all. Commented Jun 29, 2017 at 13:20
  • 1
    @Kevin - No, Python's csv will not assume any types. It deliberately considers everything a string. (This is both more Pythonic (explicit is better than implicit) and avoids one of the things that programmers hate most about Excel.) Commented Jun 29, 2017 at 13:30
  • Oops. Perhaps I was thinking of a third-party csv parser, then. Still, the module is useful even without providing type conversion. Commented Jun 29, 2017 at 13:36

3 Answers 3

6

The error is due to your line parsing. You are separating on spaces, not commas, which is what should happen according to your screenshot. The key is looking at the error returned. It is trying to convert the entire line from a string into a float.

Change:

matrix = [x.split(" ") for x in temp]

To:

matrix = [x.split(",") for x in temp]
Sign up to request clarification or add additional context in comments.

1 Comment

@Kristada673, it happens to us all. The best thing to do is read the error messages very carefully to determine the root cause. Otherwise, you're likely to go down a rabbit hole and waste a ton of time before realizing how simple the mistake was.
2

you can use strip() to remove whitespaces from the string.

matrix[i][j] = float(matrix[i][j].strip())

If the commas are troubling you, you might want to .split(',') with commas and not spaces:

matrix = [x.strip().split(",") for x in temp]

Comments

1

Remove the newline char with rstrip() like this:

matrix[i][j] = float(matrix[i][j].rstrip())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.