1

I am trying to read this file using read_csv in pandas(python). But I am not able to capture all columns. Can you help?

Here is the code:

file = r'path of file'
df = pd.read_csv(file, encoding='cp1252', on_bad_lines='skip')
2
  • What exactly do you mean by "not able to capture all columns"? What's the expected result? What result are you actually getting? What's the difference between the two? Commented Sep 19, 2022 at 11:11
  • if you open the file in excel or notepad++ you will see that there are 161 columsn and the code can capture only 7 Commented Sep 19, 2022 at 11:15

1 Answer 1

1

I tried to read your file, and I first noticed that the encoding you specified does not correspond to the one used in your file. I also noticed that the separator is not a comma (,) but a tab (\t).

First, to get the file encoding (in linux), you just need to run:

$ file -i kopie.csv 
kopie.csv: text/plain; charset=utf-16le

In Python:

import pandas as pd

path_to_file = 'kopie.csv'
df = pd.read_csv(path_to_file, encoding='utf-16le', sep='\t')

And when I print the shape of the loaded dataframe:

>>> df.shape
(869, 161)
Sign up to request clarification or add additional context in comments.

2 Comments

how to find the encoding in windows?
it is less obvious, check this SO page: stackoverflow.com/questions/3710374/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.