0

I'm using Python 3.12.6 and pandas==2.2.3.

This is a simple code, that I've always used and always worked to read the first sheet of an excel file:

df = pd.read_excel(file_path, engine='openpyxl', sheet_name=0, index_col=None)

However, I have an excel file that is behaving strangely. This is the sheet header, it has these columns:

"NOME", "DATA INSCRIÇÃO ", "PROVA OBJETIVA", "PROVA DISCURSIVA"

However, note that there are a few line breaks in some cells that might be strange to utf8 encoding:

read_excel() only reads up to the column "DATA INSCRIÇÃO".

print(df.columns)

Index(['NOME', 'DATA INSCRIÇÃO '],
      dtype='object')

When I save this sheet to .csv and open with notepad, this is what I see:

NOME;DATA INSCRIÇÃO ;"PROVA 
OBJETIVA";"PROVA 
DISCURSIVA"

I've noticed there are line breaks, as well as quotes, precisely on the problematic columns. Anyone has any idea why it's breaking? Or a better way to read all the columns in Python?

If I save the sheet to .csv and read_csv(), it breaks with an encoding error, which I suspect is the problem. BUT, if I try this:

df = pd.read_csv(csv_path, delimiter=';', encoding='latin1')

It works! If I'm interpreting this correctly, this tells me that there might be a latin1 encoded line break that read_excel can't read. The problem is: read_excel() has no encoding argument. I've looked at the other possible arguments to read_excel, but nothing seems to help. Any help would be greatly appreciated.

2
  • 2
    try passing name=["NOME", "DATA INSCRIÇÃO ","PROVA OBJETIVA", "PROVA DISCURSIVA"] including new lines Commented Feb 27 at 2:40
  • Are you saying that there are line breaks in the column names? You'll need to remove these manually before trying to save to CSV. TBH for Excel to CSV I wouldn't use Pandas at all, just chain Openpyxl and CSV.writer objects together and go line by line. Commented Feb 27 at 10:12

1 Answer 1

0
import pandas as pd
df = pd.read_excel('file.xlsx', usecols='A:C,E:G')
df = pd.read_excel('file.xlsx', header=1)
df = pd.read_excel('file.xlsx', sheet_name='Sheet1')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.