Pandas read_excel() not reading all columns

Question

I'm using Python 3.12.6 and pandas==2.2.3.

This is a simple code, that I've always used and always worked to read the first sheet of an excel file:

df = pd.read_excel(file_path, engine='openpyxl', sheet_name=0, index_col=None)

However, I have an excel file that is behaving strangely. This is the sheet header, it has these columns:

"NOME", "DATA INSCRIÇÃO ", "PROVA OBJETIVA", "PROVA DISCURSIVA"

However, note that there are a few line breaks in some cells that might be strange to utf8 encoding:

read_excel() only reads up to the column "DATA INSCRIÇÃO".

print(df.columns)

Index(['NOME', 'DATA INSCRIÇÃO '],
      dtype='object')

When I save this sheet to .csv and open with notepad, this is what I see:

NOME;DATA INSCRIÇÃO ;"PROVA 
OBJETIVA";"PROVA 
DISCURSIVA"

I've noticed there are line breaks, as well as quotes, precisely on the problematic columns. Anyone has any idea why it's breaking? Or a better way to read all the columns in Python?

If I save the sheet to .csv and read_csv(), it breaks with an encoding error, which I suspect is the problem. BUT, if I try this:

df = pd.read_csv(csv_path, delimiter=';', encoding='latin1')

It works! If I'm interpreting this correctly, this tells me that there might be a latin1 encoded line break that read_excel can't read. The problem is: read_excel() has no encoding argument. I've looked at the other possible arguments to read_excel, but nothing seems to help. Any help would be greatly appreciated.

try passing name=["NOME", "DATA INSCRIÇÃO ","PROVA OBJETIVA", "PROVA DISCURSIVA"] including new lines — LMC
– LMC, Commented Feb 27 at 2:40
Are you saying that there are line breaks in the column names? You'll need to remove these manually before trying to save to CSV. TBH for Excel to CSV I wouldn't use Pandas at all, just chain Openpyxl and CSV.writer objects together and go line by line. — Charlie Clark
– Charlie Clark, Commented Feb 27 at 10:12

General Grievance · Accepted Answer · 2025-02-27 14:11:09Z

0

import pandas as pd
df = pd.read_excel('file.xlsx', usecols='A:C,E:G')
df = pd.read_excel('file.xlsx', header=1)
df = pd.read_excel('file.xlsx', sheet_name='Sheet1')

edited Feb 27 at 14:11

General Grievance

5,12039 gold badges39 silver badges60 bronze badges

answered Feb 27 at 7:37

Wale

18 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas read_excel() not reading all columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related