7

Reading a simple xls returning empty dataframe, can't figure it out for the life of me:

path = ('c:/Users/Desktop/Stuff/Ready')
files = os.listdir(path)
print(files)

files_xlsx = [f for f in files if f[-3:] == 'xlsx']

readyorders = pd.DataFrame()
for filename in files_xlsx:
    with open(os.path.join(path, filename)) as f:
        data = pd.read_excel(f)
        readyorders = readyorders.append(data)

print(readyorders)

The excel is just two simple columns...is it just too early in the day?

2
  • 1
    In general, pd.read_excel returns a map sheetname -> dataframe. You may use sheetname=None as arg. This should read the dataframe in the first (and possibly only) sheet Commented Sep 12, 2017 at 16:02
  • 1
    By default its first sheet.. but even with sheetname arg defined still empty. Commented Sep 12, 2017 at 16:05

5 Answers 5

7

I had a similar issue and it turns out that there are TWO types of XLSX: "Excel Workbook" (at the top of the list in the image below) and "Strict Open XML Spreadsheet" (with the checkmark). The latter returns an empty spreadsheet in pandas, so use the Excel Workbook (.xlsx) and you won't have problems.

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

3

I had the same issue and I later i discovered that it's because I have many sheets in the excel file and I didn't specify the sheet name.

Comments

3

Sometimes there is also a "hidden sheet" which results of bad exports.. You should use the sheet_name parameter for your sheet then or you could also use sheet_name=None. Then you get a dict with the empty df of the hidden sheet and the other data

1 Comment

This was it for me, certainly worth understanding what sheets are present in the spreadsheet you're trying to open and specify the correct one with sheet_name.
1

f[-3:] == 'xlsx' will never be true, as you are evaluating the last three characters and comparing it to a string of four characters.

Try f[-4:] == 'xlsx'

As an aside, appending dataframes is very slow. Try concatenating instead:

readyorders = pd.concat([pd.read_excel(f) for f in files if f[-5:] == '.xlsx']

1 Comment

This appears to be the right issue. But now im getting unicode decode errors. This exact code works fine elsewhere....am i losing my mind?
0

Mine returned an empty DataFrame and I checked:

xl = pd.ExcelFile(path)
print(xl.sheet_names)  # see all sheet names

and I found a hidden sheet.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.