2

I have approximately 150 different workbooks (xlsx) in a folder that I would like to read into a python dataframe for analysis.

Each workbook is set up identically with the same sheet names and column names.

I would need to upload the first sheet of each workbook ("Keywords Rankings") to each DataFrame. For the first worksheet read in, I would want to start on row 11 to maintain the column headers; every worksheet after that I would want to append to my DataFrame starting on row 12.

I am new to Python and have been reading some instructions online but am stuck. From my understanding, I could use the xlrd library to facilitate this.

I've been playing around with the below code but haven't gotten far. 'Keywords Rankings' is the sheet name I want to append.

import pandas as pd
import numpy as np
import glob as glob

all_data = pd.DataFrame()
all_data = pd.ExcelFile("C:\\Users\\John Smith\\Documents\\Analysis\\FPR Nov - Mar 2018\\Dec_1_General.xlsx")
print(all_data.sheet_names)
all_d = all_data.parse('Keywords Rankings')

for f in glob.glob("Users\\John Smith\\Documents\\Analysis\\FPR Nov - Mar 2018\\*.xlsx", recursive=True):
    df = pd.read_excel(f)
    all_d = all_d.append(df,ignore_index=True)

1 Answer 1

3

You should not continually append to an existing pd.DataFrame, as this will be extremely inefficient.

You should use pandas.concat with a list of dataframes.

This can be facilitated by a list comprehension:

df = pd.concat([pd.read_excel(f, skiprows=range(10)) for f in files], axis=0)

Columns will automatically align, assuming that headers are present in each Excel worksheet in row 11.

Sign up to request clarification or add additional context in comments.

4 Comments

When I create a variable for files {files = "Documents\Analysis\FPR Nov - Mar 2018*"}, I get an error stating that "FileNotFoundError: [Errno 2] No such file or directory: 'D'" I have checked and my current directory is correct. am I supposed to input something different for the file variable?
files should be a list of full paths to your files. You have only included folder names. So, yes, you should look up how to retrieve full paths.
I've tried with both the full path name starting from C: drive and the partial path name. I have been using "\*" at the end to indicate that I want all files within the final folder. Is that the correct notation?
I don't know. There are many questions on SO on how to extract filenames using standard libraries, I suggest you look them up.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.