0

I am trying to import all files from a given folder, and one sheet information from all files should go to one dataframe - that is done.

Where I am stuck now is while importing each dataframe from one workbook and before appending to the masterdataframe for that sheet, this dataframe should be concatenated with the filename which has been imported.

import os
import pandas as pd

path = "C:/backup/vp/apr 11 2018 and related files/All Files/"
#print(os.listdir(path))

all_files = [path+file for file in os.listdir(path)]
#print(all_files)

all_files = [file for file in os.listdir(path)]
#print(all_files)

accepted_extensions = ["xlsx"]
xlsx_extension_file1 = [fn for fn in os.listdir(path) if fn.split(".")[-1] in accepted_extensions]
#print(xlsx_extension_file)
xlsx_extension_file = [path+fn for fn in os.listdir(path) if fn.split(".")[-1] in accepted_extensions]
#print(xlsx_extension_file)
df1= pd.DataFrame()
df2= pd.DataFrame()
for file in xlsx_extension_file:
    print(file)
    data = pd.ExcelFile(file)
    #df= pd.DataFrame()
    try:
        df1 = df1.append(data.parse("vInfo",header=0),ignore_index=True)
        #df1frame('FileName')=df1.fill
        #df1 = df1[df1['Host'].str.contains('nl03esx141ccpv1')]
        df2 = df1.append(data.parse("vDatastore",header=0),ignore_index=True)
        #df2 = df2[df2['Name'].str.contains('bb2')]
        #df1['filename'] = os.path.basename(file)
        #df1.append(df1['filename'])
        #df2['filename'] = os.path.basename(file)
        #df2.append(df2['filename'])
        #print(frame.head())
    except:
        pass
#print(df1)
#print(df2)
writer = pd.ExcelWriter('output1.xlsx')
df1.to_excel(writer, 'vInfo')
df2.to_excel(writer, 'vDatastore')
#df1.to_html('html14apr2018vInfo.html')
input()

1 Answer 1

2

Consider DataFrame.assign to add column for filename. Also, consider using lists to collect all dataframes and then concatenate together at end instead of expanding same master dataframes within loop:

...
df1_list = []
df2_list = []

for file in xlsx_extension_file:
    print(file)
    data = pd.ExcelFile(file)

    try:
        df1_list.append(data.parse("vInfo",header=0).assign(filename=file))
        df2_list.append(data.parse("vDatastore",header=0).assign(filename=file))

    except:
        pass

df1 = pd.concat(df1_list, ignore_index = True)
df2 = pd.concat(df2_list, ignore_index = True)
...
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.