Batch processing multiple excel files in folder via python script and outputting into folder, while ignoring other file types

Question

Each week our inventory reporting software outputs 15+ different inventory reports (one for each vendor respectively) in .xlsx format into one folder.

All the excel sheets are in an identical format but require a fair bit of manipulation prior to being “usable” to the end user. After trying to learn python over the last week, and piecing together various bits of code I've managed to come up with a VERY rough script to perform the processing required.

The script works as desired but as it stands currently I have to manually input the individual excel sheet file path, and change the output file name, within the script and then run the script for each corresponding excel file. Doing this for 15+ files each week renders the script redundant as would take less time to just reformat within excel manually.

After more extensive forum reading, and trying to plug my script into others code solutions (https://python-forum.io/thread-10841.html) I'm at an impasse. I'm struggling come up with a function to batch process all excel files in one folder, and ignore all other files. Ideally the manipulated files would replace the original excel files in the same folder but if that's not possible then it's not a deal breaker.

To throw another curveball into the mix I have been writing my script and deploying my code on my mac (using a copy of the .xlsx file which I sent myself from work, and a mac os filepath) to test my script on, but I hope to deploy this script on the Windows system at work eventually.

I've included only my original, currently functioning script below.

Thanks all

    #import pandas library
    import pandas as pd

    #import numpy
    import numpy as np

    #importing our excel to dataframe
    df = pd.read_excel('/Users/christiane/Downloads/AGLC CJ Test.xls.xlsx')

    #renaming our headers for each column
    df = df.set_axis(['vendor code', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'current inventory', 'incoming inventory', 'm', 'Sales: Previous 4 weeks', 'o'], axis=1)

    #convert 'vendor code' column data to string data
    df['vendor code'] = df['vendor code'].astype(str)

    #delete everything that doesn't start with a number from the bottom of the table
    df = df[~df['vendor code'].str.contains("[a-zA-Z]").fillna(False)]

    #convert 'vendor code' column data to integer data
    df['vendor code'] = df['vendor code'].astype(int)

    #sort values in column #1 from smallest to largest
    df.sort_values(by='vendor code', ascending=True, inplace=True)

    #add new column at the end input formula and populate downwards (=K1+L1)
    df['expectedinventory'] = df.loc[:,['current inventory', 'incoming inventory']].sum(axis=1)

    #move 'expected inventory' column from the end position to correct column position
    name_col = df.pop('expectedinventory')
    df.insert(12, 'expectedinventory', name_col)

    #delete columns 'm' and 'o'
    df = df.drop(['m', 'o'], axis=1)

    #run formula for 4 weeks of back orders and populate column “sold in last 4 weeks” with the results 
    df['Sales Advice'] = df['expectedinventory'] - (( df['Sales: Previous 4 weeks']*1) + (0.1*( df['Sales: Previous 4 weeks'])*1)) 


    #Don't know what this does but it works!
    pd.set_option('future.no_silent_downcasting', True)

    #change NaN values to Zero
    df['Sales Advice'] = df['Sales Advice'].fillna(0)


    inventory_data = pd.DataFrame(df)
    cols = ['Sales Advice']
    inventory_data.loc[:, cols] = inventory_data[cols].astype('float64').round()

    #delete unnecessary columns
    df = df.drop(['b', 'c', 'e', 'f','g','h', 'i', 'j'], axis=1)

    #adding a right-most column with NaN values
    df = df.reindex(columns=df.columns.tolist() + ['Ordering Now'])

    #output excel file
    df.to_excel('filteredaglctest.xlsx', index = False)

I assume the file AGLC CJ Test.xls.xlsx is one of the files you need to process and you would want to loop your script to process 14 other files just like it, possibly with some variation regarding the format of input or output? What varies between runs? (other than the input file name and the output file name)? — Grismar
– Grismar, Commented Nov 22, 2024 at 4:59
If all of your spreadsheets are in the same directory and if there's a well-known pattern to the filenames you could glob the directory in a loop to acquire and process each file appropriately. See this document — jackal
– jackal, Commented Nov 22, 2024 at 9:17
I plugged my script into the batch loop solution from this link stackoverflow.com/questions/72189803/…. Only problem is that now I'm running into a directory problem where my current "working directory" isn't the right one later on in the script (hard to tell where exaclty). — Christian E
– Christian E, Commented Nov 22, 2024 at 16:38

problemsolverROL · Accepted Answer · 2024-12-01 20:48:35Z

0

from pathlib import Path 
# APIs get imported here 

# Main folder where excel files are found 
FOLDER = "/home/user/Documents/excel_files" #Change this to your folder path!
EXT = ".xlsx"

def updateExcel(file):
    print(f"---> Processing {file} <----")
    #Insert your code here:

def main():
    #Specift the path in which the files are found 
    folder_path = Path(FOLDER)
    #Get a list of files with a specific extension
    files_ext = [files_ext for file in folder_path.iterdir() if file.is_file() and file.sufix = EXT]
    #Process the files 
    for file in files_ext:
        updateExcel(FOLDER+"/"+file)

if __name__ == '__main__':
    main()

answered Dec 1, 2024 at 20:48

problemsolverROL

1

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Batch processing multiple excel files in folder via python script and outputting into folder, while ignoring other file types

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related