Each week our inventory reporting software outputs 15+ different inventory reports (one for each vendor respectively) in .xlsx format into one folder.
All the excel sheets are in an identical format but require a fair bit of manipulation prior to being “usable” to the end user. After trying to learn python over the last week, and piecing together various bits of code I've managed to come up with a VERY rough script to perform the processing required.
The script works as desired but as it stands currently I have to manually input the individual excel sheet file path, and change the output file name, within the script and then run the script for each corresponding excel file. Doing this for 15+ files each week renders the script redundant as would take less time to just reformat within excel manually.
After more extensive forum reading, and trying to plug my script into others code solutions (https://python-forum.io/thread-10841.html) I'm at an impasse. I'm struggling come up with a function to batch process all excel files in one folder, and ignore all other files. Ideally the manipulated files would replace the original excel files in the same folder but if that's not possible then it's not a deal breaker.
To throw another curveball into the mix I have been writing my script and deploying my code on my mac (using a copy of the .xlsx file which I sent myself from work, and a mac os filepath) to test my script on, but I hope to deploy this script on the Windows system at work eventually.
I've included only my original, currently functioning script below.
Thanks all
#import pandas library
import pandas as pd
#import numpy
import numpy as np
#importing our excel to dataframe
df = pd.read_excel('/Users/christiane/Downloads/AGLC CJ Test.xls.xlsx')
#renaming our headers for each column
df = df.set_axis(['vendor code', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'current inventory', 'incoming inventory', 'm', 'Sales: Previous 4 weeks', 'o'], axis=1)
#convert 'vendor code' column data to string data
df['vendor code'] = df['vendor code'].astype(str)
#delete everything that doesn't start with a number from the bottom of the table
df = df[~df['vendor code'].str.contains("[a-zA-Z]").fillna(False)]
#convert 'vendor code' column data to integer data
df['vendor code'] = df['vendor code'].astype(int)
#sort values in column #1 from smallest to largest
df.sort_values(by='vendor code', ascending=True, inplace=True)
#add new column at the end input formula and populate downwards (=K1+L1)
df['expectedinventory'] = df.loc[:,['current inventory', 'incoming inventory']].sum(axis=1)
#move 'expected inventory' column from the end position to correct column position
name_col = df.pop('expectedinventory')
df.insert(12, 'expectedinventory', name_col)
#delete columns 'm' and 'o'
df = df.drop(['m', 'o'], axis=1)
#run formula for 4 weeks of back orders and populate column “sold in last 4 weeks” with the results
df['Sales Advice'] = df['expectedinventory'] - (( df['Sales: Previous 4 weeks']*1) + (0.1*( df['Sales: Previous 4 weeks'])*1))
#Don't know what this does but it works!
pd.set_option('future.no_silent_downcasting', True)
#change NaN values to Zero
df['Sales Advice'] = df['Sales Advice'].fillna(0)
inventory_data = pd.DataFrame(df)
cols = ['Sales Advice']
inventory_data.loc[:, cols] = inventory_data[cols].astype('float64').round()
#delete unnecessary columns
df = df.drop(['b', 'c', 'e', 'f','g','h', 'i', 'j'], axis=1)
#adding a right-most column with NaN values
df = df.reindex(columns=df.columns.tolist() + ['Ordering Now'])
#output excel file
df.to_excel('filteredaglctest.xlsx', index = False)
AGLC CJ Test.xls.xlsxis one of the files you need to process and you would want to loop your script to process 14 other files just like it, possibly with some variation regarding the format of input or output? What varies between runs? (other than the input file name and the output file name)?