0

I have a script which pulls in data from a csv file, does some manipulations to it and creates an output excel file. But, its a tedious process as I need to do it for multiple files.

Question: Is there a way for me to run this script across multiple csv files together and create a separate excel file output for each input file?

I'm not sure what to try out here. I've read that I need to use a module called glob but I'm not sure how to go about it.

This script works for a single file:

# Import libraries
import pandas as pd
import xlsxwriter

# Set system paths
INPUT_PATH = 'SystemPath//Downloads//'
INPUT_FILE = 'rawData.csv'

OUTPUT_PATH = 'SystemPath//Downloads//Output//'
OUTPUT_FILE = 'rawDataOutput.xlsx'

# Get data
df = pd.read_csv(INPUT_PATH + INPUT_FILE)

# Clean data
cleanedData = df[['State','Campaigns','Type','Start date','Impressions','Clicks','Spend(INR)',
                  'Orders','Sales(INR)','NTB orders','NTB sales']]
cleanedData = cleanedData[cleanedData['Impressions'] != 0].sort_values('Impressions', 
                                                                       ascending= False).reset_index()
cleanedData.loc['Total'] = cleanedData.select_dtypes(pd.np.number).sum()
cleanedData['CTR(%)'] = (cleanedData['Clicks'] / 
                         cleanedData['Impressions']).astype(float).map("{:.2%}".format)
cleanedData['CPC(INR)'] = (cleanedData['Spend(INR)'] / cleanedData['Clicks'])
cleanedData['ACOS(%)'] = (cleanedData['Spend(INR)'] / 
                          cleanedData['Sales(INR)']).astype(float).map("{:.2%}".format)
cleanedData['% of orders NTB'] = (cleanedData['NTB orders'] / 
                                  cleanedData['Orders']).astype(float).map("{:.2%}".format)
cleanedData['% of sales NTB'] = (cleanedData['NTB sales'] / 
                                 cleanedData['Sales(INR)']).astype(float).map("{:.2%}".format)
cleanedData = cleanedData[['State','Campaigns','Type','Start date','Impressions','Clicks','CTR(%)',
                           'Spend(INR)','CPC(INR)','Orders','Sales(INR)','ACOS(%)',
                           'NTB orders','% of orders NTB','NTB sales','% of sales NTB']]

# Create summary
summaryData = cleanedData.groupby(['Type'])[['Spend(INR)','Sales(INR)']].agg('sum')
summaryData.loc['Overall Snapshot'] = summaryData.select_dtypes(pd.np.number).sum()
summaryData['ROI'] = summaryData['Sales(INR)'] / summaryData['Spend(INR)']

# Push to excel
writer = pd.ExcelWriter(OUTPUT_PATH + OUTPUT_FILE, engine='xlsxwriter')
summaryData.to_excel(writer, sheet_name='Summary')
cleanedData.to_excel(writer, sheet_name='Overall Report')
writer.save()

I've never tried anything like this before and I would appreciate your help trying to figure this out

1

4 Answers 4

2

You can use Python's glob.glob() to get all of the CSV files from a given folder. For each filename that is returned, you could derive a suitable output filename. The file processing could be moved into a function as follows:

# Import libraries
import pandas as pd
import xlsxwriter
import glob
import os

def process_csv(input_filename, output_filename):
    # Get data
    df = pd.read_csv(input_filename)

    # Clean data
    cleanedData = df[['State','Campaigns','Type','Start date','Impressions','Clicks','Spend(INR)',
                    'Orders','Sales(INR)','NTB orders','NTB sales']]
    cleanedData = cleanedData[cleanedData['Impressions'] != 0].sort_values('Impressions', 
                                                                        ascending= False).reset_index()
    cleanedData.loc['Total'] = cleanedData.select_dtypes(pd.np.number).sum()
    cleanedData['CTR(%)'] = (cleanedData['Clicks'] / 
                            cleanedData['Impressions']).astype(float).map("{:.2%}".format)
    cleanedData['CPC(INR)'] = (cleanedData['Spend(INR)'] / cleanedData['Clicks'])
    cleanedData['ACOS(%)'] = (cleanedData['Spend(INR)'] / 
                            cleanedData['Sales(INR)']).astype(float).map("{:.2%}".format)
    cleanedData['% of orders NTB'] = (cleanedData['NTB orders'] / 
                                    cleanedData['Orders']).astype(float).map("{:.2%}".format)
    cleanedData['% of sales NTB'] = (cleanedData['NTB sales'] / 
                                    cleanedData['Sales(INR)']).astype(float).map("{:.2%}".format)
    cleanedData = cleanedData[['State','Campaigns','Type','Start date','Impressions','Clicks','CTR(%)',
                            'Spend(INR)','CPC(INR)','Orders','Sales(INR)','ACOS(%)',
                            'NTB orders','% of orders NTB','NTB sales','% of sales NTB']]

    # Create summary
    summaryData = cleanedData.groupby(['Type'])[['Spend(INR)','Sales(INR)']].agg('sum')
    summaryData.loc['Overall Snapshot'] = summaryData.select_dtypes(pd.np.number).sum()
    summaryData['ROI'] = summaryData['Sales(INR)'] / summaryData['Spend(INR)']

    # Push to excel
    writer = pd.ExcelWriter(output_filename, engine='xlsxwriter')
    summaryData.to_excel(writer, sheet_name='Summary')
    cleanedData.to_excel(writer, sheet_name='Overall Report')
    writer.save()

# Set system paths
INPUT_PATH = 'SystemPath//Downloads//'
OUTPUT_PATH = 'SystemPath//Downloads//Output//'

for csv_filename in glob.glob(os.path.join(INPUT_PATH, "*.csv")):
    name, ext = os.path.splitext(os.path.basename(csv_filename))
    # Create an output filename based on the input filename
    output_filename = os.path.join(OUTPUT_PATH, f"{name}Output.xlsx")
    process_csv(csv_filename, output_filename)

os.path.join() can be used as a safer way to join file paths together.

Sign up to request clarification or add additional context in comments.

Comments

1

Something like:

import os
import glob
import pandas as pd

os.chdir(r'path\to\folder') #changes folder path to working dir
filelist=glob.glob('*.csv') #creates a list of all csv files
for file in filelist:       #loops through the files
    df=pd.read_csv(file,...)
    #Do something and create a final_df
    final_df.to_excel(file[:-4],+'_output.xlsx',index=False) #excel with same name+ouput

Comments

1

you can run this scrip inside a for loop:

for file in os.listdir(INPUT_PATH):
    if file.endswith('.csv') or file.endswith('.CSV'):
        INPUT_FILE = INPUT_PATH + '/' + file
        OUTPUT_FILE = INPUT_PATH  + '/Outputs/' + file.[:-4] + 'xlsx'

1 Comment

Thank you for this, this was the easiest way to do it!
0

try this:

import glob

files = glob.glob(INPUT_PATH + "*.csv")

for file in files:
    # Get data
    df = pd.read_csv(file)

    # Clean data
    #your cleaning code  

   # Push to excel
   writer = pd.ExcelWriter(OUTPUT_PATH + file.split("/")[-1].replace(".csv","_OUTPUT.xlxs", engine='xlsxwriter')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.