0

I'm trying to run my script on several .csv files and output the results from each file. A snippet of my code is as follows-

import sys
import os
import logging
import subprocess
import argparse
import pandas as pd
import glob


files = glob.glob('/scratch/*/*.csv')

for file in files:
        df = pd.read_csv(file,delimiter = ',',skiprows=range(1,11))

#do some calculation on each file


#calculate the final value
metric = (max(max(dif_r1a),max(dif_r1c),max(dif_r1g),max(dif_r1t),max(dif_r2a),max(dif_r2c),max(dif_r2g),max(dif_r2t)))

#output the final value for each csv file
print(os.path.basename(file) + ' ' + str(metric))

The output I get is only for a single csv file

file1.csv 0.25

How do I iterate this to output the value for all the csv files ?

Thank you

1
  • 2
    Is the code you wrote exactly as you posted above? Did you not mean to indent the rest of the code to be in the for loop? Commented Aug 25, 2021 at 14:04

1 Answer 1

1

From what it appears like in your code above you create a dataframe for each .csv file, but only calculate the final value and print after the for loop executes. If you were to want to do it for each dataframe, these would need to be in the for loop:

import sys
import os
import logging
import subprocess
import argparse
import pandas as pd
import glob


files = glob.glob('/scratch/*/*.csv')

for file in files:
        df = pd.read_csv(file,delimiter = ',',skiprows=range(1,11))

#do some calculation on each file


#calculate the final value
metric = (max(max(dif_r1a),max(dif_r1c),max(dif_r1g),max(dif_r1t),max(dif_r2a),max(dif_r2c),max(dif_r2g),max(dif_r2t)))

#output the final value for each csv file
print(os.path.basename(file) + ' ' + str(metric))

This is what you have at the moment, but you would want to change it to:

import sys
import os
import logging
import subprocess
import argparse
import pandas as pd
import glob


files = glob.glob('/scratch/*/*.csv')

for file in files:
    df = pd.read_csv(file,delimiter = ',',skiprows=range(1,11))

    #do some calculation on each file


    #calculate the final value
    metric = 
    (max(max(dif_r1a),max(dif_r1c),max(dif_r1g),max(dif_r1t), \
    max(dif_r2a),max(dif_r2c),max(dif_r2g),max(dif_r2t)))

    #output the final value for each csv file
    print(os.path.basename(file) + ' ' + str(metric))

However this could also be due to formatting on the comment.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.