0

I have this list with five heights in it and I want to put it in a loop to create five separate dataframes indexed by these numbers. This would include creating a column name based on different height, reading a csv file and assigning the colNames to it, and finally dropping the unused columns. I have multiple blocks of the same code to do this but I want to learn how to do it with a loop so I can clean up my script.

I get a NameError: name 'colNames' is not defined.

    i = 0
    height = ['0', '5', '15', '25', '50']
    while i < len(height):
        colNames["height{}".format(i)] = ["A", "B_%s" % height, "C", "D"]
        df["height{}".format(i)] = pd.read_csv("test%s.csv" % height, names = colNames["height{}".format(i)])
        df["height{}".format(i)].drop(labels = ["A", "C"],axis = 1, inplace = True)

        i += 1

Expected results

colNames0 = ["A", "B_0", "C", "D"]
df0 = pd.read_csv("test0.csv", names = colNames0])
df0.drop(labels = ["A", "C"], axis = 1, inplace = True)

...

colNames50 = ["A", "B_0", "C", "D"]
df50 = pd.read_csv("test50.csv", names = colNames50])
df50.drop(labels = ["A", "C"], axis = 1, inplace = True)

2
  • what is the problem then, you seem to have a loop. Commented Jan 14, 2022 at 20:50
  • sorry I forgot to type in the error message. I get an error saying its not defined but that's the variable that I want to define. Commented Jan 14, 2022 at 21:12

1 Answer 1

2

Trying to name separate DataFrames in this way is a bit unwieldy in Python, but here is how I might go about writing a loop for the problem you pose:

dflist = []

for num, height in enumerate(['0', '5', '15', '25', '50']):
    dflist.append(pd.read_csv('test{}.csv'.format(height), names=['A', 'B{}'.format(height), 'C', 'D'])[['B{}'.format(height), 'D']])

You would not have DataFrames named df0, df5, ..., but will rather have a list of DataFrames. Unless there is a reason to save the various column names, you can just name your columns directly in the call to pd.read_csv. Additionally, selecting only the columns you want to keep at the end of the line is a little more streamlined than dropping the others in a separate command. As a side note,

df['newname'] = value

is a way to make a new column in an existing DataFrame, not a way to define a DataFrame.

The reason you are getting a NameError is because the syntax

colNames[x] = value

assumes you are trying to assign the value to a pre-existing object named "colNames".

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! this is a much cleaner way to write it. However, I'm having an issue with index out of range. This is likely because of the first two rows having spaces that does not belong. Is there a way to only select from row 3 to the end for column B and D?
If I understand you correctly, something like dflist[num] = pd.read_csv('test{}.csv'.format(height), names=['A', 'B{}'.format(height), 'C', 'D']).loc[3:, ['B{}'.format(height), 'D']] should work.
That worked perfectly. But I am still getting an index out of range. Turns out the list assignment index out of range is from dflist[height]. Without [height] the code works but I want it to be assigned to a different variable for each new height (e.g. dflist0, dflist5, dflist15, dflist25, and dflist50.)
I see the issue! It was my mistake, I have corrected the post and the code should work now.
Is it possible to append onto the same dataframe?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.