How do I loop variable names based on values in a list

Question

I have this list with five heights in it and I want to put it in a loop to create five separate dataframes indexed by these numbers. This would include creating a column name based on different height, reading a csv file and assigning the colNames to it, and finally dropping the unused columns. I have multiple blocks of the same code to do this but I want to learn how to do it with a loop so I can clean up my script.

I get a NameError: name 'colNames' is not defined.

    i = 0
    height = ['0', '5', '15', '25', '50']
    while i < len(height):
        colNames["height{}".format(i)] = ["A", "B_%s" % height, "C", "D"]
        df["height{}".format(i)] = pd.read_csv("test%s.csv" % height, names = colNames["height{}".format(i)])
        df["height{}".format(i)].drop(labels = ["A", "C"],axis = 1, inplace = True)

        i += 1

Expected results

colNames0 = ["A", "B_0", "C", "D"]
df0 = pd.read_csv("test0.csv", names = colNames0])
df0.drop(labels = ["A", "C"], axis = 1, inplace = True)

...

colNames50 = ["A", "B_0", "C", "D"]
df50 = pd.read_csv("test50.csv", names = colNames50])
df50.drop(labels = ["A", "C"], axis = 1, inplace = True)

sorry I forgot to type in the error message. I get an error saying its not defined but that's the variable that I want to define. — Yogi
– Yogi, Commented Jan 14, 2022 at 21:12

Tanner Eastmond · Accepted Answer · 2022-01-14 23:05:20Z

2

Trying to name separate DataFrames in this way is a bit unwieldy in Python, but here is how I might go about writing a loop for the problem you pose:

dflist = []

for num, height in enumerate(['0', '5', '15', '25', '50']):
    dflist.append(pd.read_csv('test{}.csv'.format(height), names=['A', 'B{}'.format(height), 'C', 'D'])[['B{}'.format(height), 'D']])

You would not have DataFrames named df0, df5, ..., but will rather have a list of DataFrames. Unless there is a reason to save the various column names, you can just name your columns directly in the call to pd.read_csv. Additionally, selecting only the columns you want to keep at the end of the line is a little more streamlined than dropping the others in a separate command. As a side note,

df['newname'] = value

is a way to make a new column in an existing DataFrame, not a way to define a DataFrame.

The reason you are getting a NameError is because the syntax

colNames[x] = value

assumes you are trying to assign the value to a pre-existing object named "colNames".

edited Jan 14, 2022 at 23:05

answered Jan 14, 2022 at 21:12

Tanner Eastmond

12710 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Yogi Over a year ago

Thanks! this is a much cleaner way to write it. However, I'm having an issue with index out of range. This is likely because of the first two rows having spaces that does not belong. Is there a way to only select from row 3 to the end for column B and D?

Tanner Eastmond Over a year ago

If I understand you correctly, something like

dflist[num] = pd.read_csv('test{}.csv'.format(height), names=['A', 'B{}'.format(height), 'C', 'D']).loc[3:, ['B{}'.format(height), 'D']]

should work.

Yogi Over a year ago

That worked perfectly. But I am still getting an index out of range. Turns out the list assignment index out of range is from dflist[height]. Without [height] the code works but I want it to be assigned to a different variable for each new height (e.g. dflist0, dflist5, dflist15, dflist25, and dflist50.)

Tanner Eastmond Over a year ago

I see the issue! It was my mistake, I have corrected the post and the code should work now.

Yogi Over a year ago

Is it possible to append onto the same dataframe?

Collectives™ on Stack Overflow

How do I loop variable names based on values in a list

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related