How to read Pandas Dataframe column information through string variable iteration

Question

I have a Pandas DataFrame with columns 'Var_1_Access', 'Var_2_Access',... 'Var_N_Access' and there is other information/columns between these columns that I would like to look for. For example:

data = pd.read_csv('File')
df = pd.Dataframe(data)
print(df.columns)


Index = (['Var_1', 'Var_1_Access', 'Var_1_comp1', 'Var_1_comp2', 'Var_2', 'Var_2_Access', 'Var_2_comp1', 'Var_2_comp2'], dtype='object')

I would like to write a for loop that goes through the range of N and pulls out 'Var_1_Access' up to 'Var_N_Access'.

I've tried:

Access_Matrix = []
for i in range(1, N + 1):
    Access_Matrix.append(df.f"Var_%i_Access" % i)

Access_Matrix = []
for i in range(1, N + 1):
    Access_Matrix.append(df.Var_{i}_Access)

Access_Matrix = []
for i in range(1, N + 1):
    Access_Matrix.append(df.Var_[i]_Access)

These all result in errors. Yes it would be possible to just write them in as N is small, but N will grow large and I really don't want to have to type every variable name in individually, and would rather index it. The end goal is to read the Pandas dataframe information for N variables and have the Access_Matrix be of shape [len(Var_N_Access), N]. Also, there may be the need to add more information between these specific variable names later, so that is the reason I would like to index it by string variable names vs. column indices and look for a pattern.

I can provide more information if necessary, but I think that this covers the necessary information.

I apologize for the format, not sure how to change it to an acceptable format so the post reads as two sections of code and three text blocks. — cwalde
– cwalde, Commented Apr 6, 2020 at 16:48
Thank you for fixing the format. I'll be sure to look back to this post in the future. — cwalde
– cwalde, Commented Apr 6, 2020 at 16:57

David Buck · Accepted Answer · 2020-04-06 17:10:59Z

1

You won't be able to do it with '.' notation, but you should be able to do this in square brackets with a 'f' string.

for i in range(1, N + 1):
    Access_Matrix.append(df[f"Var_{i}_Access"])

Or, perhaps a better approach would be to build up a list of the column names and extract them into a new dataframe in one go from df, e.g.:

cols = [f"Var_{i}_Access" for i in range(1, N+1)]
all_cols = df[cols]

edited Apr 6, 2020 at 17:10

answered Apr 6, 2020 at 16:52

David Buck

3,87840 gold badges54 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

cwalde Over a year ago

Brilliant! Worked like a charm, I'll accept the answer as soon as stack lets me. Thank you!

cwalde Over a year ago

I started with that approach but couldn't figure out how to write the correct syntax to pull the data from the dataframe. How would you do it that way?

David Buck Over a year ago

I've added some code to do that to the answer. To be honest, the other answer using a regex to filter the dataframe is probably the best approach to achieve the same result in a single command.

Vishnudev Krishnadas · Accepted Answer · 2020-04-06 17:15:59Z

1

Use pandas.DataFrame.filter

It will filter the columns using regex and produce a filtered version

access_df = df.filter(regex=f'Var_\d_Access')

For a specific value of N or to get until a range,

access_df = df.filter(regex=f'Var_[1-{N}]_Access')

This method is much more efficient than using a crude loop.

edited Apr 6, 2020 at 17:15

answered Apr 6, 2020 at 16:56

Vishnudev Krishnadas

11k2 gold badges29 silver badges58 bronze badges

Collectives™ on Stack Overflow

How to read Pandas Dataframe column information through string variable iteration

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related