1

I have a Pandas DataFrame with columns 'Var_1_Access', 'Var_2_Access',... 'Var_N_Access' and there is other information/columns between these columns that I would like to look for. For example:

data = pd.read_csv('File')
df = pd.Dataframe(data)
print(df.columns)


Index = (['Var_1', 'Var_1_Access', 'Var_1_comp1', 'Var_1_comp2', 'Var_2', 'Var_2_Access', 'Var_2_comp1', 'Var_2_comp2'], dtype='object')

I would like to write a for loop that goes through the range of N and pulls out 'Var_1_Access' up to 'Var_N_Access'.

I've tried:

Access_Matrix = []
for i in range(1, N + 1):
    Access_Matrix.append(df.f"Var_%i_Access" % i)

Access_Matrix = []
for i in range(1, N + 1):
    Access_Matrix.append(df.Var_{i}_Access)

Access_Matrix = []
for i in range(1, N + 1):
    Access_Matrix.append(df.Var_[i]_Access)

These all result in errors. Yes it would be possible to just write them in as N is small, but N will grow large and I really don't want to have to type every variable name in individually, and would rather index it. The end goal is to read the Pandas dataframe information for N variables and have the Access_Matrix be of shape [len(Var_N_Access), N]. Also, there may be the need to add more information between these specific variable names later, so that is the reason I would like to index it by string variable names vs. column indices and look for a pattern.

I can provide more information if necessary, but I think that this covers the necessary information.

2
  • I apologize for the format, not sure how to change it to an acceptable format so the post reads as two sections of code and three text blocks. Commented Apr 6, 2020 at 16:48
  • Thank you for fixing the format. I'll be sure to look back to this post in the future. Commented Apr 6, 2020 at 16:57

2 Answers 2

1

You won't be able to do it with '.' notation, but you should be able to do this in square brackets with a 'f' string.

for i in range(1, N + 1):
    Access_Matrix.append(df[f"Var_{i}_Access"])

Or, perhaps a better approach would be to build up a list of the column names and extract them into a new dataframe in one go from df, e.g.:

cols = [f"Var_{i}_Access" for i in range(1, N+1)]
all_cols = df[cols]
Sign up to request clarification or add additional context in comments.

3 Comments

Brilliant! Worked like a charm, I'll accept the answer as soon as stack lets me. Thank you!
I started with that approach but couldn't figure out how to write the correct syntax to pull the data from the dataframe. How would you do it that way?
I've added some code to do that to the answer. To be honest, the other answer using a regex to filter the dataframe is probably the best approach to achieve the same result in a single command.
1

Use pandas.DataFrame.filter

It will filter the columns using regex and produce a filtered version

access_df = df.filter(regex=f'Var_\d_Access')

For a specific value of N or to get until a range,

access_df = df.filter(regex=f'Var_[1-{N}]_Access')

This method is much more efficient than using a crude loop.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.