Return custom variables for each dataframe in pandas

Question

I feel like this is a super simple question, I just don't have the vocabulary to articulate it in google. Here goes:

I have a dataframe that I want to slice and split into several dataframe. So I created a function and a for loop for this.

Sample table

     col1 col2 col3 col4 col5
row1 A    Hi   my   name is
row2 A    Bye  see  you  later
row3 B    Bike on   side walk
row4 B    Car  on   str  drive
row5 C    Dog  on   grs  poop

My code is like this

list_ = list(df['col1'].drop_duplicates())
for i in list_:
    dataframe_creator(i)

My function list this

def dataframe_creator(i):
        df = df[df['col1'] == i] 
        return df

So the results of this is that it just creates a dataframe for slice and then assigns it to the same variable which isn't what I want, I want a variable for each iteration. Basically I'd like to have 3 dataframe labelled dfA, dfB, dfC at the end that holds each slice.

How about a dict: {f'df{k}':v for k, v in df.groupby('col1')} with keys dfA, dfB... etc and the values being the associated DataFrame slices — Chris Adams
– Chris Adams, Commented Mar 11, 2020 at 15:40
How about a list comprehension to generate a list of DataFrames? [dataframe_creator(i) for i in list_]? — dspencer
– dspencer, Commented Mar 11, 2020 at 15:41
Check out this post for why dict is best for this sort of thing — Chris Adams
– Chris Adams, Commented Mar 11, 2020 at 15:45

Martijniatus · Accepted Answer · 2020-03-11 15:57:08Z

1

Making a dictionary would be ideal for this case!:

df_slicer = {} 
for i in df.col1: 
    df_slicer[i] = df[df.col1==i]
#dfA:
df_slicer['A']

answered Mar 11, 2020 at 15:57

Martijniatus

1023 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

David 54321 Over a year ago

Thank you for the code, this helped out a lot. Can you extend it further? How can I iterate through the dictionary to create dfA, dfB, and dfC to have 3 separate dataframes outside of a dictionary?

David 54321 · Accepted Answer · 2020-03-11 19:23:44Z

Here is what I did to ultimately go from slices of a dataframe to seperate dataframe in variables.

Create my dataframe:

data = [['A', 'Hi', 'my', 'name', 'is'], 
        ['A', 'Bye', 'see', 'you', 'later'],
       ['B', 'Bike', 'on', 'side', 'walk'],
       ['B', 'Car', 'on', 'str', 'drive'],
       ['C', 'Dog', 'grs', 'on', 'poop']]

Set it as a dataframe

test_df = pd.DataFrame(data)

Create my list of unique column1 names

list_ = list(test_df[0].drop_duplicates())

Create the dictionary of slices

df_slicer = {}
for i in list_:
    df_slicer[i] = test_df[test_df[0] == i]

Create my variables based on the key value in the dictionary

for key, val in df_slicer.items():
    exec('df' + key + '=val')

So at the end of it dfA, dfB, dfC are each dataframe for their respective slices.

Collectives™ on Stack Overflow

Return custom variables for each dataframe in pandas

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related