0

I feel like this is a super simple question, I just don't have the vocabulary to articulate it in google. Here goes:

I have a dataframe that I want to slice and split into several dataframe. So I created a function and a for loop for this.

Sample table

     col1 col2 col3 col4 col5
row1 A    Hi   my   name is
row2 A    Bye  see  you  later
row3 B    Bike on   side walk
row4 B    Car  on   str  drive
row5 C    Dog  on   grs  poop

My code is like this

list_ = list(df['col1'].drop_duplicates())
for i in list_:
    dataframe_creator(i)

My function list this

def dataframe_creator(i):
        df = df[df['col1'] == i] 
        return df

So the results of this is that it just creates a dataframe for slice and then assigns it to the same variable which isn't what I want, I want a variable for each iteration. Basically I'd like to have 3 dataframe labelled dfA, dfB, dfC at the end that holds each slice.

3
  • 1
    How about a dict: {f'df{k}':v for k, v in df.groupby('col1')} with keys dfA, dfB... etc and the values being the associated DataFrame slices Commented Mar 11, 2020 at 15:40
  • 1
    How about a list comprehension to generate a list of DataFrames? [dataframe_creator(i) for i in list_]? Commented Mar 11, 2020 at 15:41
  • Check out this post for why dict is best for this sort of thing Commented Mar 11, 2020 at 15:45

2 Answers 2

1

Making a dictionary would be ideal for this case!:

df_slicer = {} 
for i in df.col1: 
    df_slicer[i] = df[df.col1==i]
#dfA:
df_slicer['A']
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the code, this helped out a lot. Can you extend it further? How can I iterate through the dictionary to create dfA, dfB, and dfC to have 3 separate dataframes outside of a dictionary?
0

Here is what I did to ultimately go from slices of a dataframe to seperate dataframe in variables.

Create my dataframe:

data = [['A', 'Hi', 'my', 'name', 'is'], 
        ['A', 'Bye', 'see', 'you', 'later'],
       ['B', 'Bike', 'on', 'side', 'walk'],
       ['B', 'Car', 'on', 'str', 'drive'],
       ['C', 'Dog', 'grs', 'on', 'poop']] 

Set it as a dataframe

test_df = pd.DataFrame(data)

Create my list of unique column1 names

list_ = list(test_df[0].drop_duplicates())

Create the dictionary of slices

df_slicer = {}
for i in list_:
    df_slicer[i] = test_df[test_df[0] == i]

Create my variables based on the key value in the dictionary

for key, val in df_slicer.items():
    exec('df' + key + '=val')

So at the end of it dfA, dfB, dfC are each dataframe for their respective slices.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.