Looping over a pandas Dataframe and returning multiple Dataframes

Question

This is a question that builds on the question here: Split dataframe into grouped chunks

I have been trying to break a big dataset into different chunks and was using the solution proposed in the question above to do this. This is the code I'm referring to:

df = pd.DataFrame(data=['a', 'a', 'b', 'c', 'a', 'a', 'b', 'v', 'v', 'f'], columns=['A']) 

def iter_by_group(df, column, num_groups):
    groups = []
    for i, group in df.groupby(column):
    groups.append(group)
        if len(groups) == num_groups:
            yield pd.concat(groups)
            groups = []
    if groups:
        yield pd.concat(groups)

for group in iter_by_group(df, 'A', 2):
print(group)

The result of the print is:

The issue is that I'm not managing to then go and call each of these chunks individually as if I just call group it returns me the last group only and if instead of print I use return in the last for loop it only gets me the first chunk. How could I alter the code so that I can call each of the chunks individually?

ALollz · Accepted Answer · 2019-07-24 19:38:55Z

1

Use pd.factorize to form groups, then store the grouped object in a dict. Here's it's based on the order of occurrence. Add sort=True to pd.factorize to form groups based on the sorted key ordering

N = 2
col = 'A'

d = dict(tuple(df.groupby((pd.factorize(df[col])[0]+N)//N)))

Output:

d[1]
#   A
#0  a
#1  a
#2  b
#4  a
#5  a
#6  b

d[2]
#   A
#3  c
#9  f

d[3]
#   A
#7  v
#8  v

edited Jul 24, 2019 at 19:38

answered Jul 24, 2019 at 19:21

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Looping over a pandas Dataframe and returning multiple Dataframes

1 Answer 1

Output:

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Output:

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related