This is a question that builds on the question here: Split dataframe into grouped chunks
I have been trying to break a big dataset into different chunks and was using the solution proposed in the question above to do this. This is the code I'm referring to:
df = pd.DataFrame(data=['a', 'a', 'b', 'c', 'a', 'a', 'b', 'v', 'v', 'f'], columns=['A'])
def iter_by_group(df, column, num_groups):
groups = []
for i, group in df.groupby(column):
groups.append(group)
if len(groups) == num_groups:
yield pd.concat(groups)
groups = []
if groups:
yield pd.concat(groups)
for group in iter_by_group(df, 'A', 2):
print(group)
The result of the print is:
A
0 a
1 a
4 a
5 a
2 b
6 b
A
3 c
9 f
A
7 v
8 v
The issue is that I'm not managing to then go and call each of these chunks individually as if I just call group it returns me the last group only and if instead of print I use return in the last for loop it only gets me the first chunk. How could I alter the code so that I can call each of the chunks individually?