How to use rolling functions for GroupBy objects

Question

I have a time series object grouped of the type <pandas.core.groupby.SeriesGroupBy object at 0x03F1A9F0>. grouped.sum() gives the desired result but I cannot get rolling_sum to work with the groupby object. Is there any way to apply rolling functions to groupby objects? For example:

x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']
df = DataFrame(zip(id, x), columns = ['id', 'x'])
df.groupby('id').sum()
id    x
a    3
b   12

However, I would like to have something like:

How exactly do you expect rolling function to work on grouped objects (I mean write out the math you want to do in symbols)? — tacaswell
– tacaswell, Commented Dec 21, 2012 at 20:06
So you want to do a cumsum on each of the groups and then stitch the whole thing back into a single data frame? — tacaswell
– tacaswell, Commented Dec 21, 2012 at 20:34
Yes, ideally cumsum and any rolling function(mean, sum, std). — user1642513
– user1642513, Commented Dec 21, 2012 at 20:43

Kevin Wang · Accepted Answer · 2016-12-16 19:31:54Z

145

For the Googlers who come upon this old question:

Regarding @kekert's comment on @Garrett's answer to use the new

df.groupby('id')['x'].rolling(2).mean()

rather than the now-deprecated

df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)

curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.

So I think I've figured out a solution that uses the new rolling() method and still works the same:

df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

which should give you the series

which you can add as a column:

df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

answered Dec 16, 2016 at 19:31

Kevin Wang

2,7392 gold badges14 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

TMrtSmith Over a year ago

I think you can use .transform rather than reset_index?

Kartik Sreenivasan Over a year ago

This actually fails if you're grouping by multiple columns. Dropping the first argument (levels) solves this though as it removes all levels by default. So the line becomes df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(drop=True)

Hendy Over a year ago

As another maddening nuance, use groupby(..., sort=False) if your group variable is not already sorted. I was getting really bizarre results when adding this rolling mean as a new column because the order didn't match the original df.

smci Over a year ago

Very useful information. a) They should add this to their pandas Cookbook b) Can you raise some pandas bugs on the change in functionality? They should consider the consequences better before they deprecate.

uniquegino Over a year ago

Could you elaborate on why we should put .rolling(2), i.e. why window=2 here? Is it because there are 2 groups 'a' and 'b'?

|

Garrett · Accepted Answer · 2021-03-18 13:16:59Z

84

cumulative sum

To answer the question directly, the cumsum method would produced the desired series:

In [17]: df
Out[17]:
  id  x
0  a  0
1  a  1
2  a  2
3  b  3
4  b  4
5  b  5

In [18]: df.groupby('id').x.cumsum()
Out[18]:
0     0
1     1
2     3
3     3
4     7
5    12
Name: x, dtype: int64

pandas rolling functions per group

More generally, any rolling function can be applied to each group as follows (using the new .rolling method as commented by @kekert). Note that the return type is a multi-indexed series, which is different from previous (deprecated) pd.rolling_* methods.

In [10]: df.groupby('id')['x'].rolling(2, min_periods=1).sum()
Out[10]:
id
a   0   0.00
    1   1.00
    2   3.00
b   3   3.00
    4   7.00
    5   9.00
Name: x, dtype: float64

To apply the per-group rolling function and receive result in original dataframe order, transform should be used instead:

In [16]: df.groupby('id')['x'].transform(lambda s: s.rolling(2, min_periods=1).sum())
Out[16]:
0    0
1    1
2    3
3    3
4    7
5    9
Name: x, dtype: int64

deprecated approach

For reference, here's how the now deprecated pandas.rolling_mean behaved:

In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]: 
0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

edited Mar 18, 2021 at 13:16

answered Dec 21, 2012 at 23:41

Garrett

50.3k6 gold badges64 silver badges51 bronze badges

3 Comments

kekert Over a year ago

pd.rolling_mean is now deprecated for Series and will be removed, use df.groupby('id')['x'].rolling(2).mean() instead

nrcjea001 Over a year ago

in case you need it sorted to the original index efficiently:

df.reset_index().groupby('id', sort=False)['x'].rolling(2, min_periods=1).mean().sort_index(level=1).reset_index(drop=True)

nrcjea001 Over a year ago

if original index is already sorted then replace df.reset_index() with df

Sean McCarthy · Accepted Answer · 2018-09-27 19:22:02Z

11

Here is another way that generalizes well and uses pandas' expanding method.

It is very efficient and also works perfectly for rolling window calculations with fixed windows, such as for time series.

# Import pandas library
import pandas as pd

# Prepare columns
x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']

# Create dataframe from columns above
df = pd.DataFrame({'id':id, 'x':x})

# Calculate rolling sum with infinite window size (i.e. all rows in group) using "expanding"
df['rolling_sum'] = df.groupby('id')['x'].transform(lambda x: x.expanding().sum())

# Output as desired by original poster
print(df)
  id  x  rolling_sum
0  a  0            0
1  a  1            1
2  a  2            3
3  b  3            3
4  b  4            7
5  b  5           12

answered Sep 27, 2018 at 19:22

Sean McCarthy

5,7189 gold badges47 silver badges74 bronze badges

5 Comments

bwest87 Over a year ago

Do you have anything to back up that this is "very efficient"? Generally with pandas, doing any sort of iteration (eg. "transform", or "apply") is a major performance hit, compared to doing the same thing with vector operations (which the built-ins of ".sum", ".rolling", etc. will all be). I know Pandas does do some pre-inspection on the iteration loops to see if it can optimize it for you, but in general iteration should be avoided if performance is a concern.

sousben Over a year ago

I am sorry I can only give you one upvote, I'm considering creating new accounts to give more credit to this answer. It's the only one that worked for me grouping on multiple columns, thanks!

Darkhan Over a year ago

Cool. This can apply exponential moving average. q['exponential_ave'] = q.groupby('id')['x'].transform(lambda x: x.ewm(com=0.2).mean())

liang Over a year ago

What's the difference between this using expanding vs using rolling?

Sean McCarthy Over a year ago

@liang this article explains it better than I can. In rolling functions the window size remains constant whereas in expanding functions it changes. See this answer as well.

yoav_aaa · Accepted Answer · 2020-10-01 07:02:36Z

6

If you need to reassign the grouped-rolling-function back to the original Dataframe, while keeping order and groups you can use the transform function.

df.sort_values(by='date', inplace=True)
grpd = df.groupby('group_key')
#using center=false to assign values on window's last row
df['val_rolling_7_mean'] = grpd['val'].transform(lambda x: x.rolling(7, center=False).mean())

answered Oct 1, 2020 at 7:02

yoav_aaa

3973 silver badges11 bronze badges

Comments

Zelazny7 · Accepted Answer · 2012-12-21 23:07:35Z

I'm not sure of the mechanics, but this works. Note, the returned value is just an ndarray. I think you could apply any cumulative or "rolling" function in this manner and it should have the same result.

I have tested it with cumprod, cummax and cummin and they all returned an ndarray. I think pandas is smart enough to know that these functions return a series and so the function is applied as a transformation rather than an aggregation.

In [35]: df.groupby('id')['x'].cumsum()
Out[35]:
0     0
1     1
2     3
3     3
4     7
5    12

Edit: I found it curious that this syntax does return a Series:

In [54]: df.groupby('id')['x'].transform('cumsum')
Out[54]:
0     0
1     1
2     3
3     3
4     7
5    12
Name: x

Collectives™ on Stack Overflow

How to use rolling functions for GroupBy objects

5 Answers 5

9 Comments

cumulative sum

pandas rolling functions per group

deprecated approach

3 Comments

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

9 Comments

cumulative sum

pandas rolling functions per group

deprecated approach

3 Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related