1

I have a data frame that looks like this

                     ACCEL_X  ACCEL_Y  ACCEL_Z
DATETIME                                      
2021-05-11 16:12:56      160       32    16392
2021-05-11 16:12:57       20     -192    16548
2021-05-11 16:12:57      128      224    16212
2021-05-11 16:12:57     -148     -132    16624
2021-05-11 16:12:57      -40      204    16132
2021-05-11 16:12:57       72     -132    16536
2021-05-11 16:12:57      220       48    16292
2021-05-11 16:12:57     -132      236    16332
2021-05-11 16:12:57     -232     -132    16628
2021-05-11 16:12:57      192      140    16172
2021-05-11 16:12:57      200      -96    16684
2021-05-11 16:12:57        0       64    16020
2021-05-11 16:12:57     -144      -24    16524
2021-05-11 16:12:57     -160       24    16336
2021-05-11 16:12:57       96       56    16252
2021-05-11 16:12:57       68      -44    16544
2021-05-11 16:12:57       12       76    16308
2021-05-11 16:12:57     -228     -132    16668
2021-05-11 16:12:57       72      -96    16244
2021-05-11 16:12:57       48      -96    16536

According to documentation, I can perform sliding window using the second and I've performed a sliding window rolling of 3s with code:

df = df.rolling('3s').mean()
df

which returns,

                        ACCEL_X    ACCEL_Y       ACCEL_Z
DATETIME                                                
2021-05-11 16:12:56  160.000000  32.000000  16392.000000
2021-05-11 16:12:57   90.000000 -80.000000  16470.000000
2021-05-11 16:12:57  102.666667  21.333333  16384.000000
2021-05-11 16:12:57   40.000000 -17.000000  16444.000000
2021-05-11 16:12:57   24.000000  27.200000  16381.600000
2021-05-11 16:12:57   32.000000   0.666667  16407.333333
2021-05-11 16:12:57   58.857143   7.428571  16390.857143
2021-05-11 16:12:57   35.000000  36.000000  16383.500000
2021-05-11 16:12:57    5.333333  17.333333  16410.666667
2021-05-11 16:12:57   24.000000  29.600000  16386.800000
2021-05-11 16:12:57   40.000000  18.181818  16413.818182
2021-05-11 16:12:57   36.666667  22.000000  16381.000000
2021-05-11 16:12:57   22.769231  18.461538  16392.000000
2021-05-11 16:12:57    9.714286  18.857143  16388.000000
2021-05-11 16:12:57   15.466667  21.333333  16378.933333
2021-05-11 16:12:57   18.750000  17.250000  16389.250000
2021-05-11 16:12:57   18.352941  20.705882  16384.470588
2021-05-11 16:12:57    4.666667  12.222222  16400.222222
2021-05-11 16:12:57    8.210526   6.526316  16392.000000
2021-05-11 16:12:57   10.200000   1.400000  16399.200000

It comes to my attention when I print df after the sliding window, it returns a different desired result.

As of what I understand on sliding window, it should take 'N seconds, in my case 3s' interval data, and perform mean from it.

As of what I see the 'DATETIME' is exactly the same, it should return the same result. But that's not the case, can anyone enlighten me on how the sliding window on pandas works?

--- EDIT 1 ---

running

df.index.inferred_type == "datetime64"

returns

True
2
  • I've checked it is datetimeindex, sorry for the misunderstanding, I've only done 3s running mean once. Regarding the 3-element running mean, it also does not match up after I've done calculation on few more value. Commented Jul 26, 2021 at 11:41
  • yup I've checked with your example, can reproduce the issue (pandas v1.3.0). My guess would be that pandas falls back to simple element-wise averaging if the total time window of your DatetimeIndex spans less than the specified period. In any case, it seems this corner-case is not well-documented (not at all...). Commented Jul 26, 2021 at 11:50

1 Answer 1

2

Your three second window covers all of the data points. You can see the the last row in your result is the mean of the whole DataFrame. Perhaps you were expecting this:

In [194]: df.rolling('3s', center=True).mean()
Out[194]:
                     ACCEL_X  ACCEL_Y  ACCEL_Z
DATETIME
2021-05-11 16:12:56     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2
2021-05-11 16:12:57     10.2      1.4  16399.2

From the documentation: "By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True."

When center=False the window at the first element covers just that element, at the second element it covers both the first and second elements, and so on. At the last element it covers all of the elements because the window has expanded to cover all the elements that are three seconds behind the current element. When center=True the center of the window is placed at every element in turn. This window will cover the current element as well as all the elements that are one second behind and those that are one second in front. I still have doubts about what happens when the offset is even, for example '2s', and center=True.

Look at what happens when I add another data point at 2021-05-11 16:12:58. Here I use the count aggregate for a better illustration:

In [214]: df3.rolling('3s', center=True).count()
Out[214]:
                     ACCEL_X  ACCEL_Y  ACCEL_Z
DATETIME
2021-05-11 16:12:56     20.0     20.0     20.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:57     21.0     21.0     21.0
2021-05-11 16:12:58     20.0     20.0     20.0

The key insight here is that the window changes its size depending on how many elements fall within the specified offset and also rolls across the data (because it is a rolling window). The center parameter controls the span of the window with respect to each data point.


Edit: pandas bug (fixed in 1.3.2)

pandas 1.3.1 gives inconsistent results for the code in this answer. In this case the inconsistency arises on the first line of the output, which sometimes is:

                     ACCEL_X  ACCEL_Y  ACCEL_Z
DATETIME
2021-05-11 16:12:56    160.0     32.0  16392.0

That is, the first line is sometimes taken by itself. The correct results are as shown above. This was fixed in version 1.3.2 and was documented in issue #42753.

Sign up to request clarification or add additional context in comments.

1 Comment

Hey dicristina, by any chance you know why is this happening (another question) stackoverflow.com/questions/68555338/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.