0

I want to apply some statistics on records within a time window with an offset. My data looks something like this:

                             lon        lat  stat  ...   speed  course  head
ts                                                 ...                      
2016-09-30 22:00:33.272  5.41463  53.173161    15  ...     0.0     0.0   511
2016-09-30 22:01:42.879  5.41459  53.173180    15  ...     0.0     0.0   511
2016-09-30 22:02:42.879  5.41461  53.173161    15  ...     0.0     0.0   511
2016-09-30 22:03:44.051  5.41464  53.173168    15  ...     0.0     0.0   511
2016-09-30 22:04:53.013  5.41462  53.173141    15  ...     0.0     0.0   511

[5 rows x 7 columns]

I need the records within time windows of 600 seconds, with steps of 300 seconds. For example, these windows:

start                     end
2016-09-30 22:00:00.000   2016-09-30 22:10:00.000
2016-09-30 22:05:00.000   2016-09-30 22:15:00.000
2016-09-30 22:10:00.000   2016-09-30 22:20:00.000

I have looked at Pandas rolling to do this. But it seems like it does not have the option to add the offset which I described above. Am I overlooking something, or should I create a custom function for this?

1 Answer 1

1

What you want to achieve should be possible by combining DataFrame.resample with DataFrame.shift.

import pandas as pd

index = pd.date_range('1/1/2000', periods=9, freq='T')
series = pd.Series(range(9), index=index)
df = pd.DataFrame(series)

That will give you a primitive timeseries (example taken from api docs DataFrame.resample).

2000-01-01 00:00:00  0                                                                                                                                                                        
2000-01-01 00:01:00  1                                                                                                                                                                        
2000-01-01 00:02:00  2                                                                                                                                                                        
2000-01-01 00:03:00  3                                                                                                                                                                        
2000-01-01 00:04:00  4                                                                                                                                                                        
2000-01-01 00:05:00  5                                                                                                                                                                        
2000-01-01 00:06:00  6                                                                                                                                                                        
2000-01-01 00:07:00  7                                                                                                                                                                        
2000-01-01 00:08:00  8

Now resample by your step size (see DataFrame.shift).

sampled = df.resample('90s').sum()

This will give you non-overlapping windows of the step size.

2000-01-01 00:00:00   1                                                                                                                                                                       
2000-01-01 00:01:30   2                                                                                                                                                                       
2000-01-01 00:03:00   7                                                                                                                                                                       
2000-01-01 00:04:30   5                                                                                                                                                                       
2000-01-01 00:06:00  13                                                                                                                                                                       
2000-01-01 00:07:30   8

Finally, shift the sampled df by one step and sum with the previously created df. Window size being twice the step size helps.

sampled.shift(1, fill_value=0) + sampled

This will yield:

2000-01-01 00:00:00   1                                                                                                                                                                       
2000-01-01 00:01:30   3                                                                                                                                                                       
2000-01-01 00:03:00   9                                                                                                                                                                       
2000-01-01 00:04:30  12                                                                                                                                                                       
2000-01-01 00:06:00  18                                                                                                                                                                       
2000-01-01 00:07:30  21 

There may be a more elegant solution, but I hope this helps.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I think this is it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.