1

I have a sunlight data coming from the field. I am checking if the sunlight changed more than a value in last 1 min and future 1 min. Below I am giving an example case. Where I am checking if the data value changed more than 4 in the last 10s. code:

xdf = pd.DataFrame({'data':np.random.randint(10,size=10)},index=pd.date_range('2022-06-03 00:00:00', '2022-06-03 00:00:45', freq='5s'))
# here data frequency 5s, so, to check last 10s
# I have to consider present row and last 2 rows
# Perform rolling max and min value for 3 rows
nrows = 3
# Allowable change
ac = 4
xdf['back_max'] = xdf['data'].rolling(nrows).max()
xdf['back_min'] = xdf['data'].rolling(nrows).min()
xdf['back_min_max_dif'] = (xdf['back_max'] - xdf['back_min'])
xdf['back_<4'] = (xdf['back_max'] - xdf['back_min']).abs().le(ac)
print(xdf)

## Again repeat the above for the future nrows
## Don't know how?

expected output:

                     data  back_max  back_min  back_min_max_dif  back_<4
2022-06-03 00:00:00     7       NaN       NaN               NaN    False
2022-06-03 00:00:05     7       NaN       NaN               NaN    False
2022-06-03 00:00:10     5       7.0       5.0               2.0     True
2022-06-03 00:00:15     8       8.0       5.0               3.0     True
2022-06-03 00:00:20     6       8.0       5.0               3.0     True
2022-06-03 00:00:25     2       8.0       2.0               6.0    False
2022-06-03 00:00:30     3       6.0       2.0               4.0     True
2022-06-03 00:00:35     1       3.0       1.0               2.0     True
2022-06-03 00:00:40     5       5.0       1.0               4.0     True
2022-06-03 00:00:45     5       5.0       1.0               4.0     True

Is there way I can simplify the above procedure? Also, I have to perform rolling max for future nrows, and how?

4
  • It's pretty much what I'd do. However, I'd skip attaching the intermediate columns to the dataframe. Also, do you need the two separate values for future and past changes? Commented Jun 3, 2022 at 19:26
  • Unrelated, there's an option of rolling on time windows as well. It might be slower than rolling by rows, however. Commented Jun 3, 2022 at 19:28
  • @QuangHoang I really appreciate your comment. I feel like there is an improvement in my coding. I don't want those columns, take unnecessary memory. I just wanted to show here and explain better. However, I have done it for the past. Question is, how to do it for future? Commented Jun 3, 2022 at 19:30
  • @QuangHoang also .rolling(-3) is not working. I thought it would do the forward rolling. How do we do forward rolling. Commented Jun 3, 2022 at 19:36

1 Answer 1

1

For future/forward roll, you can roll on the reversed data. This might not work with time-window roll:

rolling = xdf['data'].rolling(nrows)
xdf['pass_<'] = (rolling.max()-rolling.min()).le(ac)

future_roll = xdf['data'][::-1].rolling(nrows)
xdf['future_<'] = future_roll.max().sub(future_roll.min()).le(ac)

Output:

                     data  pass_<  future_<
2022-06-03 00:00:00     7   False      True
2022-06-03 00:00:05     7   False      True
2022-06-03 00:00:10     5    True      True
2022-06-03 00:00:15     8    True     False
2022-06-03 00:00:20     6    True      True
2022-06-03 00:00:25     2   False      True
2022-06-03 00:00:30     3    True      True
2022-06-03 00:00:35     1    True      True
2022-06-03 00:00:40     5    True     False
2022-06-03 00:00:45     5    True     False
Sign up to request clarification or add additional context in comments.

9 Comments

I thought about it. Really nice solution. Guess what: we can actually roll pass_result back and don't have to do the rolling max and min again for the future. Am I correct? Suppose, in your answer, look at the 2 and 3 third columns. Both are same. Rolling the 2 column 3 rows backwards give the 3 column.
Yes, you're totally correct, forward roll is essentially shift(-nrows).rolling(nrows).do_something().shift(). But you're gonna miss some of the beginning/end windows.
Sounds good. I am doing this xdf['forward_<4'] = xdf['back_<4'][::-3] but getting mix of True and NaNs. Am I wrong?
do you mean xdf['back<4].shfit(-3)? ortherwise you would have the same column due to index alignment. Another thing is we can do this because min/max are direction-agnostic. Gotta go the shift route if it's cumsum, for example.
Oh, [::-3] meaning you reverse and skip 3 rows at a time. So yeah, you're getting mixed NaN results (2 out of 3 rows?). So it's wrong.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.