I have a dataset like:
ID DATETIME CODE Value
999 1/2/2024 16:22 TX 100
123 1/2/2024 16:47 IP 100
666 1/2/2024 17:13 IP 85
666 1/2/2024 17:38 IP 100
123 1/2/2024 18:03 TX 90
666 1/2/2024 18:28 TX 85
666 1/2/2024 18:54 IP 100
123 1/2/2024 19:19 CA 100
666 1/2/2024 19:44 OX 95
999 1/2/2024 20:09 18 75
123 1/2/2024 20:35 12 100
654 1/2/2024 21:00 IP 85
Here's, the Reprex of above:
structure(list(ID = c("999", "123", "666", "666", "123", "666",
"666", "123", "666", "999", "123", "654"), DATETIME = structure(c(1706804520,
1706806020, 1706807580, 1706809080, 1706810580, 1706812080, 1706813640,
1706815140, 1706816640, 1706818140, 1706819700, 1706821200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), CODE = c("TX", "IP", "IP", "IP", "TX",
"TX", "IP", "CA", "OX", "18", "12", "IP"), Value = c(100, 100,
85, 100, 90, 85, 100, 100, 95, 75, 100, 85)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -12L))
I'd like to add a column with the cumulative sum, by ID within the last 2 hours, according to a condition (CODE == IP). Like this:
ID DATETIME CODE Value cum_IP
999 1/2/2024 16:22 TX 100 0
123 1/2/2024 16:47 IP 100 100
666 1/2/2024 17:13 IP 85 85
666 1/2/2024 17:38 IP 100 185
123 1/2/2024 18:03 TX 90 0
666 1/2/2024 18:28 TX 85 0
666 1/2/2024 18:54 IP 100 285
123 1/2/2024 19:19 CA 100 0
666 1/2/2024 19:44 OX 95 0
999 1/2/2024 20:09 18 75 0
123 1/2/2024 20:35 12 100 0
654 1/2/2024 21:00 IP 85 85
I expect not to had made any error computing manually the column, but the idea is understandable. A function that computes an aggregated operation (a sum or a simple count) over rows with a grouping var (ID), a predicate (CODE == IP in this case, but it would be v.g. is.number(CODE)) and a window (2 hours from the current row).