I have a dataset of event ids, the event type, and the time of the event. The events consist of "start" and "pause". I would like to identify "pause" events that are not followed by a "start" event within 7 days and classify these as "stops".
Here is the code for the test dataset:
test <- data.frame("id" = 1:5,
"event" = c("pause",
"pause",
"start",
"pause",
"start"),
"time" = dmy("03-11-2012",
"05-11-2012",
"06-11-2012",
"21-11-2012",
"30-11-2012"))
So far, I used lead() to check if the following event was a "start" event AND happened within 7 days. However, I realized that sometimes a "pause" event was followed by another "pause" event and then a "start" event, all within 7 days. Both "pause" events in this case should not be considered to be a stop. This means that I need to check all events/rows that occurred within 7 days of the "pause" event and look for a "start" event.
I am looking for a function I can use within dplyr (I'll use non-dplyr solutions if I have to) where I can check the value of multiple rows.
My solution so far using lead(), which checks the immediate next row only.
test2 <- test %>%
mutate(stop = ifelse(event == "pause" &
!((time + days(7) > lead(time)) &
lead(event) == "start"),
"yes",
"no"))
This gives
|id|event|time |stop|
|------------------------|
|1 |pause|2012-11-03|yes |
|2 |pause|2012-11-05|no |
|3 |start|2012-11-06|no |
|4 |pause|2012-11-21|yes |
|5 |start|2012-11-30|no |
I would like the stop column value for the first "pause" to also be a "no" because it has a "start" event within 7 days of it.