2

I have a dataset of event ids, the event type, and the time of the event. The events consist of "start" and "pause". I would like to identify "pause" events that are not followed by a "start" event within 7 days and classify these as "stops".

Here is the code for the test dataset:

test <- data.frame("id" = 1:5,
               "event" = c("pause",
                           "pause",
                           "start",
                           "pause",
                           "start"),
               "time" = dmy("03-11-2012",
                            "05-11-2012",
                            "06-11-2012",
                            "21-11-2012",
                            "30-11-2012"))  

So far, I used lead() to check if the following event was a "start" event AND happened within 7 days. However, I realized that sometimes a "pause" event was followed by another "pause" event and then a "start" event, all within 7 days. Both "pause" events in this case should not be considered to be a stop. This means that I need to check all events/rows that occurred within 7 days of the "pause" event and look for a "start" event.

I am looking for a function I can use within dplyr (I'll use non-dplyr solutions if I have to) where I can check the value of multiple rows.

My solution so far using lead(), which checks the immediate next row only.

test2 <- test %>%
mutate(stop = ifelse(event == "pause" &
                     !((time + days(7) > lead(time)) & 
                          lead(event) == "start"),
                   "yes",
                   "no"))

This gives

|id|event|time      |stop|
|------------------------|
|1 |pause|2012-11-03|yes |
|2 |pause|2012-11-05|no  |
|3 |start|2012-11-06|no  |
|4 |pause|2012-11-21|yes |
|5 |start|2012-11-30|no  |

I would like the stop column value for the first "pause" to also be a "no" because it has a "start" event within 7 days of it.

2 Answers 2

2

If you want to do this inside a dplyr function, you can sapply inside a mutate:

test %>% 
  mutate(stop = sapply(seq_along(time),
          function(i) {
              if(event[i] != "pause") return(FALSE)
              ind <- which(time > time[i] & event == "start")
              if(length(ind) == 0) return(FALSE)
              as.numeric(difftime(time[ind[1]], time[i], units = "day")) > 7
            }))

#>   id event       time  stop
#> 1  1 pause 2012-11-03 FALSE
#> 2  2 pause 2012-11-05 FALSE
#> 3  3 start 2012-11-06 FALSE
#> 4  4 pause 2012-11-21  TRUE
#> 5  5 start 2012-11-30 FALSE

Sign up to request clarification or add additional context in comments.

2 Comments

Did not know I could use apply functions within mutate! The answer from @dan also works, I tested both with microbenchmark and this solution seems to be more efficient.
actually shouldn't be if(length(ind) == 0) return(TRUE) ? Both work with the test data, but in the case where a "pause" was followed by more "pause" events only, then the initial "pause" event should be considered a "stop". Sorry if I didn't make that clear in my question
0

Although it might get slow with large dataset, this might do the work:

library(dplyr)
library(purrr)
test %>% 
  mutate(
    stop = ifelse(event=="pause" & !((time + days(7) > lead(time)) & lead(event) == "start"),
                  "yes", "no"),
    stop2 = ifelse(map_lgl(row_number(), 
                           ~any(event=="start" & time>=time[.x] & time<=time[.x] + days(7))),
                   "no", "yes")
  )

#   id event       time stop stop2
# 1  1 pause 2012-11-03  yes    no
# 2  2 pause 2012-11-05   no    no
# 3  3 start 2012-11-06   no    no
# 4  4 pause 2012-11-21  yes   yes
# 5  5 start 2012-11-30   no    no

Using row_number() and time[.x], this allows to consider every row independently. Then, we just check if there is any "start" between "now" and "in 7 days" and put the right value accordingly. purrr::map_lgl allows to loop over every row and return a logical vector.

The slowness comes from the fact that you have to check for all the rows each time you want to compute the value for one row.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.