2

I need help in creating a function that classifies a record as TRUE if a record is shows a diff of 7 days per User. Note that the User and DateTime fields are not arranged in order, I just arranged it for easier representation of the dataset.

 User     DateTime              Result
 A        2015-05-27 17:13      FALSE
 A        2015-06-23 14:17      FALSE
 A        2015-06-24 15:44      TRUE
 A        2015-06-27 12:16      TRUE
 B        2015-03-04 18:07      FALSE
 C        2015-07-27 08:26      FALSE
 D        2015-03-26 18:13      FALSE
 D        2015-05-20 10:35      FALSE
 D        2015-05-25 18:07      TRUE

Obviously, my function does not work when I tried this because it just gives me one logical value:

 repeatfun <- function(x) {ifelse(sum(diff(x) < 7), TRUE, FALSE)}

Here's the data for easier replication:

User <- c('A', 'A', 'A', 'A', 'A', 'B', 'C', 'D', 'D', 'D', 'D', 'D')
DateTime <- c('2015-05-27', '2015-06-23', '2015-06-24', '2015-06-27', '2015-07-08',
          '2015-03-04', '2015-07-27',
          '2015-03-26', '2015-05-20', '2015-05-25', '2015-06-17', '2015-08-13')
df <- as.data.frame(cbind(User, DateTime))
df$DateTime <- as.Date(df$DateTime)

2 Answers 2

4

With dplyr, we can group by User and arrange DateTime from the earliest date. Finally to create Result, DateTime is subtracted from the previous date. The argument default=FALSE prevents NA values from appearing. The output is tested with (x < 7).

library(dplyr)
df %>% group_by(User) %>% arrange(DateTime) %>% 
  mutate(Result=DateTime-lag(DateTime, default=F) < 7)
# Source: local data frame [9 x 3]
# Groups: User [4]
# 
#     User   DateTime Result
#   (fctr)     (date)  (lgl)
# 1      A 2015-05-27  FALSE
# 2      A 2015-06-23  FALSE
# 3      A 2015-06-24   TRUE
# 4      A 2015-06-27   TRUE
# 5      B 2015-03-04  FALSE
# 6      C 2015-07-27  FALSE
# 7      D 2015-03-26  FALSE
# 8      D 2015-05-20  FALSE
# 9      D 2015-05-25   TRUE
Sign up to request clarification or add additional context in comments.

Comments

3

Another solution using data.table and your slightly changed function:

#your function without the sum function
repeatfun <- function(x) {ifelse(diff(x) < 7, TRUE, FALSE)}

#data.table solution
setDT(df)[, ,key='DateTime'][, Result := c(FALSE, repeatfun(DateTime)), by=User]

Output:

> df
   User   DateTime Result
1:    A 2015-05-27  FALSE
2:    A 2015-06-23  FALSE
3:    A 2015-06-24   TRUE
4:    A 2015-06-27   TRUE
5:    B 2015-03-04  FALSE
6:    C 2015-07-27  FALSE
7:    D 2015-03-26  FALSE
8:    D 2015-05-20  FALSE
9:    D 2015-05-25   TRUE

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.