1

I have data with a grouping variable 'id' and a date column with missing values:

id time    date
a   1   2004-01-13
a   2   2004-05-04
a   3       NA
a   4   2007-03-20
a   5       NA
b   1   2004-01-11
b   2   2004-05-04
b   3       NA
b   4   2006-10-10
b   5       NA
c   1   2004-05-23
c   2   2004-10-14
c   3       NA
c   4       NA
c   5       NA

Within each 'id', I would like to find the difference between each consecutive pair of dates:

id time    date                 difftime
a   1   2004-01-13                 NA
a   2   2004-05-04      (2004-05-04)-(2004-01-13)
a   3       NA                     NA  
a   4   2007-03-20      (2007-03-20)-(2004-05-04)
a   5       NA                     NA
b   1   2004-01-11                 NA
b   2   2004-05-04      (2004-05-04)-(2004-01-11)
b   3       NA                     NA
b   4   2006-10-10      (2006-10-10)-(2004-05-04)
b   5       NA                     NA
c   1   2004-05-23                 NA
c   2   2004-10-14      (2004-10-14)-(2004-05-23)
c   3       NA                     NA
c   4       NA                     NA
c   5       NA                     NA

I tried these codes but none of them got what I want.

data$difftime <- aggregate(date ~ id, data, diff)

library(data.table)
setDT(data)[ , difftime := diff(data$date), by = id] 
  
diff(data$date)
2
  • 1
    Builiding on your data.table attempt, you may restrict your diff calculation to the non-NA rows (put !is.na(date) in the i slot). Also remember that the length of diff is one less than the original data so you need to pad with one NA: d[!is.na(date), dif := c(NA, diff(date)), id] Commented Feb 26, 2021 at 0:14
  • I've tried both of your suggestions, thank you so much! Commented Feb 26, 2021 at 0:25

1 Answer 1

1

Hope this data.table option could help

setDT(df)[
  ,
  difftime := replace(
    rep(NA, .N),
    which(!is.na(date))[-1],
    diff(na.omit(date))
  ),
  id
]

or a shorter one (thank @Henrik)

setDT(df)[!is.na(date), difftime := c(NA, diff(date)), id]

which gives

    id time       date difftime
 1:  a    1 2004-01-13       NA
 2:  a    2 2004-05-04      112
 3:  a    3       <NA>       NA
 4:  a    4 2007-03-20     1050
 5:  a    5       <NA>       NA
 6:  b    1 2004-01-11       NA
 7:  b    2 2004-05-04      114
 8:  b    3       <NA>       NA
 9:  b    4 2006-10-10      889
10:  b    5       <NA>       NA
11:  c    1 2004-05-23       NA
12:  c    2 2004-10-14      144
13:  c    3       <NA>       NA
14:  c    4       <NA>       NA
Sign up to request clarification or add additional context in comments.

3 Comments

just tried it and got exactly what I need. Thank you so much!
Or d[!is.na(date), dif := c(NA, diff(date)), id]
@Henrik That's smart! Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.