I have data with a grouping variable 'id' and a date column with missing values:
id time date
a 1 2004-01-13
a 2 2004-05-04
a 3 NA
a 4 2007-03-20
a 5 NA
b 1 2004-01-11
b 2 2004-05-04
b 3 NA
b 4 2006-10-10
b 5 NA
c 1 2004-05-23
c 2 2004-10-14
c 3 NA
c 4 NA
c 5 NA
Within each 'id', I would like to find the difference between each consecutive pair of dates:
id time date difftime
a 1 2004-01-13 NA
a 2 2004-05-04 (2004-05-04)-(2004-01-13)
a 3 NA NA
a 4 2007-03-20 (2007-03-20)-(2004-05-04)
a 5 NA NA
b 1 2004-01-11 NA
b 2 2004-05-04 (2004-05-04)-(2004-01-11)
b 3 NA NA
b 4 2006-10-10 (2006-10-10)-(2004-05-04)
b 5 NA NA
c 1 2004-05-23 NA
c 2 2004-10-14 (2004-10-14)-(2004-05-23)
c 3 NA NA
c 4 NA NA
c 5 NA NA
I tried these codes but none of them got what I want.
data$difftime <- aggregate(date ~ id, data, diff)
library(data.table)
setDT(data)[ , difftime := diff(data$date), by = id]
diff(data$date)
data.tableattempt, you may restrict yourdiffcalculation to the non-NArows (put!is.na(date)in theislot). Also remember that the length ofdiffis one less than the original data so you need to pad with oneNA:d[!is.na(date), dif := c(NA, diff(date)), id]