I have a data.table which age column contain missing values and rdate is Date format. I want to replace missing age by finding the next non-missing age and rdate of each horsenum, then calculate the missing age by next non-missing age - ceiling year difference of non-missing rdate and this record' rdate. I assume next non-missing rdate is birthday so I use ceiling year difference. Also, I want to keep rdate.fill as Date format. How to write this in data.table code?
My idea of age.fill is calculate by this way, but I have error
library(lubridate)
data[, rdate.fill := ifelse(is.na(age), as.Date(rdate[na.lacf(age)]), NA), by=horsenum]
data[, age.fill := ifelse(is.na(age), ind4- ceiling(time_length(difftime(rdate.fill, rdate, "years"), age), by=horsenum]
input
index rdate horsenum age ind4
1: 14704 2009-03-01 K123 NA 10
2: 14767 2009-03-01 K212 NA 9
3: 39281 2011-10-09 K123 NA 10
4: 39561 2011-10-19 K212 NA 9
5: 74560 2015-04-07 K212 NA 9
6: 77972 2015-09-06 K123 10 NA
7: 79111 2015-10-10 K212 9 NA
8: 84233 2016-03-28 K212 10 NA
structure(list(index = c(14704L, 14767L, 39281L, 39561L, 74560L,
77972L, 79111L, 84233L), rdate = structure(c(14304, 14304, 15256,
15266, 16532, 16684, 16718, 16888), class = "Date"), horsenum = c("K123",
"K212", "K123", "K212", "K212", "K123", "K212", "K212"), age = c(NA,
NA, NA, NA, NA, 10, 9, 10), ind4 = c(10, 9, 10, 9, 9, NA, NA,
NA)), row.names = c(NA, -8L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x000002c5512f1ef0>)
output
index rdate horsenum age ind4 rdate.fill age.fill
1: 14704 2009-03-01 K123 NA 10 2015-09-06 3
2: 14767 2009-03-01 K212 NA 9 2015-10-10 2
3: 39281 2011-10-09 K123 NA 10 2015-09-06 6
4: 39561 2011-10-19 K212 NA 9 2015-10-10 5
5: 74560 2015-04-07 K212 NA 9 2015-10-10 8
6: 77972 2015-09-06 K123 10 NA 10
7: 79111 2015-10-10 K212 9 NA 9
8: 84233 2016-03-28 K212 10 NA 10
text, not images.