I have a df in R with multiple events per individual. Each event is currently defined by a unique date. The column diff_earliest_date is the number of days between the earliest date (by unique id) and the subsequent date.
data <- data.frame(
id = c(1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 5678, 5678, 5678, 5678, 5678, 5678),
date = as.Date(c("1997-04-08", "1997-04-12", "1997-04-19", "1997-09-01", "1997-09-14", "1997-10-22", "1997-12-10",
"1998-06-16", "1998-06-27", "1998-06-29", "1998-09-13", "1998-09-25", "1999-05-17", "1999-07-18",
"1997-04-08", "1997-04-22", "1997-09-14", "1997-10-25", "1998-05-13", "1998-10-07")),
event = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6),
diff_earliest_date = c("0 days", "4 days", "11 days", "146 days", "159 days", "197 days", "246 days", "434 days",
"445 days", "447 days", "523 days", "535 days", "769 days", "831 days", "0 days", "10 days",
"159 days", "200 days", "400 days", "547 days"))
> data
id date event diff_earliest_date
1 1234 1997-04-08 1 0 days
2 1234 1997-04-12 2 4 days
3 1234 1997-04-19 3 11 days
4 1234 1997-09-01 4 146 days
5 1234 1997-09-14 5 159 days
6 1234 1997-10-22 6 197 days
7 1234 1997-12-10 7 246 days
8 1234 1998-06-16 8 434 days
9 1234 1998-06-27 9 445 days
10 1234 1998-06-29 10 447 days
11 1234 1998-09-13 11 523 days
12 1234 1998-09-25 12 535 days
13 1234 1999-05-17 13 769 days
14 1234 1999-07-18 14 831 days
15 5678 1997-04-08 1 0 days
16 5678 1997-04-22 2 10 days
17 5678 1997-09-14 3 159 days
18 5678 1997-10-25 4 200 days
19 5678 1998-05-13 5 400 days
20 5678 1998-10-07 6 547 days
For each id, I want to define unique events separated by 243 days apart. When the interval between the earliest date (by id) and each subsequent date is > 243, it will reset the difference in days for a new unique event. The data with the new columns (diff_reset and unique_event) should look like this:
> data
id date event diff_earliest_date diff_reset unique_event
1 1234 1997-04-08 1 0 days 0 days 1
2 1234 1997-04-12 2 4 days 4 days 1
3 1234 1997-04-19 3 11 days 11 days 1
4 1234 1997-09-01 4 146 days 146 days 1
5 1234 1997-09-14 5 159 days 159 days 1
6 1234 1997-10-22 6 197 days 197 days 1
7 1234 1997-12-10 7 246 days 0 days 2
8 1234 1998-06-16 8 434 days 189 days 2
9 1234 1998-06-27 9 445 days 200 days 2
10 1234 1998-06-29 10 447 days 202 days 2
11 1234 1998-09-13 11 523 days 0 days 3
12 1234 1998-09-25 12 535 days 13 days 3
13 1234 1999-05-17 13 769 days 0 days 4
14 1234 1999-07-18 14 831 days 63 days 4
15 5678 1997-04-08 1 0 days 0 days 1
16 5678 1997-04-22 2 14 days 14 days 1
17 5678 1997-09-14 3 159 days 159 days 1
18 5678 1997-10-25 4 200 days 0 days 1
19 5678 1998-05-13 5 400 days 200 days 2
20 5678 1998-10-07 6 547 days 0 days 3
I think I can only achieve that by running a loop through my data. I have tried mutate using date operations and lag(date) using dplyr with no success so far. Very frustrating! I would very much appreciate your help.
Thanks.