In a database containing firm and patent class values, I want to calculate the following variables:
Technological abandonment: Number of previously active technological patent classes abandoned annually.
Specifically, I want to create variables that calculate the number of patent classes (variable = class) that the firm has used in the past 3 years (t-3, t-2, and t-1) (min observation of one year prior is acceptable if the firm history initially doesn't have 3 years) but are missing in this year (t) I would like to do the same with a 5 year window as well.
I have a dataset containing millions of rows, so a fast data.table solution is much preferred.
In the following dataset:
df <- data.table(year=c(1979,1979,1980,1980,1981,1981,1982,1983,1983,1984,1984),
category = c("A","A","B","C","A","D","F","F","C","A","B"))
The desired outcome would be (for a three year window):
year class tech_aband_3
1: 1979 A 0
2: 1979 A 0
3: 1980 B 1
4: 1980 C 1
5: 1981 A 2
6: 1981 D 2
7: 1982 F 4
8: 1983 F 3
9: 1983 C 3
10: 1984 A 3
11: 1984 B 3
Many thanks in advance.
gvkey) in the sample data. Please update.