Comparing the values of a certain number previous rows with the current row [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? As written, this question is lacking some of the information it needs to be answered. If the author adds details in comments, consider editing them into the question. Once there's sufficient detail to answer, vote to reopen the question.

Closed 8 months ago.

Improve this question

In a database containing firm and patent class values, I want to calculate the following variables:

Technological abandonment: Number of previously active technological patent classes abandoned annually.

Specifically, I want to create variables that calculate the number of patent classes (variable = class) that the firm has used in the past 3 years (t-3, t-2, and t-1) (min observation of one year prior is acceptable if the firm history initially doesn't have 3 years) but are missing in this year (t) I would like to do the same with a 5 year window as well.

I have a dataset containing millions of rows, so a fast data.table solution is much preferred.

In the following dataset:

df <- data.table(year=c(1979,1979,1980,1980,1981,1981,1982,1983,1983,1984,1984),
                 category = c("A","A","B","C","A","D","F","F","C","A","B"))

The desired outcome would be (for a three year window):

    year        class tech_aband_3
 1: 1979        A     0
 2: 1979        A     0
 3: 1980        B     1
 4: 1980        C     1
 5: 1981        A     2
 6: 1981        D     2
 7: 1982        F     4
 8: 1983        F     3
 9: 1983        C     3
10: 1984        A     3
11: 1984        B     3

Many thanks in advance.

I do not see (id = gvkey) in the sample data. Please update. — Friede
– Friede, Commented Mar 9 at 10:37
Thanks for the catch, just updated it. I just meant to clarify I'll need to run it by firm id in the end. — lovestacksflow
– lovestacksflow, Commented Mar 9 at 21:40
I'm not clear on why the question was closed. It has been clear enough to draw high quality answers. If the one-sentence reference to the group id was the issue, it is removed. I am respectfully asking the question to be reopened again. — lovestacksflow
– lovestacksflow, Commented Mar 13 at 14:21

iroha · Accepted Answer · 2025-03-08 06:02:12Z

Assuming that all years are represented in the data (if not, you'd need to fill missing years for the following to work), you can try:

library(data.table)  
  
df[, .(category = list(unique(category))), by = year
   ][, tech_aband_3 := lengths(mapply(\(x, y) setdiff(unlist(x), y), 
                                      transpose(shift(list(category), 1:3, fill = first(category[[1]]))), 
                                      category))
     ][, .(category = unlist(category)), by = .(year, tech_aband_3)
       ][ df, on = .(year, category)
       ]

     year tech_aband_3 category
    <num>        <int>   <char>
 1:  1979            0        A
 2:  1979            0        A
 3:  1980            1        B
 4:  1980            1        C
 5:  1981            2        A
 6:  1981            2        D
 7:  1982            4        F
 8:  1983            3        F
 9:  1983            3        C
10:  1984            3        A
11:  1984            3        B

FJCC · Accepted Answer · 2025-03-08 04:23:45Z

Here is a method that works with you example data. I can't say how fast it will be with a large data set.

library(data.table)
library(purrr)
df <- data.table(year=c(1979,1979,1980,1980,1981,1981,1982,1983,1983,1984,1984),
                 category = c("A","A","B","C","A","D","F","F","C","A","B"))

GetCount <- function(CurrYear) {
  Prev <- unique(df[(CurrYear - year) <= 3 & (CurrYear - year) > 0, "category"])
  Current <- unique(df[year == CurrYear, "category"])
  return(nrow(Prev[!Current, on = "category"]))
}

YEARS <- unique(df$year)       
COUNTS <- map_dbl(YEARS, GetCount)
YearsCounts <- data.table(year = YEARS, tech_aband_3 = COUNTS)

FINAL <- YearsCounts[df, on = "year"]
FINAL

     year tech_aband_3 category
    <num>        <num>   <char>
 1:  1979            0        A
 2:  1979            0        A
 3:  1980            1        B
 4:  1980            1        C
 5:  1981            2        A
 6:  1981            2        D
 7:  1982            4        F
 8:  1983            3        F
 9:  1983            3        C
10:  1984            3        A
11:  1984            3        B

Collectives™ on Stack Overflow

Comparing the values of a certain number previous rows with the current row [closed]

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related