Identify first instance of repeated values within a group

Question

I have the following dataframe with three columns - FIRM, YEAR and DUMMY (0,1). For every FIRM, I wanted to scan all years and identify the first case where the DUMMY value of 1 is repeated more than once (in consecutive rows). Then, I want to create a new column which contains 0, for all the years in which the DUMMY was 1 and contains -1,-2,-3 for the years before it, and 1,2,3 for the years after it.

------------------------------
| FIRM | YEAR | DUMMY| NEW_COL
------------------------------
|  A   | 2006 |   0  |   0   |
------------------------------
|  A   | 2007 |   1  |   0   |
------------------------------
|  A   | 2008 |   0  |   0   |
------------------------------
|  B   | 2006 |   0  |   0   |
------------------------------
|  B   | 2007 |   0  |  -1   |
------------------------------
|  B   | 2008 |   1  |   0   |
------------------------------
|  B   | 2009 |   1  |   0   |
------------------------------
|  B   | 2010 |   0  |   1   |
------------------------------
|  B   | 2011 |   0  |   2   |
------------------------------
|  B   | 2012 |   1  |   3   |
------------------------------
|  B   | 2013 |   1  |   4   |
------------------------------

Frank Zhang · Accepted Answer · 2020-04-25 04:34:19Z

a data.table solution.

I think the year 2006 for firm B should be -2 based on your description.

library(data.table)

dt <- fread(' FIRM  YEAR  DUMMY NEW_COL
 A  2006  0  0 
 A  2007  1  0 
 A  2008  0  0 
 B  2006  0  0 
 B  2007  0  -1 
 B  2008  1  0 
 B  2009  1  0 
 B  2010  0  1 
 B  2011  0  2 
 B  2012  1  3 
 B  2013  1  4 ')


dt[,c("flag","grp"):=.((.N>1) & (DUMMY==1),
                       .GRP),by=.(FIRM,rleid(DUMMY))]
dt
#>     FIRM YEAR DUMMY NEW_COL  flag grp
#>  1:    A 2006     0       0 FALSE   1
#>  2:    A 2007     1       0 FALSE   2
#>  3:    A 2008     0       0 FALSE   3
#>  4:    B 2006     0       0 FALSE   4
#>  5:    B 2007     0      -1 FALSE   4
#>  6:    B 2008     1       0  TRUE   5
#>  7:    B 2009     1       0  TRUE   5
#>  8:    B 2010     0       1 FALSE   6
#>  9:    B 2011     0       2 FALSE   6
#> 10:    B 2012     1       3  TRUE   7
#> 11:    B 2013     1       4  TRUE   7

dt[flag==TRUE,result:=fifelse(grp==min(grp),0,99),by=.(FIRM)]
dt
#>     FIRM YEAR DUMMY NEW_COL  flag grp result
#>  1:    A 2006     0       0 FALSE   1     NA
#>  2:    A 2007     1       0 FALSE   2     NA
#>  3:    A 2008     0       0 FALSE   3     NA
#>  4:    B 2006     0       0 FALSE   4     NA
#>  5:    B 2007     0      -1 FALSE   4     NA
#>  6:    B 2008     1       0  TRUE   5      0
#>  7:    B 2009     1       0  TRUE   5      0
#>  8:    B 2010     0       1 FALSE   6     NA
#>  9:    B 2011     0       2 FALSE   6     NA
#> 10:    B 2012     1       3  TRUE   7     99
#> 11:    B 2013     1       4  TRUE   7     99



dt[,result:=lapply(.SD,function(x){
  if (any(!is.na(x==0))){
    position_0_head <- head(which(x==0),1)
    position_0_tail <- tail(which(x==0),1)
    x[1:position_0_head] <- 0 - (YEAR[position_0_head]-YEAR[1:position_0_head])
    x[position_0_tail:length(x)] <- 0 + (YEAR[position_0_tail:length(x)]-YEAR[position_0_tail])
  } else{
    x <- 0
  }
  x
}),.SDcols="result",by=.(FIRM)]

dt[,.SD,.SDcols = !c("flag","grp")]
#>     FIRM YEAR DUMMY NEW_COL result
#>  1:    A 2006     0       0      0
#>  2:    A 2007     1       0      0
#>  3:    A 2008     0       0      0
#>  4:    B 2006     0       0     -2
#>  5:    B 2007     0      -1     -1
#>  6:    B 2008     1       0      0
#>  7:    B 2009     1       0      0
#>  8:    B 2010     0       1      1
#>  9:    B 2011     0       2      2
#> 10:    B 2012     1       3      3
#> 11:    B 2013     1       4      4

^{Created on 2020-04-25 by the reprex package (v0.3.0)}

Collectives™ on Stack Overflow

Identify first instance of repeated values within a group

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related