Create a variable days after treatment

Question

Here's my dataset:

df = data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2), 
                treatment = c(0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0),
                date = lubridate::ymd(c("2019-07-07", "2019-07-07", "2019-07-07", "2019-07-07", 
"2019-07-07", "2019-07-06", "2019-07-06", "2019-07-05",
"2019-07-05", "2019-04-20", "2019-04-20", "2019-04-20", 
"2019-04-20", "2019-04-19", "2019-04-19", "2019-03-14",
"2019-03-14", "2019-03-14", "2019-03-14", "2019-03-14")))

I need to create a variale which reflects date after treatment for each id. Like this:

df = data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2), 
                treatment = c(0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0),
                date = lubridate::ymd(c("2019-07-07", "2019-07-07", "2019-07-07", "2019-07-07", "2019-07-07", "2019-07-06", "2019-07-06", "2019-07-05",
                         "2019-07-05", "2019-04-20", "2019-04-20", "2019-04-20", "2019-04-20", "2019-04-19", "2019-04-19", "2019-03-14",
                         "2019-03-14", "2019-03-14", "2019-03-14", "2019-03-14")),
                dat = c(0,0,0,1,2,3,0,1,2,3,4,5,6,0,0,1,0,1,2,3)
)

Can you help me with this?

it's not quite clear what your expected result is. Can you add that to your question? — VvdL
– VvdL, Commented Aug 24, 2022 at 11:56
Is treatment binary? If so, then the 10 in row 17 is a typo. — Rui Barradas
– Rui Barradas, Commented Aug 24, 2022 at 12:02
I added the expected result. Yes, treatment is binary, I corrected, thanks! — Petr
– Petr, Commented Aug 24, 2022 at 12:05
You say "date after treatment", but (1) the data is sorted in reverse, and (2) you seem to be counting rows after treatment, not the days themselves. Especially since many date values are repeated within one id, it seems that you're hoping to count rows instead. Is that right? Is it safe to assume that the order of rows is controlled externally? — r2evans
– r2evans, Commented Aug 24, 2022 at 12:33

r2evans · Accepted Answer · 2022-08-24 12:13:37Z

library(dplyr)
df %>%
  group_by(id, grp = cumsum(treatment)) %>%
  mutate(dat2 = cumsum(cumany(lag(treatment > 0, default = FALSE)))) %>%
  ungroup()
# # A tibble: 20 x 6
#       id treatment date         dat   grp  dat2
#    <dbl>     <dbl> <date>     <dbl> <dbl> <int>
#  1     1         0 2019-07-07     0     0     0
#  2     1         0 2019-07-07     0     0     0
#  3     1         1 2019-07-07     0     1     0
#  4     1         0 2019-07-07     1     1     1
#  5     1         0 2019-07-07     2     1     2
#  6     1         0 2019-07-06     3     1     3
#  7     1         1 2019-07-06     0     2     0
#  8     1         0 2019-07-05     1     2     1
#  9     1         0 2019-07-05     2     2     2
# 10     1         0 2019-04-20     3     2     3
# 11     1         0 2019-04-20     4     2     4
# 12     1         0 2019-04-20     5     2     5
# 13     1         0 2019-04-20     6     2     6
# 14     2         0 2019-04-19     0     2     0
# 15     2         1 2019-04-19     0     3     0
# 16     2         0 2019-03-14     1     3     1
# 17     2         1 2019-03-14     0     4     0
# 18     2         0 2019-03-14     1     4     1
# 19     2         0 2019-03-14     2     4     2
# 20     2         0 2019-03-14     3     4     3

You can of course delete grp after this.

An alternative:

df %>%
  group_by(id, grp = cumsum(treatment)) %>%
  mutate(dat2 = if (first(treatment)) row_number() - 1 else 0) %>%
  ungroup()

Rui Barradas · Accepted Answer · 2022-08-24 12:40:47Z

Here is a way.

df <- data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2),
                treatment = c(0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0),
                date = lubridate::ymd(c("2019-07-07", "2019-07-07", "2019-07-07", "2019-07-07", 
                                        "2019-07-07", "2019-07-06", "2019-07-06", "2019-07-05",
                                        "2019-07-05", "2019-04-20", "2019-04-20", "2019-04-20", 
                                        "2019-04-20", "2019-04-19", "2019-04-19", "2019-03-14",
                                        "2019-03-14", "2019-03-14", "2019-03-14", "2019-03-14")))

suppressPackageStartupMessages(library(dplyr))

df %>%
  group_by(id) %>%
  mutate(days = cumsum(treatment)) %>%
  group_by(id, days) %>%
  mutate(days = ifelse(days > 0, row_number() - 1L, 0)) %>%
  ungroup()
#> # A tibble: 20 × 4
#>       id treatment date        days
#>    <dbl>     <dbl> <date>     <dbl>
#>  1     1         0 2019-07-07     0
#>  2     1         0 2019-07-07     0
#>  3     1         1 2019-07-07     0
#>  4     1         0 2019-07-07     1
#>  5     1         0 2019-07-07     2
#>  6     1         0 2019-07-06     3
#>  7     1         1 2019-07-06     0
#>  8     1         0 2019-07-05     1
#>  9     1         0 2019-07-05     2
#> 10     1         0 2019-04-20     3
#> 11     1         0 2019-04-20     4
#> 12     1         0 2019-04-20     5
#> 13     1         0 2019-04-20     6
#> 14     2         0 2019-04-19     0
#> 15     2         1 2019-04-19     0
#> 16     2         0 2019-03-14     1
#> 17     2         1 2019-03-14     0
#> 18     2         0 2019-03-14     1
#> 19     2         0 2019-03-14     2
#> 20     2         0 2019-03-14     3

^{Created on 2022-08-24 by the reprex package (v2.0.1)}

Collectives™ on Stack Overflow

Create a variable days after treatment

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related