2

Here's my dataset:

df = data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2), 
                treatment = c(0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0),
                date = lubridate::ymd(c("2019-07-07", "2019-07-07", "2019-07-07", "2019-07-07", 
"2019-07-07", "2019-07-06", "2019-07-06", "2019-07-05",
"2019-07-05", "2019-04-20", "2019-04-20", "2019-04-20", 
"2019-04-20", "2019-04-19", "2019-04-19", "2019-03-14",
"2019-03-14", "2019-03-14", "2019-03-14", "2019-03-14")))

I need to create a variale which reflects date after treatment for each id. Like this:

df = data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2), 
                treatment = c(0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0),
                date = lubridate::ymd(c("2019-07-07", "2019-07-07", "2019-07-07", "2019-07-07", "2019-07-07", "2019-07-06", "2019-07-06", "2019-07-05",
                         "2019-07-05", "2019-04-20", "2019-04-20", "2019-04-20", "2019-04-20", "2019-04-19", "2019-04-19", "2019-03-14",
                         "2019-03-14", "2019-03-14", "2019-03-14", "2019-03-14")),
                dat = c(0,0,0,1,2,3,0,1,2,3,4,5,6,0,0,1,0,1,2,3)
)

Can you help me with this?

5
  • whats your expected result? Commented Aug 24, 2022 at 11:56
  • it's not quite clear what your expected result is. Can you add that to your question? Commented Aug 24, 2022 at 11:56
  • Is treatment binary? If so, then the 10 in row 17 is a typo. Commented Aug 24, 2022 at 12:02
  • I added the expected result. Yes, treatment is binary, I corrected, thanks! Commented Aug 24, 2022 at 12:05
  • You say "date after treatment", but (1) the data is sorted in reverse, and (2) you seem to be counting rows after treatment, not the days themselves. Especially since many date values are repeated within one id, it seems that you're hoping to count rows instead. Is that right? Is it safe to assume that the order of rows is controlled externally? Commented Aug 24, 2022 at 12:33

2 Answers 2

2
library(dplyr)
df %>%
  group_by(id, grp = cumsum(treatment)) %>%
  mutate(dat2 = cumsum(cumany(lag(treatment > 0, default = FALSE)))) %>%
  ungroup()
# # A tibble: 20 x 6
#       id treatment date         dat   grp  dat2
#    <dbl>     <dbl> <date>     <dbl> <dbl> <int>
#  1     1         0 2019-07-07     0     0     0
#  2     1         0 2019-07-07     0     0     0
#  3     1         1 2019-07-07     0     1     0
#  4     1         0 2019-07-07     1     1     1
#  5     1         0 2019-07-07     2     1     2
#  6     1         0 2019-07-06     3     1     3
#  7     1         1 2019-07-06     0     2     0
#  8     1         0 2019-07-05     1     2     1
#  9     1         0 2019-07-05     2     2     2
# 10     1         0 2019-04-20     3     2     3
# 11     1         0 2019-04-20     4     2     4
# 12     1         0 2019-04-20     5     2     5
# 13     1         0 2019-04-20     6     2     6
# 14     2         0 2019-04-19     0     2     0
# 15     2         1 2019-04-19     0     3     0
# 16     2         0 2019-03-14     1     3     1
# 17     2         1 2019-03-14     0     4     0
# 18     2         0 2019-03-14     1     4     1
# 19     2         0 2019-03-14     2     4     2
# 20     2         0 2019-03-14     3     4     3

You can of course delete grp after this.

An alternative:

df %>%
  group_by(id, grp = cumsum(treatment)) %>%
  mutate(dat2 = if (first(treatment)) row_number() - 1 else 0) %>%
  ungroup()
Sign up to request clarification or add additional context in comments.

Comments

1

Here is a way.

df <- data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2),
                treatment = c(0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0),
                date = lubridate::ymd(c("2019-07-07", "2019-07-07", "2019-07-07", "2019-07-07", 
                                        "2019-07-07", "2019-07-06", "2019-07-06", "2019-07-05",
                                        "2019-07-05", "2019-04-20", "2019-04-20", "2019-04-20", 
                                        "2019-04-20", "2019-04-19", "2019-04-19", "2019-03-14",
                                        "2019-03-14", "2019-03-14", "2019-03-14", "2019-03-14")))

suppressPackageStartupMessages(library(dplyr))

df %>%
  group_by(id) %>%
  mutate(days = cumsum(treatment)) %>%
  group_by(id, days) %>%
  mutate(days = ifelse(days > 0, row_number() - 1L, 0)) %>%
  ungroup()
#> # A tibble: 20 × 4
#>       id treatment date        days
#>    <dbl>     <dbl> <date>     <dbl>
#>  1     1         0 2019-07-07     0
#>  2     1         0 2019-07-07     0
#>  3     1         1 2019-07-07     0
#>  4     1         0 2019-07-07     1
#>  5     1         0 2019-07-07     2
#>  6     1         0 2019-07-06     3
#>  7     1         1 2019-07-06     0
#>  8     1         0 2019-07-05     1
#>  9     1         0 2019-07-05     2
#> 10     1         0 2019-04-20     3
#> 11     1         0 2019-04-20     4
#> 12     1         0 2019-04-20     5
#> 13     1         0 2019-04-20     6
#> 14     2         0 2019-04-19     0
#> 15     2         1 2019-04-19     0
#> 16     2         0 2019-03-14     1
#> 17     2         1 2019-03-14     0
#> 18     2         0 2019-03-14     1
#> 19     2         0 2019-03-14     2
#> 20     2         0 2019-03-14     3

Created on 2022-08-24 by the reprex package (v2.0.1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.