I have the following run-length encoding data.
df1 <- structure(list(lengths = c(2L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L), values = c(10, 9, NA, 5, 4, 3, NA, 2, NA, 1, 0, NA, 0)), row.names = c(NA, -13L), class = "data.frame")
df1
# > df1
# lengths values
# 1 2 10
# 2 3 9
# 3 2 NA
# 4 1 5
# 5 1 4
# 6 1 3
# 7 1 NA
# 8 1 2
# 9 2 NA
# 10 1 1
# 11 1 0
# 12 3 NA
# 13 1 0
Using a particular threshold (0.01), I create a new variable in this data frame.
df1$Below_Threshold <- ifelse(df1$values <= 0.01, TRUE, FALSE)
df1
# > df1
# lengths values Below_Threshold
# 1 2 10 FALSE
# 2 3 9 FALSE
# 3 2 NA NA
# 4 1 5 FALSE
# 5 1 4 FALSE
# 6 1 3 FALSE
# 7 1 NA NA
# 8 1 2 FALSE
# 9 2 NA NA
# 10 1 1 FALSE
# 11 1 0 TRUE
# 12 3 NA NA
# 13 1 0 TRUE
I now want to perform run-length encoding on this new variable, but instead of simply returning the number of occurrences, I want to return the sum of the lengths column from the first data frame. The result should look like the sum column in the df2 data frame in the following chunk of code.
df2 <- structure(list(values = c(FALSE, NA, FALSE, NA, FALSE, NA, FALSE, TRUE, NA, TRUE), sum = c(5, 2, 3, 1, 1, 2, 1, 1, 3, 1)), class = "data.frame", row.names = c(NA, -10L))
df2
# > df2
# values sum
# 1 FALSE 5
# 2 NA 2
# 3 FALSE 3
# 4 NA 1
# 5 FALSE 1
# 6 NA 2
# 7 FALSE 1
# 8 TRUE 1
# 9 NA 3
# 10 TRUE 1
Is there a nice, efficient way of achieving this result? base R solutions are preferred but all are welcome.