4

I have a dataframe with a list of consecutive measurements of temperature over a period of time. I would like to split these up into separate dataframes based on when the temperature is consecutively above zero. For example:

Date Time Temp
2022-05-20 4:16 32
2022-05-20 4:17 16
2022-05-20 4:18 0
2022-05-20 4:19 7
2022-05-20 4:34 75
2022-05-20 4:39 5
2022-05-20 4:48 0
2022-05-20 4:49 0
2022-05-20 4:59 83

The desired output would be:

Date Time Temp
2022-05-20 4:16 32
2022-05-20 4:17 16

and

Date Time Temp
2022-05-20 4:19 7
2022-05-20 4:34 75
2022-05-20 4:39 5

and

Date Time Temp
2022-05-20 4:59 83

3 Answers 3

3

A base R approach would be to use split with cumsum:

split(df[df$Temp != 0, ], 
      cumsum(df$Temp == 0)[df$Temp != 0])

Or a much more readable format suggested by @thelatemail

sel <- df$Temp == 0

split(df[!sel,], cumsum(sel)[!sel])

Output for both:

# $`0`
#   Date Time Temp
# 1 2022-05-20 4:16   32
# 2 2022-05-20 4:17   16
# 
# $`1`
#   Date Time Temp
# 4 2022-05-20 4:19    7
# 5 2022-05-20 4:34   75
# 6 2022-05-20 4:39    5
# 
# $`3`
#   Date Time Temp
# 9 2022-05-20 4:59   83

(Using these data)

df <- read.table(text = "Date   Time    Temp
2022-05-20  4:16    32
2022-05-20  4:17    16
2022-05-20  4:18    0
2022-05-20  4:19    7
2022-05-20  4:34    75
2022-05-20  4:39    5
2022-05-20  4:48    0
2022-05-20  4:49    0
2022-05-20  4:59    83", header = TRUE)
Sign up to request clarification or add additional context in comments.

2 Comments

Or to separate the selection part out - sel <- df$Temp == 0; split(df[!sel,], cumsum(sel)[!sel])
@thelatemail thanks! Thats way more readable - I've included this improvement in my edit. Thanks again!
2
  • With base R, you can try rle
split(
    subset(
        transform(
            df,
            grp = with(
                rle(Temp > 0),
                rep(cumsum(values), lengths)
            )
        ), Temp > 0
    ), ~grp
)

which gives

$`1`
        Date Time Temp grp
1 2022-05-20 4:16   32   1
2 2022-05-20 4:17   16   1

$`2`
        Date Time Temp grp
4 2022-05-20 4:19    7   2
5 2022-05-20 4:34   75   2
6 2022-05-20 4:39    5   2

$`3`
        Date Time Temp grp
9 2022-05-20 4:59   83   3
  • With dplyr, you can try consecutive_id
library(dplyr)
df %>%
    mutate(grp = consecutive_id(Temp > 0)) %>%
    filter(Temp > 0) %>%
    group_split(grp, .keep = FALSE)

which gives

[[1]]
# A tibble: 2 × 3
  Date       Time   Temp
  <chr>      <chr> <int>
1 2022-05-20 4:16     32
2 2022-05-20 4:17     16

[[2]]
# A tibble: 3 × 3
  Date       Time   Temp
  <chr>      <chr> <int>
1 2022-05-20 4:19      7
2 2022-05-20 4:34     75
3 2022-05-20 4:39      5

[[3]]
# A tibble: 1 × 3
  Date       Time   Temp
  <chr>      <chr> <int>
1 2022-05-20 4:59     83

data

> dput(df)
structure(list(Date = c("2022-05-20", "2022-05-20", "2022-05-20",
"2022-05-20", "2022-05-20", "2022-05-20", "2022-05-20", "2022-05-20",
"2022-05-20"), Time = c("4:16", "4:17", "4:18", "4:19", "4:34",
"4:39", "4:48", "4:49", "4:59"), Temp = c(32L, 16L, 0L, 7L, 75L,
5L, 0L, 0L, 83L)), class = "data.frame", row.names = c(NA, -9L
))

Comments

2

You can do this in dplyr where a group column remains

x = data.frame(num = 1:20, t = rep(c(23,16,5,0),5))

library(dplyr)    
x <- x %>%
  mutate(group = cumsum(lag(t, default = 0) == 0 & t > 0))%>% 
  filter(t > 0)    
split_list <- split(x, x$group)

or in base R

sl <- split(x[x$t > 0, ] , cumsum(c(1, diff(x$t == 0) > 0))[x$t > 0])

giving

$`1`
  num  t
1   1 23
2   2 16
3   3  5

$`2`
  num  t
5   5 23
6   6 16
7   7  5

$`3`
   num  t
9    9 23
10  10 16
11  11  5

$`4`
   num  t
13  13 23
14  14 16
15  15  5

$`5`
   num  t
17  17 23
18  18 16
19  19  5

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.