3

i have a dataset which consists of different measurements in each column and the last column consists of values (0,1,2).

for example lets say my dataframe looks like this (ignore the values of v1:v5)

 1. v1 v2 v3 v4 v5 v6 
 2. 24 76 98 89 87 2
 3. 24 76 98 89 87 2
 4. 24 76 98 89 87 1
 5. 24 76 98 89 87 2
 6. 24 76 98 89 87 2

I am interested in the values of v6 column and i want to extract the rows where the value equals to 2. In the above example, i would like to extract the first 2 rows and save them as a new dataframe and also extract the 5th and 6th row as a different dataframe and save this too. To be more clear, when my values are equal to 2 and are consequtive, i need them saved as a new dataframe. when the value is different, i need the loop to ignore it and find again the interested value (which is 2). If my dataframe has 70 blocks of consequtive 2 in the last column i need to end up with 70 dataframes.

I tried for loop but i am fairly new to R and programming and i am stuck.

this is what i tried so far:

> 
>
>
>x=1 
>for (i in 1:nrow(dataframe)) {
>     
>     if (dataframe[i,lastcolumn] == 2 && x==1) {
>         
>         start.event <- dataframe[i,]
>         
>     }
>     
>     if (dataframe[i,lastcolumn] != 2) {
>         end.event <- dataframe[i-1,]
>         
>     }
>     
>     else {
>         
>         df[1] <- dataframe( start.event:end.event , )
>         x = 1
>     }
>      }

I would really appreciate any help.

Thanks in advance

3
  • Will each saved data.frame be two rows, or should the program grab all consecutive rows ? Commented May 31, 2017 at 13:04
  • The program should grab all the consequtive rowss Commented May 31, 2017 at 13:11
  • Not sure if this helps, but maybe something to with: which( diff(as.integer(rownames(dataframe[dataframe$V6 == 2,])) ) == 1) Commented May 31, 2017 at 13:21

2 Answers 2

2

Here is one way using base R

#use rle to set indicator variable for groups of 2
rl <- rle(df$v6)
rl$values <- cumsum(rl$lengths==2)
df$ind <- inverse.rle(rl)

#filter out other values from df
df <- df[df$v6==2,]

#split by indicator (and remove it)
dflist <- split(df[,-ncol(df)],df$ind)

dflist #elements of list are named after number of 2-group
$`1`
   v1 v2 v3 v4 v5 v6
2. 24 76 98 89 87  2
3. 24 76 98 89 87  2

$`2`
   v1 v2 v3 v4 v5 v6
5. 24 76 98 89 87  2
6. 24 76 98 89 87  2
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, i will give it a try and come back to you
1

One way is to create groups (grp) based on when v6 changes. Filter out all rows where v6 != 2 and split on grp

new_d <- subset(transform(df, grp = cumsum(c(1, diff(v6) != 0))), v6 == 2)
split(new_d, new_d$grp)

#$`1`
#  v1 v2 v3 v4 v5 v6 grp
#1 24 76 98 89 87  2   1
#2 24 76 98 89 87  2   1

#$`3`
#  v1 v2 v3 v4 v5 v6 grp
#4 24 76 98 89 87  2   3
#5 24 76 98 89 87  2   3

Or via dplyr,

library(dplyr)

new_d <- df %>% 
   mutate(grp = cumsum(c(1, diff(v6) != 0))) %>% 
   filter(v6 == 2) 

split(new_d, new_d$grp)

DATA USED

structure(list(v1 = c(24L, 24L, 24L, 24L, 24L), v2 = c(76L, 76L, 
76L, 76L, 76L), v3 = c(98L, 98L, 98L, 98L, 98L), v4 = c(89L, 
89L, 89L, 89L, 89L), v5 = c(87L, 87L, 87L, 87L, 87L), v6 = c(2L, 
2L, 1L, 2L, 2L)), .Names = c("v1", "v2", "v3", "v4", "v5", "v6"
), class = "data.frame", row.names = c(NA, -5L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.