Subsetting dataframe using conditions and saving each subset as a new dataframe

Question

i have a dataset which consists of different measurements in each column and the last column consists of values (0,1,2).

for example lets say my dataframe looks like this (ignore the values of v1:v5)

 1. v1 v2 v3 v4 v5 v6 
 2. 24 76 98 89 87 2
 3. 24 76 98 89 87 2
 4. 24 76 98 89 87 1
 5. 24 76 98 89 87 2
 6. 24 76 98 89 87 2

I am interested in the values of v6 column and i want to extract the rows where the value equals to 2. In the above example, i would like to extract the first 2 rows and save them as a new dataframe and also extract the 5th and 6th row as a different dataframe and save this too. To be more clear, when my values are equal to 2 and are consequtive, i need them saved as a new dataframe. when the value is different, i need the loop to ignore it and find again the interested value (which is 2). If my dataframe has 70 blocks of consequtive 2 in the last column i need to end up with 70 dataframes.

I tried for loop but i am fairly new to R and programming and i am stuck.

this is what i tried so far:

> 
>
>
>x=1 
>for (i in 1:nrow(dataframe)) {
>     
>     if (dataframe[i,lastcolumn] == 2 && x==1) {
>         
>         start.event <- dataframe[i,]
>         
>     }
>     
>     if (dataframe[i,lastcolumn] != 2) {
>         end.event <- dataframe[i-1,]
>         
>     }
>     
>     else {
>         
>         df[1] <- dataframe( start.event:end.event , )
>         x = 1
>     }
>      }

I would really appreciate any help.

Thanks in advance

Will each saved data.frame be two rows, or should the program grab all consecutive rows ? — bouncyball
– bouncyball, Commented May 31, 2017 at 13:04
Not sure if this helps, but maybe something to with: which( diff(as.integer(rownames(dataframe[dataframe$V6 == 2,])) ) == 1) — boraas
– boraas, Commented May 31, 2017 at 13:21

Andrew Gustar · Accepted Answer · 2017-05-31 13:14:25Z

2

Here is one way using base R

#use rle to set indicator variable for groups of 2
rl <- rle(df$v6)
rl$values <- cumsum(rl$lengths==2)
df$ind <- inverse.rle(rl)

#filter out other values from df
df <- df[df$v6==2,]

#split by indicator (and remove it)
dflist <- split(df[,-ncol(df)],df$ind)

dflist #elements of list are named after number of 2-group
$`1`
   v1 v2 v3 v4 v5 v6
2. 24 76 98 89 87  2
3. 24 76 98 89 87  2

$`2`
   v1 v2 v3 v4 v5 v6
5. 24 76 98 89 87  2
6. 24 76 98 89 87  2

answered May 31, 2017 at 13:14

Andrew Gustar

18.6k1 gold badge26 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user7963676 Over a year ago

Thanks, i will give it a try and come back to you

Sotos · Accepted Answer · 2017-05-31 13:21:11Z

One way is to create groups (grp) based on when v6 changes. Filter out all rows where v6 != 2 and split on grp

new_d <- subset(transform(df, grp = cumsum(c(1, diff(v6) != 0))), v6 == 2)
split(new_d, new_d$grp)

#$`1`
#  v1 v2 v3 v4 v5 v6 grp
#1 24 76 98 89 87  2   1
#2 24 76 98 89 87  2   1

#$`3`
#  v1 v2 v3 v4 v5 v6 grp
#4 24 76 98 89 87  2   3
#5 24 76 98 89 87  2   3

Or via dplyr,

library(dplyr)

new_d <- df %>% 
   mutate(grp = cumsum(c(1, diff(v6) != 0))) %>% 
   filter(v6 == 2) 

split(new_d, new_d$grp)

DATA USED

structure(list(v1 = c(24L, 24L, 24L, 24L, 24L), v2 = c(76L, 76L, 
76L, 76L, 76L), v3 = c(98L, 98L, 98L, 98L, 98L), v4 = c(89L, 
89L, 89L, 89L, 89L), v5 = c(87L, 87L, 87L, 87L, 87L), v6 = c(2L, 
2L, 1L, 2L, 2L)), .Names = c("v1", "v2", "v3", "v4", "v5", "v6"
), class = "data.frame", row.names = c(NA, -5L))

Collectives™ on Stack Overflow

Subsetting dataframe using conditions and saving each subset as a new dataframe

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related