Merge dataframes in R, using shared columns and differing rows

Question

I tried using the merge function here, but I am stumped. I apologize, because this seems basic, but the by.x and by.y functions are quite confusing to me. I would like to extract the shared columns between dataframe A and dataframe B, and then merge the two dataframes together. The dataframes do not share any Taxa (the first column) but they will share a portion of columns X1 - X10000, etc. Each of these dataframes has ~8,000 columns and a few hundred rows. In this example, variables X2 and X5 are shared, but the other variables X1 and X3 are not shared. Based on intersecting column name vectors, I know that the dataframes share ~3000 columns.

Dataframe A:

 Taxa   X1      X2      X5
 118    T       N       A
 113    N       N       A
 60     C       Y       G
 121    N       N       N

Dataframe B:

 Taxa  X2      X3      X5
 200   C       G       N
 119   T       N       G
 30    C       G       G
 21    C       N       N

Desired merged dataframe:

 Taxa    X2      X5
 118     N       A
 113     N       A
 60      Y       G
 121     N       N
 200     C       N
 119     T       G
 30      C       G
 21      C       N

When I try using the merge function, in a variety of ways, I get this (with my actual column numbers here):

      Taxa      X408050  X995019   
NA    <NA>     <NA>     <NA>       
NA.1  <NA>     <NA>     <NA>     
NA.2  <NA>     <NA>     <NA>       
NA.3  <NA>     <NA>     <NA>      
NA.4  <NA>     <NA>     <NA>     
NA.5  <NA>     <NA>     <NA>      
NA.6  <NA>     <NA>     <NA>

jazzurro · Accepted Answer · 2016-01-15 02:17:48Z

6

Taking PierreLafortune's advice, I will leave my suggestion as an answer.Since you said you have 8000 columns in both data frames, you want to find which column names are common between the two. In order to find common columns, you can use intersect(). Once you have the necessary column names, you subset your data frames. Then, you can combine the two data frames.

ind <- intersect(names(mydf), names(mydf2))

rbind(mydf[, ind], mydf2[, ind])

#  Taxa X2 X5
#1  118  N  A
#2  113  N  A
#3   60  Y  G
#4  121  N  N
#5  200  C  N
#6  119  T  G
#7   30  C  G
#8   21  C  N

DATA

mydf <- structure(list(Taxa = c(118L, 113L, 60L, 121L), X1 = c("T", "N", 
"C", "N"), X2 = c("N", "N", "Y", "N"), X5 = c("A", "A", "G", 
"N")), .Names = c("Taxa", "X1", "X2", "X5"), class = "data.frame", row.names = c(NA, 
-4L))

mydf2 <- structure(list(Taxa = c(200L, 119L, 30L, 21L), X2 = c("C", "T", 
"C", "C"), X3 = c("G", "N", "G", "N"), X5 = c("N", "G", "G", 
"N")), .Names = c("Taxa", "X2", "X3", "X5"), class = "data.frame", row.names = c(NA, 
-4L))

answered Jan 15, 2016 at 2:17

Collectives™ on Stack Overflow

Merge dataframes in R, using shared columns and differing rows

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related