2

I tried using the merge function here, but I am stumped. I apologize, because this seems basic, but the by.x and by.y functions are quite confusing to me. I would like to extract the shared columns between dataframe A and dataframe B, and then merge the two dataframes together. The dataframes do not share any Taxa (the first column) but they will share a portion of columns X1 - X10000, etc. Each of these dataframes has ~8,000 columns and a few hundred rows. In this example, variables X2 and X5 are shared, but the other variables X1 and X3 are not shared. Based on intersecting column name vectors, I know that the dataframes share ~3000 columns.

Dataframe A:

 Taxa   X1      X2      X5
 118    T       N       A
 113    N       N       A
 60     C       Y       G
 121    N       N       N

Dataframe B:

 Taxa  X2      X3      X5
 200   C       G       N
 119   T       N       G
 30    C       G       G
 21    C       N       N

Desired merged dataframe:

 Taxa    X2      X5
 118     N       A
 113     N       A
 60      Y       G
 121     N       N
 200     C       N
 119     T       G
 30      C       G
 21      C       N

When I try using the merge function, in a variety of ways, I get this (with my actual column numbers here):

      Taxa      X408050  X995019   
NA    <NA>     <NA>     <NA>       
NA.1  <NA>     <NA>     <NA>     
NA.2  <NA>     <NA>     <NA>       
NA.3  <NA>     <NA>     <NA>      
NA.4  <NA>     <NA>     <NA>     
NA.5  <NA>     <NA>     <NA>      
NA.6  <NA>     <NA>     <NA>      
0

1 Answer 1

6

Taking PierreLafortune's advice, I will leave my suggestion as an answer.Since you said you have 8000 columns in both data frames, you want to find which column names are common between the two. In order to find common columns, you can use intersect(). Once you have the necessary column names, you subset your data frames. Then, you can combine the two data frames.

ind <- intersect(names(mydf), names(mydf2))

rbind(mydf[, ind], mydf2[, ind])

#  Taxa X2 X5
#1  118  N  A
#2  113  N  A
#3   60  Y  G
#4  121  N  N
#5  200  C  N
#6  119  T  G
#7   30  C  G
#8   21  C  N

DATA

mydf <- structure(list(Taxa = c(118L, 113L, 60L, 121L), X1 = c("T", "N", 
"C", "N"), X2 = c("N", "N", "Y", "N"), X5 = c("A", "A", "G", 
"N")), .Names = c("Taxa", "X1", "X2", "X5"), class = "data.frame", row.names = c(NA, 
-4L))

mydf2 <- structure(list(Taxa = c(200L, 119L, 30L, 21L), X2 = c("C", "T", 
"C", "C"), X3 = c("G", "N", "G", "N"), X5 = c("N", "G", "G", 
"N")), .Names = c("Taxa", "X2", "X3", "X5"), class = "data.frame", row.names = c(NA, 
-4L))
Sign up to request clarification or add additional context in comments.

1 Comment

@user3545679 Please avoid "thanks" comment. Accept answer instead.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.