I'm trying to merge 2 data frames in R, but I have two different columns with different types of ID variable. Sometimes a row will have a value for one of those columns but not the other. I want to consider them both, so that if one frame is missing a value for one of the columns then the other will be used.
> df1 <- data.frame(first = c('a', 'b', NA), second = c(NA, 'q', 'r'))
> df1
first second
1 a <NA>
2 b q
3 <NA> r
> df2 <- data.frame(first = c('a', NA, 'c'), second = c('p', 'q', NA))
> df2
first second
1 a p
2 <NA> q
3 c <NA>
I want to merge these two data frames and get 2 rows:
- row 1, because it has the same value for "first"
- row 2, because it has the same value for "second"
- row 3 would be dropped, because df1 has a value for "second", but not "first", and df2 has the reverse
It's important that NAs are ignored and don't "match" in this case.
I can get kinda close:
> merge(df1,df2, by='first', incomparables = c(NA))
first second.x second.y
1 a <NA> p
> merge(df1,df2, by='second', incomparables = c(NA))
second first.x first.y
1 q b <NA>
But I can't rbind these two data frames together because they have different column names, and it doesn't seem like the "R" way to do it (in the near future, I'll have a 3rd, 4th and even 5th type of ID).
Is there a less clumsy way to do this?
Edit: Ideally, the output would look like this:
> df3 <- data.frame(first = c('a', 'b'), second = c('p','q'))
> df3
first second
1 a p
2 b q
- row 1, has matched because the column "first" has the same value in both data frames, and it fills in the value for "second" from df2
- row 2, has matched because the column "second" has the same value in both data frames, and it fills in the value for "first" from df1
- there is no row 3, because there is no column that has a value in both data frames