0
a<-data.frame(cbind("Sample"=c("100","101","102","103"),"Status"=c("Y","","","partial")))
b<-data.frame(cbind("Sample"=c("100","101","102","103","106"),"Status"=c("NA","Y","","","Y")))

desired<-data.frame(cbind("Sample"=c("100","101","102","103","106"),"Status"=c("Y","Y","","partial","Y")))

I have sample processing data in multiple sources and I'd like to combine them into a master list. How can I merge the "Status" column between 2 data frames such that a overrules b in order to collate "Y" and "partial" for each sample? Thank you in advance.

3
  • 1
    Both variables of a and of b are factors. Working with factors like this is a pain in the neck. You should consider converting these to character and numeric, which are easier to work with. Commented Jun 7, 2017 at 15:50
  • Just use data.frame without the cbind, or you're making a matrix before converting it to a data.frame, which will sooner or later screw up types. Also, using NA instead of "" will make your life easier. Commented Jun 7, 2017 at 15:52
  • Alistaire, you're right, my example is a bit sloppy with the cbind. The example is an over-simplification as there are ~10 non ""/NA strings that can exist (not just partial/Y). This makes Mudskipper's solution a bit trickier. I'm not familiar with Simone's ":=" syntax, and it doesn't appear to run. Commented Jun 7, 2017 at 19:56

2 Answers 2

1
require(data.table)    

a<-data.table(cbind("Sample"=c("100","101","102","103"),"Status"=c("Y","","","partial")))
b<-data.table("Sample"=c("100","101","102","103","106"),"Status"=c("NA","Y","","","Y"))

c <- merge(a, b, by = "Sample", all=TRUE)
c[,Status := ifelse(!is.na(Status.x), Status.x, Status.y)]
c[,`:=` (Status.x=NULL, Status.y = NULL)]
Sign up to request clarification or add additional context in comments.

2 Comments

Hi Simone, I like that this approach is more generalized, but := doesn't seem to work. Where is the syntax error?
@sm002 I updated the answer. You need to load data.table
1

I assume you want to keep the values from a and b with an order of priority, Y covers partial that covers NA that covers nothing.

d <- merge(a,b,by="Sample",all=TRUE)
d$Status <- ""
d$Status[apply(c,1,function(x){any(is.na(x))})] <- "" # cleaning the NAs I introduced with the merge
d$Status[apply(c,1,`%in%`, x = "NA")] <- NA # or "NA" if you want to keep it this way, or "" if you want to get rid of them
d$Status[apply(c,1,`%in%`, x = "partial")] <- "partial"
d$Status[apply(c,1,`%in%`, x = "Y")] <- "Y"
d <- d[,c(1,4)]

# Sample  Status
# 1    100       Y
# 2    101       Y
# 3    102        
# 4    103 partial
# 5    106       Y

1 Comment

my merge is adding some NAs though (real NAs, not "NA"), so if you have real NAs in your data set and want to keep those for some reason, you'll have to replace them by something else in a and b (like "NA", or Inf or whatever)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.