1

I have many columns in a table where there is missing data. I want to be able to pull in the information from another table if the data is missing for a particular record based on ID. I thought about possibly joining the two tables and writing a for loop where if column X is NA then pull in information from column Y, however, I have many columns and would require writing many of these conditions.

I want to create a function or a loop where I can pass in the data column names with the missing data and be able to pass in the column name from another table to get the information from.

Reproducible Example:

ID <- c(1,2,3,4,5,6)
Year <- c(1990,1987,NA,NA,1968,1992)
Month <- c(1,NA,8,12,NA,5)
Day <- c(3,NA,NA,NA,NA,30)

New_Data = data.frame(ID=ID,Year=Year,Month=Month,Day=Day)

ID <- c(2,3,4,5)
Year <- c(NA,1994,1967,NA)
Month <- c(4,NA,NA,10)
Day <- c(23,12,16,9)

Old_Data = data.frame(ID=ID,Year=Year,Month=Month,Day=Day)

Expected Output:

ID <- c(1,2,3,4,5,6)
Year <- c(1990,1987,1994,1967,1968,1992)
Month <- c(1,4,8,12,10,5)
Day <- c(3,23,12,16,9,30)
New_Data = data.frame(ID=ID,Year=Year,Month=Month,Day=Day)

3 Answers 3

2

Using rbind combine two dataframe , then we using group_by with summarise_all

library(dplyr)
rbind(New_Data,Old_Data)%>%group_by(ID)%>%dplyr::summarise_all(function(x) x[!is.na(x)][1])
# A tibble: 6 x 4
     ID  Year Month   Day
  <dbl> <dbl> <dbl> <dbl>
1     1  1990     1     3
2     2  1987     4    23
3     3  1994     8    12
4     4  1967    12    16
5     5  1968    10     9
6     6  1992     5    30
Sign up to request clarification or add additional context in comments.

1 Comment

This works but I need a function that matches on ID and only pulls in records from Old_Data table that has matches to the New_Data based on ID, I do not want any records from Old_Data that does not exist in New_Data table.
1

An option using dplyr::left_join and dplyr::coalesce can be as:

library(dplyr)

New_Data %>% left_join(Old_Data, by="ID") %>%
  mutate(Year = coalesce(Year.x, Year.y),
         Month = coalesce(Month.x, Month.y),
         Day = coalesce(Day.x, Day.y)) %>%
  select(ID, Year, Month, Day)

#   ID Year Month Day
# 1  1 1990     1   3
# 2  2 1987     4  23
# 3  3 1994     8  12
# 4  4 1967    12  16
# 5  5 1968    10   9
# 6  6 1992     5  30

Comments

1

Here's a solution using only base functions from another SO question

I modified it to your needs (created a function, and made an argument for the key column name):

fill_missing_data = function(df1, df2, keyColumn) {
  commonNames <- names(df1)[which(colnames(df1) %in% colnames(df2))]
  commonNames <- commonNames[commonNames != keyColumn]
  dfmerge<- merge(df1,df2,by="ID",all=T)
  for(i in commonNames){
    left <- paste(i, ".x", sep="")
    right <- paste(i, ".y", sep="")
    dfmerge[is.na(dfmerge[left]),left] <- dfmerge[is.na(dfmerge[left]),right]
    dfmerge[right]<- NULL
    colnames(dfmerge)[colnames(dfmerge) == left] <- i
  }
  return(dfmerge)
}

result = fill_missing_data(New_Data, Old_Data, "ID")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.