replace missing column values with values from 'matching' columns

Question

This is a follow-up question from replace missing values with a value from another column which was adequately solved. My problem is with regards to multiple matching columns.

Example dataset:

s <- data.frame(ID=c(191, 282, 202, 210), 
            Group.1=c(NA, "A", NA, "B"), 
            Back.1=c("DD", "AA", "DD", "BB"), 
            Group.2=c("D","A", NA, "B"),
            Back.2=c("DD", "BB", "CC", "AA"),
            stringsAsFactors=FALSE)

   ID Group.1 Back.1 Group.2 Back.2
1 191    <NA>     DD       D     DD
2 282       A     AA       A     BB
3 202    <NA>     DD    <NA>     CC
4 210       B     BB       B     AA

If I wanted to replace the NAs with matching 'Back' columns I would use:

s$Group.1 <- ifelse(test = !is.na(s$Group.1), yes = s$Group.1, no = s$Back.1)
s$Group.2 <- ifelse(test = !is.na(s$Group.2), yes = s$Group.2, no = s$Back.2)
s

   ID Group.1 Back.1 Group.2 Back.2
1 191      DD     DD       D     DD
2 282       A     AA       A     BB
3 202      DD     DD      CC     CC
4 210       B     BB       B     AA

As posted by Akrun, another approach would be:

library(data.table)
setDT(s)[is.na(Group.1), Group.1:= Back.1]
setDT(s)[is.na(Group.2), Group.2:= Back.2]

So if I have many matching columns I want to be able to map, loop or apply or whatever across them. Trying out a loop functions yields:

for (i in 1:2){
  s[paste0("Group.", i)] <- ifelse(test = !is.na(s[paste0("Group.", i)]), 
                                   yes = s[paste0("Group.", i)], 
                                   no = s[paste0("Back.", i)])
}

Warning messages:
1: In `[<-.data.frame`(`*tmp*`, paste0("Group.", i), value = list(c("DD",  :
  provided 4 variables to replace 1 variables
2: In `[<-.data.frame`(`*tmp*`, paste0("Group.", i), value = list(c("D",  :
  provided 4 variables to replace 1 variables
> s
   ID Group.1 Back.1 Group.2 Back.2
1 191      DD     DD       D     DD
2 282      AA     AA       A     BB
3 202      DD     DD    <NA>     CC
4 210      BB     BB       B     AA

Which appears to work for Group.1 and Back.1 but not Group.2, and the warning message is difficult to understand from my angle.

If someone can solve this with an appropriate loop would be most grateful. Even more helpful would be the ability to generalize to other named columns so that the numerically matching columns to Back.x can also have missing values imputed by Back.x. i.e.

s <- data.frame(ID=c(191, 282, 202, 210), 
            Group.1=c(NA, "A", NA, "B"), 
            Back.1=c("DD", "AA", "DD", "BB"), 
            Group.2=c("D","A", NA, "B"),
            Back.2=c("DD", "BB", "CC", "AA"),
            Donk.1 =c("PP", "ZZ", NA, "QQ"),
            stringsAsFactors=FALSE)

It will infinitely easier if you do this in long format long <- reshape(s, direction="long", idvar="ID", varying=-1) then just replace long$Group with long$Back where it is missing. — thelatemail
– thelatemail, Commented Feb 28, 2017 at 6:28
Sorry @waterling picked up back.1 is added twice. Should only be once. Will edit later — mkrasmus
– mkrasmus, Commented Feb 28, 2017 at 6:40
It is not clear in the last example which one should replace from which columns? You have Back.1, Back.2 and Back.1.1 while there is Group.1, Group.2, and Donk.1 — akrun
– akrun, Commented Feb 28, 2017 at 7:04

akrun · Accepted Answer · 2017-02-28 07:17:12Z

1

We can use

gr1 <- grep("Group", names(s), value = TRUE)
bc1 <- grep("Back", names(s), value = TRUE)
setDT(s)
for(j in seq_along(gr1)){
    s[is.na(get(gr1[j])), (gr1[j]) := get(bc1[j])]
}

s
#    ID Group.1 Back.1 Group.2 Back.2
#1: 191      DD     DD       D     DD
#2: 282       A     AA       A     BB
#3: 202      DD     DD      CC     CC
#4: 210       B     BB       B     AA

For the updated dataset

gr1 <- names(s)[seq(2, ncol(s), by = 2)]
bc1 <- names(s)[seq(3, ncol(s), by = 2)]

setDT(s)
for(j in seq_along(gr1)){
    s[is.na(get(gr1[j])), (gr1[j]) := get(bc1[j])][]
}
s
#    ID Group.1 Back.1 Group.2 Back.2 Donk.1 Back.1.1
#1: 191      DD     DD       D     DD     PP       DD
#2: 282       A     AA       A     BB     ZZ       AA
#3: 202      DD     DD      CC     CC     DD       DD
#4: 210       B     BB       B     AA     QQ       BB

data

s <- data.frame(ID=c(191, 282, 202, 210), 
        Group.1=c(NA, "A", NA, "B"), 
        Back.1=c("DD", "AA", "DD", "BB"), 
        Group.2=c("D","A", NA, "B"),
        Back.2=c("DD", "BB", "CC", "AA"),
        Donk.1 =c("PP", "ZZ", NA, "QQ"),
        Back.1=c("DD", "AA", "DD", "BB"), 
        stringsAsFactors=FALSE)

edited Feb 28, 2017 at 7:17

answered Feb 28, 2017 at 6:37

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jean Over a year ago

Replace gr1 with gr1<- grep("^(?!Back).*\\.[0-9]+$",names(s), perl=T, value=TRUE) and bc1 with bc1<-paste("Back",sub(".*\\.([0-9])+", "\\1", rInd),sep=".")

akrun Over a year ago

@waterling Sorry, I didn't understand why it was needed. I was trying based on the logic setDT(s)[is.na(Group.1), Group.1:= Back.1] provided in the OP's code

Jean Over a year ago

OP wants to be able to replace Donk.1 with Back.1, or some arbitrarytext.Number with Back.Number. From the sentence: "generalize to other named columns so that the numerically matching columns to Back.x can also have missing values imputed by Back.x"

akrun Over a year ago

@waterling Thanks I missed that part. I updated the post assuming that the columns are alternating

Collectives™ on Stack Overflow

replace missing column values with values from 'matching' columns

1 Answer 1

data

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

data

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related