I guess that other people have already looked for it but couldn't find what I'm looking for.
I want to replace NA values with the value of the row above, only when all other values are the same. Bonus point for data.table solution.
Right now, I've managed to do it only with a (very inefficient) loop.
In addition, my current code does not replace NA in case that there are two NA's in the same row.
I have a strong feeling that I'm overthinking this problem. Any ideas of making this stuff easier?
ex <- data.table(
id = c(1, 1, 2, 2),
attr1 = c(NA, NA, 3, 3),
attr2 = c(2, 2, NA, 3),
attr3 = c(NA, 2, 2, 1),
attr4 = c(1, 1, 1, 3)
)
desired_ex <- data.table(
id = c(1, 1, 2, 2),
attr1 = c(NA, NA, 3, 3),
attr2 = c(2, 2, NA, 3),
attr3 = c(2, 2, 2, 1),
attr4 = c(1, 1, 1, 3)
)
col_names <- paste0("attr", 1:4)
r<-1
for (r in 1:nrow(ex)) {
print(r)
to_check <- col_names[colSums(is.na(ex[r, .SD, .SDcols = col_names])) >0]
if (length(to_check) == 0) {
print("no NA- next")
next
}
for (col_check in to_check) {
.ex <- copy(ex)[seq(from = r, to = r + 1), ]
.ex[[col_check]] <- NULL
if (nrow(unique(.ex)) == 1) {
ex[[col_check]][r] <- ex[[col_check]][r + 1]
}
}
}
all.equal(ex, desired_ex)
desired_exattr2has anNAbut it is replaced in theattr3?1:2, apart from theNAinattr3, they are the same. Thus I would like to replace theNAwith the value in the other line. However, this is not the case for rows3:4, I see them as different rows, as apart from theNAinattr2, they differ inattr3andattr4. Does it make more sense now?