1

Having a data.table like the following:

a <- data.table(col1 = c(1, 2, 3, NA, NA),
                col2 = c(NA, NA, NA, 4, 5),
                col3 = c("a", "b", "c", NA, NA),
                col4 = c(NA, NA, NA, "d", "e"))

I would like to find a way to unify col1 with col2, and col3 with col4 by skipping the NAs and keeping only the values, with an output like the following:

    col1   col2
   <num> <char>
1:     1      a
2:     2      b
3:     3      c
4:     4      d
5:     5      e

Is there any way to achieve that? I was thinking to use the sum, but of course it doesn't work with character columns then.

0

2 Answers 2

2

Using coalesce over 2 columns:

a[, .(col1 = fcoalesce(col1, col2),
      col2 = fcoalesce(col3, col4)) ]

#     col1   col2
#    <num> <char>
# 1:     1      a
# 2:     2      b
# 3:     3      c
# 4:     4      d
# 5:     5      e

A bit more automated way to use coalesce on every n number of columns:

cc <- split(colnames(a), seq(ncol(a)) %/% 3)

for(i in seq_along(cc)){
  a[, (paste0("newCol", i)) := fcoalesce( .SD ), .SDcols = cc[[ i ]] ]
  }

#     col1  col2   col3   col4 newCol1 newCol2
#    <num> <num> <char> <char>   <num>  <char>
# 1:     1    NA      a   <NA>       1       a
# 2:     2    NA      b   <NA>       2       b
# 3:     3    NA      c   <NA>       3       c
# 4:    NA     4   <NA>      d       4       d
# 5:    NA     5   <NA>      e       5       e

Sign up to request clarification or add additional context in comments.

Comments

1

Using fifelse from data.table (as a more efficient alternative to ifelse, as mentioned in the comments):

result <- a[, .(
  col1 = fifelse(is.na(col1), col2, col1),
  col2 = fifelse(is.na(col3), col4, col3)
)]

print(result)
   col1 col2
1:    1    a
2:    2    b
3:    3    c
4:    4    d
5:    5    e

or with mapply with a function to select the first non-NA value:

choose_first <- function(x, y) {
  if (!is.na(x)) x else y
}

result <- a[, .(
  col1 = mapply(choose_first, col1, col2),
  col2 = mapply(choose_first, col3, col4)
)]

print(result)
   col1 col2
1:    1    a
2:    2    b
3:    3    c
4:    4    d
5:    5    e

4 Comments

data.table::fcoalesce is optimized for this, and will be much faster. Further, base::ifelse is not class safe: the largest risk for this question is that it allows col2 and col1 to be different classes (which is corruptive), additionally (but not in this question) it drops the class of (e.g.) POSIXt and Date objects, try ifelse(TRUE,Sys.Date(),Sys.Date()). stackoverflow.com/q/6668963/3358272 is a good ref for that discussion.
and there is fifelse in {data.table}. Your multivariate apply can be optimised to an univariate one (iterating over index of objects with same length). However, such loop-hiding techniques are not needed here due to full vectorised alternatives.
I didn't know about data.table::fifelse until now; the answer has been updated.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.