Inconsistent data.table assignment by reference behaviour

Question

When assigning by reference with a data.table using a column from a second data.table, the results are inconsistent. When there are no matches by the key columns of both data.tables, it appears the assigment expression y := y is totally ignored - not even NAs are returned.

library(data.table)
dt1 <- data.table(id = 1:2, x = 3:4, key = "id")
dt2 <- data.table(id = 3:4, y = 5:6, key = "id")
print(dt1[dt2, y := y])
##    id x     # Would have also expected column:   y
## 1:  1 3     #                                   NA
## 2:  2 4     #                                   NA

However, when there is a partial match, non-matching columns have a placeholder NA.

dt2[, id := 2:3]
print(dt1[dt2, y := y])
##    id x  y
## 1:  1 3 NA    # <-- placeholder NA here
## 2:  2 4  5

This wreaks havoc on later code that assumes a y column exists in all cases. Otherwise I keep having to write cumbersome additional checks to take into account both cases.

Is there an elegant way around this inconsistency?

You could create the y column first... dt1[, y:=NA_integer_];dt1[dt2, y:=y][] — GSee
– GSee, Commented Aug 5, 2014 at 21:27
I think the most elegant way is to submit a feature request on github ;) — eddi
– eddi, Commented Aug 5, 2014 at 21:48
@eddi - thanks. See github.com/Rdatatable/data.table/issues/759 — mchen
– mchen, Commented Aug 6, 2014 at 15:27
OTOH when y isn't in the joined table, I notice I made a mistake in my code more quickly. But yes what you suggested is probably better as it's more robust — MichaelChirico
– MichaelChirico, Commented Jul 11, 2015 at 15:36

Arun · Accepted Answer · 2016-03-08 21:04:07Z

2

With this recent commit, this issue, #759, is now fixed in v1.9.7. It works as expected when nomatch=NA (the current default).

require(data.table)
dt1 <- data.table(id = 1:2, x = 3:4, key = "id")
dt2 <- data.table(id = 3:4, y = 5:6, key = "id")
dt1[dt2, y := y][]
#    id x  y
# 1:  1 3 NA
# 2:  2 4 NA

answered Mar 8, 2016 at 21:04

Arun

119k28 gold badges290 silver badges396 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tchotchke · Accepted Answer · 2015-08-16 17:04:24Z

1

Using merge works:

> dt3 <- merge(dt1, dt2, by='id', all.x=TRUE)
> dt3
   id x  y
1:  1 3 NA
2:  2 4 NA

answered Aug 16, 2015 at 17:04

Tchotchke

3,1213 gold badges27 silver badges38 bronze badges

Collectives™ on Stack Overflow

Inconsistent data.table assignment by reference behaviour

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related