This question has now become target of a duplicated question and I felt that the existing answers could be improved to help novice data.table users.
1. What is the difference between DT[.()] and DT[CJ()]?
According to ?data.table, .() is an alias for list() and a list supplied as parameter i is converted into a data.table internally. So, DT[.(1, c(3, 4), c(2, 4))] is equivalent to DT[data.table(1, c(3, 4), c(2, 4))] with
data.table(1, c(3, 4), c(2, 4))
# V1 V2 V3
#1: 1 3 2
#2: 1 4 4
The data.table consists of two rows which is the length of the longest vector. 1 is recycled.
This is different to cross join which creates all combinations of the supplied vectors.
CJ(1, c(3, 4), c(2, 4))
V1 V2 V3
#1: 1 3 2
#2: 1 3 4
#3: 1 4 2
#4: 1 4 4
Note that setDT(expand.grid()) would produce the same result.
This explains why the OP gets two different results:
DT[.(1, c(3, 4), c(2, 4))]
# mpg cyl disp hp drat wt qsec vs am gear carb
#1: NA NA NA NA NA NA NA NA 1 3 2
#2: 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#3: 21 6 160 110 3.9 2.875 17.02 0 1 4 4
DT[CJ(1, c(3, 4), c(2, 4))]
# mpg cyl disp hp drat wt qsec vs am gear carb
#1: NA NA NA NA NA NA NA NA 1 3 2
#2: NA NA NA NA NA NA NA NA 1 3 4
#3: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#4: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#5: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#6: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Note that the parameter nomatch = 0 will remove the non-matching rows, i.e., the rows containing NA.
2. Using %in%
Beside CJ() and am == 1 & (gear == 3 | gear == 4) & (carb == 2 | carb == 4), there is a third equivalent option using value matching:
DT[am == 1 & gear %in% c(3, 4) & carb %in% c(2, 4)]
# mpg cyl disp hp drat wt qsec vs am gear carb
#1: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#2: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#3: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#4: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Note that CJ() requires the data.tableto be keyed while the two other variants also will work with unkeyed data.tables.
3. Benchmarking
Data
In order to test execution speed of the 3 options we need a much larger data.table than just the 32 rows of mtcars. This is achieved by repeatedly doubling mtcars until 1 million rows (89 MB) are reached. Then this data.table is copied to get a keyed version of the same input data.
library(data.table)
# create unkeyed data.table
DT_unkey <- data.table(mtcars)
for (i in 1:15) {
DT_unkey <- rbindlist(list(DT_unkey, DT_unkey))
print(nrow(DT_unkey))
}
#create keyed data.table
DT_keyed <- copy(DT_unkey)
setkeyv(DT_keyed, c("am", "gear", "carb"))
# show data.tables
tables()
# NAME NROW NCOL MB COLS KEY
#[1,] DT_keyed 1,048,576 11 89 mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb am,gear,carb
#[2,] DT_unkey 1,048,576 11 89 mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
#Total: 178MB
Run
To get a fair comparison, the setkey() operations are included in the timing. Also, the data.tables are explicitely copied to exclude effects from data.table's update by reference.
With
result <- microbenchmark::microbenchmark(
setkey = {
DT_keyed <- copy(DT)
setkeyv(DT_keyed, c("am", "gear", "carb"))},
cj_keyed = {
DT_keyed <- copy(DT)
setkeyv(DT_keyed, c("am", "gear", "carb"))
DT_keyed[CJ(1, c(3, 4), c(2, 4)), nomatch = 0]},
or_keyed = {
DT_keyed <- copy(DT)
setkeyv(DT_keyed, c("am", "gear", "carb"))
DT_keyed[am == 1 & (gear == 3 | gear == 4) & (carb == 2 | carb == 4)]},
or_unkey = {
copy = DT_unkey <- copy(DT)
DT_unkey[am == 1 & (gear == 3 | gear == 4) & (carb == 2 | carb == 4)]},
in_keyed = {
DT_keyed <- copy(DT)
setkeyv(DT_keyed, c("am", "gear", "carb"))
DT_keyed[am %in% c(1) & gear %in% c(3, 4) & carb %in% c(2, 4)]},
in_unkey = {
copy = DT_unkey <- copy(DT)
DT_unkey[am %in% c(1) & gear %in% c(3, 4) & carb %in% c(2, 4)]},
times = 10L)
we get
print(result)
#Unit: milliseconds
# expr min lq mean median uq max neval
# setkey 198.23972 198.80760 209.0392 203.47035 213.7455 245.8931 10
# cj_keyed 210.03574 212.46850 227.6808 216.00190 254.0678 259.5231 10
# or_keyed 244.47532 251.45227 296.7229 287.66158 291.3811 404.8678 10
# or_unkey 69.78046 75.61220 103.6113 89.32464 111.5240 231.6814 10
# in_keyed 269.82501 270.81692 302.3453 274.42716 321.2935 431.9619 10
# in_unkey 93.75537 95.86832 119.4371 100.19446 126.6605 251.4172 10
ggplot2::autoplot(result)

Apparently, setkey() is a rather costly operations. So, for a one time task
the vector scan operations might be faster than using binary search on a keyed table.
The benchmark was run with R version 3.3.2 (x86_64, mingw32), data.table 1.10.4, microbenchmark 1.4-2.1.
CJ, likeDT[CJ(1,3:4,c(4,2))]. Your approach does not work because it is searching for combo 1,3,4 & combo 1,4,2 only.