I experience some unexpected behavior when using grouped modification of a column in a data.table:
# creating a data.frame
data <- data.frame(sequence = rep(c("A","B","C","D"), c(2,3,3,2)), trim = 0, random_value = NA)
data[c(1:4, 10), "trim"] <- 1
# copying data to data_temp
data_temp <- data
# assigning some random value to data_temp so that it should no longer be a
# copy of "data"
data_temp[1, "random_value"] <- rnorm(1)
# converting data_temp to data.table
setDT(data_temp)
# expanding trim parameter to group and subsetting
data_temp <- data_temp[, trim := sum(trim), by = sequence][trim == 0]
data_temp comes out as expected with only the "C" sequence entries remaining. However, I would also expect the "data" object to remain unchanged. This is not the case. The "data" object looks as follows:
sequence trim random_value
1 A 2 NA
2 A 2 NA
3 B 2 NA
4 B 2 NA
5 B 2 NA
6 C 0 NA
7 C 0 NA
8 C 0 NA
9 D 1 NA
10 D 1 NA
So the assignment by reference of the "trim" variable also happened in the original data.frame.
I am using data.table_1.11.4 and R version 3.4.3 for compatibility reasons.
Is the error a result of using old versions or am I doing something wrong / do I need to change the code to avoid that error?
help("copy").data_temp[1, "random_value"] <- rnorm(1)does not copy the entire data.frame, but only the "random_value" vector. So, after this line, the sequence and trim variables of the separate data.frames still point to the same objects in memory. I verified this with.Internal(inspect(.)). I wonder how long this behavior has been the default in base R. Maybe since lists were allowed to hold pointers?->differs for data.frame and data.table objects. You can see this by repeating matt dowle's example with data.framea and inspecting the memory location of the vectors. This would more accurately mirror the above situation.