A collegue trying to do analysis came up with a code from chatgpt, doing something wrong, but that I don't understand.
Here is the example:
Let's consider a first table ( drugs: Patient have an id, and start a drug at x):
library(data.table)
df1 <- data.table(id = rep(LETTERS[1:5],each = 3))
set.seed(125)
df1[,x := sample(1:10,.N,replace = T)]
id x
<char> <int>
1: A 10
2: A 8
3: A 8
4: B 3
5: B 9
Let's consider a second (and main) table (hospital visits, same patients, several hospital stays between two dates y1 and y2) :
df2 <- data.table(id = rep(LETTERS[1:5],each = 2),y1 = c(2,4),y2 = c(6,8))
# unique identifier
df2[,eds_id := 1:.N]
id y1 y2 eds_id
<char> <num> <num> <int>
1: A 2 6 1
2: A 4 8 2
3: B 2 6 3
4: B 4 8 4
Now I want, for each hospital stay, know if any drug was prescribed to the patient during the stay, aka x between y1 and y2, for any drug.
I would do non-equi merge:
df2[df1,xinbetween_true := TRUE,on = .(id,y1 <= x, y2 >= x)]
df2[is.na(xinbetween_true),xinbetween_true := FALSE]
Which work.
ChatGPT came up with:
df2[df1,on = "id",xinbetween := x >= y1 & x <= y2]
Which produce wrong answers:
df2[xinbetween_true != xinbetween]
id y1 y2 eds_id xinbetween xinbetween_true
<char> <num> <num> <int> <lgcl> <lgcl>
1: B 2 6 3 FALSE TRUE
2: C 4 8 6 FALSE TRUE
For these two entries, the ChatGPT script says no, when it actually has some of the df1 entries respecting the condition:
df2[df1,on = "id",allow.cartesian = T][xinbetween_true != xinbetween]
id y1 y2 eds_id xinbetween xinbetween_true x
<char> <num> <num> <int> <lgcl> <lgcl> <int>
1: B 2 6 3 FALSE TRUE 3
2: B 2 6 3 FALSE TRUE 9
3: B 2 6 3 FALSE TRUE 9
4: C 4 8 6 FALSE TRUE 3
5: C 4 8 6 FALSE TRUE 4
6: C 4 8 6 FALSE TRUE 3
So is here my question:
What does the df2[df1,on = "id",xinbetween := x >= y1 & x <= y2] script do? It does not do a proper non-equi merge, but I don't get what it does.
And in what case can it be used?
What does the df2[df1,on = "id",xinbetween := x >= y1 & x <= y2] script do?df2[df1,on = "id", xinbetween := {print(data.table(id, x, y1, y2, x >= y1 & x <= y2)); x >= y1 & x <= y2}]