Since my terrible execution and interpretation of my previous question I'll start over and will try to formulate the question as short and general possible.
I have two dataframes (see the examples below). Each dataset contains the same number of columns.
tc <- textConnection('
ID Track1 Track2 Track3 Track4 Time Loc
4 15 "" "" 50 40 1
5 17 115 109 55 50 1
6 17 115 109 55 60 1
7 13 195 150 60 70 1
8 13 195 150 60 80 1
9 "" "" 181 70 90 2 #From this row, example data added
10 "" "" 182 70 92 2
11 429 31 "" 80 95 3
12 480 31 12 80 96 3
13 118 "" "" 90 100 4
14 120 16 213 90 101 4
')
MATCHINGS <- read.table(tc, header=TRUE)
tc <- textConnection('
ID Track1 Track2 Track3 Track4 Time Loc
"" 15 "" "" 50 40 1
"" 17 "" 109 55 50 1
"" 17 432 109 55 65 1
"" 17 115 109 55 59 1
"" 13 195 150 60 68 1
"" 13 195 150 60 62 1
"" 10 5 1 10 61 3
"" 13 195 150 60 72 1
"" 40 "" 181 70 82 2 #From this row, example data added
"" "" "" 182 70 85 2
"" 429 "" "" 80 90 3
"" "" 31 12 80 92 3
"" "" "" "" 90 95 4
"" 118 16 213 90 96 4
')
INVOLVED <- read.table(tc, header=TRUE)
The goal is to place the least recent ID from MATCHINGS into INVOLVED by matching on Track1 to Track4 and Loc. An extra condition is that the Time of the matching INVOLVED entry may not be higher than the Time of the entry in MATCHING. Furthermore a match on Track1 is most preferred, a match on Track4 is least preferred. However only Track4 is always available (all other Track-columns can be empty). Thus the expected results are:
ID Track1 Track2 Track3 Track4 Time Loc
4 15 "" "" 50 40 1
5 17 "" 109 55 50 1
"" 17 432 109 55 65 1
6 17 115 109 55 59 1
7 13 195 150 60 68 1
7 13 195 150 60 62 1
"" 10 5 1 10 61 3
8 13 195 150 60 72 1
9 40 "" 181 70 82 2 #From this row, example data added
10 "" "" 182 70 85 2
11 429 "" "" 80 90 3
12 "" 31 12 80 92 3
13 "" "" "" 90 95 4
13 118 16 213 90 96 4
I tried to this with the data.table package, but fail in doing this efficient. Is it possible to get rid of the vector scans and efficiently go through the data without looping?
dat <- data.table(MATCHINGS)
for(i in 1:nrow(INVOLVED)){
row <- INVOLVED[i,]
match <- dat[Time>=row$Time][Loc==row$Loc][Track4==row$Track4][Track4!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
match <- dat[Time>=row$Time][Loc==row$Loc][Track3==row$Track3][Track3!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
match <- dat[Time>=row$Time][Loc==row$Loc][Track2==row$Track2][Track2!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
match <- dat[Time>=row$Time][Loc==row$Loc][Track1==row$Track1][Track1!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
}
update
Updated the example data showing the need for Track 1 to 3. As shown Track1 is most important and Track4 least important. Even if Track1 to 3 match to MATCHINGS x and Track4 matches to MATCHINGS y, the ID of y should be assigned to that INVOLVED row. So: Track3 match overrides Track4 match, Track2 match overrides Track3 match, Track1 match overrides Track2 match.
byetc. At the moment your code is not using any of thedata.tableefficiency!