I have 3 tables in the Excel workbook that I access with SQL.
There is Inscriptions table that holds the AGENT_ID and MLS_ID, PHOTOS table that holds all the photos that came in recent feed for MLS_ID, and PHOTOS_CURRENT that holds all the photos that are currently in the system for MLS_ID.
The goal is to find if there are photos in the new feed that are not in the system currently.
I tried to query using NOT EXISTS and NOT IN approach. Both take too long to run (sometimes 2 minutes per AGENT_ID).
NOT EXISTS approach:
sqlQuery = "SELECT DISTINCT INSCR.MLS_ID FROM [INSCRIPTIONS_CURRENT$] INSCR, [PHOTOS$] P1 " & _
"WHERE INSCR.AGENT_ID = " & inpAgentId & _
" AND INSCR.MLS_ID = P1.MLS_ID AND NOT exists (select 1 from [PHOTOS_CURRENT$] PC1 where PC1.MLS_ID = P1.MLS_ID and PC1.PHOTO_ID = P1.PHOTO_ID)"
NOT IN approach:
sqlQuery = "SELECT DISTINCT INSCR.MLS_ID FROM [INSCRIPTIONS_CURRENT$] INSCR, [PHOTOS$] P1 " & _
"WHERE INSCR.AGENT_ID = " & inpAgentId & _
" AND INSCR.MLS_ID = P1.MLS_ID AND INSCR.MLS_ID NOT IN (select MLS_ID from [PHOTOS_CURRENT$] PC1 where PC1.MLS_ID = P1.MLS_ID and PC1.PHOTO_ID = P1.PHOTO_ID)"
DB connection is done as follows:
Sub Connect()
Set objConnection = CreateObject("ADODB.Connection")
objConnection.CommandTimeout = 120
End Sub
The query is sent to the procedure for processing as follows:
Function select_query(sqlQuery As String) As ADODB.Recordset
Dim objRecordset As ADODB.Recordset
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H1
Set objRecordset = CreateObject("ADODB.Recordset")
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & ThisWorkbook.FullName & _
";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
objRecordset.Open sqlQuery, objConnection, adOpenStatic, adLockOptimistic,
adCmdText
Set select_query = objRecordset
End Function
Any suggestions to improve the performance?
ms access, that can index on fields for faster table scans!NOT EXISTSquery and join on it instead of repeating that query for every agent.