2

I need help with a repetitive task that I use vlookup for huge dataset, a bunch of tables of >1 million rows from multiple columns, multiple sheets of another workbook.

ID TA ISRC Result
ID1 TA1 ISRC1
ID2 TA2 ISRC2

Here is my vlookup formula

=IFERROR(VLOOKUP([@ID],'[data.xlsx]Sheet1'!$A:$G,7,0),IFERROR(VLOOKUP([@ta],'[data.xlsx]Sheet2'!$E:$G,3,0),VLOOKUP([@ISRC],'[data.xlsx]Sheet3'!$F:$G,2,0)))

The vlookup took me about 30 minutes to complete. I read a few discussion about VBA, dictionary and array that will significantly improve the speed to vlookup formula. I am new to this. Could someone help me to convert the above formula to VBA code with dictionary and array.

12
  • Using Excel for large scale database work is really not recommended. Commented Oct 15 at 8:39
  • Use Index/Match to make the lookup dynamic in case a column is inserted/deleted in the source. Also lookup all values in one step, outside the table with =IFERROR(INDEX('[data.xlsx]Sheet1'!$G:$G,MATCH([ID],'[data.xlsx]Sheet1'!$A:$A,0)),IFERROR(INDEX('[data.xlsx]Sheet2'!$G:$G,MATCH([ta],'[data.xlsx]Sheet2'!$E:$E,0)),INDEX('[data.xlsx]Sheet3'!$G:$G,MATCH([ISRC],'[data.xlsx]Sheet3'!$F:$F,0)))). Then you can simply link your table column to the outside column Commented Oct 15 at 9:07
  • Just to note, a dictionary approach is not much faster than MATCH so not really worth involving VBA here. Of course, in the above suggested formula, if you link from outside the table then MATCH([ID],... is not valid, so just replace with the table name e.g. MATCH(myTableName[ID],.... Same for [ta] and ISRC. Commented Oct 15 at 9:09
  • Powerquery might be an alternative approach. Commented Oct 15 at 9:17
  • How many formulas were evaluated in that time range? Millions of rows is very vague, furthermore excel sheet has at least 1048576 rows. Commented Oct 15 at 11:18

2 Answers 2

1

Here's a fully-worked out demo (including test data creation)

Add a class module with this code and name it KeyMap

Option Explicit

Dim col As Collection

Sub AddData(data, keyCol As Long, valueCol As Long)
    Dim r As Long, k As String, v
    If col Is Nothing Then Set col = New Collection
    For r = 1 To UBound(data, 1)
        k = CStr(data(r, keyCol))
        If Len(k) > 0 Then
            v = data(r, valueCol)
            On Error Resume Next
            col.Add v, k
            On Error GoTo 0
        End If
    Next r
End Sub

Function GetValue(ByVal k)
    Dim result
    If Len(k) > 0 Then
        On Error Resume Next   '### ignore error on duplicate key
        result = col(CStr(k))
        On Error GoTo 0        'stop ignoring errors
    End If
    GetValue = IIf(IsEmpty(result), CVErr(xlErrNA), result)
End Function

Property Get Count() As Long
    Count = col.Count
End Property

Here's how you can use the class as a lookup after adding data from each of your 3 worksheets:

Sub LookupTester()
    
    Dim map As KeyMap, t, i As Long, result, fmt, wb As Workbook
    Dim lookups, rng As Range, ws As Worksheet, col As Long
    
    Set wb = ThisWorkbook
    Set ws = wb.Worksheets("Lookups")
    
    Set map = New KeyMap
    
    t = Timer
    'Note my dummy data has only two columns: keys in A and values in B
    map.AddData DataNoHeader(wb.Worksheets("Listing1").Range("A1")), 1, 2
    map.AddData DataNoHeader(wb.Worksheets("Listing2").Range("A1")), 1, 2
    map.AddData DataNoHeader(wb.Worksheets("Listing3").Range("A1")), 1, 2
    
    Debug.Print "Initialized map in " & Round(Timer - t, 1) & _
                " sec (" & map.Count & " items)"
    
    'read the lookup data from the sheet
    Set rng = ws.Range("A2:D" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row)
    lookups = rng.Value
    
    t = Timer
    'do some lookups
    Debug.Print "Performing " & UBound(lookups, 1) & " lookups"
    For i = 1 To UBound(lookups, 1)
        For col = 1 To 3
            result = map.GetValue(lookups(i, col))
            If Not IsError(result) Then Exit For
        Next col
        lookups(i, 4) = result
    Next i
    rng.Value = lookups
    
    Debug.Print "Lookups done in " & Round(Timer - t, 4) & " sec"
    
End Sub

'return data as an array, excluding the first row of headers
Function DataNoHeader(cTopLeft As Range)
    Dim rng As Range
    Set rng = cTopLeft.CurrentRegion 'assumes no empty rows/columns in the data
    Set rng = rng.Offset(1, 0).Resize(rng.Rows.Count - 1)
    DataNoHeader = rng.Value
End Function

Notes

  • It's much quicker to set up a keyed Collection than a Dictionary when you have a large number of keys, and the lookups are also much faster
  • Assumes all keys can be string-type, and unique for all data added to the lookup (duplicates just get ignored), and that lookups are case-insensitive.
  • Runtime in my testing as about 15 seconds to set up the mapping (3 sheets x 1M rows each), and about 0.5 seconds to lookups for 20k rows

Here's the code I used to create test data and the lookup table:

'################# All code below was for setting up test data"
Sub SetupTestData()
    Const NUM_DATA_ROWS As Long = 1000000#  'rows per sheetin the 3 listings
    Const NUM_LOOKUP_ROWS As Long = 20000   'rows to perform lookups on
    
    Dim wb As Workbook, ws As Worksheet, i As Long, col As Long, rw As Long, c As Long, v
    
    Set wb = ThisWorkbook
    CreateTestDataSheet wb, "Listing1", "ID", NUM_DATA_ROWS
    CreateTestDataSheet wb, "Listing2", "TA", NUM_DATA_ROWS
    CreateTestDataSheet wb, "Listing3", "ISRC", NUM_DATA_ROWS
    
    Set ws = wb.Worksheets.Add(before:=wb.Worksheets("Listing1"))
    ws.Name = "Lookups"
    ws.Range("A1").Resize(1, 4).Value = Array("Listing1", "Listing2", "Listing3", "Result")
    For i = 1 To NUM_LOOKUP_ROWS
        col = Application.RandBetween(1, 3)
        rw = Application.RandBetween(1, NUM_DATA_ROWS)
        For c = 1 To 3
            If c <> col Then
                ws.Cells(i + 1, c).Value = "xxxxxxxx" 'no match
            Else
                ws.Cells(i + 1, c).Value = wb.Worksheets("listing" & col).Cells(rw, 1).Value
            End If
        Next c
    Next i
    ws.Columns.AutoFit
End Sub

Sub CreateTestDataSheet(wb As Workbook, sheetName As String, idPrefix As String, _
                                                             numRows As Long)
    Dim data, ws As Worksheet
    
    Set ws = wb.Worksheets.Add(after:=wb.Worksheets(wb.Worksheets.Count))
    ws.Name = sheetName
    data = TestData(idPrefix, numRows)
    ws.Range("A1").Resize(1, 3).Value = Array("ID", "Value", "Sort") 'add headers
    ws.Range("A2").Resize(UBound(data, 1), UBound(data, 2)).Value = data
    With ws.Range("C2").Resize(UBound(data, 1))
        .Formula = "=RAND()"
        .Value = .Value
    End With
    With ws.Sort         'randomize the listing order
        .SortFields.Clear
        .SortFields.Add2 Key:=ws.Range("C2"), SortOn:=xlSortOnValues, Order:=xlAscending
        .SetRange ws.Range("A1").CurrentRegion
        .Header = xlYes
        .Apply
    End With
    ws.Columns(3).Delete 'delete "sort" column
    ws.Range("A:B").Columns.AutoFit
End Sub

'Returns an array of size `numRows` x 2 with dummy data (keys and values) 
Function TestData(label As String, numRows As Long)
    Dim arr(), r As Long, k
    ReDim arr(1 To numRows, 1 To 2)
    For r = 1 To numRows
        k = label & "_" & Format(r, "0000000")
        arr(r, 1) = k
        arr(r, 2) = k & "_value"
    Next r
    TestData = arr
End Function
Sign up to request clarification or add additional context in comments.

2 Comments

15 seconds is a lot for mapping 3 million keys. On my machine, it's just over 1 seconds using VBA-FastDictionary
Definitely could be faster, but 15sec is still a big jump on 30min 🤔
0

Working with Arrays in Excel VBA is a way of doing things very quickly. I don't actually think that Dictionaries are the right thing for you here. What you're going to need to do is open up Visual Basic Editor from the Developer tab. In the lefthand window, right click and insert a new module. Double click into the newly created Module1.

Now to the bulk of what you want to do, which breaks down into a few steps:

  • Read the table you want to lookup to (i.e. the one you've shared at the top here) into an array. Then you can loop through it quicker than reading from the sheet each time.

  • Read the 3 tables you want to lookup from into 3 separate arrays.

  • Lookup the values from the first array in the 3 arrays (rather than IFERROR, we do an equivalent "IFFOUND". I.e. you continue searching if you haven't found it, and go to the next one if you have), and write to another array.

  • Write the values from this array back to the original sheet.

The below code achieves this. Paste this into the blank Module1. Then when you go back into your workbook, you'll see it come back in your list of possible macros to run.

Option Explicit ' Look into this later - you'll want it at the top of all VBA projects.

Sub LookupIDTAISRC() ' This line starts the declaration of the subroutine we are going to use.
Application.ScreenUpdating = False ' Useful as it stops things flashing around on the screen and slowing things down while working.

' By convention, variables are declared at the start. Declaring more along the way won't break things, so long as you do so before they are used.
Dim LookupTo()         As Variant ' Adding the brackets tells VBA it will be an array.
Dim LookupFromID()     As Variant
Dim LookupFromTA()     As Variant
Dim LookupFromISRC()   As Variant
Dim Result()           As Variant
Dim DataWorkbook       As Workbook

Dim MySheetName        As String ' String means text.
Dim MyLastRow          As Long   ' Long is the type VBA needs to be a number

' XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
' XXXXX Do the initial setup portion. XXXXX
' XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

' Change Sheet1 to whatever the name of your to sheet is, and 30 to the actual last row.
MySheetName = "Sheet1"
MyLastRow = 30
' You can also get VBA to select the last row dynamically, but I'm keeping it simple for now.
' This reads the cell contents into an array, that VBA can deal with faster than repeatedly reading from the worksheet.
LookupTo = ThisWorkbook.Sheets(MySheetName).Range("A2:C" & CStr(MyLastRow)).Value ' CStr() function makes the row number text so it can be read.

' It looks like your other sheets are in a different workbook, so we are going to open that in the background, and close it when we are done.
' Replace "C:\Files\data.xlsx" with the full path of where your file is stored.
' This makes the 3 arrays to lookup from.
Set DataWorkbook = Workbooks.Open("C:\Files\data.xlsx")
With DataWorkbook
    LookupFromID() = .Sheets("Sheet1").Range("A:G").Value
    LookupFromTA() = .Sheets("Sheet2").Range("E:G").Value
    LookupFromISRC() = .Sheets("Sheet3").Range("F:G").Value
End With
' Use false to stop auto saving of the data workbook.
' Delete the below line if you want to have the data workbook open anyway.
DataWorkbook.Close False

' XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
' XXXXX Do the actual lookup portion. XXXXX
' XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
' Declare some variables that we are going to use for looping.
Dim i As Long, j As Long, k As Long, m As Long
' Declare a variable for seeing if a result has been found.
Dim ValueFound        As Boolean
Dim LookupResult      As Variant ' Change to Long or String depending what you expect the result to be if desired.

' Set the size of the Result array to have rows equal to LookupTo.
' Note that it is nominally a 2-D array, but with a length of 1 in the second dimension.
' This is just so it is easier to write back to the sheet.
ReDim Result(1 To UBound(LookupTo), 0)

' Loop through the "To" array, looking up all the values.
For i = 1 To UBound(LookupTo, 1)
    ValueFound = False ' Set to false for each new row.
    ' Loop through the ID array, looking for the value in row i.
    For j = 1 To UBound(LookupFromID, 1)
        If LookupTo(i, 1) = LookupFromID(j, 1) Then
            ValueFound = True
            LookupResult = LookupFromID(j, 7)
            Exit For ' Exit the j For loop.
        End If
    Next j
    If ValueFound Then GoTo NextIteration ' Skip to the next lookup value if found.
    For k = 1 To UBound(LookupFromTA, 1)
        If LookupTo(i, 2) = LookupFromTA(k, 1) Then
            ValueFound = True
            LookupResult = LookupFromTA(k, 3)
            Exit For ' Exit the k For loop.
        End If
    Next k
    If ValueFound Then GoTo NextIteration ' Skip to the next lookup value if found.
    For m = 1 To UBound(LookupFromISRC, 1)
        If LookupTo(i, 3) = LookupFromISRC(m, 1) Then
            ValueFound = True
            LookupResult = LookupFromISRC(m, 2)
            Exit For ' Exit the m For loop.
        End If
    Next m
    If ValueFound Then GoTo NextIteration ' Skip to the next lookup value if found.
    LookupResult = "Not found"
' The below is a label that we have put in. It's sole purpose is to be gone to when the loop needs to end early.
NextIteration:
    ' Add the lookup result to the array.
    Result(i, 0) = LookupResult
Next i

' XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
' XXXXXXXX Writing back in portion. XXXXXXX
' XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

' Write the results back to the original spreadsheet.
ThisWorkbook.Sheets(MySheetName).Range("D2:D" & MyLastRow).Value = Result

Application.ScreenUpdating = True 'Turns this back on at the end.
End Sub ' Demarcates the end of the subroutine.



Once you get more familiar with VBA, you'll see that there are loads of ways to make this more flexible. For example, seeing how many rows there actually are on each of the data sheets, rather than just looping through all 1048576 rows that could possibly exist.

However, this does run in a couple of seconds.

2 Comments

You're right of course and I've deleted my earlier comment - my poor performance was due to an "optimization" I'd added where instead of a separate loop for each array I'd used allLookups = Array(LookupFromID, LookupFromTA, LookupFromISRC) then For Each arr in allLookups and then looped over arr. Seems that's not a wise approach. 😳

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.