3

I built a VBA Sub that matches an input string with the equivalent string within a large array and returns a specific string which is bounded to the matched string.

However, while the code is working well with some 100 entries, around 12sec. Around 1000 entries take 1min and 1500 entries might take 3min.

So, I was wondering if there anything I could improve to make the code run faster with a large amount of entries.

The VBA Function:

Sub searchISIN()
       
Dim StartTime As Double
StartTime = Timer
       
lRow = getlastrow(ws_universe, 1)
Dim rngISIN As Range: Set rngISIN = ws_universe.Range("A2:A" & lRow)
Dim z As Long: z = 1
Dim i As Long: i = 1
Dim j As Long

For Each cell In rngISIN
    z = z + 1
    For j = LBound(MatchingArr) To UBound(MatchingArr)
        If InStr(1, CStr(MatchingArr(j)), CStr(cell.Value), vbTextCompare) Then
            ws_universe.Cells(z, 2).Value = Left(MatchingArr(j), 18)
            i = i + 1
            GoTo NextIteration
        End If
    Next j
    
ws_universe.Cells(z, 2).Value = "k.A."
i = i + 1

NextIteration:
    Next cell

MsgBox "Search ISINs: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")

End Sub

The array that gets parsed has around 150k entries and each entry is a string which looks like the following:

"IID00XXXXXXXXXXXX|Magna International Inc.|US55922PF576;US559222AQ72;CA559222AT14;US559222AV67;US55922PRV75;US55922PF329;CA5592224011;XS1689185426;US55922PUW12;US559222AR55"

The code takes an input string, for example CA559222AT14, uses the built-in InStr function and returns the first 18 characters of the current array entry. In this example the return value would be "IID00XXXXXXXXXXXX"

I'm open for any idea to improve the code runtime. There are no constrains, rearranging the array layout, rearranging the complete code or whatsoever.

3
  • 1
    Your code is making 150k accesses to the spreadsheeet which is a very slow operation. The first step in speeding things up is to read the whole range into an array in VBA and to then work on the array. If necessary, build up a second array in VBA and only paste that back into excel. when your code has completed. Good luck Commented Mar 8, 2022 at 12:03
  • You might also find that adding option explicit at the start of each module is helpful. At the moment its very difficult to work with your code because you have a number of undeclared variables. Commented Mar 8, 2022 at 12:14
  • Thanks for the tips so far. Yeah, sorry I didn't reveal the full code. Some variables are public and some others get passed. Commented Mar 8, 2022 at 12:22

3 Answers 3

4

I think what you're doing is a good use case for a dictionary (instead of an array).

Dim MatchingArr as new Dictionary

If you're looking for matches between the semi-colons, then you should split those out when you build the dictionary instead of trying to match after the fact. You don't show what the search strings are supposed to be, so I'm just guessing.

For example, if you're looking for the following matches:

US55922PF576 US559222AQ72 CA559222AT14 to return IID00XXXXXXXXXXXX, then they should be separate dictionary entries:

MatchingArr.Add "US55922PF576", "IID00XXXXXXXXXXXX"
MatchingArr.Add "US559222AQ72", "IID00XXXXXXXXXXXX"
MatchingArr.Add "CA559222AT14", "IID00XXXXXXXXXXXX"

^ I'm not sure how you need to build the dictionary, but you can loop through your strings, split things out, and add the matches to it if needed.

Then, dump your range to an array and use your dictionary. Create a "paste range" array and fill it out as you loop:

Dim rngISIN As Range: Set rngISIN = ws_universe.Range("A2:A" & lRow)
Dim rngArr() as Variant: rngArr = rngISIN
Dim pasteArr() as Variant: pasteArr = rngArr
Dim x
For x = LBound(rngArr,1) to UBound(rngArr,1)
    If MatchingArr.Exists(rngArr(x,1)) Then pasteArr(x,1) = MatchingArr(rngArr(x,1))
Next x

Finally just set the range next to your original range = to the paste range array.

ws_universe.Range("B2:B" & lRow) = pasteArr

The following code runs through 500,000 rows in about 10 seconds (this includes building the dictionary).

If the dictionary is already built, it's faster.

Examplez

The results of the first run with 1,048,576 rows was 00:00:56 using this code:

Dim myDict As New Dictionary
Sub TestBigRange()
Dim StartTime As Double
StartTime = Timer
Dim rangeArr() As Variant
rangeArr = Range("A1:A1048576")
Dim x As Long
'Build dictionary if needed
If myDict.Count = 0 Then
    For x = LBound(rangeArr, 1) To UBound(rangeArr, 1)
        myDict.Add x, "A" & x
    Next x
End If
Dim pasteRng() As Variant
pasteRng = rangeArr
For x = LBound(rangeArr, 1) To UBound(rangeArr, 1)
    If myDict.Exists(rangeArr(x, 1)) Then pasteRng(x, 1) = myDict(rangeArr(x, 1))
Next x
Range("A1:A1048576") = rangeArr
Range("B1:B1048576") = pasteRng
MsgBox "Search Dictionary: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")
End Sub

Without having to rebuild the dictionary after:

Runtime 4

Building the dictionary with 1,048,576 entries took 24 seconds.

Reading/Writing the dictionary to/from a text file is pretty quick as well:

Read

'If we write to separate lines, we don't have to split:
Sub WriteDictionary()
Open ThisWorkbook.Path & "\DictionaryFile.txt" For Output As #1
Dim x As Long
For x = 1 To 1048576
    Print #1, x & ""
    Print #1, "A" & x
Next x
Close #1
End Sub
Sub ReadDictionary()
Set myDict = New Dictionary
Dim StartTime As Double
StartTime = Timer
Open ThisWorkbook.Path & "\DictionaryFile.txt" For Input As #1
Dim key, val
Do Until EOF(1)
    Line Input #1, key
    Line Input #1, val
    myDict.Add key, val
Loop
Close #1
MsgBox "Read Dictionary from File [1048576 rows]: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")
End Sub

EDIT: Per Tim Williams, keeping your dictionaries smaller results in even faster creation/lookup time.

Here we do the complete build dictionary/lookup on 1048576 rows in 31 seconds!

Speed

Dim aDicts() As New Dictionary
Sub BuildDictionaries(ARange As Range, Optional MaxSize = 100000)
'100,000 is arbitrary, but seems to be a pretty good number
'Feel free to experiment: too small/big = slower.
ReDim aDicts(Int(ARange.Cells.Count / MaxSize))
Dim x As Long, r() As Variant, curDict As Integer
curDict = 0: r = ARange
For x = LBound(r, 1) To UBound(r, 1)
    If aDicts(curDict).Count < MaxSize Then
        aDicts(curDict).Add x, "A" & x
    Else
        curDict = curDict + 1
        aDicts(curDict).Add x, "A" & x
    End If
Next x
End Sub

The code to search through each dictionary:

For x = LBound(rangeArr, 1) To UBound(rangeArr, 1)
    For Each z In aDicts
        If z.Exists(rangeArr(x, 1)) Then
            pasteRng(x, 1) = z(rangeArr(x, 1))
            Exit For
        End If
    Next z
Next x

Here's the output from my last round of testing (reading in dictionary from text file):

Need 11 dictionaries [100,000 split]
Dictionary(0) size: 100000
Dictionary(1) size: 100000
Dictionary(2) size: 100000
Dictionary(3) size: 100000
Dictionary(4) size: 100000
Dictionary(5) size: 100000
Dictionary(6) size: 100000
Dictionary(7) size: 100000
Dictionary(8) size: 100000
Dictionary(9) size: 100000
Dictionary(10) size: 48576
Read Dictionary from File [1048576 rows]: 00:00:02
Search Dictionary [100000]: 00:00:02
Search Dictionary [200000]: 00:00:03
Search Dictionary [300000]: 00:00:04
Search Dictionary [400000]: 00:00:05
Search Dictionary [500000]: 00:00:07
Search Dictionary [600000]: 00:00:10
Search Dictionary [700000]: 00:00:13
Search Dictionary [800000]: 00:00:17
Search Dictionary [900000]: 00:00:22
Search Dictionary [1000000]: 00:00:27
Search Dictionary [Finished]: 00:00:31
Sign up to request clarification or add additional context in comments.

8 Comments

Interesting approach, tried something similar as well. Basically I got the dictionary structure already, like this "US55922PF576","IID00XXXXXXXXXXXX". However, its a large txt file with >2mio entries. I'm afraid that it will take vba a long time to read the txt file first and then convert the data into a dictionary. That's why I chose the layout like shown above, the initial string layout.
Have you tried? It doesn't take very long to build a dictionary in my experience - even with tons of values (in my example edit, I test with 500,000 rows and it takes less than 5 seconds). You may be able to save it as well; it's an interesting thing I haven't thought about (I'd be interested in looking into it more). Searching through strings takes far longer in my experience.
@rin_o please see my edits
thanks for the input. I will try the approach and give you an update later!
Populating a Dictionary get slower in a non-linear way as you add more entries - eg I can add 100k entries in ~0.8 sec but adding 1M entries takes 64 sec not 8 sec. If speed is important you can create multiple smaller dictionaries and (eg) put them in a Collection. That may end up faster overall than using a single dictionary for all your data.
|
3

Looping Through Arrays Instead of Ranges

  • Not tested. It will fail if lRow is less than 3.
Option Explicit


Sub searchISIN()
       
    Dim StartTime As Double: StartTime = Timer
           
    lRow = getlastrow(ws_universe, 1)
    Dim rngISIN As Range: Set rngISIN = ws_universe.Range("A2:A" & lRow)
    
    Dim aData As Variant: aData = rngISIN.Value
    Dim bData As Variant: bData = rngISIN.EntireRow.Columns("B").Value
    
    Dim aOffset As Long: aOffset = 1 - LBound(MatchingArr)
    
    Dim aIndex As Variant
    Dim a As Long
    Dim i As Long: i = 1
   
    For a = 1 To UBound(aData, 1)
        aIndex = Application.Match("*" & CStr(aData(a, 1)) & "*", MatchingArr, 0)
        If IsNumeric(aIndex) Then
            bData(a, 1).Value = Left(MatchingArr(aIndex - aOffset), 18)
            i = i + 1
        Else
            bData(a, 1) = "k.A."
            i = i + 1
        End If
    Next a
    
    rngISIN.EntireRow.Columns("B").Value = bData
    
    MsgBox "Search ISINs: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")

End Sub


Sub searchISINFirst()
       
    Dim StartTime As Double: StartTime = Timer
           
    lRow = getlastrow(ws_universe, 1)
    Dim rngISIN As Range: Set rngISIN = ws_universe.Range("A2:A" & lRow)
    
    Dim aData As Variant: aData = rngISIN.Value
    Dim bData As Variant: bData = rngISIN.EntireRow.Columns("B").Value
    
    Dim a As Long
    
    Dim i As Long: i = 1
    Dim j As Long
    Dim jFound As Boolean
    
    For a = 1 To UBound(aData, 1)
        For j = LBound(MatchingArr) To UBound(MatchingArr)
            If InStr(1, CStr(MatchingArr(j)), CStr(aData(a, 1)), vbTextCompare) Then
                bData(a, 1).Value = Left(MatchingArr(j), 18)
                i = i + 1
                jFound = True
                Exit For
            End If
        Next j
        If jFound Then
            jFound = False
        Else
            bData(a, 1) = "k.A."
            i = i + 1
        End If
    Next a
    
    rngISIN.EntireRow.Columns("B").Value = bData
    
    MsgBox "Search ISINs: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")

End Sub

3 Comments

Thanks for your input! After the comment from @freeflow, I tried the same as you have done above. For around 1500 entries the code run approx. 2mins. Thus, the runtime can be reduced by approx. 1min, which is quite nice.
So, would you say that large operations, like these, should be solved by using arrays? Or in other words, should I avoid to perform large operations directly in the spreadsheet?
I've added two solutions. The first should be the fastest. The second is like the previous, only additionally illustrates how to get rid of the label (GoTo). Let me know how it goes. When processing large data, immediately read what you need into arrays and process there. When done, write to the worksheet. Ideally, one read and one write. BTW, where does the funny string array come from? If it is from a worksheet, you should leave it there because Application.Match is much faster on a range than on an array (in regards to the first procedure).
3

Not an answer but just to illustrate a point about loading a Scripting Dictionary if you have a lot of data...

The chart below compares actual load times vs number of entries with a linear extrapolation based on time to load the first 500k entries. It's clear that when you get past a few hundred thousand entries the load time gets very long.

enter image description here

As commented above, splitting your data over multiple dictionaries (stored in a Collection for example) may result in faster run times (depending on your exact use case).

Also worth noting that the Dictionary can accept [pretty much] any datatype as keys, and some types load faster than others (eg. Long keys can be added about 2-3x faster than String keys)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.