1

I have an array of ~5,000 unique IDs loaded from a CSV file:

Dim wb As Workbook
Dim idRng As Variant

Set wb = Workbooks.Open(Filename:=ThisWorkbook.path & "\DataSource\ID.csv")
    
With wb.Sheets(1)
    idRng = .Range("A2:A" & .Range("A" & .Rows.Count).End(xlUp).Row).Value2
End With
    
wb.Close

Alongside this, I also load in ~100,000 rows of data, which contains non-unique IDs with numerous possible duplicates. My aim is to loop through the 100,000 rows and check if the corresponding rows ID is contained within the smaller array, and if so, add the rows data to a collection. Both IDs are stored as Longs. I have completed this using the below:

Dim dataRng As Variant
Set wb = Workbooks.Open(Filename:=ThisWorkbook.path & "\DataSource\data.csv")
    
With wb.Sheets(1)
    dataRng = .Range("A2:H" & .Range("A" & .Rows.Count).End(xlUp).Row).Value2
        
    For i = LBound(dataRng) To UBound(dataRng)
        If mUtil.IsInArray(dataRng(i, 1), idRng) Then
            'Add object to collection
        End If
    Next
End With

'mUtil
Public Function IsInArray(v As Variant, arr As Variant) As Boolean
    For i = LBound(arr) To UBound(arr)
        If arr(i, 1) = v Then
            IsInArray = True
            Exit Function
        End If
    Next
    
    IsInArray = False
End Function

Despite this working, as you can imagine iterating through the 5,000 unique IDs 100,000 times can take a fair amount of time, alongside this, the larger file can end up being much bigger.

Is there a more efficient way of performing this task, with the ultimate aim to reduce the run time?

4
  • I had given this a thought - I've just implemented this now and it has made an impact, albeit a very minor one Commented Feb 12, 2022 at 3:00
  • What does Add object to collection mean? What exactly are you adding? Commented Feb 12, 2022 at 3:23
  • Adding the contents of the row (in this case the array where the ID within the row is found within the smaller array) to a collection. It's simply just creating a new class object to capture the data. e.g. If mUtil.IsInArray(dataRng(i, 1), idRng) Then dataColl.Add mFactory.CreateDataObject(dataRng(i, 1), dataRng(i, 2), dataRng(i, 3), dataRng(i, 4), dataRng(i, 5), dataRng(i, 6), dataRng(i, 7), dataRng(i, 8)) End If Commented Feb 12, 2022 at 3:29
  • If you use a 'binary search', sometimes known as a 'binary chop', 100k searches of a 5k array would be trivial. First though your search array must be sorted, though if only 5k a 'QuickSort' would also be trivial. Commented Feb 12, 2022 at 14:52

2 Answers 2

1

I'd suggest throwing your 5,000 records into a dictionary and then use the Exists method to check to see if it does in fact exist.

Public Sub DictionaryTest()
    Dim lngKey As Long, objDict As Object
    
    Set objDict = CreateObject("Scripting.Dictionary")
    
    lngKey = 123456
    
    objDict.Add lngKey, 0
    
    Debug.Print objDict.Exists(lngKey)
End Sub

It absolves you from having to loop over the 5,000 each time AND the power of the search within the dictionary should speed up the process 10 fold.

Sign up to request clarification or add additional context in comments.

2 Comments

There definitely is an improvement, and is ~10 seconds quicker than the Match method above
Curious, how long did each approach take?
0

You can try something as simple as the following. Instead of looping twice, just loop one of them and Match if the item is found in the other array. I just tested with random numbers and just looped the unique values. This would work only if you want the first match. If you want all the matches you need to simply reverse it and loop the 100k non-unique array to the unique one. What we do is create MatchArr as a Variant and then use that variable for our Application.Match function. If the function finds a match, it returns the row it found it on. If it doesn't find a match it will error, but because we made it a variant it won't stop the code. We simply check if it's an error or not and if it is then we simply move to the next line.

This is what I tried (Change as needed):

EDIT: I've updated to do the loop of the bigger array that needs to be refined.

Sub FindValues()

Dim Arr1, Arr2, MatchArr, i As Long, Col As New Collection

Arr1 = Sheet1.Range("A1:A50").Value
Arr2 = Sheet1.Range("C1:C1000").Value

For i = LBound(Arr2, 1) To UBound(Arr2, 1)
    MatchArr = Application.Match(Arr2(i, 1), Arr1, 0)
    If Not IsError(MatchArr) Then
        Col.Add Arr2(i, 1)
    End If
Next i

For i = 1 To Col.Count
    Sheet1.Range("E" & i).Value = Col(i)
Next i

End Sub

2 Comments

I need to loop the larger of the arrays as all matches are required (adding the row to a collection, if it exists in the smaller array). Unfortunately this method takes longer
If this method is slower than you looping both arrays then something is wrong with what you're doing. It should be much faster. I updated it to loop the bigger array.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.