3
\$\begingroup\$

After spending a considerable amount of time poking at the internals of the Collection class, I have discovered 2 methods to iterate a private internal array within a VBA class using a For Each loop on the class instance.

Both methods:

  • work without the need for assembly code or external DLL references
  • make use of the already available IEnumVariant returned by a Collection via it's _NewEnum method
  • are safe (no Application crashes)

I must note that I was already aware of the MEnumerator approach. This approach can lead to crashes, epecially within the 3 custom methods for IUnknown. There is absolutely no need to redo what IUnknown does - on the contrary this approach leads to crashes when state is lost and I found it unstable when updated for x64.

Prerequisites

Since we need to read/write some memory, I will use LibMemory. However, all calls to LibMemory can be easily replaced with calls to RtlMoveMemory (Win) / memmove (Mac).

The internal structure of a Collection's IEnumVariant

After some testing, this is how the structure looks like:

Const LONG_SIZE As Long = 4

Private Type CollEnumVariant
    vTblPtr As LongPtr
    refCount As Long
    nextItem As LongPtr
    unkPtr As LongPtr
    collPtr As LongPtr
End Type

Private Enum CollEnumVariantOffsets
    vTblPtrOffset = 0
    refCountOffset = PTR_SIZE
    nextItemOffset = refCountOffset + PTR_SIZE
    unkPtrOffset = nextItemOffset + PTR_SIZE
    collPtrOffset = unkPtrOffset + PTR_SIZE
End Enum

Let's test. Create a EnumHelper class with the following code. Note that you either need Rubberduck to syncronize the @Enumerator annotation or you have to copy paste the code into a text editor (e.g. Notepad) and then import the saved text file.

VERSION 1.0 CLASS
BEGIN
  MultiUse = -1  'True
END
Attribute VB_Name = "EnumHelper"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit

Private m_enum As IEnumVARIANT

Public Property Set EnumVariant(newEnum_ As IEnumVARIANT)
    Set m_enum = newEnum_
End Property
'@Enumerator
Public Property Get EnumVariant() As IEnumVARIANT
Attribute EnumVariant.VB_UserMemId = -4
    Set EnumVariant = m_enum
End Property

EnumHelper can also be found here.

Then, run TestCollEnumVariantStructure method from a standard module:

Option Explicit

Private Enum CollEnumVariantOffsets
    vTblPtrOffset = 0
    refCountOffset = PTR_SIZE
    nextItemOffset = refCountOffset + PTR_SIZE
    unkPtrOffset = nextItemOffset + PTR_SIZE
    collPtrOffset = unkPtrOffset + PTR_SIZE
End Enum

Sub TestCollEnumVariantStructure()
    Dim c As New Collection
    '
    c.Add 111
    c.Add 222
    '
    Dim i As stdole.IEnumVARIANT: Set i = c.[_NewEnum]
    Dim e As New EnumHelper:      Set e.EnumVariant = i
    Dim iPtr As LongPtr:          iPtr = ObjPtr(i)
    Dim v As Variant
    Dim addr As LongPtr
    Dim j As Long
    '
    Debug.Print "Virtual table: " & MemLongPtr(iPtr + vTblPtrOffset)
    Debug.Print "Reference count: " & MemLong(iPtr + refCountOffset)
    Debug.Print "Next item ptr: " & MemLongPtr(iPtr + nextItemOffset)
    Debug.Print "Unknown ptr: " & MemLongPtr(iPtr + unkPtrOffset)
    Debug.Print "Collection ptr: " & MemLongPtr(iPtr + collPtrOffset) & " (same as: " & ObjPtr(c) & ")"
    Debug.Print
    '
    j = 0
    For Each v In e
        j = j + 1
        Debug.Print "Item value for index " & j & ": " & v
        If j = 1 Then
            addr = MemLongPtr(iPtr + nextItemOffset)
            Debug.Print "Address of second item is: " & addr
            MemLong(addr + 8) = 555 '+8 as offset within the Variant
        End If
    Next v
    Debug.Print
    '
    Debug.Print "Next item ptr (after loop): " & MemLongPtr(iPtr + nextItemOffset)
    i.Reset
    Debug.Print "Next item ptr (after Reset): " & MemLongPtr(iPtr + nextItemOffset)
    Set c = Nothing
    Debug.Print "Next item ptr (after Collection destroyed): " & MemLongPtr(iPtr + nextItemOffset)
    Debug.Print "Collection ptr (after Collection destroyed): " & MemLongPtr(iPtr + collPtrOffset)
End Sub

I get the following result in the Immediate window:
structureTest

As you can see we can even change the items inside the collection as there is a pointer to the next item in the enumeration.

The more interesting part is that once we destroy the collection, it does the cleanup for us i.e. it sets to zero both the pointer for the next item as well as the pointer for the collection. This is the very safe mechanism that Collections have in place to avoid crashes when you destroy the collection while still inside the For Each loop.

I've tested the offsets on both x32 and x64 VBA7 as well as VBA6. Would be grateful if someone can test this on VB6 which I don't have.


#1 Hijack calls to IEnumVARIANT::Next

We cannot swap the function pointer for the IEnumVARIANT::Next method with our own for 2 reasons:

  • memory is write-protected in the original virtual table
  • it would affect all instances of the class which we surely do not want

However, by creating our own virtual table we solve both problems. We will continue using the original functions except the one we actually need replacing. In other words, all the memory management is left to the original implementation while we make use of a custom IEnumVARIANT::Next function.

We will also make use of the 'nextItem' bytes so that we call call back into our desired class.

There are really 2 main ways to implement this approach:
a) Use a direct approach i.e. the regular Public Function NewEnum() As IEnumVariant with the Attribute NewEnum.VB_UserMemId = -4 in each class that we want to iterate
b) Use an indirect approach, like the EnumHelper class we used in TestCollEnumVariantStructure which requires Public Function NewEnum() As EnumHelper in any class that we want to iterate

#1a Direct approach - Public Function NewEnum() As IEnumVariant

The most widespread solution on the internet, to enumerate a private collection within a class, is to just add a method like this:

'@Enumerator
Public Function NewEnum() As IEnumVARIANT
Attribute NewEnum.VB_UserMemId = -4
    Set NewEnum = internalCollection.[_NewEnum]
End Function 

For our exercise - since we want to iterate an array (not a Collection), this implies that we have to call into our array class using late-binding or an interface. I will not implement late-binding because it's slower and it provides no guarantee that the class has the desired method. For example, the MEnumerator (line 152) approach, that I mentioned at the start, expects a default property that can receive an index and return a value. I will instead use an interface to make things explicit.

Create the following IEnumerable_1a interface:

Option Explicit

Public Function ItemByIndex(ByVal iIndex As Long _
                          , ByRef outVar As Variant) As Boolean
    Err.Raise 5, TypeName(Me), "This class should be used as an interface only"
End Function

Create a standard module called LibEnumerable_1a:

Option Explicit

Private vtbl(0 To 6) As LongPtr

Private Type CollEnumVariant
    vTblPtr As LongPtr
    refCount As Long
    enumInterface As IEnumerable_1a
    unkPtr As LongPtr
    coll As Collection
End Type

Private Enum CollEnumVariantOffsets
    vTblPtrOffset = 0
    refCountOffset = PTR_SIZE
    enumInterfaceOffset = refCountOffset + PTR_SIZE
    unkPtrOffset = enumInterfaceOffset + PTR_SIZE
    collOffset = unkPtrOffset + PTR_SIZE
End Enum

Private m_ptrToIndex As New Collection
Private m_index() As Long

'*******************************************************************************
'We 'inject' the instance of a class implementing 'IEnumerable_1a' into the
'   'nextItem' member' (3rd member after vTbl and refCount) of the IEnumVariant
'   instance returned by the relevant collection
'*******************************************************************************
Public Function NewEnumerable(ByRef c As Collection _
                            , ByRef i As IEnumerable_1a) As IEnumVARIANT
    Set NewEnumerable = c.[_NewEnum]
    Dim ptr As LongPtr: ptr = ObjPtr(NewEnumerable)
    If vtbl(0) = 0 Then 'We swap the 'Next' and 'Reset' method with our own
        'We keep the 3 IUnknown methods
        'We replace Next
        'Skip cannot be called from VBA anyway due to the unsupported type so can be left as is
        'We replace Reset just in case, to avoid crashes when coercing to 'IEnumerable_1a'
        'Clone can be left as is
        '
        MemCopy VarPtr(vtbl(0)), MemLongPtr(ptr), PTR_SIZE * 7 'Copy entire original virtual function table
        vtbl(3) = VBA.Int(AddressOf IEnumVARIANT_Next)         'Replace 'Next'
        vtbl(5) = VBA.Int(AddressOf IEnumVARIANT_Reset)        'Replace 'Reset'
    End If
    '
    'Swap original vTable with our own (5 out of 7 are still original functions)
    MemLongPtr(ptr) = VarPtr(vtbl(0))
    '
    'We overwrite the 'nextItem' value (3rd member after vTbl and refCount) with
    '   the pointer to the class instance implementing 'IEnumerable_1a'
    MemLongPtr(ptr + enumInterfaceOffset) = ObjPtr(i)
    '
    'We store the instance pointer as a collection key so we can quickly point
    '   to an array postition where we store the index of the next item to retrieve
    'This is not the fastest approach but rather a quick demo
    Dim s As String: s = CStr(ptr)
    Dim p As Long
    On Error Resume Next
    p = UBound(m_index) + 1
    ReDim Preserve m_index(0 To p)
    m_ptrToIndex.Remove s
    m_ptrToIndex.Add p, s
    On Error GoTo 0
End Function

'*******************************************************************************
'When IEnumVariant.Next is called, we coerce the first argument, which is the
'   pointer to the IEnumVariant class instance, into our custom type so we can
'   take advantage of having access to all values at the instance address
'   without the need to copy memory ourselfs i.e. VB does the copying for us
'*******************************************************************************
Private Function IEnumVARIANT_Next(ByRef thisEnum As CollEnumVariant _
                                 , ByVal celt As Long _
                                 , ByRef rgVar As Variant _
                                 , ByVal pceltFetched As LongPtr) As Long
    If thisEnum.enumInterface Is Nothing Then GoTo CloseLoop
    '
    Dim mapIndex As Long
    '
    'We map the 'IEnumVARIANT' pointer to the index in the 'm_index' array
    '   where we store the nextItem positin
    mapIndex = m_ptrToIndex(CStr(VarPtr(thisEnum)))
    '
    Dim nextIndex As Long: nextIndex = m_index(mapIndex)
    '
    'We retrive the next item and write it to 'rgVar' By Ref
    If Not thisEnum.enumInterface.ItemByIndex(nextIndex, rgVar) Then GoTo CloseLoop
    '
    'We increment the index for the next item to be read
    m_index(mapIndex) = nextIndex + 1
Exit Function
CloseLoop:
    Const S_FALSE = 1
    IEnumVARIANT_Next = S_FALSE
End Function

'*******************************************************************************
'Should not be called. This is just to prevent the user calling the 'Reset'
'   method in the IEenumVariant on purpose, which would overwrite the 'nextItem'
'   member i.e. 'enumInterface' in our representation
'*******************************************************************************
Private Function IEnumVARIANT_Reset(ByVal ptr As Long) As Long
    Const E_NOTIMPL = &H80004001
    IEnumVARIANT_Reset = E_NOTIMPL
End Function

Create the following DemoClass_1a class:

VERSION 1.0 CLASS
BEGIN
  MultiUse = -1  'True
END
Attribute VB_Name = "DemoClass_1a"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit

Implements IEnumerable_1a

Private m_coll As New Collection
Private m_items() As Variant
Private m_nextIndex As Long
Private m_bufferUBound As Long

Public Sub Add(ByRef Item As Variant)
    If m_nextIndex >= m_bufferUBound Then
        m_bufferUBound = m_bufferUBound * 2 + 1
        ReDim Preserve m_items(0 To m_bufferUBound)
    End If
    If IsObject(Item) Then
        Set m_items(m_nextIndex) = Item
    Else
        m_items(m_nextIndex) = Item
    End If
    m_nextIndex = m_nextIndex + 1
End Sub

Public Property Get Count() As Long
    Count = m_nextIndex
End Property

'@Enumerator
Public Function NewEnum() As IEnumVARIANT
Attribute NewEnum.VB_UserMemId = -4
    If m_bufferUBound > m_nextIndex - 1 Then
        m_bufferUBound = m_nextIndex - 1
        ReDim Preserve m_items(0 To m_bufferUBound)
    End If
    '
    'We 'inject' the instance of this class into the 'nextItem' member
    '   of the IEnumVariant instance returned by the internal collection
    'This is to enable the IEnumVariant to call back the 'ItemByIndex'
    '   method via the 'IEnumerable_1a' interface
    Set NewEnum = NewEnumerable(m_coll, Me)
End Function

Private Function IEnumerable_1a_ItemByIndex(ByVal iIndex As Long, ByRef outVar As Variant) As Boolean
    If m_nextIndex = 0 Then Exit Function
    If iIndex < LBound(m_items) Or iIndex > UBound(m_items) Then Exit Function
    If IsObject(m_items(iIndex)) Then
        Set outVar = m_items(iIndex)
    Else
        outVar = m_items(iIndex)
    End If
    IEnumerable_1a_ItemByIndex = True
End Function

Finally, run TestArrayEnumerator_1a in a standard module:

Option Explicit

Sub TestArrayEnumerator_1a()
    Dim d As New DemoClass_1a
    Dim i As Long
    Dim v As Variant
    '
    For i = 1 To 3
        d.Add i
    Next i
    '
    For Each v In d
        Debug.Print v
    Next v
End Sub

This will print 1, 2, 3 to the Immediate window.

It works, but there are a few serious disadvantages with this approch:

  • the class is exposed to this x64 bug . For me this was a major issue over the years, until I completely switched to the indirect approach (e.g. EnumHelper).
  • the 'nextIndex' for each instance of IEnumVariant has to be stored somewhere. In the above example I used the m_ptrToIndex collection and m_index array inside the LibEnumerable_1a. The alternative would be to store and manage this inside the demo class itself which does not scale well. Moreover, we cannot use the existing memory space under CollEnumVariant because if we overwrite any of the vTblPtr or unkPtr or coll pointers we can have crashes if state is lost or the user has Exit For within the For Each loop. In other words, extra management just to store a Long value
  • overall this is slow. Try this:
Sub TestEnumeratorSpeed_1a()
    Dim d As New DemoClass_1a
    Dim i As Long
    Dim v As Variant
    Const size As Long = 1000000
    Dim arr() As Variant: ReDim arr(1 To size)
    Dim t As Double
    '
    For i = 1 To size
        arr(i) = i
        d.Add i
    Next i
    '
    Debug.Print "Running 'For Each' on " & Format$(size, "#,##0") & " elements"
    Debug.Print String$(40, "-")
    '
    t = Timer
    For Each v In arr
    Next v
    Debug.Print "Array: " & Round(Timer - t, 3) & " (seconds)"
    '
    t = Timer
    For Each v In d
    Next v
    Debug.Print "Class: " & Round(Timer - t, 3) & " (seconds)"
End Sub

I get something like:

Running 'For Each' on 1,000,000 elements
----------------------------------------
Array: 0.02 (seconds)
Class: 1.031 (seconds)

which is a x50 factor. Surely, it is faster to just make a copy of the internal array and iterate that copy.

#1b Indirect approach - Public Function NewEnum() As EnumHelper

No need for late binding or an interface. Create a standard module called LibEnumerable_1b:

Option Explicit

Private vtbl(0 To 6) As LongPtr

Private Type CollEnumVariant
    vTblPtr As LongPtr
    refCount As Long
    enumH As EnumHelper_1b
    unkPtr As LongPtr
    coll As Collection
End Type

Private Enum CollEnumVariantOffsets
    vTblPtrOffset = 0
    refCountOffset = PTR_SIZE
    enumHelperOffset = refCountOffset + PTR_SIZE
    unkPtrOffset = enumHelperOffset + PTR_SIZE
    collOffset = unkPtrOffset + PTR_SIZE
End Enum

'*******************************************************************************
'We add the instance of the 'IEnumVariant' returned by the collection as
'   well as a reference to the desired array, both to the 'EnumHelper_1b' instance
'*******************************************************************************
Public Function NewEnumHelper(ByRef c As Collection _
                            , ByRef arr As Variant) As EnumHelper_1b
    With New EnumHelper_1b
        .Init c.[_NewEnum], arr
        Set NewEnumHelper = .Self
        Dim ptr As LongPtr: ptr = ObjPtr(.EnumVariant)
    End With
    If vtbl(0) = 0 Then 'We swap the 'Next' and 'Reset' method with our own
        'We keep the 3 IUnknown methods
        'We replace Next
        'Skip cannot be called from VBA anyway due to the unsupported type so can be left as is
        'We replace Reset just in case, to avoid crashes when coercing to 'EnumHelper_1b'
        'Clone can be left as is
        '
        MemCopy VarPtr(vtbl(0)), MemLongPtr(ptr), PTR_SIZE * 7 'Copy entire original virtual function table
        vtbl(3) = VBA.Int(AddressOf IEnumVARIANT_Next)         'Replace 'Next'
        vtbl(5) = VBA.Int(AddressOf IEnumVARIANT_Reset)        'Replace 'Reset'
    End If
    '
    'Swap original vTable with our own (5 out of 7 are still original functions)
    MemLongPtr(ptr) = VarPtr(vtbl(0))
    '
    'We overwrite the 'nextItem' value (3rd member after vTbl and refCount) with
    '   the pointer to the instance of 'EnumHelper_1b'
    MemLongPtr(ptr + enumHelperOffset) = ObjPtr(NewEnumHelper)
End Function

'*******************************************************************************
'When IEnumVariant.Next is called, we coerce the first argument, which is the
'   pointer to the IEnumVariant class instance, into our custom type so we can
'   take advantage of having access to all values at the instance address
'   without the need to copy memory ourselfs i.e. VB does the copying for us
'*******************************************************************************
Private Function IEnumVARIANT_Next(ByRef thisEnum As CollEnumVariant _
                                 , ByVal celt As Long _
                                 , ByRef rgVar As Variant _
                                 , ByVal pceltFetched As LongPtr) As Long
    Const S_FALSE = 1
    If thisEnum.enumH Is Nothing Then
        IEnumVARIANT_Next = S_FALSE
    ElseIf Not thisEnum.enumH.GetNext(rgVar) Then
        IEnumVARIANT_Next = S_FALSE
    End If
End Function

'*******************************************************************************
'Should not be called. This is just to prevent the user calling the 'Reset'
'   method in the IEenumVariant on purpose, which would overwrite the 'nextItem'
'   member i.e. 'enumInterface' in our representation
'*******************************************************************************
Private Function IEnumVARIANT_Reset(ByVal ptr As Long) As Long
    Const E_NOTIMPL = &H80004001
    IEnumVARIANT_Reset = E_NOTIMPL
End Function

Create the updated EnumHelper_1b class:

VERSION 1.0 CLASS
BEGIN
  MultiUse = -1  'True
END
Attribute VB_Name = "EnumHelper_1b"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit

Private m_enum As IEnumVARIANT
Private m_arrByRef As Variant
Private m_currIndex As Long

'@Enumerator
Public Property Get EnumVariant() As IEnumVARIANT
Attribute EnumVariant.VB_UserMemId = -4
    Set EnumVariant = m_enum
End Property

Public Property Get Self() As EnumHelper_1b
    Set Self = Me
End Property

Public Sub Init(ByRef newEnum_ As IEnumVARIANT _
              , ByRef arr As Variant)
    Set m_enum = newEnum_
    m_arrByRef = GetArrayByRef(arr)
    m_currIndex = LBound(arr)
    '
    'We overwrite the 'nextItem' value (3rd member after vTbl and refCount)
    '   with a pointer to this instance so that 'enumH' member in
    '   'CollEnumVariant' representation is coerced to this class
    MemLongPtr(ObjPtr(newEnum_) + PTR_SIZE * 2) = ObjPtr(Me)
End Sub

Public Function GetNext(ByRef v As Variant) As Boolean
    If m_currIndex < LBound(m_arrByRef) Or _
       m_currIndex > UBound(m_arrByRef) Then Exit Function
    If IsObject(m_arrByRef(m_currIndex)) Then
        Set v = m_arrByRef(m_currIndex)
    Else
        v = m_arrByRef(m_currIndex)
    End If
    m_currIndex = m_currIndex + 1
    GetNext = True
End Function

Create DemoClass_1b class:

Option Explicit

Private m_coll As New Collection
Private m_items() As Variant
Private m_nextIndex As Long
Private m_bufferUBound As Long

Public Sub Add(ByRef Item As Variant)
    If m_nextIndex >= m_bufferUBound Then
        m_bufferUBound = m_bufferUBound * 2 + 1
        ReDim Preserve m_items(0 To m_bufferUBound)
    End If
    If IsObject(Item) Then
        Set m_items(m_nextIndex) = Item
    Else
        m_items(m_nextIndex) = Item
    End If
    m_nextIndex = m_nextIndex + 1
End Sub

Public Property Get Count() As Long
    Count = m_nextIndex
End Property

Public Function NewEnum() As EnumHelper_1b
    If m_bufferUBound > m_nextIndex - 1 Then
        m_bufferUBound = m_nextIndex - 1
        ReDim Preserve m_items(0 To m_bufferUBound)
    End If
    '
    'We push the items array (By Ref) into the 'EnumHelper_1b' instance
    '   which will also store the 'IEnumVariant'
    'Basically all calls coming from 'For Each...' will go into
    '   the 'EnumHelper_1b' instance and not this one (DemoClass_1b)
    Set NewEnum = NewEnumHelper(m_coll, m_items)
End Function

Finally, run the TestArrayEnumerator from a standard module:

Option Explicit

Sub TestArrayEnumerator_1b()
    Dim d As New DemoClass_1b
    Dim i As Long
    Dim v As Variant
    '
    For i = 1 To 3
        d.Add i
    Next i
    '
    For Each v In d.NewEnum
        Debug.Print v
    Next v
End Sub

This will print 1, 2, 3 to the Immediate window.

This approach is better than #1a for the following reasons:

  • the class is bypassing this x64 bug
  • no class needs the VB_UserMemId = -4 attribute. In fact a class can now expose multiple EnumHelper functions each pointing to a different array
  • the 'currentIndex' for each instance of IEnumVariant is stored within it's corresponding 'parent' EnumHelper class under the m_currIndex variable. This scales well
  • the array is stored by reference in a Variant which points to the actual array variable. The address does not change for the m_items variable itself even if the array is resized or destroyed. m_arrByRef will always point to the current array whatever that is. We do NOT have to call into the demo class be that with late-binding or via interface. Of course, we would not need to store the array by reference if we were to implement an interface or use late-binding but that is much slower
  • this is faster than #1a: Try this:
Sub TestEnumeratorSpeed_1b()
    Dim d As New DemoClass_1b
    Dim i As Long
    Dim v As Variant
    Const size As Long = 1000000
    Dim arr() As Variant: ReDim arr(1 To size)
    Dim t As Double
    '
    For i = 1 To size
        arr(i) = i
        d.Add i
    Next i
    '
    Debug.Print "Running 'For Each' on " & Format$(size, "#,##0") & " elements"
    Debug.Print String$(40, "-")
    '
    t = Timer
    For Each v In arr
    Next v
    Debug.Print "Array: " & Round(Timer - t, 3) & " (seconds)"
    '
    t = Timer
    For Each v In d.NewEnum
    Next v
    Debug.Print "Class: " & Round(Timer - t, 3) & " (seconds)"
End Sub

I get this:

Running 'For Each' on 1,000,000 elements
----------------------------------------
Array: 0.02 (seconds)
Class: 0.305 (seconds)

which is a x15 factor compared to x50 (#1a).


Both #1a and #1b are safe for the following reasons:

  1. If the DemoClass instance is destroyed, then so is the internal collection destroyed which in turn clears the pointers in the IEnumVariant structure thus stopping the enumeration (thanks to the If thisEnum.enumInterface Is Nothing Then / If thisEnum.enumH Is Nothing Then lines from our swap methods).
  2. There are no crashes when state is lost even if the 'Reset' button is pressed in the IDE while code is running regardless which method is the last in the stack
  3. Cleanup is performed by the original implementation that handles all the Collection instances. We don't have to manage memory storage (e.g. CoTaskMemAlloc/CoTaskMemFree) nor handle reference count or internal interfaces

Overall, I think #1b scales well. However, this is not as fast as I would like it to be and this is mainly because of the extra function calls to IEnumVARIANT_Next and then the follow up to get the actual value from the array. There is a faster way, see section #2.

#2 Hijack the nextItemPtr within the IEnumVARIANT storage

As well detailed on VB Forums or here, there is a specific offset between items in the underlying structures making up a collection (linked-list). If we offset our items carefully, we can make use of the original speed of iterating a Collection.

For the purpose of this example, I will skip using the EnumHelper class so the below example is vulnerable to the previously mentioned x64 bug. We can easily implement the most basic EnumHelper (like the one under the 'The internal structure' section).

Create a class called DemoClass_2:

VERSION 1.0 CLASS
BEGIN
  MultiUse = -1  'True
END
Attribute VB_Name = "DemoClass_2"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Option Explicit

Private Type VariantInfo
    IsObject As Boolean
    DummyOffset1 As Integer
    DummyOffset2 As Long
    #If Win64 Then
        DummyOffset3 As LongLong
    #End If
End Type

'We are mimicking the structure of an 'VbCollectionItem' as detailed here:
'https://gist.github.com/wqweto/39822f4fb7090fa086aeff1e2e06e630
'or on VB Forums.
'We only need the item value which is the 1st member (Variant) and the
'   4th member which is the address of the next item structure.
'Since we don't need the pointer to the key (2nd member) or the pointer
'   to the previous item (3rd member), we replace with 'VariantInfo'
'   which has the same byte size as the 2 pointers but allows us to store
'   some useful metadata like 'IsObject' bool for the Variant member
Private Type EnumerableVariant
    Value As Variant
    Info As VariantInfo
    NextPtr As LongPtr
End Type

Private m_coll As New Collection
Private m_items() As EnumerableVariant
Private m_nextIndex As Long
Private m_bufferUBound As Long

Public Sub Add(ByRef Item As Variant)
    If m_nextIndex >= m_bufferUBound Then
        m_bufferUBound = m_bufferUBound * 2 + 1
        ReDim Preserve m_items(0 To m_bufferUBound)
    End If
    With m_items(m_nextIndex)
        .Info.IsObject = IsObject(Item) 'This can be used later to speed up item retrieval
        If .Info.IsObject Then Set .Value = Item Else .Value = Item
    End With
    m_nextIndex = m_nextIndex + 1
End Sub

Public Property Get Count() As Long
    Count = m_nextIndex
End Property

'@Enumerator
Public Function NewEnum() As IEnumVARIANT
Attribute NewEnum.VB_UserMemId = -4
    Set NewEnum = m_coll.[_NewEnum]
    Dim ptr As LongPtr: ptr = VarPtr(m_items(0))
    '
    'We overwrite the 'nextItem' pointer in the 'IEnumVARIANT' so that
    '   it points to our desired 'EnumerableVariant' member which will allow
    '   the collection to iterate the values in the array
    MemLongPtr(ObjPtr(NewEnum) + PTR_SIZE * 2) = ptr
    '
    'We link each 'EnumerableVariant' member to the next
    '   one via the 'NextPtr' member
    If m_nextIndex > 1 Then
        #If Win64 Then
            Const size As Long = 48
        #Else
            Const size As Long = 28
        #End If
        Dim i As Long
        For i = 0 To m_nextIndex - 2
            ptr = ptr + size
            m_items(i).NextPtr = ptr
        Next i
    End If
End Function

Finally, run TestArrayEnumerator_2 in a standard module:

Option Explicit

Sub TestArrayEnumerator_2()
    Dim d As New DemoClass_2
    Dim i As Long
    Dim v As Variant
    '
    For i = 1 To 3
        d.Add i
    Next i
    '
    For Each v In d
        Debug.Print v
    Next v
End Sub

This will print 1, 2, 3 to the Immediate window.

How does this perform? Run TestEnumeratorSpeed_2 from a standard module:

Sub TestEnumeratorSpeed_2()
    Dim d As New DemoClass_2
    Dim i As Long
    Dim v As Variant
    Const size As Long = 1000000
    Dim arr() As Variant: ReDim arr(1 To size)
    Dim t As Double
    '
    For i = 1 To size
        arr(i) = i
        d.Add i
    Next i
    '
    Debug.Print "Running 'For Each' on " & Format$(size, "#,##0") & " elements"
    Debug.Print String$(40, "-")
    '
    t = Timer
    For Each v In arr
    Next v
    Debug.Print "Array: " & Round(Timer - t, 3) & " (seconds)"
    '
    t = Timer
    For Each v In d
    Next v
    Debug.Print "Class: " & Round(Timer - t, 3) & " (seconds)"
End Sub

I get this:

Running 'For Each' on 1,000,000 elements
----------------------------------------
Array: 0.02 (seconds)
Class: 0.047 (seconds)

which is way faster that both approaches in #1.

However, this approach has 2 main drawbacks:

  1. The array needs to be locked somehow to prevent reallocation while the loop is running. I haven't implemented this for the purpose of this example, but it is definitely achievable even if it proves to be complex.
  2. The internal array is not a contiguous array anymore. If I want to have a function that exposes a copy of the array, then I will have to copy each item one by one. I must mention that I already tried a 'workaround' that makes the array copy and then increases the 'cbElements' member of the underlying SAFEARRAY structure (of the returned copy) to accommodate the extra members in the struct. However, this does not work well in 3 out of 4 cases (For Each loop on Variant(), For Each loop on Variant and For.. To.. loop on Variant()). It only works for a For.. To.. loop on Variant. I can provide more details if needed but wanted to say it's a dead-end. I also already tried to use a fake ParamArray that points to each element by Reference which then gets copied with the build-in functionality (for param arrays) into an array by Value - however, this is much slower than iterating elements one by one. Also, I've tried referencing elements via ByRef Variants but that does not work well because in the For Each loop the iterator Variant then needs to be set to a value (like Empty) before getting the next value - so, not a good solution either. Here is a quick test:
Option Explicit

Private Type TagVariant
    vt As Integer
    wReserved1 As Integer
    wReserved2 As Integer
    wReserved3 As Integer
    ptr1 As LongPtr
    ptr2 As LongPtr
End Type

Private Type nextItemAddr
    itemByRef As TagVariant
    dummy1 As LongPtr
    dummy2 As LongPtr
    nextPtr As LongPtr
End Type

#If Win64 Then
    Const sizeOfNext = 48
#Else
    Const sizeOfNext = 28
#End If

Sub TestByRef()
    Const size As Long = 10
    '
    Dim c As Collection:        Set c = New Collection
    Dim e As IEnumVARIANT:      Set e = c.[_NewEnum]
    Dim arr() As Variant:       ReDim arr(0 To size - 1)
    Dim arr2() As nextItemAddr: ReDim arr2(0 To size - 1)
    Dim h As New EnumHelper:    Set h.EnumVariant = e
    Dim i As Long
    Dim v As Variant
    Dim ptr1 As LongPtr
    Dim ptr2 As LongPtr
    Dim vt As Integer
    '
    ptr1 = VarPtr(arr(0))
    ptr2 = VarPtr(arr2(1))
    vt = vbVariant + VT_BYREF
    For i = 0 To UBound(arr2)
        arr(i) = i 'doesn't matter what values
        arr2(i).itemByRef.ptr1 = ptr1
        arr2(i).itemByRef.vt = vt 'could use the actual var type here but then the ptr needs updated with +8
        arr2(i).nextPtr = ptr2
        ptr1 = ptr1 + VARIANT_SIZE
        ptr2 = ptr2 + sizeOfNext
    Next i
    arr2(UBound(arr2)).nextPtr = 0
    '
    MemLongPtr(ObjPtr(e) + PTR_SIZE * 2) = VarPtr(arr2(0))
    '
    For Each v In h
        Debug.Print v
        v = Empty 'Required to avoid 'Type Mismatch' error that gets raised on the 'Next v' line
    Next v
End Sub

Of course, any feedback on this would be great.

I do have some questions though:
Q1) Is any of the presented approaches viable (in your opinion, of course)? Would you use any of them? Why (both for yes or no)?
Q2) For approach #2a, I would be very interested if there is a way to be efficient in copying the internal Private m_items() As EnumerableVariant array into and array of Variant() without iterating one by one.
Q3) I would be interested in any improvements to any of the presented approaches

Thank you for your time!

\$\endgroup\$
21
  • \$\begingroup\$ This is a very original idea I don't think I've seen it anywhere. I think your question is dying for some context I've got to know; why are you doing this, what particular use case do you have in mind. Is it a performance thing, or a quality of life change, or it paves the way to some trick you can only do with arrays? \$\endgroup\$ Commented Nov 18, 2023 at 10:40
  • \$\begingroup\$ (Otherwise this is a very good question with the demo code at the top showing how it works under the hood, where you could've just presented the final 3 implementations on their own but it wouldn't have been nearly as informative!) \$\endgroup\$ Commented Nov 18, 2023 at 10:43
  • \$\begingroup\$ Thanks @Greedo for the kind words. I have a Dictionary class (not yet posted anywhere). My goal is to have a drop-in replacement for the Scripting.Runtime dict, even for Windows. I know there is one here that works on Mac but performance is awful, and it has major bugs e.g. an int key '2' is considered equal to a text key '2__2' in text comparison. Arrays seem the most efficient internal storage for the dict. The problem I have is getting a For Each loop to work and be efficient while also returning the Items and Keys arrays. \$\endgroup\$ Commented Nov 19, 2023 at 20:20
  • 1
    \$\begingroup\$ @CristianBuse Great, I'll take a look. I do have some thoughts, particularly as I was looking at using assembly to solve the stability issues you identified implementing IEnumVariant manually - just need to gather them. Performance of your idea seems orders of magnitude better on first glance \$\endgroup\$ Commented Dec 14, 2023 at 1:02
  • 1
    \$\begingroup\$ @Greedo Nice. I just took a peek and was able to run the countdown generator example. When I wrote you, I was basically stuck with deciding what to do but in the meanwhile I went for the Collection's NewEnum hijack approach which works pretty great. This is the repo I've been working on for more than a year: VBA-FastDictionary - will send you a link. It's now fully working and properly tested on all platforms. I will make it public once I have time to write documentation and a CR post. \$\endgroup\$ Commented Apr 3, 2024 at 7:16

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.