1

The following macro does a great job of grouping files by folder, however, it is very slow when it is run on a directory with tens of thousands of files (like 'My Pictures'). Is there any way to speed it up?

Option Explicit
Sub cmdList()
Dim objShell    As Object
Dim objFolder   As Object
Dim sPath       As String
Dim fOut        As Variant
Dim r           As Integer
Dim listRng     As Range
Dim cell        As Range
Dim i           As Integer
Dim j           As Integer

Set objShell = CreateObject("Shell.Application")
Set objFolder = objShell.BrowseForFolder(0, "Select Folder", 0, 17)
If objFolder Is Nothing Then Exit Sub
Application.ScreenUpdating = False
sPath = objFolder.self.Path
Set objFolder = Nothing: Set objShell = Nothing

r = 6: Range(r & ":" & Rows.Count).Delete
Cells(r - 1, 1) = sPath

fOut = Split(CreateObject("WScript.Shell").exec("cmd /c dir """ & sPath & """ /a:-h-s /b /s").StdOut.ReadAll, vbNewLine)

Cells(r, 1).Resize(UBound(fOut), 1) = WorksheetFunction.Transpose(fOut)

Set listRng = Cells(r, 1).CurrentRegion
listRng.Sort Key1:=Cells(r, 1), Order1:=xlAscending, Header:=xlYes

For i = 1 To listRng.Count
    For j = i + 1 To listRng.Count
        If InStr(listRng.Cells(j), listRng.Cells(i)) Then
            With listRng.Cells(j)
                .Rows.Group
                .IndentLevel = .Rows.OutlineLevel - 1
            End With
        Else
            Exit For
        End If
    Next j
Next i
Application.ScreenUpdating = True
End Sub

The output I am looking to achieve is this:

Level 1....

Level 1

Level 2...

Level 2

Level 3...

Level 3

6
  • At which stage is the slowdown occurring? Commented May 31, 2019 at 12:32
  • It looks like you are having a nested loop, over the same range. Maybe you can rewrite the conditions to not have to do that twice. And/Or allocate the range to an array, and only touch the rows where needed, should probably help a bit. Commented May 31, 2019 at 12:37
  • The slow down is definitely in the nested loop, Ron. I am not sure how to implement DarXyde's suggestion, though. Commented May 31, 2019 at 12:49
  • Looping through 10K rows twice is 100M of iterations. What is the exact idea of the loop? How many are the grouped rows? If they are 2 and they are next to each other, this could be achieved with 1 loop only, thus a linear complexity is quite ok. Commented May 31, 2019 at 13:00
  • That's why I am here, Vityata. No idea how to implement....is there a 'beginner;s VBA' thread I should be using??? Commented May 31, 2019 at 13:07

1 Answer 1

2

EDIT: answer is updated to change the grouping to above and to correct a condition that incorrectly indented and grouped the rows.

So it was an interesting problem to solve. In addition to the actual solution, I have some other tips that I normally include in my code that I'll call out as well. My solution is VERY fast as well. When I parsed the C:\Program Files\ directory tree (18,017 files), it ran in under 5 seconds.

  1. Declare your variables as close as possible to the point where they are used for the first time. This makes it much easier to determine the variable type and definition, and also helps to functionally group the code.
  2. Those logical groups can then be functionally isolated into separate functions and subs. This will make the main logic of your code much easier to grasp in a single quick view, rather than requiring the reader (probably YOU in a few months) to re-read large logic sections and digest it in order to understand it.

In my example code, I start off with three quick functions that tell you exactly what's going on:

Dim rootFolder As String
rootFolder = SelectFolder

Dim pathArray As Variant
pathArray = GetAllFiles(rootFolder)

Dim folderGroups As Object
Set folderGroups = BuildFolderDictionary(pathArray)

The first function is straightforward and closely follows your approach to selecting the root folder:

Private Function SelectFolder() As String
    '--- returns the user-selected folder as a string
    Dim objShell As Object
    Dim objFolder As Object
    Set objShell = CreateObject("Shell.Application")
    Set objFolder = objShell.BrowseForFolder(0, "Select Folder", 0, 17)
    If Not objFolder Is Nothing Then
        SelectFolder = objFolder.self.path
    End If
End Function

The next function (GetAllFiles) also uses your approach but instead of putting the results directly into the worksheet, it keeps the results in a memory-based array (at the bottom of this answer, I include the whole module in a single code block for copy/paste later):

Private Function GetAllFiles(ByVal rootPath As String, _
                             Optional onlyFolders As Boolean = False) As Variant
    '--- returns a sorted array of all filepaths in the given directory path
    Dim dirOptions As String
    If onlyFolders Then
        dirOptions = """ /a:d-h-s /b /s"
    Else
        dirOptions = """ /a:-h-s /b /s"
    End If
    Dim fOut() As String
    fOut = Split(CreateObject("WScript.Shell").exec("cmd /c dir """ & _
                                                    rootPath & _
                                                    dirOptions).StdOut.ReadAll, _
                 vbNewLine)
    QuickSort fOut, LBound(fOut), UBound(fOut)

    '--- the pathArray skips the first position from the fOut array
    '    because it's always blank, but add the root folder as the first entry
    Dim pathArray As Variant
    ReDim pathArray(1 To UBound(fOut) + 1, 1 To 1)
    pathArray(1, 1) = rootPath
    Dim i As Long
    For i = 2 To UBound(fOut) + 1
        pathArray(i, 1) = fOut(i - 1)
    Next i
    GetAllFiles = pathArray
End Function

Memory-based arrays are MUCH, MUCH faster than working directly off the worksheet with Cells or Ranges.

The next function (BuildFolderDictionary) works using the array of paths and works to build a list (a Dictionary) of unique folders within the folder hierarchy. Along the way, it also creates a "span" of the rows that the subfolder encompasses. This will be very useful later. Remember, we're doing all this in memory so it's quick.

Private Function BuildFolderDictionary(ByRef paths As Variant) As Object
    Dim folders As Object
    Set folders = CreateObject("Scripting.Dictionary")

    '--- scan all paths and create a dictionary of each folder and subfolder
    '    noting which items (rows) map into each dictionary
    Dim i As Long
    For i = LBound(paths) To UBound(paths)
        Dim folder As String
        Dim pos1 As Long
        If Not IsEmpty(paths(i, 1)) Then
            pos1 = InStrRev(paths(i, 1), "\")   'find the last folder separator
            folder = Left$(paths(i, 1), pos1)
            If Not folders.Exists(folder) Then
                '--- new (sub)folder, create a new entry
                folders.Add folder, CStr(i) & ":" & CStr(i)
            Else
                '--- extisting (sub)folder, add to the row range
                Dim rows As String
                rows = folders(folder)
                rows = Left$(rows, InStr(1, rows, ":"))
                rows = rows & CStr(i)
                folders(folder) = rows
            End If
        End If
    Next i

    '--- final fixup: the root folder group should always encompass all
    '    the entries (runs from the second row to the end)...
    '    and we'll also determine the indent level using the first entry
    '    as the baseline (level 1).  stored as "rows,level" e.g. "2:7,1"
    Dim rootSlashes As Long
    rootSlashes = Len(root) - Len(Replace(root, "\", "")) - 1
    folders(root) = "2:" & UBound(paths) & ",1"

    Dim slashes As Long
    folder = folders.Keys
    For i = 1 To UBound(folder)
        slashes = Len(folder(i)) - Len(Replace(folder(i), "\", ""))
        folders(folder(i)) = folders(folder(i)) & "," & _
                                     CStr(slashes - rootSlashes)
    Next i

    For Each folder In folders
        Debug.Print folder & " - " & folders(folder)
    Next folder

    Set BuildFolderDictionary = folders
End Function

The final two parts are to copy the memory array (of file paths) to the worksheet...

    Const START_ROW As Long = 6
    Dim pathRange As Range
    Set pathRange = Sheet1.Range("A" & START_ROW).Resize(UBound(pathArray) + 1, 1)
    pathRange = pathArray

and then apply both the indentation and grouping of the rows. We're using the dictionary of folder groups we created that has all of the subfolder rows nicely defined for us already...

    Const MAX_GROUP_LEVEL As Long = 8
    Dim rowGroup As Variant
    Dim level As Long
    Dim folderData As Variant
    Dim theseRows As String
    For Each rowGroup In folderGroups
        folderData = Split(folderGroups(rowGroup), ",")
        theseRows = folderData(0)
        level = folderData(1)
        With pathRange.rows(theseRows)
            .IndentLevel = level
            If level < MAX_GROUP_LEVEL Then
                .Group
            End If
        End With
    Next rowGroup

(I ran into an issue during testing when the program errored with a group level deeper than 8. So I put a limit in the logic to prevent the error.)

So now, the whole module in a single block:

Option Explicit

Public Sub ShowFilePaths()
    Dim rootFolder As String
    rootFolder = SelectFolder
    If rootFolder = vbNullString Then Exit Sub

    '--- quick fixup if needed
    rootFolder = rootFolder & IIf(Right$(rootFolder, 1) = "\", vbNullString, "\")

    Dim pathArray As Variant
    pathArray = GetAllFiles(rootFolder)

    Dim folderGroups As Object
    Set folderGroups = BuildFolderDictionary(rootFolder, pathArray)

    '--- when debugging, this block just clears the worksheet to make it
    '    easier to rerun and test the code
    On Error Resume Next
    With Sheet1
        .UsedRange.ClearOutline
        .UsedRange.Clear
        .Outline.SummaryRow = xlAbove
    End With
    Err.Clear
    On Error GoTo 0

    '--- copy the array to the worksheet
    Const START_ROW As Long = 6
    Dim pathRange As Range
    Set pathRange = Sheet1.Range("A" & START_ROW).Resize(UBound(pathArray), 1)
    pathRange = pathArray

    '------ now apply the indention levels to each line on the sheet
    '       and group the same rows
    Const MAX_GROUP_LEVEL As Long = 8
    Dim rowGroup As Variant
    Dim level As Long
    Dim folderData As Variant
    Dim theseRows As String
    For Each rowGroup In folderGroups
        folderData = Split(folderGroups(rowGroup), ",")
        theseRows = folderData(0)
        level = folderData(1)
        With pathRange.rows(theseRows)
            .IndentLevel = level
            If level < MAX_GROUP_LEVEL Then
                .Group
            End If
        End With
    Next rowGroup
End Sub

Private Function SelectFolder() As String
    '--- returns the user-selected folder as a string
    Dim objShell As Object
    Dim objFolder As Object
    Set objShell = CreateObject("Shell.Application")
    Set objFolder = objShell.BrowseForFolder(0, "Select Folder", 0, 17)
    If Not objFolder Is Nothing Then
        SelectFolder = objFolder.self.Path
    End If
End Function

Private Function GetAllFiles(ByVal rootPath As String, _
                             Optional onlyFolders As Boolean = False) As Variant
    '--- returns a sorted array of all filepaths in the given directory path
    Dim dirOptions As String
    If onlyFolders Then
        dirOptions = """ /a:d-h-s /b /s"
    Else
        dirOptions = """ /a:-h-s /b /s"
    End If
    Dim fOut() As String
    fOut = Split(CreateObject("WScript.Shell").exec("cmd /c dir """ & _
                                                    rootPath & _
                                                    dirOptions).StdOut.ReadAll, _
                 vbNewLine)
    QuickSort fOut, LBound(fOut), UBound(fOut)

    '--- the pathArray skips the first position from the fOut array
    '    because it's always blank, but add the root folder as the first entry
    Dim pathArray As Variant
    ReDim pathArray(1 To UBound(fOut) + 1, 1 To 1)
    pathArray(1, 1) = rootPath
    Dim i As Long
    For i = 2 To UBound(fOut) + 1
        pathArray(i, 1) = fOut(i - 1)
    Next i
    GetAllFiles = pathArray
End Function

Private Function BuildFolderDictionary(ByVal root As String, _
                                       ByRef paths As Variant) As Object
    Dim folders As Object
    Set folders = CreateObject("Scripting.Dictionary")

    '--- scan all paths and create a dictionary of each folder and subfolder
    '    noting which items (rows) map into each dictionary
    Dim folder As Variant
    Dim i As Long
    For i = LBound(paths) To UBound(paths)
        Dim pos1 As Long
        If Not IsEmpty(paths(i, 1)) Then
            pos1 = InStrRev(paths(i, 1), "\")   'find the last folder separator
            folder = Left$(paths(i, 1), pos1)
            If Not folders.Exists(folder) Then
                '--- new (sub)folder, create a new entry
                folders.Add folder, CStr(i) & ":" & CStr(i)
            Else
                '--- extisting (sub)folder, add to the row range
                Dim rows As String
                rows = folders(folder)
                rows = Left$(rows, InStr(1, rows, ":"))
                rows = rows & CStr(i)
                folders(folder) = rows
            End If
        End If
    Next i

    '--- final fixup: the root folder group should always encompass all
    '    the entries (runs from the second row to the end)...
    '    and we'll also determine the indent level using the first entry
    '    as the baseline (level 1).  stored as "rows,level" e.g. "2:7,1"
    Dim rootSlashes As Long
    rootSlashes = Len(root) - Len(Replace(root, "\", "")) - 1
    folders(root) = "2:" & UBound(paths) & ",1"

    Dim slashes As Long
    folder = folders.Keys
    For i = 1 To UBound(folder)
        slashes = Len(folder(i)) - Len(Replace(folder(i), "\", ""))
        folders(folder(i)) = folders(folder(i)) & "," & _
                                     CStr(slashes - rootSlashes)
    Next i

    For Each folder In folders
        Debug.Print folder & " - " & folders(folder)
    Next folder

    Set BuildFolderDictionary = folders
End Function

Private Sub QuickSort(ByRef Field() As String, ByVal LB As Long, ByVal UB As Long)
    '--- from https://stackoverflow.com/a/152333/4717755
    Dim P1 As Long, P2 As Long, Ref As String, TEMP As String

    P1 = LB
    P2 = UB
    Ref = Field((P1 + P2) / 2)

    Do
        Do While (Field(P1) < Ref)
            P1 = P1 + 1
        Loop

        Do While (Field(P2) > Ref)
            P2 = P2 - 1
        Loop

        If P1 <= P2 Then
            TEMP = Field(P1)
            Field(P1) = Field(P2)
            Field(P2) = TEMP

            P1 = P1 + 1
            P2 = P2 - 1
        End If
    Loop Until (P1 > P2)

    If LB < P2 Then Call QuickSort(Field, LB, P2)
    If P1 < UB Then Call QuickSort(Field, P1, UB)
End Sub
Sign up to request clarification or add additional context in comments.

17 Comments

PeterT, at "Private Function BuildFolderDictionary(ByRef paths As Variant) As Dictionary" I get a 'Compile error: User-defined type not defined'? I think I need to implement late-binding, because I don't want to have the end-user to have to enable the 'Microsoft Scripting Runtime' library.
That's fine. I've updated the answer to show the Dictionary implemented with late-binding.
You were right about the speed, PeterT! Fantastically fast. However, the last group in the outputted list is not including the folders/files contained within the last folder. The indent level looks good, but the contents of the last folder are grouped in with the first level.
I'm not sure if that's an artifact of your folder structure or in the code. I don't see that issue when I run it against a few different folder hierarchies on my drive. You can try commenting out the QuickSort call and see if that makes any difference, but beyond that I would need more information to help resolve the issue if it's in the code.
Actually, you must leave the QuickSort in the logic in order for the grouping to work out correctly. So I'm still not sure if there's an issue with the code.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.