I have a Powershell script which reads a 4000 KB text file (88,500 lines approx) This is the first time I have had my code do this much work. The script below took over 2 minutes to run and consumed around 20% CPU (see Task Manager screenshot below)
Can I improve performance using different code choices?
# extractUniqueBaseNames.ps1 --- copy first UPPERCASE word in each line of text, remove duplicates & store
$listing = 'C:\roll-conversion dump\LINZ Place Street Index\StreetIndexOutput.txt'
[array]$tempStorage = $null
[array]$Storage = $null
# select only CAPITALISED first string (at least two chars or longer) from listings
Select-String -Pattern '(\b[A-Z]{2,}\b[$\s])' -Path $listing -CaseSensitive |
ForEach-Object {$newStringValue = $_.Matches.Value -replace '$\s', '\n'
$tempStorage += $newStringValue
}
$Storage += $tempStorage | Select-Object -Unique
I have also added the following line to output results to a new text file (this was not included for the previous Task Manager reading):
$Storage | Out-File -Append atest.txt
Since I am at an early stage of my development I would appreciate any suggestions that would improve the performance of this kind of Powershell script.
$Storage += $tempStorage$tempStoragecontains matches of the regex. These UPPERCASE 'words' are the base names of streets. That is just the identifying name. Because towns have some street names in common (eg Main Street) they get duplicated.$Storagestores the data piped usingSelect-Object -Uniqueto remove the duplicates. So that this is only unique street base names.