1

I have this PowerShell script to parse many text files at once (about 1 MB) that look like configuration files:

Script:

$counter = ($false,0,0)
$objcounter = 0
$global:files = [ordered]@{}
$txt = [System.IO.File]::ReadAllLines($opath)

foreach($line in $txt){
    if ($counter[2] -eq "spline"){if ($counter[1] -eq 1){$counter[1]++}else{$key=$global:files.Keys;if (-not($global:files.Contains($line))){$global:files+=[ordered]@{$line=@{path=$line;type="spline"}}};$counter = ($false,0,0)}}
    elseif ($counter[2] -eq "object"){if ($counter[1] -eq 1){$counter[1]++}else{$key=$global:files.Keys;if (-not($global:files.Contains($line))){$global:files+=[ordered]@{$line=@{path=$line;type="sceneryobject"}}};$counter = ($false,0,0)}}
    elseif ($counter[2] -eq "splineh"){if ($counter[1] -eq 1){$counter[1]++}else{$key=$global:files.Keys;if (-not($global:files.Contains($line))){$global:files+=[ordered]@{$line=@{path=$line;type="splineh"}}};$counter = ($false,0,0)}}
    elseif ($counter[2] -eq "attachedobject"){if ($counter[1] -eq 1){$counter[1]++}else{$key=$global:files.Keys;if (-not($global:files.Contains($line))){$global:files+=[ordered]@{$line=@{path=$line;type="attachedobject"}}};$counter = ($false,0,0)}}
    elseif ($counter[2] -eq "splineattachement"){if ($counter[1] -eq 1){$counter[1]++}else{$key=$global:files.Keys;if (-not($global:files.Contains($line))){$global:files+=[ordered]@{$line=@{path=$line;type="splineattachement"}}};$counter = ($false,0,0)}}

    if ($line -eq "[spline]"){
        $counter = @($true,1,"spline");$objcounter++} 
    if ($line -eq "[splineh]"){
        $counter = @($true,1,"object");$objcounter++}
    if ($line -eq "[object]"){
        $counter = @($true,1,"object");$objcounter++}
    if ($line -eq "[attachObj]"){
        $counter = @($true,1,"attachedobject");$objcounter++}
    if ($line -eq "[splineAttachement]"){
        $counter = @($true,1,"splineattachement");$objcounter++}
}

(I know that it isn't well-structured.)

File:

[spline]
0
apath\path\file3.ext
8947
8946
8992
0.0584106412565594
0.250000081976033
195.973568100565
90.0000020235813
39.99999937227
0
0
0
0
0
0
0
180.853118555128


[spline_h]
0
apath\path\file2.ext
8949
8948
9022
0.0565795901830857
0.250000202235118
202.972286028874
90.0000020235813
39.99999937227
0
0
0
0
0
0
0
183.907441598005
mirror

[spline]
0
apath\path\file.ext
8951
0
9019
0.0585327145350332
0.0999999434550936
201.971026072961
90.0000020235813
39.99999937227
0
0
0
0
0
0
0
183.47110728047
mirror

(and so on…)

The script works fine, but it takes very long to parse the files and after a while I get “no response” and the app crashes.

This is the output that I need:

$global:files = [ordered]@{path=@{path="path";type="type"}}

Where 'path' is the file path, like: apath\path\file.ext and 'type' is the mesh type, like: spline or spline_h.

What can I change to make the parsing faster?

4
  • 2
    Could you please add an example of what your desired output needs to be? For me, this is not clear.. Commented Nov 29, 2021 at 11:06
  • @Theo Sure. The needed output is an [ordered]. Commented Nov 29, 2021 at 11:56
  • Could you explain what the script is doing and what are it's conditions, at this point we might be able to recreate it in a more efficient way but the if conditions seem awfully over-complicated. I understand it's a hashtable with the paths of each keyword between [..] as Key and the values are the path and the keyword. Commented Nov 29, 2021 at 13:24
  • @Theo I'm sure you'll be able to improve over my answer by far especially on the regex part though there are certain points OP would need to clarify before Commented Nov 29, 2021 at 14:30

1 Answer 1

1

Here is an example of how it could be improved using regex mainly and some string manipulation. Note that I'm nowhere near good with it and I'm quite sure it could be improved greatly but as is, it's working for me.

It was not clear for me what should happen whenever there are two or more types ([keyword]) with the same path (path being the hashtable key). Right now the code is assuming there will not be duplicated paths on the file.

For the regex explanation see: https://regex101.com/r/aN4WNR/1
NOTE: This only works because the paths end with .ext, if that was not the case, you should clarify that too.

The regex is expecting a multi-line string to work properly, hence you would need to use either one of these (which will also improve the efficiency of the script).

  • Get-Content -Raw
  • [System.IO.File]::ReadAllText(...)
$txt = Get-Content -Raw ./test.txt
$re = [regex]::Matches($txt, '(?ms)\b(?<=(\[)).*?\.ext\b')
$result = [ordered]@{}

foreach($r in $re)
{
    $parse = $r.Value -split '\r?\n'
    $type = $parse[0].Replace(']','')
    $path = $parse[-1]

    $result.Add(
        $path,
        [ordered]@{
            path = $path
            type = $type
        }
    )
}

Result:

PS /> $result

Name                           Value
----                           -----
apath\path\file3.ext           {path, type}
apath\path\file2.ext           {path, type}
apath\path\file.ext            {path, type}

PS /> $result['apath\path\file2.ext']

Name                           Value
----                           -----
path                           apath\path\file2.ext
type                           spline_h
Sign up to request clarification or add additional context in comments.

2 Comments

@jjb well then, if your script performs so fast why ask the question to begin with?
The problem is that I have about 50 or more files that need to be parsed, and for each file, 2 minutes is a very long time. EDIT: I have checked the speed a second time and your script needs 30 seconds longer than mine. My last comment was false because I've checked the script with Get-Content -Raw and that gave me no output...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.