0

I have a block of text I need to parse (saved in a variable) but I'm unsure how to go about it. This block of text, saved in a variable we can call $block for simplicity's sake, includes all the whitespace shown below.

I would like the result to be an iterable list, the first value being Health_AEPOEP_Membership_Summary - Dev and the second one being Health_AEPOEP_YoY_Comparison_Summary - Dev. Assume this list of workbooks can be longer (up to 50) or shorter (minimum 1 workbook), and all workbooks are formatted similarly (in terms of name_with_underscores - Dev. I'd try the $block.split(" ") method, but this method gives many spaces which may be hard to enumerate and account for.


                    Workbooks : Health_AEPOEP_Membership_Summary - Dev [Project: Health - Dev]
                                Health_AEPOEP_YoY_Comparison_Summary - Dev [Project: Health - Dev]



Any help is much appreciated!

1

2 Answers 2

1

You could write a multi-line regex pattern and try to extract the names, but it might be easier to reason about if you just breaking it into simple(r) steps:

$string = @'

                    Workbooks : Health_AEPOEP_Membership_Summary - Dev [Project: Health - Dev]
                                Health_AEPOEP_YoY_Comparison_Summary - Dev [Project: Health - Dev]



'@

# Split into one string per line
$strings = $string -split '\r?\n'

# Remove leading whitespace
$strings = $strings -replace '^\s*' 

# Remove `Workbooks : ` prefix (strings that don't match will be left untouched)
$strings = $strings -replace '^Workbooks :\s*' 

# Remove `[Project $NAME]` suffix
$strings = $strings -replace '\s*\[Project: [^\]]+\]'

# Get rid of empty lines
$strings = $strings |Where-Object Length

$strings now contains the two project names

Sign up to request clarification or add additional context in comments.

3 Comments

And I've heard SO should not be a free code writing service. ;-)
when i run this, the [Project...] text remains. Why is that?
@AmeeraKhan There was a typo in the regex pattern (I forgot to escape the [), I've updated it now
1

If the text is in a file it would make this a little easier, and I would recommend this approach

switch -Regex -file ($file){
    '(\w+_.+- Dev)' {$matches.1}
}

Regex details

() - capture group

\w+ - match one or more letter characters

_ - match literal underscore

.+ - match one or more of any character

- Dev - literal match of dash space Dev

If it's already in a variable, it would depend if it's a string array or a single string. Assuming it's a single string, I'd recommend this approach

$regex = [regex]'(\w+_.+)(?=(\s\[.+))'

$regex.Matches($block).value

Health_AEPOEP_Membership_Summary - Dev
Health_AEPOEP_YoY_Comparison_Summary - Dev

Regex details

Same as above but added the following

(?=) - Look ahead

\s\[.+ - match a space, a left square bracket, one or more characters

Simply add a variable assignment $strings = before either of these to capture the output. Either would work on one or 500 workbooks.

1 Comment

Hmm... when I run the second option, it seems like the [Project...] portion remains. When I run the first suggestion, only the ] character is removed, not everything else within brackets. Why would that be? EDIT: the first solution works when I add a space after the - Dev in the regex expression.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.