1

I have a file that has multiple instances of the following:

<password encrypted="True">271NFANCMnd8BFdERjHoAwEA7BTuX</password>

But for each instance the password is different.

I would like the output to delete the encyrpted password:

<password encrypted="True"></password>

What is the best method using PowerShell to loop through all instances of the pattern within the file and output to a new file?

Something like:

gc file1.txt | (regex here) > new_file.txt

where (regex here) is something like:

s/"True">.*<\/pass//

1 Answer 1

5

This one is fairly easy in regex, and you can do it that way, or you can parse it as actual XML, which may be more appropriate. I'll demonstrate both ways. In each case, we'll start with this common bit:

$raw = @"
<xml>
    <something>
        <password encrypted="True">hudhisd8sd9866786863rt</password>
    </something>
    <another>
        <thing>
            <password encrypted="True">nhhs77378hd8y3y8y282yr892</password>
        </thing>
    </another>
    <test>
        <password encrypted="False">plain password here</password>
    </test>
</xml>
"@

Regex

$raw -ireplace '(<password encrypted="True">)[^<]+(</password>)', '$1$2'

or:

$raw -ireplace '(?<=<password encrypted="True">).+?(?=</password>)', ''

XML

$xml = [xml]$raw

foreach($password in $xml.SelectNodes('//password')) {
    $password.InnerText = ''
}

Only replace the encrypted passwords:

$xml = [xml]$raw

foreach($password in $xml.SelectNodes('//password[@encrypted="True"]')) {
    $password.InnerText = ''
}

Explanations

Regex 1

(<password encrypted="True">)[^<]+(</password>)

Regular expression visualization

Debuggex Demo

The first regex method uses 2 capture groups to capture the opening and closing tags, and replaces the entire match with those tags (so the middle is omitted).

Regex 2

(?<=<password encrypted="True">).+?(?=</password>)

Regular expression visualization

Debuggex Demo

The second regex method uses positive lookaheads and lookbehinds. It finds 1 or more characters which are preceded by the opening tag and followed by the closing tag. Since lookarounds are zero-width, they are not part of the match, therefore they don't get replaced.

XML

Here we're using a simple xpath query to find all of the password nodes. We iterate through each one with a foreach loop and set its innerText to an empty string.

The second version checks that the encrypted attribute is set to True and only operates on those.

Which to Choose

I personally think that the XML method is more appropriate, because it means you don't have to account for variations in XML syntax so much. You can also more easily account for different attributes specified on the nodes or different attribute values.

By using xpath you have a lot more flexibility than with regex for processing XML.

File operations

I noticed your sample to read the data used gc (short for Get-Content). Be aware that this reads the file line-by-line.

You can use this to get your raw content in one string, for conversion to XML or processing by regex:

$raw = Get-Content file1.txt -Raw

You can write it out pretty easily too:

$raw | Out-File file1.txt
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the detailed answer. Is the correct usage in this case: $raw = Get-Content file1.xml -Raw $xml = [xml]$raw foreach($password in $xml.SelectNodes('//password')) { $password.InnerText = '' }
I used the second REGEX command you provided and it worked perfectly. Thanks again for the excellent and detailed response. A1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.