1

I am looking for a way to randomize a specific string in a huge file by using predefined strings from array, without having to write temporary file on disk.

There is a file which contains the same string, e.g. "ABC123456789" at many places:

<Id>ABC123456789</Id><tag1>some data</tag1><Id>ABC123456789</Id><Id>ABC123456789</Id><tag2>some data</tag2><Id>ABC123456789</Id><tag1>some data</tag1><tag3>some data</tag3><Id>ABC123456789</Id><Id>ABC123456789</Id>

I am trying to randomize that "ABC123456789" string using array, or list of defined strings, e.g. "@('foo','bar','baz','foo-1','bar-1')". Each ABC123456789 should be replaced by randomly picked string from the array/list.

I have ended up with following solution, which is working "fine". But it definitely is not the right approach, as it do many savings on disk - one for each replaced string and therefore is very slow:

$inputFile = Get-Content 'c:\temp\randomize.xml' -raw
$checkString = Get-Content -Path 'c:\temp\randomize.xml' -Raw | Select-String -Pattern '<Id>ABC123456789'
[regex]$pattern = "<Id>ABC123456789"

while($checkString -ne $null) {
    $pattern.replace($inputFile, "<Id>$(Get-Random -InputObject @('foo','bar','baz','foo-1','bar-1'))", 1) | Set-Content 'c:\temp\randomize.xml' -NoNewline
    $inputFile = Get-Content 'c:\temp\randomize.xml' -raw
    $checkString = Get-Content -Path 'c:\temp\randomize.xml' -Raw | Select-String -Pattern '<Id>ABC123456789'
}
Write-Host All finished

The output is randomized, e.g.:

<Id>foo
<Id>bar
<Id>foo
<Id>foo-1

However, I would like to achieve this kind of output without having to write file to disk in each step. For thousands of the string occurrences it takes a lot of time. Any idea how to do it?

========================= Edit 2023-02-16

I tried the solution from zett42 and it works fine with simple XML structure. In my case there is some complication which was not important in my text processing approach. Root and some other elements names in the structure of processed XML file contain colon and there must be some special setting for "-XPath" for this situation. Or, maybe the solution is outside of Powershell scope.

<?xml version='1.0' encoding='UTF-8'?>
<C23A:SC777a xmlns="urn:C23A:xsd:$SC777a" xmlns:C23A="urn:C23A:xsd:$SC777a" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:C23A:xsd:$SC777a SC777a.xsd">
    <C23A:FIToDDD xmlns="urn:iso:std:iso:20022:tech:xsd:pacs.008.001.02">
        <CxAAA>
            <DxBBB>
                <ABC>
                    <Id>ZZZZZZ999999</Id>
                </ABC>
            </DxBBB>
            <CxxCCC>
                <ABC>
                    <Id>ABC123456789</Id>
                </ABC>
            </CxxCCC>
        </CxAAA>
        <CxAAA>
            <DxBBB>
                <ABC>
                    <Id>ZZZZZZ999999</Id>
                </ABC>
            </DxBBB>
            <CxxCCC>
                <ABC>
                    <Id>ABC123456789</Id>
                </ABC>
            </CxxCCC>
        </CxAAA>
    </C23A:FIToDDD>
    <C23A:PmtRtr xmlns="urn:iso:std:iso:20022:tech:xsd:pacs.004.001.02">
        <GrpHdr>
            <TtREEE Abc="XV">123.45</TtREEE>
            <SttlmInf>
                <STTm>ABCA</STTm>
                <CLss>
                    <PRta>SIII</PRta>
                </CLss>
            </SttlmInf>
        </GrpHdr>
        <TxInf>
            <OrgnlTxRef>
                <DxBBB>
                    <ABC>
                        <Id>YYYYYY888888</Id>
                    </ABC>
                </DxBBB>
                <CxxCCC>
                    <ABC>
                        <Id>ABC123456789</Id>
                    </ABC>
                </CxxCCC>
            </OrgnlTxRef>
        </TxInf>
    </C23A:PmtRtr>
</C23A:SC777a>
5
  • Are you trying to perform data masking for the XML file? Commented Feb 15, 2023 at 9:11
  • It is not about data masking. I have set of data in xml file which has repetitive part and I need to make it less "homogeneous" for testing purpose while I have to use provided set of strings to achieve it. Commented Feb 15, 2023 at 9:26
  • 1
    Peeking and poking directly into a serialized string (e.g. XML) using string methods (like -Replace) is a bad idea. Instead you should use the related parser for searching and replacing. See e.g.: Powershell regex for replacing text between two strings Commented Feb 15, 2023 at 9:59
  • Regarding your edit, use Select-Xml with the -Namespace parameter like this: Select-Xml -XPath '//a:Id/text()' -Namespace @{a = 'urn:iso:std:iso:20022:tech:xsd:pacs.008.001.02'} Commented Feb 17, 2023 at 11:08
  • 1
    @zett42 I already did it when you mentioned this in your answer. I also tested code on different sets of XML and after modifying "Namespace" it worked really well. Thanks once again. Commented Feb 20, 2023 at 8:01

1 Answer 1

1

As commented, it is not recommended to process XML like a text file. This is a brittle approach that depends too much on the formatting of the XML. Instead, use a proper XML parser to load the XML and then process its elements in an object-oriented way.

# Use XmlDocument (alias [xml]) to load the XML
$xml = [xml]::new(); $xml.Load(( Convert-Path -LiteralPath input.xml ))

# Define the ID replacements
$searchString = 'ABC123456789'
$replacements = 'foo','bar','baz','foo-1','bar-1'

# Process the text of all ID elements that match the search string, regardless how deeply nested they are.
$xml | Select-Xml -XPath '//Id/text()' | ForEach-Object Node |
       Where-Object Value -eq $searchString | ForEach-Object {

    # Replace the text of the current element by a randomly choosen string
    $_.Value = Get-Random $replacements 
}

# Save the modified document to a file
$xml.Save( (New-Item output.xml -Force).Fullname )
  • $xml | Select-Xml -XPath '//Id/text()' selects the text nodes of all Id elements, regardless how deeply nested they are in the XML DOM, using the versatile Select-Xml command. The XML nodes are selected by specifying an XPath expression.
    • Regarding your edit, when you have to deal with XML namespaces, use the parameter -Namespace to specify a namespace prefix to use in the XPath expression for the given namespace URI. In this example I've simply choosen a as the namespace prefix:
      $xml | Select-Xml -XPath '//a:Id/text()' -Namespace @{a = 'urn:iso:std:iso:20022:tech:xsd:pacs.008.001.02'}
      
  • ForEach-Object Node selects the Node property from each result of Select-Xml. This simplifies the following code.
  • Where-Object Value -eq $searchString selects the text nodes that match the search string.
  • Within ForEach-Object, the variable $_ stands for the current text node. Assign to its Value property to change the text.
  • The Convert-Path and New-Item calls make it possible to use a relative PowerShell path (PSPath) with the .NET XmlDocument class. In general .NET APIs don't know anything about the current directory of PowerShell, so we have to convert the paths before passing to .NET API.
Sign up to request clarification or add additional context in comments.

1 Comment

Finally I found your solution to be OK for my purpose as I can separate relevant part of XML code, use this replacement and then insert that part back into final XML file, between opening and closing part. And this XML parser way is really much quicker! Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.