I'm using a Powershell script to automate the replacement of some troublesome characters from an xml file such as & ' - £
The script I have works well for these characters, but I also want to remove the double quote character " but only if it is used within an xml attribute (which unfortunately is enclosed by double quotes) so I obviously cannot remove all double quotes from the xml file as this will stop the attributes from working as they should.
My Powershell script is below:
(Get-Content C:\test\communication.xml) |
Foreach-Object {$_ -replace "&", "+" -replace "£", "GBP" -replace "'", "" -replace "–", " "} |
Set-Content C:\test\communication.xml
What I'd like to be able to so is to remove ONLY the double quotes that make up part the XML attributes that are themselves enclosed by a pair of double quotes as below. I know that Powershell looks at each line as a separate object so suspect this should be quite easy, possibly by using conditions?
An example XML file is below:
<?xml version="1.0" encoding="UTF-8"?>
<Portal>
<communication updates="Text data with no double quotes in the attribute" />
<communication updates="Text data that "includes" double quotes within the double quotes for the attribute" />
</Portal>
In the above example I'd like to remove only the double quotes that immediately surround the word includes BUT not the double quotes that are to the left of the word Text or to the right of the word attribute. The words used for the XML attributes will change on a regular basis but the left double quote will always be to the immediate right of the = symbol and the right double quote will always be to the left of a space forward slash combination / Thanks
-replaceoperations ($_ -replace "&","+" -replace "£","GBP" ...). A separate loop for each replacement is not required.<communication updates="Text data the "includes" double quotes within the double quotes." />