I need to repair several huge buggy XML files. Because they are buggy, I cannot just do:
[xml]$xml = Get-Content .\data.xml
I want to parse them with captured groups. However, I don't know how to handle nested tags.
Here is a simple example to illustrate my problem.
$xml = '<tag><tag><tag>Anything</tag><tag>Something else</tag></tag><tag><tag>Another value</tag><tag>And another one...</tag></tag></tag>'
$Pattern = '<tag>(?<Content>.+?)</tag>'
([regex]::Matches($Xml, $Pattern)).Value
This piece of code returns:
<tag><tag><tag>Anything</tag>
<tag>Something else</tag>
<tag><tag>Another value</tag>
<tag>And another one...</tag>
How can I change my Regex pattern to get this?
<tag>Anything</tag>
<tag>Something else</tag>
<tag>Another value</tag>
<tag>And another one...</tag>
It seems that Regex recursion would fit my needs. However, I couldn't find someone explaining how it works with PowerShell (if ever...)
<tag>(?<Content>[^<]*)</tag><tag><font><tag>Anything</font></tag>in<tag><tag><font><tag>Anything</font></tag></tag>. Not sure what you need.