1

Trying to extract some strings from a file. Here's a simplified example of the text in the file:

<modelName>thing1</modelName><gtin>123456789</gtin><description>blah blah blah</description>
<modelName>thing2</modelName><gtin>789456123</gtin><description>blah blah blah</description>
<modelName>thing3</modelName><gtin>456789123</gtin><description>blah blah blah</description>

I want to extract just this part of each line: <gtin>xxxxxxx</gtin> and put them into another file.

I do not want the whole line, just the gtin.

Here's what I tried:

Get-Content -Path C:\firstFile.xml -Readcount 1000 | foreach { $_ -match "<gtin1>*</gtin1>" } | out-file C:\gtins.txt

But as you can probably guess it's not working.

Any help is greatly appreciated. I have a feeling this is embarrassingly easy.

Thanks!

1

2 Answers 2

2

(Edit: Ansgar Wiechers is right that you shouldn't parse XML using a regular expression, and that proper XML parsing is vastly to be preferred.)

You can extract substrings using Select-String and a regular expression. Example:

Get-Content "C:\firstfile.xml" | Select-String '(<gtin>.+</gtin>)' | ForEach-Object {
  $_.Matches[0].Groups[1].Value
}

If you want just the value between the tags, move the ( and ) to surround only the .+ portion of the expression.

More information about regular expressions:

PS C:\> help about_Regular_Expressions
Sign up to request clarification or add additional context in comments.

Comments

0

Do not parse XML with regular expressions.

Use an actual XML parser for extracting data from XML files.

[xml]$xml = Get-Content 'C:\firstfile.xml'
$xml.SelectNodes('//gtin') | Select-Object -Expand '#text'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.