0

I am a regex/powershell beginner and struggling to get this working. I am working with some HTML data and I need to be able to extract the string between given characters. In the below case, I need to extract the string (if it matches my search string) which is found between the characters > and <. I have provided multple examples here and I hope I made my question clear. Any help is greatly appreciated.

For example -

$string1 = '<P><STRONG><SPAN style="COLOR: rgb(255,0,0)">ILOM 2.6.1.6.a <BR>BIOS vers. 0CDAN860 <BR>LSI MPT SAS firmware MPT BIOS 1.6.00</SPAN></STRONG></P></DIV></TD>'

$string2 = '<P><A id=T5220 name=T5220></A><A href="http://mywebserver/index.html">Enterprise T5120 Server</A> <BR><A href="http://mywebserver/index.html">Enterprise T5220 Server</A></P></DIV></TD>'


$searchstring = "ILOM"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string1 -match $regex
$matches[x] = ILOM 2.6.1.6.a  # expected result    

Similarly -

$searchstring = "BIOS"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string1 -match $regex
$matches[x] = BIOS vers. 0CDAN860  # expected result

$searchstring = "T5120"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string2 -match $regex
$matches[x] = Enterprise T5120 Server   # expected result

$searchstring = "T5220"
$regex = ".+>(.*$searchstring.+)<" # Tried this
$string2 -match $regex
$matches[x] = Enterprise T5220 Server  # expected result
3
  • 1
    No Regex for HTML! Use a HTML parser. Commented Mar 9, 2015 at 17:13
  • What is $matches[x] currently returning? Commented Mar 9, 2015 at 17:23
  • $matches[1] currently matches ILOM 2.6.1.6.a <BR>BIOS vers. 0CDAN860 <BR>LSI MPT SAS firmware MPT BIOS 1.6.00</SPAN></STRONG></P></DIV> for my 1st example Commented Mar 9, 2015 at 17:27

1 Answer 1

1

You need to add the lazy ? operator(? qualifier?) on the "wildcard" after your searchstring to make it stop at the first occurence of <.

.*< = Any character as many as possible until an <

.*?< = Any character until first <

I would use the lazy operator on the "wildcard" before your searchstring too just to be safe even though it isn't necessary in this particular situation.

The minimum required modification:

".+>(.*$searchstring.+?)<"

I would recommend:

".+>(.*?$searchstring.+?)<"

Sample:

$string1 = '<P><STRONG><SPAN style="COLOR: rgb(255,0,0)">ILOM 2.6.1.6.a <BR>BIOS vers. 0CDAN860 <BR>LSI MPT SAS firmware MPT BIOS 1.6.00</SPAN></STRONG></P></DIV></TD>'

$string2 = '<P><A id=T5220 name=T5220></A><A href="http://mywebserver/index.html">Enterprise T5120 Server</A> <BR><A href="http://mywebserver/index.html">Enterprise T5220 Server</A></P></DIV></TD>'


$searchstring = "ILOM"
$regex = ".+>(.*?$searchstring.+?)<"
if($string1 -match $regex) { $matches[1] }

#Custom regex
$searchstring = "BIOS"
$regex = ".+>($searchstring.+?)<"
if($string1 -match $regex) { $matches[1] }

#Or the original regex with different search string
$searchstring = "BIOS vers"
$regex = ".+>(.*?$searchstring.+?)<"
if($string1 -match $regex) { $matches[1] }

$searchstring = "T5120"
$regex = ".+>(.*?$searchstring.+?)<"
if($string2 -match $regex) { $matches[1] }

$searchstring = "T5220"
$regex = ".+>(.*?$searchstring.+?)<"
if($string2 -match $regex) { $matches[1] }

Output:

ILOM 2.6.1.6.a 
BIOS vers. 0CDAN860 
BIOS vers. 0CDAN860 
Enterprise T5120 Server
Enterprise T5220 Server
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Frode F. However it doesn't work with my second example. I was wanting it to match with "BIOS vers. 0CDAN860".
That requires a different regex or searchstring. updated with 2 BIOS examples now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.