Regex for string enclosed in <*>, C#

Question

I am trying to get all strings enclosed in <*> by using following Regex:

Regex regex = new Regex(@"\<(?<name>\S+)\>", RegexOptions.IgnoreCase);
string name = e.Match.Groups["name"].Value;

But in some cases where I have text like :

<Vendors><Vtitle/>  <VSurname/></Vendors>

It's returning two strings instead of four, i.e. above Regex outputs

<Vendors><Vtitle/> //as one string and 
<VSurname/></Vendors> //as second string

Where as I am expecting four strings:

<Vendors>
<Vtitle/>
<VSurname/>
</Vendors>

Could you please guide me what change I need to make to my Regex.

I tried adding '\b' to specify word boundry

new Regex(@"\b\<(?<name>\S+)\>\b", RegexOptions.IgnoreCase);

, but that didn't help.

Agreed with Marc; use an XML parser. Unless you want to build one. — Fragsworth
– Fragsworth, Commented Dec 14, 2009 at 16:58
Are you parsing an XML document or do you have angle bracket tags inside a mostly plain text document? XML parsers are particular about having well formatted XML documents. They wouldn't work for finding a few angle bracket tags sprinkled throughout a text document. — CoderDennis
– CoderDennis, Commented Dec 15, 2009 at 17:41
OK, I just saw OP's comment on Andrew's answer. These tags happen to look like XML, but this isn't about parsing XML. This is about finding angle bracket delimited text within a mostly plain text document. — CoderDennis
– CoderDennis, Commented Dec 15, 2009 at 17:52
Here is the best ever answer on your question. It have 2302 votes up. stackoverflow.com/questions/1732348/… — Vasyl Boroviak
– Vasyl Boroviak, Commented Dec 15, 2009 at 21:35

Community · Accepted Answer · 2017-05-23 12:10:51Z

10

You'll get most of what what you want by using the regex /<([^>]*)>/. (No need to escape the angle brackets' as angle brackets aren't special characters in most regex engines, including the .NET engine.) The regex I provided will also capture trailing whitespace and any attributes on the tag--parsing those things reliably is way, way beyond the scope of a reasonable regex.

However, be aware that if you're trying to parse XML/HTML with a regex, that way lies madness

edited May 23, 2017 at 12:10

CommunityBot

11 silver badge

answered Dec 14, 2009 at 16:58

JSBձոգչ

41.6k19 gold badges106 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Fragsworth Over a year ago

By answering this question, however, the OP might use this regex (and more regexes) instead of the better methods. Then 2-3 years down the road someone's going to have to maintain it.

Jay Bazuzi · Accepted Answer · 2009-12-15 17:49:45Z

6

Regexes are the wrong tool for parsing XML. Try using the System.Xml.Linq (XElement) API.

edited Dec 15, 2009 at 17:49

Jay Bazuzi

46.9k17 gold badges117 silver badges173 bronze badges

answered Dec 14, 2009 at 16:58

Manu

29.2k28 gold badges79 silver badges84 bronze badges

1 Comment

Cheeso Over a year ago

See Dennis Palmer's comment on the original question. This isn't XML.

Cheeso · Accepted Answer · 2009-12-15 21:31:02Z

4

Your regex is using \S+ as the wildcard. In english, this is "a series of one or more characters, none of which is non-whitespace". In other words, when the regex <(?<name>\S+)> is applied to this string: '`, the regex will match the entire string. angle brackets are non-whitespace.

I think what you want is "a series of one or more characters, none of which is an angle bracket".

The regex for that is <(?<name>[^>]+)> .

Ahhh, regular expressions. The language designed to look like cartoon swearing.

answered Dec 15, 2009 at 21:31

Cheeso

193k106 gold badges486 silver badges734 bronze badges

1 Comment

kenny Over a year ago

+2 if I could for cartoon swearing.

Collectives™ on Stack Overflow

Regex for string enclosed in <*>, C#

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related