How to exclude a line break from regex character class?

Question

Given this PCRE pattern:

/(<name>[^<>]*<\/name>[^<>]*<phone>[^<>]*<\/phone>)/

And this subject text:

<name>John Stevens</name>  <phone>888-555-1212</phone>
<name>Peter Wilson</name>  
<phone>888-555-2424</phone>

How can I get the Regular Expression to match the first name-phone pair but not the second? I don't want to match pairs that are separated by line breaks. I tried including an end-of-line in the negated character class like so [^<>$]* but nothing changed.

You can use the following online tools to test your expressions:
http://rubular.com/
http://www.regextester.com/
Thank you.

Inside a character class, the $ loses its special meaning and becomes simply a literal dollar sign. What you want is: [^<>\r\n] as sawa suggests. — ridgerunner
– ridgerunner, Commented Apr 24, 2011 at 4:14

sawa · Accepted Answer · 2011-04-24 04:29:42Z

4

I think this will do it

/<name>[^<>]*<\/name>[^<>\r\n]*<phone>[^<>]*<\/phone>/

Whatever you put in the class [ ] must be something that represents a single character. $ is interpreted as literal $ within a class, probably because $ as line end is 0-width, and could not be interpreted as such within a class. (Edited after comment by ridgerunner)

By the way, I took off the parentheses that surrounds your regex because whatever matches it can be referred to as the whole match.

edited Apr 24, 2011 at 4:29

answered Apr 24, 2011 at 3:58

sawa

169k51 gold badges287 silver badges398 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ridgerunner Over a year ago

+1 (but the $ does have an effect inside a char class - it matches a dollar sign.)

sawa Over a year ago

@ridgerunner Thanks for pointing out. I will correct my answer.

sawa Over a year ago

I also added \r as pointed out by ridgerunner. I only had unix in mind.

anubhava · Accepted Answer · 2011-04-24 14:58:38Z

1

If you don't want to match pairs separated by line breaks then following regex will do the job:

/(<name>[^<>]*<\/name>.*?<phone>[^<>]*<\/phone>)/

Matches only first name, phone pair since dot . will not match EOL but [^<>] will match it.

Tested it on http://rubular.com/r/amXvq20sl8

edited Apr 24, 2011 at 14:58

answered Apr 24, 2011 at 4:19

anubhava

790k67 gold badges603 silver badges671 bronze badges

3 Comments

Andrew Over a year ago

Thank you. But I also needed to exclude <> to prevent capturing other tags.

anubhava Over a year ago

It wouldn't really hurt to make it [^<>]* above, however I think once we are already inside <name> then to capture everything up to </name>' we just need [<]*`

Andrew Over a year ago

Right, and I like that change. What I omitted from the subject text is that there could be other tags between name and phone that I don't want to capture if they're there. ie <name>Mark</name><name>Bill</name><phone>888...</phone>. The .* would capture both names on that same line. I know I could make it lazy instead of greedy, but that could negatively affect other parts of my pattern. I think the \r\n as stated above will work for me. With the addition of your change: [^<\r\n].

Christo · Accepted Answer · 2011-04-24 04:27:30Z

0

Those sites don't seem to support the whole PCRE syntax. I used this site: http://lumadis.be/regex/test_regex.php

And this worked:

/^(<name>[^<>]*<\/name>[^<>$]*<phone>[^<>]*<\/phone>)/

/(?-s)(<name>[^<>]*<\/name>.*<phone>[^<>]*<\/phone>)/

is probably better

answered Apr 24, 2011 at 4:27

Christo

9,1872 gold badges24 silver badges16 bronze badges

Collectives™ on Stack Overflow

How to exclude a line break from regex character class?

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related