1

Given this PCRE pattern:

/(<name>[^<>]*<\/name>[^<>]*<phone>[^<>]*<\/phone>)/

And this subject text:

<name>John Stevens</name>  <phone>888-555-1212</phone>
<name>Peter Wilson</name>  
<phone>888-555-2424</phone>

How can I get the Regular Expression to match the first name-phone pair but not the second? I don't want to match pairs that are separated by line breaks. I tried including an end-of-line in the negated character class like so [^<>$]* but nothing changed.

You can use the following online tools to test your expressions:
http://rubular.com/
http://www.regextester.com/
Thank you.

1
  • 1
    Inside a character class, the $ loses its special meaning and becomes simply a literal dollar sign. What you want is: [^<>\r\n] as sawa suggests. Commented Apr 24, 2011 at 4:14

3 Answers 3

4

I think this will do it

/<name>[^<>]*<\/name>[^<>\r\n]*<phone>[^<>]*<\/phone>/

Whatever you put in the class [ ] must be something that represents a single character. $ is interpreted as literal $ within a class, probably because $ as line end is 0-width, and could not be interpreted as such within a class. (Edited after comment by ridgerunner)

By the way, I took off the parentheses that surrounds your regex because whatever matches it can be referred to as the whole match.

Sign up to request clarification or add additional context in comments.

3 Comments

+1 (but the $ does have an effect inside a char class - it matches a dollar sign.)
@ridgerunner Thanks for pointing out. I will correct my answer.
I also added \r as pointed out by ridgerunner. I only had unix in mind.
1

If you don't want to match pairs separated by line breaks then following regex will do the job:

/(<name>[^<>]*<\/name>.*?<phone>[^<>]*<\/phone>)/

Matches only first name, phone pair since dot . will not match EOL but [^<>] will match it.

Tested it on http://rubular.com/r/amXvq20sl8

3 Comments

Thank you. But I also needed to exclude <> to prevent capturing other tags.
It wouldn't really hurt to make it [^<>]* above, however I think once we are already inside <name> then to capture everything up to </name>' we just need [<]*`
Right, and I like that change. What I omitted from the subject text is that there could be other tags between name and phone that I don't want to capture if they're there. ie <name>Mark</name><name>Bill</name><phone>888...</phone>. The .* would capture both names on that same line. I know I could make it lazy instead of greedy, but that could negatively affect other parts of my pattern. I think the \r\n as stated above will work for me. With the addition of your change: [^<\r\n].
0

Those sites don't seem to support the whole PCRE syntax. I used this site: http://lumadis.be/regex/test_regex.php

And this worked:

/^(<name>[^<>]*<\/name>[^<>$]*<phone>[^<>]*<\/phone>)/

/(?-s)(<name>[^<>]*<\/name>.*<phone>[^<>]*<\/phone>)/

is probably better

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.