0

I'm trying to get the text between the heading tag using the following php script:

$search_string= < h1 >testing here< /h1 >;

$text = preg_match('<%TAG%[^>]*>(.*?)</%TAG%>',$search_string, $matches);

echo $matches[0]; 

When i try to run this script there is no value being returned. Instead there is warning message: Warning: preg_match() [function.preg-match]: Unknown modifier '(' in C:\xampp\htdocs\check_for_files.php on line 10

Can anyone help with this please?

2
  • See [ RegEx match open tags except XHTML self-contained tags ](stackoverflow.com/questions/1732348/…). Commented Oct 14, 2010 at 3:32
  • True, you'll want to use a real tag name (e.g. 'h1') in your expression, and quoting your $search_string will also help. Commented Oct 14, 2010 at 3:36

3 Answers 3

2

Your expression needs delimiters. / is the most common, but # should work for this situation.

$text = preg_match('#<%TAG%[^>]*>(.*?)</%TAG%>#',$search_string, $matches);
Sign up to request clarification or add additional context in comments.

Comments

2

The warning is because you've not enclosed your regex in delimiters. So try

$text = preg_match('#<%TAG%[^>]*>(.*?)</%TAG%>#',$search_string, $matches);

Understanding the warning.

Consider your regex:

'<%TAG%[^>]*>(.*?)</%TAG%>'
 ^          ^
start      end 

Since you've not explicitly put the regex between delimiter, PHP thinks you are using < and > as delimiter as < is the first char in the regex. Hence when it sees an un-escaped < it takes it as end of pattern. Next we can have few modifiers after the closing delimiter which allow us to alter the behavior of the pattern matching. Some commmon modifiers are:

  • i for case insensitive
  • m for multi line match

Now in your case there is a ( after the closing delimiter which is not a valid modifier, hence the warning.

Comments

1

/^<[^>]+>(.*)<\/[^>]+>$/ should do the trick.

3 Comments

hi, I'm very interested in this approach. Could you please explain this? Thank you.
It's a pretty basic expression; <[^>]+> means 'one or more of any character except > enclosed within <>; (.*) matches anything; and <\/[^>]+> is similar to the first in that it means 'one or more of any character except > enclosed within </>. The first and the last are structured this way so you don't have to write complex rules to match what might possibly be in the tag (attributes, etc); we assume > will not be in it (because that's not valid in class names or element ids, for example). Not the most efficient expression, but gets the job done.
Also: there are parenthesis around .* (eg, (.*)) so that that group is returned as a specific match within the results.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.